WorldWideScience

Sample records for gene cluster sequence

  1. Variations in CCL3L gene cluster sequence and non-specific gene copy numbers

    Directory of Open Access Journals (Sweden)

    Edberg Jeffrey C

    2010-03-01

    Full Text Available Abstract Background Copy number variations (CNVs of the gene CC chemokine ligand 3-like1 (CCL3L1 have been implicated in HIV-1 susceptibility, but the association has been inconsistent. CCL3L1 shares homology with a cluster of genes localized to chromosome 17q12, namely CCL3, CCL3L2, and, CCL3L3. These genes are involved in host defense and inflammatory processes. Several CNV assays have been developed for the CCL3L1 gene. Findings Through pairwise and multiple alignments of these genes, we have shown that the homology between these genes ranges from 50% to 99% in complete gene sequences and from 70-100% in the exonic regions, with CCL3L1 and CCL3L3 being identical. By use of MEGA 4 and BioEdit, we aligned sense primers, anti-sense primers, and probes used in several previously described assays against pre-multiple alignments of all four chemokine genes. Each set of probes and primers aligned and matched with overlapping sequences in at least two of the four genes, indicating that previously utilized RT-PCR based CNV assays are not specific for only CCL3L1. The four available assays measured median copies of 2 and 3-4 in European and African American, respectively. The concordance between the assays ranged from 0.44-0.83 suggesting individual discordant calls and inconsistencies with the assays from the expected gene coverage from the known sequence. Conclusions This indicates that some of the inconsistencies in the association studies could be due to assays that provide heterogenous results. Sequence information to determine CNV of the three genes separately would allow to test whether their association with the pathogenesis of a human disease or phenotype is affected by an individual gene or by a combination of these genes.

  2. Sequencing, characterization, and gene expression analysis of the histidine decarboxylase gene cluster of Morganella morganii.

    Science.gov (United States)

    Ferrario, Chiara; Borgo, Francesca; de Las Rivas, Blanca; Muñoz, Rosario; Ricci, Giovanni; Fortina, Maria Grazia

    2014-03-01

    The histidine decarboxylase gene cluster of Morganella morganii DSM30146(T) was sequenced, and four open reading frames, named hdcT1, hdc, hdcT2, and hisRS were identified. Two putative histidine/histamine antiporters (hdcT1 and hdcT2) were located upstream and downstream the hdc gene, codifying a pyridoxal-P dependent histidine decarboxylase, and followed by hisRS gene encoding a histidyl-tRNA synthetase. This organization was comparable with the gene cluster of other known Gram negative bacteria, particularly with that of Klebsiella oxytoca. Recombinant Escherichia coli strains harboring plasmids carrying the M. morganii hdc gene were shown to overproduce histidine decarboxylase, after IPTG induction at 37 °C for 4 h. Quantitative RT-PCR experiments revealed the hdc and hisRS genes were highly induced under acidic and histidine-rich conditions. This work represents the first description and identification of the hdc-related genes in M. morganii. Results support the hypothesis that the histidine decarboxylation reaction in this prolific histamine producing species may play a role in acid survival. The knowledge of the role and the regulation of genes involved in histidine decarboxylation should improve the design of rational strategies to avoid toxic histamine production in foods.

  3. Detecting Sequence Homology at the Gene Cluster Level with MultiGeneBlast

    NARCIS (Netherlands)

    Medema, Marnix H.; Takano, Eriko; Breitling, Rainer; Nowick, Katja

    2013-01-01

    The genes encoding many biomolecular systems and pathways are genomically organized in operons or gene clusters. With MultiGeneBlast, we provide a user-friendly and effective tool to perform homology searches with operons or gene clusters as basic units, instead of single genes. The contextualizatio

  4. Detecting Sequence Homology at the Gene Cluster Level with MultiGeneBlast

    NARCIS (Netherlands)

    Medema, Marnix H.; Takano, Eriko; Breitling, Rainer; Nowick, Katja

    The genes encoding many biomolecular systems and pathways are genomically organized in operons or gene clusters. With MultiGeneBlast, we provide a user-friendly and effective tool to perform homology searches with operons or gene clusters as basic units, instead of single genes. The

  5. A hybrid distance measure for clustering expressed sequence tags originating from the same gene family.

    Directory of Open Access Journals (Sweden)

    Keng-Hoong Ng

    Full Text Available BACKGROUND: Clustering is a key step in the processing of Expressed Sequence Tags (ESTs. The primary goal of clustering is to put ESTs from the same transcript of a single gene into a unique cluster. Recent EST clustering algorithms mostly adopt the alignment-free distance measures, where they tend to yield acceptable clustering accuracies with reasonable computational time. Despite the fact that these clustering methods work satisfactorily on a majority of the EST datasets, they have a common weakness. They are prone to deliver unsatisfactory clustering results when dealing with ESTs from the genes derived from the same family. The root cause is the distance measures applied on them are not sensitive enough to separate these closely related genes. METHODOLOGY/PRINCIPAL FINDINGS: We propose a hybrid distance measure that combines the global and local features extracted from ESTs, with the aim to address the clustering problem faced by ESTs derived from the same gene family. The clustering process is implemented using the DBSCAN algorithm. We test the hybrid distance measure on the ten EST datasets, and the clustering results are compared with the two alignment-free EST clustering tools, i.e. wcd and PEACE. The clustering results indicate that the proposed hybrid distance measure performs relatively better (in terms of clustering accuracy than both EST clustering tools. CONCLUSIONS/SIGNIFICANCE: The clustering results provide support for the effectiveness of the proposed hybrid distance measure in solving the clustering problem for ESTs that originate from the same gene family. The improvement of clustering accuracies on the experimental datasets has supported the claim that the sensitivity of the hybrid distance measure is sufficient to solve the clustering problem.

  6. Isolation of Hox cluster genes from insects reveals an accelerated sequence evolution rate.

    Directory of Open Access Journals (Sweden)

    Heike Hadrys

    Full Text Available Among gene families it is the Hox genes and among metazoan animals it is the insects (Hexapoda that have attracted particular attention for studying the evolution of development. Surprisingly though, no Hox genes have been isolated from 26 out of 35 insect orders yet, and the existing sequences derive mainly from only two orders (61% from Hymenoptera and 22% from Diptera. We have designed insect specific primers and isolated 37 new partial homeobox sequences of Hox cluster genes (lab, pb, Hox3, ftz, Antp, Scr, abd-a, Abd-B, Dfd, and Ubx from six insect orders, which are crucial to insect phylogenetics. These new gene sequences provide a first step towards comparative Hox gene studies in insects. Furthermore, comparative distance analyses of homeobox sequences reveal a correlation between gene divergence rate and species radiation success with insects showing the highest rate of homeobox sequence evolution.

  7. Sequencing and comparative analysis of fugu protocadherin clusters reveal diversity of protocadherin genes among teleosts

    Directory of Open Access Journals (Sweden)

    Rajasegaran Vikneswari

    2007-03-01

    Full Text Available Abstract Background The synaptic cell adhesion molecules, protocadherins, are a vertebrate innovation that accompanied the emergence of the neural tube and the elaborate central nervous system. In mammals, the protocadherins are encoded by three closely-linked clusters (α, β and γ of tandem genes and are hypothesized to provide a molecular code for specifying the remarkably-diverse neural connections in the central nervous system. Like mammals, the coelacanth, a lobe-finned fish, contains a single protocadherin locus, also arranged into α, β and γ clusters. Zebrafish, however, possesses two protocadherin loci that contain more than twice the number of genes as the coelacanth, but arranged only into α and γ clusters. To gain further insight into the evolutionary history of protocadherin clusters, we have sequenced and analyzed protocadherin clusters from the compact genome of the pufferfish, Fugu rubripes. Results Fugu contains two unlinked protocadherin loci, Pcdh1 and Pcdh2, that collectively consist of at least 77 genes. The fugu Pcdh1 locus has been subject to extensive degeneration, resulting in the complete loss of Pcdh1γ cluster. The fugu Pcdh genes have undergone lineage-specific regional gene conversion processes that have resulted in a remarkable regional sequence homogenization among paralogs in the same subcluster. Phylogenetic analyses show that most protocadherin genes are orthologous between fugu and zebrafish either individually or as paralog groups. Based on the inferred phylogenetic relationships of fugu and zebrafish genes, we have reconstructed the evolutionary history of protocadherin clusters in the teleost fish lineage. Conclusion Our results demonstrate the exceptional evolutionary dynamism of protocadherin genes in vertebrates in general, and in teleost fishes in particular. Besides the 'fish-specific' whole genome duplication, the evolution of protocadherin genes in teleost fishes is influenced by lineage

  8. Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2008-04-01

    Full Text Available Abstract Background The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. Results We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net. Conclusion The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.

  9. Defining reference sequences for Nocardia species by similarity and clustering analyses of 16S rRNA gene sequence data.

    Directory of Open Access Journals (Sweden)

    Manal Helal

    Full Text Available BACKGROUND: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. METHODS: A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. RESULTS: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52% corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. CONCLUSION: The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra

  10. Sequencing and mapping hemoglobin gene clusters in the australian model dasyurid marsupial sminthopsis macroura

    Energy Technology Data Exchange (ETDEWEB)

    De Leo, A.A.; Wheeler, D.; Lefevre, C.; Cheng, Jan-Fang; Hope, R.; Kuliwaba, J.; Nicholas, K.R.; Westermanc, M.; Graves, J.A.M.

    2004-07-26

    Comparing globin genes and their flanking sequences across many species has allowed globin gene evolution to be reconstructed in great detail. Marsupial globin sequences have proved to be of exceptional significance. A previous finding of a beta-like omega gene in the alpha cluster in the tammar wallaby suggested that the alpha and beta cluster evolved via genome duplication and loss rather than tandem duplication. To confirm and extend this important finding we isolated and sequenced BACs containing the alpha and beta loci from the distantly related Australian marsupial Sminthopsis macroura. We report that the alpha gene lies in the same BAC as the beta-like omega gene, implying that the alpha-omega juxtaposition is likely to be conserved in all marsupials. The LUC7L gene was found 3' of the S. macroura alpha locus, a gene order shared with humans but not mouse, chicken or fugu. Sequencing a BAC contig that contained the S. macroura beta globin and epsilon globin loci showed that the globin cluster is flanked by olfactory genes, demonstrating a gene arrangement conserved for over 180 MY. Analysis of the region 5' to the S. macroura epsilon globin gene revealed a region similar to the eutherian LCR, containing sequences and potential transcription factor binding sites with homology to eutherian hypersensitive sites 1 to 5. FISH mapping of BACs containing S. macroura alpha and beta globin genes located the beta globin cluster on chromosome 3q and the alpha locus close to the centromere on 1q, resolving contradictory map locations obtained by previous radioactive in situ hybridization.

  11. Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters.

    Science.gov (United States)

    Schorn, Michelle A; Alanjary, Mohammad M; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R; Ziemert, Nadine; Moore, Bradley S

    2016-12-01

    Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.

  12. Nucleotide sequence analysis of hypervariable junctions of Haemophilus influenzae pilus gene clusters.

    Science.gov (United States)

    Read, T D; Satola, S W; Farley, M M

    2000-12-01

    Haemophilus influenzae pili are surface structures that promote attachment to human epithelial cells. The five genes that encode pili, hifABCDE, are found inserted in genomes either between pmbA and hpt (hif-1) or between purE and pepN (hif-2). We determined the sequence between the ends of the pilus clusters and bordering genes in a number of H. influenzae strains. The junctions of the hif-1 cluster (limited to biogroup aegyptius isolates) are structurally simple. In contrast, hif-2 junctions are highly diverse, complex assemblies of conserved intergenic sequences (including genes hicA and hicB) with evidence of frequent recombination. Variation at hif-2 junctions seems to be tied to multiple copies of a 23-bp Haemophilus intergenic dyad sequence. The hif-1 cluster appears to have originated in biogroup aegyptius strains from invasion of the hpt-pmbA region by a DNA template containing the hif-2 genes with termini in the hairpin loop of flanking intergenic dyad sequences. The pilus gene clusters are an interesting model of a mobile "pathogenicity island" not associated with a phage, transposon, or insertion element.

  13. Sequence breakpoints in the aflatoxin biosynthesis gene cluster and flanking regions in nonaflatoxigenic Aspergillus flavus isolates.

    Science.gov (United States)

    Chang, Perng-Kuang; Horn, Bruce W; Dorner, Joe W

    2005-11-01

    Aspergillus flavus populations are genetically diverse. Isolates that produce either, neither, or both aflatoxins and cyclopiazonic acid (CPA) are present in the field. We investigated defects in the aflatoxin gene cluster in 38 nonaflatoxigenic A. flavus isolates collected from southern United States. PCR assays using aflatoxin-gene-specific primers grouped these isolates into eight (A-H) deletion patterns. Patterns C, E, G, and H, which contain 40 kb deletions, were examined for their sequence breakpoints. Pattern C has one breakpoint in the cypA 3' untranslated region (UTR) and another in the verA coding region. Pattern E has a breakpoint in the amdA coding region and another in the ver1 5'UTR. Pattern G contains a deletion identical to the one found in pattern C and has another deletion that extends from the cypA coding region to one end of the chromosome as suggested by the presence of telomeric sequence repeats, CCCTAATGTTGA. Pattern H has a deletion of the entire aflatoxin gene cluster from the hexA coding region in the sugar utilization gene cluster to the telomeric region. Thus, deletions in the aflatoxin gene cluster among A. flavus isolates are not rare, and the patterns appear to be diverse. Genetic drift may be a driving force that is responsible for the loss of the entire aflatoxin gene cluster in nonaflatoxigenic A. flavus isolates when aflatoxins have lost their adaptive value in nature.

  14. Sequencing, physical organization and kinetic expression of the patulin biosynthetic gene cluster from Penicillium expansum.

    Science.gov (United States)

    Tannous, Joanna; El Khoury, Rhoda; Snini, Selma P; Lippi, Yannick; El Khoury, André; Atoui, Ali; Lteif, Roger; Oswald, Isabelle P; Puel, Olivier

    2014-10-17

    Patulin is a polyketide-derived mycotoxin produced by numerous filamentous fungi. Among them, Penicillium expansum is by far the most problematic species. This fungus is a destructive phytopathogen capable of growing on fruit, provoking the blue mold decay of apples and producing significant amounts of patulin. The biosynthetic pathway of this mycotoxin is chemically well-characterized, but its genetic bases remain largely unknown with only few characterized genes in less economic relevant species. The present study consisted of the identification and positional organization of the patulin gene cluster in P. expansum strain NRRL 35695. Several amplification reactions were performed with degenerative primers that were designed based on sequences from the orthologous genes available in other species. An improved genome Walking approach was used in order to sequence the remaining adjacent genes of the cluster. RACE-PCR was also carried out from mRNAs to determine the start and stop codons of the coding sequences. The patulin gene cluster in P. expansum consists of 15 genes in the following order: patH, patG, patF, patE, patD, patC, patB, patA, patM, patN, patO, patL, patI, patJ, and patK. These genes share 60-70% of identity with orthologous genes grouped differently, within a putative patulin cluster described in a non-producing strain of Aspergillus clavatus. The kinetics of patulin cluster genes expression was studied under patulin-permissive conditions (natural apple-based medium) and patulin-restrictive conditions (Eagle's minimal essential medium), and demonstrated a significant association between gene expression and patulin production. In conclusion, the sequence of the patulin cluster in P. expansum constitutes a key step for a better understanding of the mechanisms leading to patulin production in this fungus. It will allow the role of each gene to be elucidated, and help to define strategies to reduce patulin production in apple-based products.

  15. Complete Genome Sequence of the Filamentous Fungus Aspergillus westerdijkiae Reveals the Putative Biosynthetic Gene Cluster of Ochratoxin A

    Science.gov (United States)

    Chakrabortti, Alolika; Li, Jinming

    2016-01-01

    Ochratoxin A (OTA) is a common mycotoxin that contaminates food and agricultural products. Sequencing of the complete genome of Aspergillus westerdijkiae, a major producer of OTA, reveals more than 50 biosynthetic gene clusters, including a putative OTA biosynthetic gene cluster that encodes a dozen of enzymes, transporters, and regulatory proteins. PMID:27635003

  16. Versatile Cosmid Vectors for the Isolation, Expression, and Rescue of Gene Sequences: Studies with the Human α -globin Gene Cluster

    Science.gov (United States)

    Lau, Yun-Fai; Kan, Yuet Wai

    1983-09-01

    We have developed a series of cosmids that can be used as vectors for genomic recombinant DNA library preparations, as expression vectors in mammalian cells for both transient and stable transformations, and as shuttle vectors between bacteria and mammalian cells. These cosmids were constructed by inserting one of the SV2-derived selectable gene markers-SV2-gpt, SV2-DHFR, and SV2-neo-in cosmid pJB8. High efficiency of genomic cloning was obtained with these cosmids and the size of the inserts was 30-42 kilobases. We isolated recombinant cosmids containing the human α -globin gene cluster from these genomic libraries. The simian virus 40 DNA in these selectable gene markers provides the origin of replication and enhancer sequences necessary for replication in permissive cells such as COS 7 cells and thereby allows transient expression of α -globin genes in these cells. These cosmids and their recombinants could also be stably transformed into mammalian cells by using the respective selection systems. Both of the adult α -globin genes were more actively expressed than the embryonic zeta -globin genes in these transformed cell lines. Because of the presence of the cohesive ends of the Charon 4A phage in the cosmids, the transforming DNA sequences could readily be rescued from these stably transformed cells into bacteria by in vitro packaging of total cellular DNA. Thus, these cosmid vectors are potentially useful for direct isolation of structural genes.

  17. Gene Sequence Based Clustering Assists in Dereplication of Pseudoalteromonas luteoviolacea Strains with Identical Inhibitory Activity and Antibiotic Production

    DEFF Research Database (Denmark)

    Vynne, Nikolaj Grønnegaard; Månsson, Maria; Gram, Lone

    2012-01-01

    Some microbial species are chemically homogenous, and the same secondary metabolites are found in all strains. In contrast, we previously found that five strains of P. luteoviolacea were closely related by 16S rRNA gene sequence but produced two different antibiotic profiles. The purpose of the p...... spent rediscovering known compounds, and this study indicates that phylogeny clustering of bioactive species has the potential to be a useful dereplication tool in biodiscovery efforts....... antibacterial profiles based on inhibition assays against Vibrio anguillarum and Staphylococcus aureus. To determine whether chemotype and inhibition profile are reflected by phylogenetic clustering we sequenced 16S rRNA, gyrB and recA genes. Clustering based on 16S rRNA gene sequences alone showed little...... correlation to chemotypes and inhibition profiles, while clustering based on concatenated 16S rRNA, gyrB, and recA gene sequences resulted in three clusters, two of which uniformly consisted of strains of identical chemotype and inhibition profile. A major time sink in natural products discovery is the effort...

  18. Leveraging long sequencing reads to investigate R-gene clustering and variation in sugar beet

    Science.gov (United States)

    Host-pathogen interactions are of prime importance to modern agriculture. Plants utilize various types of resistance genes to mitigate pathogen damage. Identification of the specific gene responsible for a specific resistance can be difficult due to duplication and clustering within R-gene families....

  19. Distribution of Suicin Gene Clusters in Streptococcus suis Serotype 2 Belonging to Sequence Types 25 and 28

    Directory of Open Access Journals (Sweden)

    Taryn B. T. Athey

    2016-01-01

    Full Text Available Recently, we reported the purification and characterization of three distinct lantibiotics (named suicin 90-1330, suicin 3908, and suicin 65 produced by Streptococcus suis. In this study, we investigated the distribution of the three suicin lantibiotic gene clusters among serotype 2 S. suis strains belonging to sequence type (ST 25 and ST28, the two dominant STs identified in North America. The genomes of 102 strains were interrogated for the presence of suicin gene clusters encoding suicins 90-1330, 3908, and 65. The gene cluster encoding suicin 65 was the most prevalent and mainly found among ST25 strains. In contrast, none of the genes related to suicin 90-1330 production were identified in 51 ST25 strains nor in 35/51 ST28 strains. However, the complete suicin 90-1330 gene cluster was found in ten ST28 strains, although some genes in the cluster were truncated in three of these isolates. The vast majority (101/102 of S. suis strains did not possess any of the genes encoding suicin 3908. In conclusion, this study indicates heterogeneous distribution of suicin genes in S. suis.

  20. ThioFinder: a web-based tool for the identification of thiopeptide gene clusters in DNA sequences.

    Directory of Open Access Journals (Sweden)

    Jing Li

    Full Text Available Thiopeptides are a growing class of sulfur-rich, highly modified heterocyclic peptides that are mainly active against Gram-positive bacteria including various drug-resistant pathogens. Recent studies also reveal that many thiopeptides inhibit the proliferation of human cancer cells, further expanding their application potentials for clinical use. Thiopeptide biosynthesis shares a common paradigm, featuring a ribosomally synthesized precursor peptide and conserved posttranslational modifications, to afford a characteristic core system, but differs in tailoring to furnish individual members. Identification of new thiopeptide gene clusters, by taking advantage of increasing information of DNA sequences from bacteria, may facilitate new thiopeptide discovery and enrichment of the unique biosynthetic elements to produce novel drug leads by applying the principle of combinatorial biosynthesis. In this study, we have developed a web-based tool ThioFinder to rapidly identify thiopeptide biosynthetic gene cluster from DNA sequence using a profile Hidden Markov Model approach. Fifty-four new putative thiopeptide biosynthetic gene clusters were found in the sequenced bacterial genomes of previously unknown producing microorganisms. ThioFinder is fully supported by an open-access database ThioBase, which contains the sufficient information of the 99 known thiopeptides regarding the chemical structure, biological activity, producing organism, and biosynthetic gene (cluster along with the associated genome if available. The ThioFinder website offers researchers a unique resource and great flexibility for sequence analysis of thiopeptide biosynthetic gene clusters. ThioFinder is freely available at http://db-mml.sjtu.edu.cn/ThioFinder/.

  1. Expanding our understanding of sequence-function relationships of type II polyketide biosynthetic gene clusters: bioinformatics-guided identification of Frankiamicin A from Frankia sp. EAN1pec.

    Directory of Open Access Journals (Sweden)

    Yasushi Ogasawara

    Full Text Available A large and rapidly increasing number of unstudied "orphan" natural product biosynthetic gene clusters are being uncovered in sequenced microbial genomes. An important goal of modern natural products research is to be able to accurately predict natural product structures and biosynthetic pathways from these gene cluster sequences. This requires both development of bioinformatic methods for global analysis of these gene clusters and experimental characterization of select products produced by gene clusters with divergent sequence characteristics. Here, we conduct global bioinformatic analysis of all available type II polyketide gene cluster sequences and identify a conserved set of gene clusters with unique ketosynthase α/β sequence characteristics in the genomes of Frankia species, a group of Actinobacteria with underexploited natural product biosynthetic potential. Through LC-MS profiling of extracts from several Frankia species grown under various conditions, we identified Frankia sp. EAN1pec as producing a compound with spectral characteristics consistent with the type II polyketide produced by this gene cluster. We isolated the compound, a pentangular polyketide which we named frankiamicin A, and elucidated its structure by NMR and labeled precursor feeding. We also propose biosynthetic and regulatory pathways for frankiamicin A based on comparative genomic analysis and literature precedent, and conduct bioactivity assays of the compound. Our findings provide new information linking this set of Frankia gene clusters with the compound they produce, and our approach has implications for accurate functional prediction of the many other type II polyketide clusters present in bacterial genomes.

  2. Sequencing and transcriptional analysis of the biosynthesis gene cluster of putrescine-producing Lactococcus lactis.

    Science.gov (United States)

    Ladero, Victor; Rattray, Fergal P; Mayo, Baltasar; Martín, María Cruz; Fernández, María; Alvarez, Miguel A

    2011-09-01

    Lactococcus lactis is a prokaryotic microorganism with great importance as a culture starter and has become the model species among the lactic acid bacteria. The long and safe history of use of L. lactis in dairy fermentations has resulted in the classification of this species as GRAS (General Regarded As Safe) or QPS (Qualified Presumption of Safety). However, our group has identified several strains of L. lactis subsp. lactis and L. lactis subsp. cremoris that are able to produce putrescine from agmatine via the agmatine deiminase (AGDI) pathway. Putrescine is a biogenic amine that confers undesirable flavor characteristics and may even have toxic effects. The AGDI cluster of L. lactis is composed of a putative regulatory gene, aguR, followed by the genes (aguB, aguD, aguA, and aguC) encoding the catabolic enzymes. These genes are transcribed as an operon that is induced in the presence of agmatine. In some strains, an insertion (IS) element interrupts the transcription of the cluster, which results in a non-putrescine-producing phenotype. Based on this knowledge, a PCR-based test was developed in order to differentiate nonproducing L. lactis strains from those with a functional AGDI cluster. The analysis of the AGDI cluster and their flanking regions revealed that the capacity to produce putrescine via the AGDI pathway could be a specific characteristic that was lost during the adaptation to the milk environment by a process of reductive genome evolution.

  3. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  4. Cloning and sequence analysis of Alcaligenes faecalis nifHDK gene cluster

    Institute of Scientific and Technical Information of China (English)

    张海予; 林敏; 萧凤回; 朱新生; 方宣钧; 尤崇杓; 朱玉贤

    1997-01-01

    Total DNA of Alcaligenes faecalis was probed with both the nifH and nifHD sequences from K. pneumoniae. One positive band of about 4.6 kb was discovered. This nifH homologous fragment was cloned into the vector pBluescript SK to construct the recombinant plasmid pBZl. The inserted fragment in pBZl was analyzed by physical mapping and was further subcloned for sequencing. It was found that this A. faecalis nifHDK homology pos-sessed a typical σ54-dependent promoter region with upstream activator sequence (UAS) and A-T rich region. The nifH and nifD ORFs were 888 and 1 476 bp long respectively. The GC contents of these two genes were about 61. 6% and 60.0% . The intergenic regions of nifH-nifD and nifD-nifK were 101 and 105 bp respectively. There were sepa-rate SD sequences upstream of all the three genes. The deduced amino acid sequences of the nifH gene product (the Fe-protein ) and the nifD gene product (the Mo-Fc-protein) were also highly homologous to other nitrogen-fixing bacteria, especially in th

  5. MIDDAS-M: motif-independent de novo detection of secondary metabolite gene clusters through the integration of genome sequencing and transcriptome data.

    Science.gov (United States)

    Umemura, Myco; Koike, Hideaki; Nagano, Nozomi; Ishii, Tomoko; Kawano, Jin; Yamane, Noriko; Kozone, Ikuko; Horimoto, Katsuhisa; Shin-ya, Kazuo; Asai, Kiyoshi; Yu, Jiujiang; Bennett, Joan W; Machida, Masayuki

    2013-01-01

    Many bioactive natural products are produced as "secondary metabolites" by plants, bacteria, and fungi. During the middle of the 20th century, several secondary metabolites from fungi revolutionized the pharmaceutical industry, for example, penicillin, lovastatin, and cyclosporine. They are generally biosynthesized by enzymes encoded by clusters of coordinately regulated genes, and several motif-based methods have been developed to detect secondary metabolite biosynthetic (SMB) gene clusters using the sequence information of typical SMB core genes such as polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS). However, no detection method exists for SMB gene clusters that are functional and do not include core SMB genes at present. To advance the exploration of SMB gene clusters, especially those without known core genes, we developed MIDDAS-M, a motif-independent de novodetection algorithm for SMB gene clusters. We integrated virtual gene cluster generation in an annotated genome sequence with highly sensitive scoring of the cooperative transcriptional regulation of cluster member genes. MIDDAS-M accurately predicted 38 SMB gene clusters that have been experimentally confirmed and/or predicted by other motif-based methods in 3 fungal strains. MIDDAS-M further identified a new SMB gene cluster for ustiloxin B, which was experimentally validated. Sequence analysis of the cluster genes indicated a novel mechanism for peptide biosynthesis independent of NRPS. Because it is fully computational and independent of empirical knowledge about SMB core genes, MIDDAS-M allows a large-scale, comprehensive analysis of SMB gene clusters, including those with novel biosynthetic mechanisms that do not contain any functionally characterized genes.

  6. In silico analysis highlights the frequency and diversity of type 1 lantibiotic gene clusters in genome sequenced bacteria

    LENUS (Irish Health Repository)

    Marsh, Alan J

    2010-11-30

    Abstract Background Lantibiotics are lanthionine-containing, post-translationally modified antimicrobial peptides. These peptides have significant, but largely untapped, potential as preservatives and chemotherapeutic agents. Type 1 lantibiotics are those in which lanthionine residues are introduced into the structural peptide (LanA) through the activity of separate lanthionine dehydratase (LanB) and lanthionine synthetase (LanC) enzymes. Here we take advantage of the conserved nature of LanC enzymes to devise an in silico approach to identify potential lantibiotic-encoding gene clusters in genome sequenced bacteria. Results In total 49 novel type 1 lantibiotic clusters were identified which unexpectedly were associated with species, genera and even phyla of bacteria which have not previously been associated with lantibiotic production. Conclusions Multiple type 1 lantibiotic gene clusters were identified at a frequency that suggests that these antimicrobials are much more widespread than previously thought. These clusters represent a rich repository which can yield a large number of valuable novel antimicrobials and biosynthetic enzymes.

  7. Investigation of pathogenic genes in peri-implantitis from implant clustering failure patients: a whole-exome sequencing pilot study.

    Directory of Open Access Journals (Sweden)

    Soohyung Lee

    Full Text Available Peri-implantitis is a frequently occurring gum disease linked to multi-factorial traits with various environmental and genetic causalities and no known concrete pathogenesis. The varying severity of peri-implantitis among patients with relatively similar environments suggests a genetic aspect which needs to be investigated to understand and regulate the pathogenesis of the disease. Six unrelated individuals with multiple clusterization implant failure due to severe peri-implantitis were chosen for this study. These six individuals had relatively healthy lifestyles, with minimal environmental causalities affecting peri-implantitis. Research was undertaken to investigate pathogenic genes in peri-implantitis albeit with a small number of subjects and incomplete elimination of environmental causalities. Whole-exome sequencing was performed on collected saliva samples via self DNA collection kit. Common variants with minor allele frequencies (MAF > = 0.05 from all control datasets were eliminated and variants having high and moderate impact and loss of function were used for comparison. Gene set enrichment analysis was performed to reveal functional groups associated with the genetic variants. 2,022 genes were left after filtering against dbSNP, the 1000 Genomes East Asian population, and healthy Korean randomized subsample data (GSK project. 175 (p-value <0.05 out of 927 gene sets were obtained via GSEA (DAVID. The top 10 was chosen (p-value <0.05 from cluster enrichment showing significance of cytoskeleton, cell adhesion, and metal ion binding. Network analysis was applied to find relationships between functional clusters. Among the functional groups, ion metal binding was located in the center of all clusters, indicating dysfunction of regulation in metal ion concentration might affect cell morphology or cell adhesion, resulting in implant failure. This result may demonstrate the feasibility of and provide pilot data for a larger research

  8. Investigation of Pathogenic Genes in Peri-Implantitis from Implant Clustering Failure Patients: A Whole-Exome Sequencing Pilot Study

    Science.gov (United States)

    Lee, Soohyung; Kim, Ji-Young; Hwang, Jihye; Kim, Sanguk; Lee, Jae-Hoon; Han, Dong-Hoo

    2014-01-01

    Peri-implantitis is a frequently occurring gum disease linked to multi-factorial traits with various environmental and genetic causalities and no known concrete pathogenesis. The varying severity of peri-implantitis among patients with relatively similar environments suggests a genetic aspect which needs to be investigated to understand and regulate the pathogenesis of the disease. Six unrelated individuals with multiple clusterization implant failure due to severe peri-implantitis were chosen for this study. These six individuals had relatively healthy lifestyles, with minimal environmental causalities affecting peri-implantitis. Research was undertaken to investigate pathogenic genes in peri-implantitis albeit with a small number of subjects and incomplete elimination of environmental causalities. Whole-exome sequencing was performed on collected saliva samples via self DNA collection kit. Common variants with minor allele frequencies (MAF) > = 0.05 from all control datasets were eliminated and variants having high and moderate impact and loss of function were used for comparison. Gene set enrichment analysis was performed to reveal functional groups associated with the genetic variants. 2,022 genes were left after filtering against dbSNP, the 1000 Genomes East Asian population, and healthy Korean randomized subsample data (GSK project). 175 (p-value implant failure. This result may demonstrate the feasibility of and provide pilot data for a larger research project aimed at discovering biomarkers for early diagnosis of peri-implantitis. PMID:24921256

  9. Whole-genome sequencing suggests a chemokine gene cluster that modifies age at onset in familial Alzheimer's disease.

    Science.gov (United States)

    Lalli, M A; Bettcher, B M; Arcila, M L; Garcia, G; Guzman, C; Madrigal, L; Ramirez, L; Acosta-Uribe, J; Baena, A; Wojta, K J; Coppola, G; Fitch, R; de Both, M D; Huentelman, M J; Reiman, E M; Brunkow, M E; Glusman, G; Roach, J C; Kao, A W; Lopera, F; Kosik, K S

    2015-11-01

    We have sequenced the complete genomes of 72 individuals affected with early-onset familial Alzheimer's disease caused by an autosomal dominant, highly penetrant mutation in the presenilin-1 (PSEN1) gene, and performed genome-wide association testing to identify variants that modify age at onset (AAO) of Alzheimer's disease. Our analysis identified a haplotype of single-nucleotide polymorphisms (SNPs) on chromosome 17 within a chemokine gene cluster associated with delayed onset of mild-cognitive impairment and dementia. Individuals carrying this haplotype had a mean AAO of mild-cognitive impairment at 51.0 ± 5.2 years compared with 41.1 ± 7.4 years for those without these SNPs. This haplotype thus appears to modify Alzheimer's AAO, conferring a large (~10 years) protective effect. The associated locus harbors several chemokines including eotaxin-1 encoded by CCL11, and the haplotype includes a missense polymorphism in this gene. Validating this association, we found plasma eotaxin-1 levels were correlated with disease AAO in an independent cohort from the University of California San Francisco Memory and Aging Center. In this second cohort, the associated haplotype disrupted the typical age-associated increase of eotaxin-1 levels, suggesting a complex regulatory role for this haplotype in the general population. Altogether, these results suggest eotaxin-1 as a novel modifier of Alzheimer's disease AAO and open potential avenues for therapy.

  10. Gene Sequence Based Clustering Assists in Dereplication of Pseudoalteromonas luteoviolacea Strains with Identical Inhibitory Activity and Antibiotic Production

    Directory of Open Access Journals (Sweden)

    Lone Gram

    2012-08-01

    Full Text Available Some microbial species are chemically homogenous, and the same secondary metabolites are found in all strains. In contrast, we previously found that five strains of P. luteoviolacea were closely related by 16S rRNA gene sequence but produced two different antibiotic profiles. The purpose of the present study was to determine whether such bioactivity differences could be linked to genotypes allowing methods from phylogenetic analysis to aid in selection of strains for biodiscovery. Thirteen P. luteoviolacea strains divided into three chemotypes based on production of known antibiotics and four antibacterial profiles based on inhibition assays against Vibrio anguillarum and Staphylococcus aureus. To determine whether chemotype and inhibition profile are reflected by phylogenetic clustering we sequenced 16S rRNA, gyrB and recA genes. Clustering based on 16S rRNA gene sequences alone showed little correlation to chemotypes and inhibition profiles, while clustering based on concatenated 16S rRNA, gyrB, and recA gene sequences resulted in three clusters, two of which uniformly consisted of strains of identical chemotype and inhibition profile. A major time sink in natural products discovery is the effort spent rediscovering known compounds, and this study indicates that phylogeny clustering of bioactive species has the potential to be a useful dereplication tool in biodiscovery efforts.

  11. Genomic sequence analysis of the 238-kb swine segment with a cluster of TRIM and olfactory receptor genes located, but with no class I genes, at the distal end of the SLA class I region.

    Science.gov (United States)

    Ando, Asako; Shigenari, Atsuko; Kulski, Jerzy K; Renard, Christine; Chardon, Patrick; Shiina, Takashi; Inoko, Hidetoshi

    2005-12-01

    Continuous genomic sequence has been previously determined for the swine leukocyte antigen (SLA) class I region from the TNF gene cluster at the border between the major histocompatibility complex (MHC) class III and class I regions to the UBD gene at the telomeric end of the classical class I gene cluster (SLA-1 to SLA-5, SLA-9, SLA-11). To complete the genomic sequence of the entire SLA class I genomic region, we have analyzed the genomic sequences of two BAC clones carrying a continuous 237,633-bp-long segment spanning from the TRIM15 gene to the UBD gene located on the telomeric side of the classical SLA class I gene cluster. Fifteen non-class I genes, including the zinc finger and the tripartite motif (TRIM) ring-finger-related family genes and olfactory receptor genes, were identified in the 238-kilobase (kb) segment, and their location in the segment was similar to their apparent human homologs. In contrast, a human segment (alpha block) spanning about 375 kb from the gene ETF1P1 and from the HLA-J to HLA-F genes was absent from the 238-kb swine segment. We conclude that the gene organization of the MHC non-class I genes located in the telomeric side of the classical SLA class I gene cluster is remarkably similar between the swine and the human segments, although the swine lacks a 375-kb segment corresponding to the human alpha block.

  12. De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units

    Directory of Open Access Journals (Sweden)

    Sarah L. Westcott

    2015-12-01

    Full Text Available Background. 16S rRNA gene sequences are routinely assigned to operational taxonomic units (OTUs that are then used to analyze complex microbial communities. A number of methods have been employed to carry out the assignment of 16S rRNA gene sequences to OTUs leading to confusion over which method is optimal. A recent study suggested that a clustering method should be selected based on its ability to generate stable OTU assignments that do not change as additional sequences are added to the dataset. In contrast, we contend that the quality of the OTU assignments, the ability of the method to properly represent the distances between the sequences, is more important.Methods. Our analysis implemented six de novo clustering algorithms including the single linkage, complete linkage, average linkage, abundance-based greedy clustering, distance-based greedy clustering, and Swarm and the open and closed-reference methods. Using two previously published datasets we used the Matthew’s Correlation Coefficient (MCC to assess the stability and quality of OTU assignments.Results. The stability of OTU assignments did not reflect the quality of the assignments. Depending on the dataset being analyzed, the average linkage and the distance and abundance-based greedy clustering methods generated OTUs that were more likely to represent the actual distances between sequences than the open and closed-reference methods. We also demonstrated that for the greedy algorithms VSEARCH produced assignments that were comparable to those produced by USEARCH making VSEARCH a viable free and open source alternative to USEARCH. Further interrogation of the reference-based methods indicated that when USEARCH or VSEARCH were used to identify the closest reference, the OTU assignments were sensitive to the order of the reference sequences because the reference sequences can be identical over the region being considered. More troubling was the observation that while both USEARCH and

  13. A heritability-based comparison of methods used to cluster 16S rRNA gene sequences into operational taxonomic units

    Directory of Open Access Journals (Sweden)

    Matthew A. Jackson

    2016-08-01

    Full Text Available A variety of methods are available to collapse 16S rRNA gene sequencing reads to the operational taxonomic units (OTUs used in microbiome analyses. A number of studies have aimed to compare the quality of the resulting OTUs. However, in the absence of a standard method to define and enumerate the different taxa within a microbial community, existing comparisons have been unable to compare the ability of clustering methods to generate units that accurately represent functional taxonomic segregation. We have previously demonstrated heritability of the microbiome and we propose this as a measure of each methods’ ability to generate OTUs representing biologically relevant units. Our approach assumes that OTUs that best represent the functional units interacting with the hosts’ properties will produce the highest heritability estimates. Using 1,750 unselected individuals from the TwinsUK cohort, we compared 11 approaches to OTU clustering in heritability analyses. We find that de novo clustering methods produce more heritable OTUs than reference based approaches, with VSEARCH and SUMACLUST performing well. We also show that differences resulting from each clustering method are minimal once reads are collapsed by taxonomic assignment, although sample diversity estimates are clearly influenced by OTU clustering approach. These results should help the selection of sequence clustering methods in future microbiome studies, particularly for studies of human host-microbiome interactions.

  14. Subtyping Salmonella enterica serovar enteritidis isolates from different sources by using sequence typing based on virulence genes and clustered regularly interspaced short palindromic repeats (CRISPRs).

    Science.gov (United States)

    Liu, Fenyun; Kariyawasam, Subhashinie; Jayarao, Bhushan M; Barrangou, Rodolphe; Gerner-Smidt, Peter; Ribot, Efrain M; Knabel, Stephen J; Dudley, Edward G

    2011-07-01

    Salmonella enterica subsp. enterica serovar Enteritidis is a major cause of food-borne salmonellosis in the United States. Two major food vehicles for S. Enteritidis are contaminated eggs and chicken meat. Improved subtyping methods are needed to accurately track specific strains of S. Enteritidis related to human salmonellosis throughout the chicken and egg food system. A sequence typing scheme based on virulence genes (fimH and sseL) and clustered regularly interspaced short palindromic repeats (CRISPRs)-CRISPR-including multi-virulence-locus sequence typing (designated CRISPR-MVLST)-was used to characterize 35 human clinical isolates, 46 chicken isolates, 24 egg isolates, and 63 hen house environment isolates of S. Enteritidis. A total of 27 sequence types (STs) were identified among the 167 isolates. CRISPR-MVLST identified three persistent and predominate STs circulating among U.S. human clinical isolates and chicken, egg, and hen house environmental isolates in Pennsylvania, and an ST that was found only in eggs and humans. It also identified a potential environment-specific sequence type. Moreover, cluster analysis based on fimH and sseL identified a number of clusters, of which several were found in more than one outbreak, as well as 11 singletons. Further research is needed to determine if CRISPR-MVLST might help identify the ecological origins of S. Enteritidis strains that contaminate chickens and eggs.

  15. The human met-ase gene (GZMM): Structure, sequence, and close physical linkage to the serine protease gene cluster on 19p13.3

    Energy Technology Data Exchange (ETDEWEB)

    Pilat, D.; Zimmer, M.; Wekerle, H. [Max-Planck-Institut fuer Psychiatrie, Martinsried (Germany)] [and others

    1994-12-01

    Cosmid clones containing the genes for the human and murine natural killer cell serine protease Met-ase (gene symbol GZMM; granzyme M) were identified by screening human and murine cosmid libraries with rat Met-ase (RNIK-Met-1) cDNA. The human gene has a size of 7.5 kb and an exon-intron structure identical to that of serine protease genes located on human chromosomes 5q11-q12, 14q11.2, and 19p13.3 that are expressed by lymphocytes, mast cells, or myelomonocyte precursors. Using cosmid DNA as a probe for fluorescence in situ hybridization, we identified the chromosomal position of human Met-ase as 19p13.3. Interphase studies with two differentially labeled probes for Met-ase and the azurocidin (AZU1), proteinase 3 (PRTN3), and neutrophil elastase (ELA2) gene cluster revealed that the distance of Met-ase from this gene cluster is in the range of 200 to 500 kb. Using differentially labeled mouse cosmid probes, we also mapped the murine gene for Met-ase to chromosomal band 10C, close to the gene for lamin B2. Thus, the Met-ase, AZU1, PRTN3, and ELA2 genes fall into an established region of homology between mouse chromosomal band 10C and human 19p13.3. 35 refs., 4 figs.

  16. Gene Cluster Statistics with Gene Families

    Science.gov (United States)

    Durand, Dannie

    2009-01-01

    Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such “gene clusters” is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data

  17. A Cluster of Vitellogenin Genes in the Mediterranean Fruit Fly Ceratitis Capitata: Sequence and Structural Conservation in Dipteran Yolk Proteins and Their Genes

    Science.gov (United States)

    Rina, M.; Savakis, C.

    1991-01-01

    Four genes encoding the major egg yolk polypeptides of the Mediterranean fruit fly Ceratitis capitata, vitellogenins 1 and 2 (VG1 and VG2), were cloned, characterized and partially sequenced. The genes are located on the same region of chromosome 5 and are organized in pairs, each encoding the two polypeptides on opposite DNA strands. Restriction and nucleotide sequence analysis indicate that the gene pairs have arisen from an ancestral pair by a relatively recent duplication event. The transcribed part is very similar to that of the Drosophila melanogaster yolk protein genes Yp1, Yp2 and Yp3. The Vg1 genes have two introns at the same positions as those in D. melanogaster Yp3; the Vg2 genes have only one of the introns, as do D. melanogaster Yp1 and Yp2. Comparison of the five polypeptide sequences shows extensive homology, with 27% of the residues being invariable. The sequence similarity of the processed proteins extends in two regions separated by a nonconserved region of varying size. Secondary structure predictions suggest a highly conserved secondary structure pattern in the two regions, which probably correspond to structural and functional domains. The carboxy-end domain of the C. capitata proteins shows the same sequence similarities with triacylglycerol lipases that have been reported previously for the D. melanogaster yolk proteins. Analysis of codon usage shows significant differences between D. melanogaster and C. capitata vitellogenins with the latter exhibiting a less biased representation of synonymous codons. PMID:1903120

  18. Chicken rRNA Gene Cluster Structure.

    Directory of Open Access Journals (Sweden)

    Alexander G Dyomin

    Full Text Available Ribosomal RNA (rRNA genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5'ETS (1836 bp, 18S rRNA gene (1823 bp, ITS1 (2530 bp, 5.8S rRNA gene (157 bp, ITS2 (733 bp, 28S rRNA gene (4441 bp and 3'ETS (343 bp. The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region. The results have confirmed the chicken rRNA gene cluster validity.

  19. Pichia stipitis genomics, transcriptomics, and gene clusters

    Science.gov (United States)

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  20. Network of tRNA Gene Sequences

    Institute of Scientific and Technical Information of China (English)

    WEI Fang-ping; LI Sheng; MA Hong-ru

    2008-01-01

    A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distribution to scale-free distribution when the similarity degree of the tRNA gene sequences increases. The tRNA gene sequences with the same anticodon identity are more self-organized than those with different anticodon identities and form local clusters in the network. Some vertices of the local cluster have a high connection with other local clusters, and the probable reason was given. Moreover, a network constructed by the same number of random tRNA sequences was used to make comparisons. The relationships between the properties of the tRNA similarity network and the characters of tRNA evolutionary history were discussed.

  1. Complete sequence of a plasmid from a bovine methicillin-resistant Staphylococcus aureus harbouring a novel ica-like gene cluster in addition to antimicrobial and heavy metal resistance genes.

    Science.gov (United States)

    Feßler, Andrea T; Zhao, Qin; Schoenfelder, Sonja; Kadlec, Kristina; Brenner Michael, Geovana; Wang, Yang; Ziebuhr, Wilma; Shen, Jianzhong; Schwarz, Stefan

    2017-02-01

    The multiresistance plasmid pAFS11, obtained from a bovine methicillin-resistant Staphylococcus aureus (MRSA) isolate, was completely sequenced and analysed for its structure and organisation. Moreover, the susceptibility to the heavy metals cadmium and copper was determined by broth macrodilution. The 49,189-bp plasmid harboured the apramycin resistance gene apmA, two copies of the macrolide/lincosamide/streptogramin B resistance gene erm(B) (both located on remnants of a truncated transposon Tn917), the kanamycin/neomycin resistance gene aadD, the tetracycline resistance gene tet(L) and the trimethoprim resistance gene dfrK. The latter three genes were part of a 7,284-bp segment which was bracketed by two copies of IS431. In addition, the cadmium resistance operon cadDX as well as the copper resistance genes copA and mco were located on the plasmid and mediated a reduced susceptibility to cadmium and copper. Moreover, a complete novel ica-like gene cluster of so far unknown genetic origin was detected on this plasmid. The ica-like gene cluster comprised four different genes whose products showed 64.4-76.9% homology to the Ica proteins known to be involved in biofilm formation of the S. aureus strains Mu50, Mu3 and N315. However, 96.2-99.4% homology was seen to proteins from S. sciuri NS1 indicating an S. sciuri origin. The finding of five different antibiotic resistance genes co-located on a plasmid with heavy metal resistance genes and an ica-like gene cluster is alarming. With the acquisition of this plasmid, antimicrobial multiresistance, heavy metal resistances and potential virulence properties may be co-selected and spread via a single horizontal gene transfer event. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. The DUB/USP17 deubiquitinating enzymes: A gene family within a tandemly repeated sequence, is also embedded within the copy number variable Beta-defensin cluster

    Directory of Open Access Journals (Sweden)

    Scott Christopher J

    2010-04-01

    Full Text Available Abstract Background The DUB/USP17 subfamily of deubiquitinating enzymes were originally identified as immediate early genes induced in response to cytokine stimulation in mice (DUB-1, DUB-1A, DUB-2, DUB-2A. Subsequently we have identified a number of human family members and shown that one of these (DUB-3 is also cytokine inducible. We originally showed that constitutive expression of DUB-3 can block cell proliferation and more recently we have demonstrated that this is due to its regulation of the ubiquitination and activity of the 'CAAX' box protease RCE1. Results Here we demonstrate that the human DUB/USP17 family members are found on both chromosome 4p16.1, within a block of tandem repeats, and on chromosome 8p23.1, embedded within the copy number variable beta-defensin cluster. In addition, we show that the multiple genes observed in humans and other distantly related mammals have arisen due to the independent expansion of an ancestral sequence within each species. However, it is also apparent when sequences from humans and the more closely related chimpanzee are compared, that duplication events have taken place prior to these species separating. Conclusions The observation that the DUB/USP17 genes, which can influence cell growth and survival, have evolved from an unstable ancestral sequence which has undergone multiple and varied duplications in the species examined marks this as a unique family. In addition, their presence within the beta-defensin repeat raises the question whether they may contribute to the influence of this repeat on immune related conditions.

  3. Escherichia coli O-Antigen Gene Clusters of Serogroups O62, O68, O131, O140, O142, and O163: DNA Sequences and Similarity between O62 and O68, and PCR-Based Serogrouping

    Directory of Open Access Journals (Sweden)

    Yanhong Liu

    2015-02-01

    Full Text Available The DNA sequence of the O-antigen gene clusters of Escherichia coli serogroups O62, O68, O131, O140, O142, and O163 was determined, and primers based on the wzx (O-antigen flippase and/or wzy (O-antigen polymerase genes within the O-antigen gene clusters were designed and used in PCR assays to identify each serogroup. Specificity was tested with E. coli reference strains, field isolates belonging to the target serogroups, and non-E. coli bacteria. The PCR assays were highly specific for the respective serogroups; however, the PCR assay targeting the O62 wzx gene reacted positively with strains belonging to E. coli O68, which was determined by serotyping. Analysis of the O-antigen gene cluster sequences of serogroups O62 and O68 reference strains showed that they were 94% identical at the nucleotide level, although O62 contained an insertion sequence (IS element located between the rmlA and rmlC genes within the O-antigen gene cluster. A PCR assay targeting the rmlA and rmlC genes flanking the IS element was used to differentiate O62 and O68 serogroups. The PCR assays developed in this study can be used for the detection and identification of E. coli O62/O68, O131, O140, O142, and O163 strains isolated from different sources.

  4. Diversity of capsular polysaccharide gene clusters in Kpc-producing Klebsiella pneumoniae clinical isolates of sequence type 258 involved in the Italian epidemic.

    Science.gov (United States)

    D'Andrea, Marco Maria; Amisano, Francesco; Giani, Tommaso; Conte, Viola; Ciacci, Nagaia; Ambretti, Simone; Santoriello, Luisa; Rossolini, Gian Maria

    2014-01-01

    Strains of Klebsiella pneumoniae producing KPC-type beta-lactamases (KPC-Kp) are broadly disseminating worldwide and constitute a major healthcare threat given their extensively drug resistant phenotypes and ability to rapidly disseminate in healthcare settings. In this work we report on the characterization of two different capsular polysaccharide (CPS) gene clusters, named cpsBO-4 and cps207-2, from two KPC-Kp clinical strains from Italy belonging in sequence type (ST) 258, which is one of the most successful ST of KPC-Kp spreading worldwide. While cpsBO-4 was different from known 78 K-types according to the recently proposed typing schemes based on the wzi or wzc gene sequences, cps207-2 was classified as K41 by one of these methods. Bioinformatic analysis revealed that they were represented in the genomic sequences of KPC-Kp from strains of ST258 from different countries, and cpsBO-4 was also detected in a KPC-Kp strain of ST442 from Brazil. Investigation of a collection of 46 ST258 and ST512 (a single locus variant of ST258) clinical strains representative of the recent Italian epidemic of KPC-Kp by means of a multiplex PCR typing approach revealed that cpsBO-4 was the most prevalent type, being detected both in ST258 and ST512 strains with a countrywide distribution, while cps207-2 was only detected in ST258 strains with a more restricted distribution.

  5. Diversity of Capsular Polysaccharide Gene Clusters in Kpc-Producing Klebsiella pneumoniae Clinical Isolates of Sequence Type 258 Involved in the Italian Epidemic

    Science.gov (United States)

    D’Andrea, Marco Maria; Amisano, Francesco; Giani, Tommaso; Conte, Viola; Ciacci, Nagaia; Ambretti, Simone; Santoriello, Luisa; Rossolini, Gian Maria

    2014-01-01

    Strains of Klebsiella pneumoniae producing KPC-type beta-lactamases (KPC-Kp) are broadly disseminating worldwide and constitute a major healthcare threat given their extensively drug resistant phenotypes and ability to rapidly disseminate in healthcare settings. In this work we report on the characterization of two different capsular polysaccharide (CPS) gene clusters, named cpsBO-4 and cps207-2, from two KPC-Kp clinical strains from Italy belonging in sequence type (ST) 258, which is one of the most successful ST of KPC-Kp spreading worldwide. While cpsBO-4 was different from known 78 K-types according to the recently proposed typing schemes based on the wzi or wzc gene sequences, cps207-2 was classified as K41 by one of these methods. Bioinformatic analysis revealed that they were represented in the genomic sequences of KPC-Kp from strains of ST258 from different countries, and cpsBO-4 was also detected in a KPC-Kp strain of ST442 from Brazil. Investigation of a collection of 46 ST258 and ST512 (a single locus variant of ST258) clinical strains representative of the recent Italian epidemic of KPC-Kp by means of a multiplex PCR typing approach revealed that cpsBO-4 was the most prevalent type, being detected both in ST258 and ST512 strains with a countrywide distribution, while cps207-2 was only detected in ST258 strains with a more restricted distribution. PMID:24823690

  6. Diversity of capsular polysaccharide gene clusters in Kpc-producing Klebsiella pneumoniae clinical isolates of sequence type 258 involved in the Italian epidemic.

    Directory of Open Access Journals (Sweden)

    Marco Maria D'Andrea

    Full Text Available Strains of Klebsiella pneumoniae producing KPC-type beta-lactamases (KPC-Kp are broadly disseminating worldwide and constitute a major healthcare threat given their extensively drug resistant phenotypes and ability to rapidly disseminate in healthcare settings. In this work we report on the characterization of two different capsular polysaccharide (CPS gene clusters, named cpsBO-4 and cps207-2, from two KPC-Kp clinical strains from Italy belonging in sequence type (ST 258, which is one of the most successful ST of KPC-Kp spreading worldwide. While cpsBO-4 was different from known 78 K-types according to the recently proposed typing schemes based on the wzi or wzc gene sequences, cps207-2 was classified as K41 by one of these methods. Bioinformatic analysis revealed that they were represented in the genomic sequences of KPC-Kp from strains of ST258 from different countries, and cpsBO-4 was also detected in a KPC-Kp strain of ST442 from Brazil. Investigation of a collection of 46 ST258 and ST512 (a single locus variant of ST258 clinical strains representative of the recent Italian epidemic of KPC-Kp by means of a multiplex PCR typing approach revealed that cpsBO-4 was the most prevalent type, being detected both in ST258 and ST512 strains with a countrywide distribution, while cps207-2 was only detected in ST258 strains with a more restricted distribution.

  7. antiSMASH : rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences

    NARCIS (Netherlands)

    Medema, Marnix H.; Blin, Kai; Cimermancic, Peter; de Jager, Victor; Zakrzewski, Piotr; Fischbach, Michael A.; Weber, Tilmann; Takano, Eriko; Breitling, Rainer

    2011-01-01

    Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs. To find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide var

  8. Genome sequence of a diabetes-prone rodent reveals a mutation hotspot around the ParaHox gene cluster

    DEFF Research Database (Denmark)

    Hargreaves, Adam D.; Zhou, Long; Christensen, Josef

    2017-01-01

    Pdx1 has been grossly affected by GC-biased mutation, leading to the highest divergence observed for this gene across the Bilateria. In addition to genomic insights into restricted caloric intake in a desert species, the discovery of a localized chromosomal region subject to elevated mutation suggests...

  9. FunGeneClusterS

    DEFF Research Database (Denmark)

    Vesth, Tammi Camilla; Brandl, Julian; Andersen, Mikael Rørdam

    2016-01-01

    and industrial biotechnology applications. We have previously published a method for accurate prediction of clusters from genome and transcriptome data, which could also suggest cross-chemistry, however, this method was limited both in the number of parameters which could be adjusted as well as in user......Secondary metabolites of fungi are receiving an increasing amount of interest due to their prolific bioactivities and the fact that fungal biosynthesis of secondary metabolites often occurs from co-regulated and co-located gene clusters. This makes the gene clusters attractive for synthetic biology...

  10. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis

    OpenAIRE

    Noar, Roslyn D.; Daub, Margaret E.

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the numb...

  11. Lateral transfer of the lux gene cluster.

    Science.gov (United States)

    Kasai, Sabu; Okada, Kazuhisa; Hoshino, Akinori; Iida, Tetsuya; Honda, Takeshi

    2007-02-01

    The lux operon is an uncommon gene cluster. To find the pathway through which the operon has been transferred, we sequenced the operon and both flanking regions in four typical luminous species. In Vibrio cholerae NCIMB 41, a five-gene cluster, most genes of which were highly similar to orthologues present in Gram-positive bacteria, along with the lux operon, is inserted between VC1560 and VC1563, on chromosome 1. Because this entire five-gene cluster is present in Photorhabdus luminescens TT01, about 1.5 Mbp upstream of the operon, we deduced that the operon and the gene cluster were transferred from V. cholerae to an ancestor of Pr. luminescens. Because in both V. fischeri and Shewanella hanedai, luxR and luxI were found just upstream of the operon, we concluded that the operon was transferred from either species to the other. Because most of the genes flanking the operon were highly similar to orthologues present on chromosome 2 of vibrios, we speculated that the operon of most species is located on this chromosome. The undigested genomic DNAs of five luminous species were analysed by pulsed-field gel electrophoresis and Southern hybridization. In all the species except V. cholerae, the operons are located on chromosome 2.

  12. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Directory of Open Access Journals (Sweden)

    Roslyn D Noar

    Full Text Available Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that

  13. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    Science.gov (United States)

    Noar, Roslyn D; Daub, Margaret E

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  14. A Nomadic Subtelomeric Disease Resistance Gene Cluster in Common Bean

    Science.gov (United States)

    The B4 resistance (R)-gene cluster, located in subtelomeric region of chromosome 4, is one of the largest clusters known in common bean (Phaseolus vulgaris, Pv). We sequenced 650 kb spanning this locus and annotated 97 genes, 26 of which correspond to Coiled-coil-Nucleotide-Binding-Site-Leucine-Rich...

  15. Genetic characteristics of vancomycin resistance gene cluster in Enterococcus spp.

    Science.gov (United States)

    Chunhui, Chen; Xiaogang, Xu

    2015-05-01

    Vancomycin resistant enterococci has become an important nosocomial pathogen since it is discovered in late 1980s. The products, encoded by vancomycin resistant gene cluster in enterococci, catalyze the synthesis of peptidoglycan precursors with low affinity with glycopeptide antibiotics including vancomycin and teicoplanin and lead to resistance. These vancomycin resistant gene clusters are classified into nine types according to their gene sequences and organization, or D-Ala:D-Lac (VanA, VanB, VanD and VanM) and D-Ala:D-Ser (VanC, VanE, VanG, VanL and VanN) ligase gene clusters based on the differences of their encoded ligases. Moreover, these gene clusters are characterized by their different resistance levels and infection models. In this review, we summarize the classification, gene organization and infection model of vancomycin resistant gene cluster in Enterococcus spp.

  16. Probabilistic Clustering of Sequences Inferring new bacterial regulons by comparative genomics

    CERN Document Server

    Van Nimwegen, E; Rajewsky, N; Siggia, E D; Nimwegen, Erik van; Zavolan, Mihaela; Rajewsky, Nikolaus; Siggia, Eric D.

    2002-01-01

    Genome wide comparisons between enteric bacteria yield large sets of conserved putative regulatory sites on a gene by gene basis that need to be clustered into regulons. Using the assumption that regulatory sites can be represented as samples from weight matrices we derive a unique probability distribution for assignments of sites into clusters. Our algorithm, 'PROCSE' (probabilistic clustering of sequences), uses Monte-Carlo sampling of this distribution to partition and align thousands of short DNA sequences into clusters. The algorithm internally determines the number of clusters from the data, and assigns significance to the resulting clusters. We place theoretical limits on the ability of any algorithm to correctly cluster sequences drawn from weight matrices (WMs) when these WMs are unknown. Our analysis suggests that the set of all putative sites for a single genome (e.g. E. coli) is largely inadequate for clustering. When sites from different genomes are combined and all the homologous sites from the ...

  17. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    Science.gov (United States)

    Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  18. Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

    Directory of Open Access Journals (Sweden)

    Zhimin Dai

    Full Text Available Biological nitrogen fixation is an essential function of acid mine drainage (AMD microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.

  19. Interactions between HMG proteins and the core sequence of DNaseI hypersensitive site 2 in the locus control region (LCR) of the human β-like globin gene cluster

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    HMG proteins are abundant chromosomal non-histone proteins. It has been suggested that the HMG proteins may play an important role in the structure and function of chromatin. In the present study, the binding of HMG proteins (HMG1/2 and HMG14/17) to the core DNA sequence of DNaseI hypersensitive site 2 (HS2core DNA sequence, -10681--10970 bp) in the locus control region (LCR) of the human b-like globin gene cluster has been examined by using both the in vitro nucleosome reconstitution and the gel mobility shift assays. Here we show that HMG1/2 can bind to the naked HS2core DNA sequence, however, HMG14/17 cannot. Using the in vitro nucleosome reconstitution we demonstrate that HMG14/17 can bind to the HS2core DNA sequence which is assembled into nucleosomes with the core histone octamer transferred from chicken erythrocytes. In contrast, HMG1/2 cannot bind to the nucleosomes reconstituted in vitro with the HS2core DNA sequence. These results indicate that the binding patterns between HMG proteins and the HS2core DNA sequence which exists in different states (the naked DNA or the in vitro reconstituted nucleosomal DNA) are quite different. We speculate that HMG proteins might play a critical role in the regulation of the human β-like globin gene's expression.

  20. Interactions between HMG proteins and the core sequence of DNaseI hypersensitive site 2 in the locus control region (LCR) of the human b-like globin gene cluster

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    HMG proteins are abundant chromosomal non-histone proteins. It has been suggested that the HMG proteins may play an important role in the structure and function of chromatin. In the present study, the binding of HMG proteins (HMG1/2 and HMG14/17) to the core DNA sequence of DNaseI hypersensitive site 2 (HS2core DNA sequence, -10681--10970 bp) in the locus control region (LCR) of the human b-like globin gene cluster has been examined by using both the in vitro nucleosome reconstitution and the gel mobility shift assays. Here we show that HMG1/2 can bind to the naked HS2core DNA sequence, however, HMG14/17 cannot. Using the in vitro nucleosome reconstitution we demonstrate that HMG14/17 can bind to the HS2core DNA sequence which is assembled into nucleosomes with the core histone octamer transferred from chicken erythrocytes. In contrast, HMG1/2 cannot bind to the nucleosomes reconstituted in vitro with the HS2core DNA sequence. These results indicate that the binding patterns between HMG proteins and the HS2core DNA sequence which exists in different states (the naked DNA or the in vitro reconstituted nucleosomal DNA) are quite different. We speculate that HMG proteins might play a critical role in the regulation of the human b-like globin gene's expression.

  1. Globular Cluster Systems along the Hubble Sequence

    Science.gov (United States)

    Huizinga, Edwin

    1996-07-01

    Globular Cluster Systems {GCSs} provide a powerful tool to differentiate between competing galaxy formation- and evolution scenarios. However, our current knowledge of GCS in spiral galaxies is based mainly on studies of the Galaxy and M31. Even though GCSs have been detected in other spiral galaxies, ground-based observations barely reach the peak of the Globular-Cluster luminosity function, and do not provide accurate colors. We propose a systematic study of the GCSs in 6 edge-on L* spiral galaxies beyond the Local Group, using WFPC2. These galaxies were carefully selected to meet several stringent criteria. With the new dithering techniques, it will be possible to resolve any faint background galaxies and obtain a clean sample of globular clusters for all galaxies in our sample. This will allow us to study the complete luminosity functions, {V-I} color distributions, and GCS richness for L* galaxies as a function of Hubble type {Sa, Sb, Sc}. These data will be used to study the relations between the galaxies' bulge and {thin/thick} disk properties and their GCSs. If, for example, GCS properties correlate with bulge properties, this will rule out any strong evolution along the Hubble Sequence towards earlier type spirals, from Sc to Sa, as has recently been proposed by Pfenniger et al. {1994}.

  2. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    Science.gov (United States)

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

  3. An alanine tRNA gene cluster from Nephila clavipes.

    Science.gov (United States)

    Luciano, E; Candelas, G C

    1996-06-01

    We report the sequence of a 2.3-kb genomic DNA fragment from the orb-web spider, Nephila clavipes (Nc). The fragment contains four regions of high homology to tRNA(Ala). The members of this irregularly spaced cluster of genes are oriented in the same direction and have the same anticodon (GCA), but their sequence differs at several positions. Initiation and termination signals, as well as consensus intragenic promoter sequences characteristic of tRNA genes, have been identified in all genes. tRNA(Ala) are involved in the regulation of the fibroin synthesis in the large ampullate Nc glands.

  4. Unsupervised statistical clustering of environmental shotgun sequences

    Directory of Open Access Journals (Sweden)

    Bhatnagar Srijak

    2009-10-01

    Full Text Available Abstract Background The development of effective environmental shotgun sequence binning methods remains an ongoing challenge in algorithmic analysis of metagenomic data. While previous methods have focused primarily on supervised learning involving extrinsic data, a first-principles statistical model combined with a self-training fitting method has not yet been developed. Results We derive an unsupervised, maximum-likelihood formalism for clustering short sequences by their taxonomic origin on the basis of their k-mer distributions. The formalism is implemented using a Markov Chain Monte Carlo approach in a k-mer feature space. We introduce a space transformation that reduces the dimensionality of the feature space and a genomic fragment divergence measure that strongly correlates with the method's performance. Pairwise analysis of over 1000 completely sequenced genomes reveals that the vast majority of genomes have sufficient genomic fragment divergence to be amenable for binning using the present formalism. Using a high-performance implementation, the binner is able to classify fragments as short as 400 nt with accuracy over 90% in simulations of low-complexity communities of 2 to 10 species, given sufficient genomic fragment divergence. The method is available as an open source package called LikelyBin. Conclusion An unsupervised binning method based on statistical signatures of short environmental sequences is a viable stand-alone binning method for low complexity samples. For medium and high complexity samples, we discuss the possibility of combining the current method with other methods as part of an iterative process to enhance the resolving power of sorting reads into taxonomic and/or functional bins.

  5. Mining Bacterial Genomes for Secondary Metabolite Gene Clusters.

    Science.gov (United States)

    Adamek, Martina; Spohn, Marius; Stegmann, Evi; Ziemert, Nadine

    2017-01-01

    With the emergence of bacterial resistance against frequently used antibiotics, novel antibacterial compounds are urgently needed. Traditional bioactivity-guided drug discovery strategies involve laborious screening efforts and display high rediscovery rates. With the progress in next generation sequencing methods and the knowledge that the majority of antibiotics in clinical use are produced as secondary metabolites by bacteria, mining bacterial genomes for secondary metabolites with antimicrobial activity is a promising approach, which can guide a more time and cost-effective identification of novel compounds. However, what sounds easy to accomplish, comes with several challenges. To date, several tools for the prediction of secondary metabolite gene clusters are available, some of which are based on the detection of signature genes, while others are searching for specific patterns in gene content or regulation.Apart from the mere identification of gene clusters, several other factors such as determining cluster boundaries and assessing the novelty of the detected cluster are important. For this purpose, comparison of the predicted secondary metabolite genes with different cluster and compound databases is necessary. Furthermore, it is advisable to classify detected clusters into gene cluster families. So far, there is no standardized procedure for genome mining; however, different approaches to overcome all of these challenges exist and are addressed in this chapter. We give practical guidance on the workflow for secondary metabolite gene cluster identification, which includes the determination of gene cluster boundaries, addresses problems occurring with the use of draft genomes, and gives an outlook on the different methods for gene cluster classification. Based on comprehensible examples a protocol is set, which should enable the readers to mine their own genome data for interesting secondary metabolites.

  6. Genomic Analyses of Bacterial Porin-Cytochrome Gene Clusters

    Directory of Open Access Journals (Sweden)

    Liang eShi

    2014-11-01

    Full Text Available The porin-cytochrome (Pcc protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c-type cytochrome (c-Cyt and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteria from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr gene clusters of other Fe(III-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III and Mn(IV oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular electron transfer reactions with the substrates other than Fe(III and Mn(IV oxides.

  7. Interactions between HMG proteins and the core sequence of DNaseI hypersensitive site 2 in the locus control region (LCR) of the human β-Mike globin gene cluster

    Institute of Scientific and Technical Information of China (English)

    赵晖; 张树冰; 蒋俶; 钱若兰

    2000-01-01

    HMG proteins are abundant chromosomal non-histone proteins. It has been suggested that the HMG proteins may play an important role in the structure and function of chromatin. In the present study, the binding of HMG proteins (HMG1/2 and HMG14/17) to the core DNA sequence of DNasel hypersensitive site 2 (HS2core DNA sequence, -10681-10970 bp) in the locus control region (LCR) of the human β-like globin gene cluster has been examined by using both the in vitro nucleosome reconstitution and the gel mobility shift assays. Here we show that HMG1/2 can bind to the naked HS2core DNA sequence, however, HMG 14/17 cannot. Using the in vitro nucleosome reconstitution we demonstrate that HMG14/17 can bind to the HS2core DNA sequence which is assembled into nucleosomes with the core histone octamer transferred from chicken erythrocytes. In contrast, HMG 1/2 cannot bind to the nucleosomes reconstituted in vitro with the HS2core DNA sequence. These results indicate that the binding patterns between HMG proteins and t

  8. Cluster growing process and a sequence of magic numbers

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2003-01-01

    demonstrate that in this way all known global minimum structures of the Lennard-Jones (LJ) clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence for the clusters of noble gas atoms......We present a new theoretical framework for modeling the cluster growing process. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system, and absorbing its energy at each step, we find cluster growing paths up to the cluster sizes of more than 100 atoms. We...... and compare it with experimental observations....

  9. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    Science.gov (United States)

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  10. Evolutionary conservation of regulatory elements in vertebrate HOX gene clusters

    Energy Technology Data Exchange (ETDEWEB)

    Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

    2003-12-31

    Due to their high degree of conservation, comparisons of DNA sequences among evolutionarily distantly-related genomes permit to identify functional regions in noncoding DNA. Hox genes are optimal candidate sequences for comparative genome analyses, because they are extremely conserved in vertebrates and occur in clusters. We aligned (Pipmaker) the nucleotide sequences of HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human and mouse (over 500 million years of evolutionary distance). We identified several highly conserved intergenic sequences, likely to be important in gene regulation. Only a few of these putative regulatory elements have been previously described as being involved in the regulation of Hox genes, while several others are new elements that might have regulatory functions. The majority of these newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac). The conserved intergenic regions located between the most rostrally expressed genes in the developing embryo are longer and better retained through evolution. We document that presumed regulatory sequences are retained differentially in either A or A clusters resulting from a genome duplication in the fish lineage. This observation supports both the hypothesis that the conserved elements are involved in gene regulation and the Duplication-Deletion-Complementation model.

  11. Secondary metabolic gene clusters: evolutionary toolkits for chemical innovation.

    Science.gov (United States)

    Osbourn, Anne

    2010-10-01

    Microbes and plants produce a huge array of secondary metabolites that have important ecological functions. These molecules have long been exploited in medicine as antibiotics, anticancer and anti-infective agents and for a wide range of other applications. Gene clusters for secondary metabolic pathways are common in bacteria and filamentous fungi, and examples have now been discovered in plants. Here, current knowledge of gene clusters across the kingdoms is evaluated with the aim of trying to understand the rules behind cluster existence and evolution. Such knowledge will be crucial in learning how to activate the enormous number of 'silent' gene clusters being revealed by whole-genome sequencing and hence in making available a wealth of novel compounds for evaluation as drug leads and other bioactives. It could also facilitate the development of crop plants with enhanced pest or disease resistance, improved nutritional qualities and/or elevated levels of high-value products.

  12. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    Science.gov (United States)

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  13. Predicting gene expression from sequence: a reexamination.

    Directory of Open Access Journals (Sweden)

    Yuan Yuan

    2007-11-01

    Full Text Available Although much of the information regarding genes' expressions is encoded in the genome, deciphering such information has been very challenging. We reexamined Beer and Tavazoie's (BT approach to predict mRNA expression patterns of 2,587 genes in Saccharomyces cerevisiae from the information in their respective promoter sequences. Instead of fitting complex Bayesian network models, we trained naïve Bayes classifiers using only the sequence-motif matching scores provided by BT. Our simple models correctly predict expression patterns for 79% of the genes, based on the same criterion and the same cross-validation (CV procedure as BT, which compares favorably to the 73% accuracy of BT. The fact that our approach did not use position and orientation information of the predicted binding sites but achieved a higher prediction accuracy, motivated us to investigate a few biological predictions made by BT. We found that some of their predictions, especially those related to motif orientations and positions, are at best circumstantial. For example, the combinatorial rules suggested by BT for the PAC and RRPE motifs are not unique to the cluster of genes from which the predictive model was inferred, and there are simpler rules that are statistically more significant than BT's ones. We also show that CV procedure used by BT to estimate their method's prediction accuracy is inappropriate and may have overestimated the prediction accuracy by about 10%.

  14. Clustering of gene ontology terms in genomes.

    Science.gov (United States)

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. Copyright © 2014 Elsevier B.V. All

  15. An Incremental Algorithm of Text Clustering Based on Semantic Sequences

    Institute of Scientific and Technical Information of China (English)

    FENG Zhonghui; SHEN Junyi; BAO Junpeng

    2006-01-01

    This paper proposed an incremental textclustering algorithm based on semantic sequence.Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm.The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set.

  16. Minimum Information about a Biosynthetic Gene cluster

    NARCIS (Netherlands)

    Medema, M.H.; Kottmann, Renzo; Yilmaz, Pelin; Cummings, Matthew; Biggins, J.B.; Blin, Kai; Bruijn, De Irene; Chooi, Yit Heng; Claesen, Jan; Coates, R.C.; Cruz-Morales, Pablo; Duddela, Srikanth; Düsterhus, Stephanie; Edwards, Daniel J.; Fewer, David P.; Garg, Neha; Geiger, Christoph; Gomez-Escribano, Juan Pablo; Greule, Anja; Hadjithomas, Michalis; Haines, Anthony S.; Helfrich, Eric J.N.; Hillwig, Matthew L.; Ishida, Keishi; Jones, Adam C.; Jones, Carla S.; Jungmann, Katrin; Kegler, Carsten; Kim, Hyun Uk; Kötter, Peter; Krug, Daniel; Masschelein, Joleen; Melnik, Alexey V.; Mantovani, Simone M.; Monroe, Emily A.; Moore, Marcus; Moss, Nathan; Nützmann, Hans Wilhelm; Pan, Guohui; Pati, Amrita; Petras, Daniel; Reen, F.J.; Rosconi, Federico; Rui, Zhe; Tian, Zhenhua; Tobias, Nicholas J.; Tsunematsu, Yuta; Wiemann, Philipp; Wyckoff, Elizabeth; Yan, Xiaohui; Yim, Grace; Yu, Fengan; Xie, Yunchang; Aigle, Bertrand; Apel, Alexander K.; Balibar, Carl J.; Balskus, Emily P.; Barona-Gómez, Francisco; Bechthold, Andreas; Bode, Helge B.; Borriss, Rainer; Brady, Sean F.; Brakhage, Axel A.; Caffrey, Patrick; Cheng, Yi Qiang; Clardy, Jon; Cox, Russell J.; Mot, De René; Donadio, Stefano; Donia, Mohamed S.; Donk, Van Der Wilfred A.; Dorrestein, Pieter C.; Doyle, Sean; Driessen, Arnold J.M.; Ehling-Schulz, Monika; Entian, Karl Dieter; Fischbach, Michael A.; Gerwick, Lena; Gerwick, William H.; Gross, Harald; Gust, Bertolt; Hertweck, Christian; Höfte, Monica; Jensen, Susan E.; Ju, Jianhua; Katz, Leonard; Kaysser, Leonard; Klassen, Jonathan L.; Keller, Nancy P.; Kormanec, Jan; Kuipers, Oscar P.; Kuzuyama, Tomohisa; Kyrpides, Nikos C.; Kwon, Hyung Jin; Lautru, Sylvie; Lavigne, Rob; Lee, Chia Y.; Linquan, Bai; Liu, Xinyu; Liu, Wen; Luzhetskyy, Andriy; Mahmud, Taifo; Mast, Yvonne; Méndez, Carmen; Metsä-Ketelä, Mikko; Micklefield, Jason; Mitchell, Douglas A.; Moore, Bradley S.; Moreira, Leonilde M.; Müller, Rolf; Neilan, Brett A.; Nett, Markus; Nielsen, Jens; O'Gara, Fergal; Oikawa, Hideaki; Osbourn, Anne; Osburne, Marcia S.; Ostash, Bohdan; Payne, Shelley M.; Pernodet, Jean Luc; Petricek, Miroslav; Piel, Jörn; Ploux, Olivier; Raaijmakers, Jos M.; Salas, José A.; Schmitt, Esther K.; Scott, Barry; Seipke, Ryan F.; Shen, Ben; Sherman, David H.; Sivonen, Kaarina; Smanski, Michael J.; Sosio, Margherita; Stegmann, Evi; Süssmuth, Roderich D.; Tahlan, Kapil; Thomas, Christopher M.; Tang, Yi; Truman, Andrew W.; Viaud, Muriel; Walton, Jonathan D.; Walsh, Christopher T.; Weber, Tilmann; Wezel, Van Gilles P.; Wilkinson, Barrie; Willey, Joanne M.; Wohlleben, Wolfgang; Wright, Gerard D.; Ziemert, Nadine; Zhang, Changsheng; Zotchev, Sergey B.; Breitling, Rainer; Takano, Eriko; Glöckner, Frank Oliver

    2015-01-01

    A wide variety of enzymatic pathways that produce specialized metabolites in bacteria, fungi and plants are known to be encoded in biosynthetic gene clusters. Information about these clusters, pathways and metabolites is currently dispersed throughout the literature, making it difficult to exploi

  17. Identification and structural analysis of a novel snoRNA gene cluster from Arabidopsis thaliana

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    A Z2 snoRNA gene cluster,consisting of four antisense snoRNA genes, was identified from Arabidopsis thaliana. The sequence and structural analysis showed that the Z2 snoRNA gene cluster might be transcribed as a polycistronic precursor from an upstream promoter, and the intergenic spacers of the gene cluster encode the 'hairpin' structures similar to the processing recognition signals of yeast Saccharomyces cerevisiae polycistronic snoRNA precursor. The results also revealed that plant snoRNA gene with multiple copies is a characteristic in common, and provides a good system for further revealing the transcription and expression mechanism of plant snoRNA gene cluster.

  18. Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences.

    Directory of Open Access Journals (Sweden)

    Zhang Zhang

    2009-06-01

    Full Text Available A major analytical challenge in computational biology is the detection and description of clusters of specified site types, such as polymorphic or substituted sites within DNA or protein sequences. Progress has been stymied by a lack of suitable methods to detect clusters and to estimate the extent of clustering in discrete linear sequences, particularly when there is no a priori specification of cluster size or cluster count. Here we derive and demonstrate a maximum likelihood method of hierarchical clustering. Our method incorporates a tripartite divide-and-conquer strategy that models sequence heterogeneity, delineates clusters, and yields a profile of the level of clustering associated with each site. The clustering model may be evaluated via model selection using the Akaike Information Criterion, the corrected Akaike Information Criterion, and the Bayesian Information Criterion. Furthermore, model averaging using weighted model likelihoods may be applied to incorporate model uncertainty into the profile of heterogeneity across sites. We evaluated our method by examining its performance on a number of simulated datasets as well as on empirical polymorphism data from diverse natural alleles of the Drosophila alcohol dehydrogenase gene. Our method yielded greater power for the detection of clustered sites across a breadth of parameter ranges, and achieved better accuracy and precision of estimation of clusters, than did the existing empirical cumulative distribution function statistics.

  19. Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences.

    Directory of Open Access Journals (Sweden)

    Zhang Zhang

    2009-06-01

    Full Text Available A major analytical challenge in computational biology is the detection and description of clusters of specified site types, such as polymorphic or substituted sites within DNA or protein sequences. Progress has been stymied by a lack of suitable methods to detect clusters and to estimate the extent of clustering in discrete linear sequences, particularly when there is no a priori specification of cluster size or cluster count. Here we derive and demonstrate a maximum likelihood method of hierarchical clustering. Our method incorporates a tripartite divide-and-conquer strategy that models sequence heterogeneity, delineates clusters, and yields a profile of the level of clustering associated with each site. The clustering model may be evaluated via model selection using the Akaike Information Criterion, the corrected Akaike Information Criterion, and the Bayesian Information Criterion. Furthermore, model averaging using weighted model likelihoods may be applied to incorporate model uncertainty into the profile of heterogeneity across sites. We evaluated our method by examining its performance on a number of simulated datasets as well as on empirical polymorphism data from diverse natural alleles of the Drosophila alcohol dehydrogenase gene. Our method yielded greater power for the detection of clustered sites across a breadth of parameter ranges, and achieved better accuracy and precision of estimation of clusters, than did the existing empirical cumulative distribution function statistics.

  20. Clustering of diverse replicated sequences in the MHC: Evidence for en bloc duplication

    Energy Technology Data Exchange (ETDEWEB)

    Leelayuwat, C.; Pinelli, M. [Univ. Western Australia, Perth (Australia); Dawkins, R.L. [Royal Perth Hospital (Australia)

    1995-07-15

    The MHC contains clusters of polymorphic duplicated genes and gene sequences. It has been thought that these duplicated genes and sequences have arisen from single gene duplications. We compared the cloned region between TNF and HLA-B with the region in close proximity to HLA-A using sequence analysis and DNA hybridization. The results indicate that several sequences existing in the region centromeric of HLA-B are also present in close proximity to HLA-A. These include sequences belonging to the P5, BAT1, and PERB11 gene families as well as HLA class I gene sequences. Interestingly, when the two regions of approximately 200 kilobases are compared, the replicated sequences are organized similarly but in an inverted fashion suggesting the existence of an historical inverted en bloc duplication. Thus, we propose that the origin of these MHC gene clusters involves several mechanisms. In addition to single gene replication, a long-range duplication of a genomic block must have occurred. It is possible that a block at the telomeric end of the MHC represents a basic functional genomic unit conserved and duplicated en bloc. 49 refs., 3 figs., 3 tabs.

  1. Joint Sequence Analysis: Association and Clustering

    Science.gov (United States)

    Piccarreta, Raffaella

    2017-01-01

    In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…

  2. Evolution of homeobox gene clusters in animals: the Giga-cluster and primary versus secondary clustering.

    Directory of Open Access Journals (Sweden)

    David Ellard Keith Ferrier

    2016-04-01

    Full Text Available The Hox gene cluster has been a major focus in evolutionary developmental biology. This is because of its key role in patterning animal development and widespread examples of changes in Hox genes being linked to the evolution of animal body plans and morphologies. Also, the distinctive organisation of the Hox genes into genomic clusters in which the order of the genes along the chromosome corresponds to the order of their activity along the embryo, or during a developmental process, has been a further source of great interest. This is known as Colinearity, and it provides a clear link between genome organisation and the regulation of genes during development, with distinctive changes marking evolutionary transitions. The Hox genes are not alone, however. The homeobox genes are a large super-class, of which the Hox genes are only a small subset, and an ever-increasing number of further gene clusters besides the Hox are being discovered. This is of great interest because of the potential for such gene clusters to help understand major evolutionary transitions, both in terms of changes to development and morphology as well as evolution of genome organisation. However, there is uncertainty in our understanding of homeobox gene cluster evolution at present. This relates to our still rudimentary understanding of the dynamics of genome rearrangements and evolution over the evolutionary timescales being considered when we compare lineages from across the animal kingdom. A major goal is to deduce whether particular instances of clustering are primary (conserved from ancient ancestral clusters or secondary (reassortment of genes into clusters in lineage-specific fashion. The following summary of the various instances of homeobox gene clusters in animals, and the hypotheses about their evolution, provides a framework for the future resolution of this uncertainty.

  3. DNA splice site sequences clustering method for conservativeness analysis

    Institute of Scientific and Technical Information of China (English)

    Quanwei Zhang; Qinke Peng; Tao Xu

    2009-01-01

    DNA sequences that are near to splice sites have remarkable conservativeness,and many researchers have contributed to the prediction of splice site.In order to mine the underlying biological knowledge,we analyze the conservativeness of DNA splice site adjacent sequences by clustering.Firstly,we propose a kind of DNA splice site sequences clustering method which is based on DBSCAN,and use four kinds of dissimilarity calculating methods.Then,we analyze the conservative feature of the clustering results and the experimental data set.

  4. Evolution of orthologous tandemly arrayed gene clusters

    Directory of Open Access Journals (Sweden)

    Bertrand Denis

    2011-10-01

    Full Text Available Abstract Background Tandemly Arrayed Gene (TAG clusters are groups of paralogous genes that are found adjacent on a chromosome. TAGs represent an important repertoire of genes in eukaryotes. In addition to tandem duplication events, TAG clusters are affected during their evolution by other mechanisms, such as inversion and deletion events, that affect the order and orientation of genes. The DILTAG algorithm developed in 1 makes it possible to infer a set of optimal evolutionary histories explaining the evolution of a single TAG cluster, from an ancestral single gene, through tandem duplications (simple or multiple, direct or inverted, deletions and inversion events. Results We present a general methodology, which is an extension of DILTAG, for the study of the evolutionary history of a set of orthologous TAG clusters in multiple species. In addition to the speciation events reflected by the phylogenetic tree of the considered species, the evolutionary events that are taken into account are simple or multiple tandem duplications, direct or inverted, simple or multiple deletions, and inversions. We analysed the performance of our algorithm on simulated data sets and we applied it to the protocadherin gene clusters of human, chimpanzee, mouse and rat. Conclusions Our results obtained on simulated data sets showed a good performance in inferring the total number and size distribution of duplication events. A limitation of the algorithm is however in dealing with multiple gene deletions, as the algorithm is highly exponential in this case, and becomes quickly intractable.

  5. Identification of the Scopularide Biosynthetic Gene Cluster in Scopulariopsis brevicaulis

    Directory of Open Access Journals (Sweden)

    Mie Bech Lukassen

    2015-07-01

    Full Text Available Scopularide A is a promising potent anticancer lipopeptide isolated from a marine derived Scopulariopsis brevicaulis strain. The compound consists of a reduced carbon chain (3-hydroxy-methyldecanoyl attached to five amino acids (glycine, l-valine, d-leucine, l-alanine, and l-phenylalanine. Using the newly sequenced S. brevicaulis genome we were able to identify the putative biosynthetic gene cluster using genetic information from the structurally related emericellamide A from Aspergillus nidulans and W493-B from Fusarium pseudograminearum. The scopularide A gene cluster includes a nonribosomal peptide synthetase (NRPS1, a polyketide synthase (PKS2, a CoA ligase, an acyltransferase, and a transcription factor. Homologous recombination was low in S. brevicaulis so the local transcription factor was integrated randomly under a constitutive promoter, which led to a three to four-fold increase in scopularide A production. This indirectly verifies the identity of the proposed biosynthetic gene cluster.

  6. Customer Clustering Based on Customer Purchasing Sequence Data

    Directory of Open Access Journals (Sweden)

    Yen-Chung Liu

    2017-01-01

    Full Text Available Customer clustering has become a priority for enterprises because of the importance of customer relationship management. Customer clustering can improve understanding of the composition and characteristics of customers, thereby enabling the creation of appropriate marketing strategies for each customer group. Previously, different customer clustering approaches have been proposed according to data type, namely customer profile data, customer value data, customer transaction data, and customer purchasing sequence data. This paper considers the customer clustering problem in the context of customer purchasing sequence data. However, two major aspects distinguish this paper from past research: (1 in our model, a customer sequence contains itemsets, which is a more realistic configuration than previous models, which assume a customer sequence would merely consist of items; and (2 in our model, a customer may belong to multiple clusters or no cluster, whereas in existing models a customer is limited to only one cluster. The second difference implies that each cluster discovered using our model represents a crucial type of customer behavior and that a customer can exhibit several types of behavior simultaneously. Finally, extensive experiments are conducted through a retail data set, and the results show that the clusters obtained by our model can provide more accurate descriptions of customer purchasing behaviors.

  7. cis sequence effects on gene expression

    Directory of Open Access Journals (Sweden)

    Jacobs Kevin

    2007-08-01

    Full Text Available Abstract Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.

  8. Coordinated evolution of co-expressed gene clusters in the Drosophila transcriptome

    Directory of Open Access Journals (Sweden)

    Jones Corbin D

    2008-01-01

    Full Text Available Abstract Background Co-expression of genes that physically cluster together is a common characteristic of eukaryotic transcriptomes. This organization of transcriptomes suggests that coordinated evolution of gene expression for clustered genes may also be common. Clusters where expression evolution of each gene is not independent of their neighbors are important units for understanding transcriptome evolution. Results We used a common microarray platform to measure gene expression in seven closely related species in the Drosophila melanogaster subgroup, accounting for confounding effects of sequence divergence. To summarize the correlation structure among genes in a chromosomal region, we analyzed the fraction of variation along the first principal component of the correlation matrix. We analyzed the correlation for blocks of consecutive genes to assess patterns of correlation that may be manifest at different scales of coordinated expression. We find that expression of physically clustered genes does evolve in a coordinated manner in many locations throughout the genome. Our analysis shows that relatively few of these clusters are near heterochromatin regions and that these clusters tend to be over-dispersed relative to the rest of the genome. This suggests that these clusters are not the byproduct of local gene clustering. We also analyzed the pattern of co-expression among neighboring genes within a single Drosophila species: D. simulans. For the co-expression clusters identified within this species, we find an under-representation of genes displaying a signature of recurrent adaptive amino acid evolution consistent with previous findings. However, clusters displaying co-evolution of expression among species are enriched for adaptively evolving genes. This finding points to a tie between adaptive sequence evolution and evolution of the transcriptome. Conclusion Our results demonstrate that co-evolution of expression in gene clusters is

  9. A maize-specifically expressed gene cluster in Ustilago maydis.

    Science.gov (United States)

    Basse, Christoph W; Kolb, Sebastian; Kahmann, Regine

    2002-01-01

    The corn pathogen Ustilago maydis requires its host plant maize for development and completion of its sexual cycle. We have identified the fungal mig2-1 gene as being specifically expressed during this biotrophic stage. Intriguingly, mig2-1 is part of a gene cluster comprising five highly homologous and similarly regulated genes designated mig2-1 to mig2-5. Deletion analysis of the mig2-1 promoter provides evidence for negative and positive regulation. The predicted polypeptides of all five genes lack significant homologies to known genes but have characteristic N-terminal secretion sequences. The secretion signals of mig2-1 and mig2-5 were shown to be functional, and secretion of a full length Mig2-1-eGFP fusion protein to the extracellular space was demonstrated. The central domains of the Mig2 proteins are highly variable whereas the C-termini are strongly conserved and share a characteristic pattern of eight cysteine residues. The mig2 gene cluster was conserved in a wide collection of U. maydis strains. Interestingly, some U. maydis isolates from South America had lost the mig2-4 gene as a result of a homologous recombination event. Furthermore, the related Ustilago scitaminea strain, which is pathogenic on sugar cane, appears to lack the mig2 cluster. We describe a model of how the mig2 cluster might have evolved and discuss its possible role in governing host interaction.

  10. The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering

    Directory of Open Access Journals (Sweden)

    Khalid Jaber

    2009-01-01

    Full Text Available Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced a maximal cliques algorithm for protein clustering. Approach: In this study we adapted the maximal cliques algorithm of Mohseni-Zadeh to find cliques in protein sequences and we then parallelized the algorithm to improve computation times and allowed large protein databases to be processed. We used the N-Gram Hirschberg approach proposed by Abdul Rashid to calculate the distance between protein sequences. The task farming parallel program model was used to parallelize the enhanced cliques algorithm. Results: Our parallel maximal cliques algorithm was implemented on the stealth cluster using the C programming language and a hybrid approach that includes both the Message Passing Interface (MPI library and POSIX threads (PThread to accelerate protein sequence clustering. Conclusion: Our results showed a good speedup over sequential algorithms for cliques in protein sequences.

  11. Protein sequence for clustering DNA based on Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Gamal. F. Elhadi

    2012-01-01

    Full Text Available DNA is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. Clustering is a process that groups a set of objects into clusters so that the similarity among objects in the same cluster is high, while that among the objects in different clusters is low. In this paper, we proposed an approach for clustering DNA sequences using Self-Organizing Map (SOM algorithm and Protein Sequence. The main objective is to analyze biological data and to bunch DNA to many clusters more easily and efficiently. We use the proposed approach to analyze both large and small amount of input DNA sequences. The results show that the similarity of the sequences does not depend on the amount of input sequences. Our approach depends on evaluating the degree of the DNA sequences similarity using the hierarchal representation Dendrogram. Representing large amount of data using hierarchal tree gives the ability to compare large sequences efficiently

  12. Unusual Gene Order and Organization of the Sea Urchin Hox Cluster

    OpenAIRE

    Richardson, Paul M.; Lucas, Susan; Cameron, R. Andrew; Rowen, Lee; Nesbitt, Ryan; Bloom, Scott; Rast, Jonathan P.; Berney, Kevin; Arenas-Mena, Cesar; Martinez, Pedro; Davidson, Eric H.; Peterson, Kevin J.; Hood, Leroy

    2005-01-01

    The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and...

  13. Filtering Genes for Cluster and Network Analysis

    Directory of Open Access Journals (Sweden)

    Parkhomenko Elena

    2009-06-01

    Full Text Available Abstract Background Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias. Results This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks. Conclusion The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.

  14. Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

    OpenAIRE

    Zhimin Dai; Xue Guo; Huaqun Yin; Yili Liang; Jing Cong; Xueduan Liu

    2014-01-01

    Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large...

  15. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  16. Unique nucleotide polymorphism of ankyrin gene cluster in Arabidopsis

    Indian Academy of Sciences (India)

    Jianchang Du; Xingna Wang; Mingsheng Zhang; Dacheng Tian; Yong-Hua Yang

    2007-01-01

    The ankyrin (ANK) gene cluster is a part of a multigene family encoding ANK transmembrane proteins in Arabidopsis thaliana, and plays an important role in protein–protein interactions and in signal pathways. In contrast to other regions of a genome, the ANK gene cluster exhibits an extremely high level of DNA polymorphism in an ∼5-kb region, without apparent decay. Phylogenetic analysis detects two clear, deeply differentiated haplotypes (dimorphism). The divergence between haplotypes of accession Col-0 and Ler-0 (Hap-C and Hap-L) is estimated to be 10.7%, approximately equal to the 10.5% average divergence between A. thaliana and A. lyrata. Sequence comparisons for the ANK gene cluster homologues in Col-0 indicate that the members evolve independently, and that the similarity among paralogues is lower than between alleles. Very little intralocus recombination or gene conversion is detected in ANK regions. All these characteristics of the ANK gene cluster are consistent with a tandem gene duplication and birth-and-death process. The possible mechanisms for and implications of this elevated nucleotide variation are also discussed, including the suggestion of balancing selection.

  17. Accurate prediction of secondary metabolite gene clusters in filamentous fungi.

    Science.gov (United States)

    Andersen, Mikael R; Nielsen, Jakob B; Klitgaard, Andreas; Petersen, Lene M; Zachariasen, Mia; Hansen, Tilde J; Blicher, Lene H; Gotfredsen, Charlotte H; Larsen, Thomas O; Nielsen, Kristian F; Mortensen, Uffe H

    2013-01-02

    Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify supporting enzymes for key synthases one cluster at a time. In this study, we design and apply a DNA expression array for Aspergillus nidulans in combination with legacy data to form a comprehensive gene expression compendium. We apply a guilt-by-association-based analysis to predict the extent of the biosynthetic clusters for the 58 synthases active in our set of experimental conditions. A comparison with legacy data shows the method to be accurate in 13 of 16 known clusters and nearly accurate for the remaining 3 clusters. Furthermore, we apply a data clustering approach, which identifies cross-chemistry between physically separate gene clusters (superclusters), and validate this both with legacy data and experimentally by prediction and verification of a supercluster consisting of the synthase AN1242 and the prenyltransferase AN11080, as well as identification of the product compound nidulanin A. We have used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom.

  18. Physical and genetic map of the major nif gene cluster from Azotobacter vinelandii.

    OpenAIRE

    Jacobson, M R; Brigle, K E; Bennett, L T; Setterquist, R A; Wilson, M. S.; Cash, V L; Beynon, J.; Newton, W.E.; Dean, D R

    1989-01-01

    Determination of a 28,793-base-pair DNA sequence of a region from the Azotobacter vinelandii genome that includes and flanks the nitrogenase structural gene region was completed. This information was used to revise the previously proposed organization of the major nif cluster. The major nif cluster from A. vinelandii encodes 15 nif-specific genes whose products bear significant structural identity to the corresponding nif-specific gene products from Klebsiella pneumoniae. These genes include ...

  19. Physical and genetic map of the major nif gene cluster from Azotobacter vinelandii.

    OpenAIRE

    Jacobson, M. R.; Brigle, K E; Bennett, L T; Setterquist, R. A.; Wilson, M S; Cash, V L; Beynon, J; Newton, W E; Dean, D. R.

    1989-01-01

    Determination of a 28,793-base-pair DNA sequence of a region from the Azotobacter vinelandii genome that includes and flanks the nitrogenase structural gene region was completed. This information was used to revise the previously proposed organization of the major nif cluster. The major nif cluster from A. vinelandii encodes 15 nif-specific genes whose products bear significant structural identity to the corresponding nif-specific gene products from Klebsiella pneumoniae. These genes include ...

  20. The Nature of Red-Sequence Cluster Spiral Galaxies

    Science.gov (United States)

    Kashur, Lane; Barkhouse, Wayne; Sultanova, Madina; Kalawila Vithanage, Sandanuwa; Archer, Haylee; Foote, Gregory; Mathew, Elijah; Rude, Cody; Lopez-Cruz, Omar

    2017-01-01

    Preliminary analysis of the red-sequence galaxy population from a sample of 57 low-redshift galaxy clusters observed using the KPNO 0.9m telescope and 74 clusters from the WINGS dataset, indicates that a small fraction of red-sequence galaxies have a morphology consistent with spiral systems. For spiral galaxies to acquire the color of elliptical/S0s at a similar luminosity, they must either have been stripped of their star-forming gas at an earlier epoch, or contain a larger than normal fraction of dust. To test these ideas we have compiled a sample of red-sequence spiral galaxies and examined their infrared properties as measured by 2MASS, WISE, Spitzer, and Herschel. These IR data allows us to estimate the amount of dust in each of our red-sequence spiral galaxies. We compare the estimated dust mass in each of these red-sequence late-type galaxies with spiral galaxies located in the same cluster field but having colors inconsistent with the red-sequence. We thus provide a statistical measure to discriminate between purely passive spiral galaxy evolution and dusty spirals to explain the presence of these late-type systems in cluster red-sequences.

  1. The Biosynthetic Gene Cluster for Andrastin A in Penicillium roqueforti

    Directory of Open Access Journals (Sweden)

    Juan F. Rojas-Aedo

    2017-05-01

    Full Text Available Penicillium roqueforti is a filamentous fungus involved in the ripening of several kinds of blue cheeses. In addition, this fungus produces several secondary metabolites, including the meroterpenoid compound andrastin A, a promising antitumoral compound. However, to date the genomic cluster responsible for the biosynthesis of this compound in P. roqueforti has not been described. In this work, we have sequenced and annotated a genomic region of approximately 29.4 kbp (named the adr gene cluster that is involved in the biosynthesis of andrastin A in P. roqueforti. This region contains ten genes, named adrA, adrC, adrD, adrE, adrF, adrG, adrH, adrI, adrJ and adrK. Interestingly, the adrB gene previously found in the adr cluster from P. chrysogenum, was found as a residual pseudogene in the adr cluster from P. roqueforti. RNA-mediated gene silencing of each of the ten genes resulted in significant reductions in andrastin A production, confirming that all of them are involved in the biosynthesis of this compound. Of particular interest was the adrC gene, encoding for a major facilitator superfamily transporter. According to our results, this gene is required for the production of andrastin A but does not have any role in its secretion to the extracellular medium. The identification of the adr cluster in P. roqueforti will be important to understand the molecular basis of the production of andrastin A, and for the obtainment of strains of P. roqueforti overproducing andrastin A that might be of interest for the cheese industry.

  2. The Serratia gene cluster encoding biosynthesis of the red antibiotic, prodigiosin, shows species- and strain-dependent genome context variation

    DEFF Research Database (Denmark)

    Harris, Abigail K P; Williamson, Neil R; Slater, Holly

    2004-01-01

    The prodigiosin biosynthesis gene cluster (pig cluster) from two strains of Serratia (S. marcescens ATCC 274 and Serratia sp. ATCC 39006) has been cloned, sequenced and expressed in heterologous hosts. Sequence analysis of the respective pig clusters revealed 14 ORFs in S. marcescens ATCC 274 and...

  3. Disease gene identification strategies for exome sequencing

    NARCIS (Netherlands)

    Gilissen, C.; Hoischen, A.; Brunner, H.G.; Veltman, J.A.

    2012-01-01

    Next generation sequencing can be used to search for Mendelian disease genes in an unbiased manner by sequencing the entire protein-coding sequence, known as the exome, or even the entire human genome. Identifying the pathogenic mutation amongst thousands to millions of genomic variants is a major c

  4. The cylindrospermopsin gene cluster of Aphanizomenon sp. strain 10E6: organization and recombination.

    Science.gov (United States)

    Stüken, Anke; Jakobsen, Kjetill S

    2010-08-01

    Cylindrospermopsin (CYN), a potent hepatoxin, occurs in freshwaters worldwide. Several cyanobacterial species produce the toxin, but the producing species vary between geographical regions. Aphanizomenon flos-aquae, a common algae species in temperate fresh and brackish waters, is one of the three well-documented CYN producers in European waters. So far, no genetic information on the CYN genes of this species has been available. Here, we describe the complete CYN gene cluster, including flanking regions from the German Aphanizomenon sp. strain 10E6 using a full genome sequencing approach by 454 pyrosequencing and bioinformatic identification of the gene cluster. In addition, we have sequenced a approximately 7 kb fragment covering the genes cyrC (partially), cyrA and cyrB (partially) of the same gene cluster in the CYN-producing Aphanizomenon sp. strains 10E9 and 22D11. Comparisons with the orthologous gene clusters of the Australian Cylindrospermopsis raciborskii strains AWT205 and CS505 and the partial gene cluster of the Israeli Aphanizomenon ovalisporum strain ILC-146 revealed a high gene sequence similarity, but also extensive rearrangements of gene order. The high sequence similarity (generally higher than that of 16S rRNA gene fragments from the same strains), atypical GC-content and signs of transposase activities support the suggestion that the CYN genes have been horizontally transferred.

  5. Cluster Analysis of Gene Expression Data

    CERN Document Server

    Domany, E

    2002-01-01

    The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample - such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50 - 100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by "clustering" and how we analyze the massive amounts of data from such experiments, and present results obtained from a...

  6. Bayesian History Reconstruction of Complex Human Gene Clusters on a Phylogeny

    CERN Document Server

    Vinař, Tomáš; Song, Giltae; Siepel, Adam

    2009-01-01

    Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. Improved understanding of these clusters is of utmost importance, since they have been shown to be the source of evolutionary innovation, and have been linked to multiple diseases, including HIV and a variety of cancers. Previously, Zhang et al. (2008) developed an algorithm for reconstructing parsimonious evolutionary histories of such gene clusters, using only human genomic sequence data. In this paper, we propose a probabilistic model for the evolution of gene clusters on a phylogeny, and an MCMC algorithm for reconstruction of duplication histories from genomic sequences in multiple species. Several projects are underway to obtain high quality BAC-based assemblies of duplicated clusters in multiple species, and we anticipate that our method will be useful in analyzing these valuable new data sets.

  7. Yeast homologous recombination-based promoter engineering for the activation of silent natural product biosynthetic gene clusters.

    Science.gov (United States)

    Montiel, Daniel; Kang, Hahk-Soo; Chang, Fang-Yuan; Charlop-Powers, Zachary; Brady, Sean F

    2015-07-21

    Large-scale sequencing of prokaryotic (meta)genomic DNA suggests that most bacterial natural product gene clusters are not expressed under common laboratory culture conditions. Silent gene clusters represent a promising resource for natural product discovery and the development of a new generation of therapeutics. Unfortunately, the characterization of molecules encoded by these clusters is hampered owing to our inability to express these gene clusters in the laboratory. To address this bottleneck, we have developed a promoter-engineering platform to transcriptionally activate silent gene clusters in a model heterologous host. Our approach uses yeast homologous recombination, an auxotrophy complementation-based yeast selection system and sequence orthogonal promoter cassettes to exchange all native promoters in silent gene clusters with constitutively active promoters. As part of this platform, we constructed and validated a set of bidirectional promoter cassettes consisting of orthogonal promoter sequences, Streptomyces ribosome binding sites, and yeast selectable marker genes. Using these tools we demonstrate the ability to simultaneously insert multiple promoter cassettes into a gene cluster, thereby expediting the reengineering process. We apply this method to model active and silent gene clusters (rebeccamycin and tetarimycin) and to the silent, cryptic pseudogene-containing, environmental DNA-derived Lzr gene cluster. Complete promoter refactoring and targeted gene exchange in this "dead" cluster led to the discovery of potent indolotryptoline antiproliferative agents, lazarimides A and B. This potentially scalable and cost-effective promoter reengineering platform should streamline the discovery of natural products from silent natural product biosynthetic gene clusters.

  8. Stepwise threshold clustering: a new method for genotyping MHC loci using next-generation sequencing technology.

    Directory of Open Access Journals (Sweden)

    William E Stutz

    Full Text Available Genes of the vertebrate major histocompatibility complex (MHC are of great interest to biologists because of their important role in immunity and disease, and their extremely high levels of genetic diversity. Next generation sequencing (NGS technologies are quickly becoming the method of choice for high-throughput genotyping of multi-locus templates like MHC in non-model organisms. Previous approaches to genotyping MHC genes using NGS technologies suffer from two problems:1 a "gray zone" where low frequency alleles and high frequency artifacts can be difficult to disentangle and 2 a similar sequence problem, where very similar alleles can be difficult to distinguish as two distinct alleles. Here were present a new method for genotyping MHC loci--Stepwise Threshold Clustering (STC--that addresses these problems by taking full advantage of the increase in sequence data provided by NGS technologies. Unlike previous approaches for genotyping MHC with NGS data that attempt to classify individual sequences as alleles or artifacts, STC uses a quasi-Dirichlet clustering algorithm to cluster similar sequences at increasing levels of sequence similarity. By applying frequency and similarity based criteria to clusters rather than individual sequences, STC is able to successfully identify clusters of sequences that correspond to individual or similar alleles present in the genomes of individual samples. Furthermore, STC does not require duplicate runs of all samples, increasing the number of samples that can be genotyped in a given project. We show how the STC method works using a single sample library. We then apply STC to 295 threespine stickleback (Gasterosteus aculeatus samples from four populations and show that neighboring populations differ significantly in MHC allele pools. We show that STC is a reliable, accurate, efficient, and flexible method for genotyping MHC that will be of use to biologists interested in a variety of downstream applications.

  9. A DOUBLE MAIN SEQUENCE IN THE GLOBULAR CLUSTER NGC 6397

    Energy Technology Data Exchange (ETDEWEB)

    Milone, A. P.; Aparicio, A. [Instituto de Astrofisica de Canarias, E-38200 La Laguna, Tenerife, Canary Islands (Spain); Marino, A. F. [Max Planck Institute for Astrophysics, Postfach 1317, D-85741 Garching (Germany); Piotto, G. [Dipartimento di Astronomia, Universita di Padova, Vicolo dell' Osservatorio 3, Padova I-35122 (Italy); Bedin, L. R.; Anderson, J. [Space Telescope Science Institute, 3800 San Martin Drive, Baltimore, MD 21218 (United States); Cassisi, S. [INAF-Osservatorio Astronomico di Collurania, via Mentore Maggini, I-64100 Teramo (Italy); Rich, R. M., E-mail: milone@iac.es, E-mail: aparicio@iac.es, E-mail: amarino@MPA-Garching.MPG.DE, E-mail: giampaolo.piotto@unipd.it, E-mail: jayander@stsci.edu, E-mail: bedin@stsci.edu, E-mail: cassisi@oa-teramo.inaf.it, E-mail: rmr@astro.ucla.edu [Division of Astronomy and Astrophysics, University of California, Los Angeles, 430 Portola Plaza, Box 951547, Los Angeles, CA 90095-1547 (United States)

    2012-01-20

    High-precision multi-band Hubble Space Telescope (HST) photometry reveals that the main sequence of the globular cluster NGC 6397 splits into two components, containing {approx}30% and {approx}70% of the stars. This double sequence is consistent with the idea that the cluster hosts two stellar populations: (1) a primordial population that has a composition similar to field stars, containing {approx}30% of the stars, and (2) a second generation with enhanced sodium and nitrogen, depleted carbon and oxygen, and a slightly enhanced helium abundance ({Delta}Y {approx} 0.01). We examine the color difference between the two sequences across a variety of color baselines and find that the second sequence is anomalously faint in m{sub F336W}. Theoretical isochrones indicate that this could be due to NH depletion.

  10. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  11. Cosmological Constraints from the Red-Sequence Cluster Survey

    CERN Document Server

    Gladders, M D; Hall, P B; Hoekstra, H; Infante, L; Majumdar, S; Yee, H K C; Gladders, Michael D.; Hall, Patrick B.; Hoekstra, Henk; Infante, Leopoldo; Majumdar, Subhabrata

    2006-01-01

    [abridged] We present a first cosmological analysis of a refined cluster catalog from the Red-Sequence Cluster Survey (RCS). The input cluster sample is derived from 72.07 square degrees of imaging data [...] The catalog contains 956 clusters over 0.35cluster richness and richness error. The calibration of the survey images has been extensively cross-checked against publicly available Sloan Digital Sky Survey imaging [...] We analyze the cluster sample via a general self-calibration technique including scatter in the mass-richness relation [...]. We fit simultaneously for Omega_M and sigma_8, and four parameters describing the calibration of cluster richness to mass, its evolution with redshift, and scatter in the richness-mass relation. The principal goal of this general analysis is to establish the consistency (or lack thereof) between the fitted parameters (both cosmological and cluster mass observables) and available results on both from independent measures. From an unconstraine...

  12. Evolution and differential expression of a vertebrate vitellogenin gene cluster

    Directory of Open Access Journals (Sweden)

    Kongshaug Heidi

    2009-01-01

    Full Text Available Abstract Background The multiplicity or loss of the vitellogenin (vtg gene family in vertebrates has been argued to have broad implications for the mode of reproduction (placental or non-placental, cleavage pattern (meroblastic or holoblastic and character of the egg (pelagic or benthic. Earlier proposals for the existence of three forms of vertebrate vtgs present conflicting models for their origin and subsequent duplication. Results By integrating phylogenetics of novel vtg transcripts from old and modern teleosts with syntenic analyses of all available genomic variants of non-metatherian vertebrates we identify the gene orthologies between the Sarcopterygii (tetrapod branch and Actinopterygii (fish branch. We argue that the vertebrate vtg gene cluster originated in proto-chromosome m, but that vtg genes have subsequently duplicated and rearranged following whole genome duplications. Sequencing of a novel fourth vtg transcript in labrid species, and the presence of duplicated paralogs in certain model organisms supports the notion that lineage-specific gene duplications frequently occur in teleosts. The data show that the vtg gene cluster is more conserved between acanthomorph teleosts and tetrapods, than in ostariophysan teleosts such as the zebrafish. The differential expression of the labrid vtg genes are further consistent with the notion that neofunctionalized Aa-type vtgs are important determinants of the pelagic or benthic character of the eggs in acanthomorph teleosts. Conclusion The vertebrate vtg gene cluster existed prior to the separation of Sarcopterygii from Actinopterygii >450 million years ago, a period associated with the second round of whole genome duplication. The presence of higher copy numbers in a more highly expressed subcluster is particularly prevalent in teleosts. The differential expression and latent neofunctionalization of vtg genes in acanthomorph teleosts is an adaptive feature associated with oocyte hydration

  13. Motif-independent de novo detection of secondary metabolite gene clusters-toward identification from filamentous fungi.

    Science.gov (United States)

    Umemura, Myco; Koike, Hideaki; Machida, Masayuki

    2015-01-01

    Secondary metabolites are produced mostly by clustered genes that are essential to their biosynthesis. The transcriptional expression of these genes is often cooperatively regulated by a transcription factor located inside or close to a cluster. Most of the secondary metabolism biosynthesis (SMB) gene clusters identified to date contain so-called core genes with distinctive sequence features, such as polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS). Recent efforts in sequencing fungal genomes have revealed far more SMB gene clusters than expected based on the number of core genes in the genomes. Several bioinformatics tools have been developed to survey SMB gene clusters using the sequence motif information of the core genes, including SMURF and antiSMASH. More recently, accompanied by the development of sequencing techniques allowing to obtain large-scale genomic and transcriptomic data, motif-independent prediction methods of SMB gene clusters, including MIDDAS-M, have been developed. Most these methods detect the clusters in which the genes are cooperatively regulated at transcriptional levels, thus allowing the identification of novel SMB gene clusters regardless of the presence of the core genes. Another type of the method, MIPS-CG, uses the characteristics of SMB genes, which are highly enriched in non-syntenic blocks (NSBs), enabling the prediction even without transcriptome data although the results have not been evaluated in detail. Considering that large portion of SMB gene clusters might be sufficiently expressed only in limited uncommon conditions, it seems that prediction of SMB gene clusters by bioinformatics and successive experimental validation is an only way to efficiently uncover hidden SMB gene clusters. Here, we describe and discuss possible novel approaches for the determination of SMB gene clusters that have not been identified using conventional methods.

  14. Genome Sequences of Newly Isolated Mycobacteriophages Forming Cluster S.

    Science.gov (United States)

    Mills, Monique L; Bragg, Judd; Bruce, Asri; Dehn, Ari; Drouin, Jordan; Hefner, Morgan; Katon, Dylan; McHugh, Dustin; Zeba, Franck; Bowman, Charles A; Cresawn, Steven G; Jacobs-Sera, Deborah; Russell, Daniel A; Pope, Welkin H; Hatfull, Graham F; Dunbar, David A; Zegers, Gerard P; Page, Shallee T

    2016-09-29

    We describe the genomes of two mycobacteriophages, MosMoris and Gattaca, newly isolated on Mycobacterium smegmatis The two phages are very similar to each other, differing in 61 single nucleotide polymorphisms and six small insertion/deletions. Both have extensive nucleotide sequence similarity to mycobacteriophage Marvin and together form cluster S. Copyright © 2016 Mills et al.

  15. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  16. Characterization of the fumonisin B2 biosynthetic gene cluster in Aspergillus niger and A. awamori.

    Science.gov (United States)

    Aspergillus niger and A. awamori strains isolated from grapes cultivated in Mediterranean basin were examined for fumonisin B2 (FB2) production and presence/absence of sequences within the fumonisin biosynthetic gene (fum) cluster. Presence of 13 regions in the fum cluster was evaluated by PCR assay...

  17. Detecting genomic clustering of risk variants from sequence data: cases versus controls.

    Science.gov (United States)

    Schaid, Daniel J; Sinnwell, Jason P; McDonnell, Shannon K; Thibodeau, Stephen N

    2013-11-01

    As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method--Tango's statistic--to genomic sequence data. An advantage of Tango's method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ(2) distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango's statistic, which we call "Kernel Distance" statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff's scan statistic had the greatest power over a range of clustering scenarios.

  18. Gene Expression Data Knowledge Discovery using Global and Local Clustering

    CERN Document Server

    H, Swathi

    2010-01-01

    To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. A validation technique, Figure of Merit is used to determine the quality of clustering results. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. To determine the quality of clustering results, a validation technique, Figure of Merit is used. Appropriate ...

  19. A phylogenomic gene cluster resource: The phylogeneticallyinferred groups (PhlGs) database

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir S.; Boore, Jeffrey L.

    2005-08-25

    We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community.

  20. Red Sequence Cluster Finding in the Millennium Simulation

    CERN Document Server

    Cohn, J D; White, M; Croton, D; Ellingson, E

    2007-01-01

    We investigate halo mass selection properties of red-sequence cluster finders using galaxy populations of the Millennium Simulation (MS). A clear red sequence exists for MS galaxies in massive halos at redshifts z < 1, and we use this knowledge to inform a cluster-finding algorithm applied to 500 Mpc/h projections of the simulated volume. At low redshift (z = 0.4), we find that 90% of the clusters found have galaxy membership dominated by a single, real-space halo, and that 10% are blended systems for which no single halo contributes a majority of a cluster's membership. At z=1, the fraction of blends increases to \\sim 20%, as weaker redshift evolution in observed color extends the comoving length probed by a fixed color cut. Other factors contributing to the high-z increase include broadening of the red sequence and increased confusion from a larger number of intermediate mass halos hosting bright red galaxies of magnitude similar to those in higher mass halos. We show that a bimodal, log-normal model des...

  1. GENE SEQUENCE HOMOLOGY OF CHEMOKINES ACROSS SPECIES

    Science.gov (United States)

    The abundance of expressed gene and protein sequences available in the biological information databases facilitates comparison of protein homologies. A high degree of sequence similarity typically implies homology regarding structure and function and may provide clues to antibody cross-react...

  2. Protein sequences clustering of herpes virus by using Tribe Markov clustering (Tribe-MCL)

    Science.gov (United States)

    Bustamam, A.; Siswantining, T.; Febriyani, N. L.; Novitasari, I. D.; Cahyaningrum, R. D.

    2017-07-01

    The herpes virus can be found anywhere and one of the important characteristics is its ability to cause acute and chronic infection at certain times so as a result of the infection allows severe complications occurred. The herpes virus is composed of DNA containing protein and wrapped by glycoproteins. In this work, the Herpes viruses family is classified and analyzed by clustering their protein-sequence using Tribe Markov Clustering (Tribe-MCL) algorithm. Tribe-MCL is an efficient clustering method based on the theory of Markov chains, to classify protein families from protein sequences using pre-computed sequence similarity information. We implement the Tribe-MCL algorithm using an open source program of R. We select 24 protein sequences of Herpes virus obtained from NCBI database. The dataset consists of three types of glycoprotein B, F, and H. Each type has eight herpes virus that infected humans. Based on our simulation using different inflation factor r=1.5, 2, 3 we find a various number of the clusters results. The greater the inflation factor the greater the number of their clusters. Each protein will grouped together in the same type of protein.

  3. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  4. Arrangement of the Clostridium baratii F7 toxin gene cluster with identification of a σ factor that recognizes the botulinum toxin gene cluster promoters.

    Science.gov (United States)

    Dover, Nir; Barash, Jason R; Burke, Julianne N; Hill, Karen K; Detter, John C; Arnon, Stephen S

    2014-01-01

    Botulinum neurotoxin (BoNT) is the most poisonous substances known and its eight toxin types (A to H) are distinguished by the inability of polyclonal antibodies that neutralize one toxin type to neutralize any of the other seven toxin types. Infant botulism, an intestinal toxemia orphan disease, is the most common form of human botulism in the United States. It results from swallowed spores of Clostridium botulinum (or rarely, neurotoxigenic Clostridium butyricum or Clostridium baratii) that germinate and temporarily colonize the lumen of the large intestine, where, as vegetative cells, they produce botulinum toxin. Botulinum neurotoxin is encoded by the bont gene that is part of a toxin gene cluster that includes several accessory genes. We sequenced for the first time the complete botulinum neurotoxin gene cluster of nonproteolytic C. baratii type F7. Like the type E and the nonproteolytic type F6 botulinum toxin gene clusters, the C. baratii type F7 had an orfX toxin gene cluster that lacked the regulatory botR gene which is found in proteolytic C. botulinum strains and codes for an alternative σ factor. In the absence of botR, we identified a putative alternative regulatory gene located upstream of the C. baratii type F7 toxin gene cluster. This putative regulatory gene codes for a predicted σ factor that contains DNA-binding-domain homologues to the DNA-binding domains both of BotR and of other members of the TcdR-related group 5 of the σ70 family that are involved in the regulation of toxin gene expression in clostridia. We showed that this TcdR-related protein in association with RNA polymerase core enzyme specifically binds to the C. baratii type F7 botulinum toxin gene cluster promoters. This TcdR-related protein may therefore be involved in regulating the expression of the genes of the botulinum toxin gene cluster in neurotoxigenic C. baratii.

  5. Global Analysis of miRNA Gene Clusters and Gene Families Reveals Dynamic and Coordinated Expression

    Directory of Open Access Journals (Sweden)

    Li Guo

    2014-01-01

    Full Text Available To further understand the potential expression relationships of miRNAs in miRNA gene clusters and gene families, a global analysis was performed in 4 paired tumor (breast cancer and adjacent normal tissue samples using deep sequencing datasets. The compositions of miRNA gene clusters and families are not random, and clustered and homologous miRNAs may have close relationships with overlapped miRNA species. Members in the miRNA group always had various expression levels, and even some showed larger expression divergence. Despite the dynamic expression as well as individual difference, these miRNAs always indicated consistent or similar deregulation patterns. The consistent deregulation expression may contribute to dynamic and coordinated interaction between different miRNAs in regulatory network. Further, we found that those clustered or homologous miRNAs that were also identified as sense and antisense miRNAs showed larger expression divergence. miRNA gene clusters and families indicated important biological roles, and the specific distribution and expression further enrich and ensure the flexible and robust regulatory network.

  6. Ultra-fast sequence clustering from similarity networks with SiLiX

    Directory of Open Access Journals (Sweden)

    Duret Laurent

    2011-04-01

    Full Text Available Abstract Background The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time. Results We present the software package SiLiX that implements a novel method which reconsiders single linkage clustering with a graph theoretical approach. A parallel version of the algorithms is also presented. As a demonstration of the ability of our software, we clustered more than 3 millions sequences from about 2 billion BLAST hits in 7 minutes, with a high clustering quality, both in terms of sensitivity and specificity. Conclusions Comparing state-of-the-art software, SiLiX presents the best up-to-date capabilities to face the problem of clustering large collections of sequences. SiLiX is freely available at http://lbbe.univ-lyon1.fr/SiLiX.

  7. Developmental expression and gene/enzyme identifications in the alpha esterase gene cluster of Drosophila melanogaster.

    Science.gov (United States)

    Campbell, P M; de Q Robin, G C; Court, L N; Dorrian, S J; Russell, R J; Oakeshott, J G

    2003-10-01

    Here we show how the 10 genes of the alpha esterase cluster of Drosophila melanogaster have diverged substantially in their expression profiles. Together with previously described sequence divergence this suggests substantial functional diversification. By peptide mass fingerprinting and in vitro gene expression we have also shown that two of the genes encode the isozymes EST9 (formerly ESTC) and EST23. EST9 is the major 'alpha staining' esterase in zymograms of gut tissues in feeding stages while orthologues of EST23 confer resistance to organophosphorus insecticides in other higher Diptera. The results for EST9 and EST23 concur with previous suggestions that the products of the alpha esterase cluster function in digestion and detoxification of xenobiotic esters. However, many of the other genes in the cluster show developmental or tissue-specific expression that seems inconsistent with such roles. Furthermore, there is generally poor correspondence between the mRNA expression patterns of the remaining eight genes and isozymes previously characterized by standard techniques of electrophoresis and staining, suggesting that the alpha cluster might only account for a small minority of the esterase isozyme profile.

  8. Natural product proteomining, a quantitative proteomics platform, allows rapid discovery of biosynthetic gene clusters for different classes of natural products.

    Science.gov (United States)

    Gubbens, Jacob; Zhu, Hua; Girard, Geneviève; Song, Lijiang; Florea, Bogdan I; Aston, Philip; Ichinose, Koji; Filippov, Dmitri V; Choi, Young H; Overkleeft, Herman S; Challis, Gregory L; van Wezel, Gilles P

    2014-06-19

    Information on gene clusters for natural product biosynthesis is accumulating rapidly because of the current boom of available genome sequencing data. However, linking a natural product to a specific gene cluster remains challenging. Here, we present a widely applicable strategy for the identification of gene clusters for specific natural products, which we name natural product proteomining. The method is based on using fluctuating growth conditions that ensure differential biosynthesis of the bioactivity of interest. Subsequent combination of metabolomics and quantitative proteomics establishes correlations between abundance of natural products and concomitant changes in the protein pool, which allows identification of the relevant biosynthetic gene cluster. We used this approach to elucidate gene clusters for different natural products in Bacillus and Streptomyces, including a novel juglomycin-type antibiotic. Natural product proteomining does not require prior knowledge of the gene cluster or secondary metabolite and therefore represents a general strategy for identification of all types of gene clusters.

  9. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David

    2012-01-01

    more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...

  10. Characterization of a major cluster of nif, fix, and associated genes in a sugarcane endophyte, Acetobacter diazotrophicus.

    Science.gov (United States)

    Lee, S; Reth, A; Meletzus, D; Sevilla, M; Kennedy, C

    2000-12-01

    A major 30.5-kb cluster of nif and associated genes of Acetobacter diazotrophicus (syn. Gluconacetobacter diazotrophicus), a nitrogen-fixing endophyte of sugarcane, was sequenced and analyzed. This cluster represents the largest assembly of contiguous nif-fix and associated genes so far characterized in any diazotrophic bacterial species. Northern blots and promoter sequence analysis indicated that the genes are organized into eight transcriptional units. The overall arrangement of genes is most like that of the nif-fix cluster in Azospirillum brasilense, while the individual gene products are more similar to those in species of Rhizobiaceae or in Rhodobacter capsulatus.

  11. Identification and structural analysis of a novel snoRNA gene cluster from Arabidopsis thaliana

    Institute of Scientific and Technical Information of China (English)

    周惠; 孟清; 屈良鹄

    2000-01-01

    A 22 snoRNA gene cluster, consisting of four antisense snoRNA genes, was identified from Arabidopsis thaliana. The sequence and structural analysis showed that the 22 snoRNA gene cluster might be transcribed as a polycistronic precursor from an upstream promoter, and the in-tergenic spacers of the gene cluster encode the ’hairpin’ structures similar to the processing recognition signals of yeast Saccharomyces cerevisiae polycistronic snoRNA precursor. The results also revealed that plant snoRNA gene with multiple copies is a characteristic in common, and provides a good system for further revealing the transcription and expression mechanism of plant snoRNA gene cluster.

  12. Gene ordering in partitive clustering using microarray expressions.

    Science.gov (United States)

    Ray, Shubhra Sankar; Bandyopadhyay, Sanghamitra; Pal, Sankar K

    2007-08-01

    A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarray gene expressions.Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution.

  13. Gene ordering in partitive clustering using microarray expressions

    Indian Academy of Sciences (India)

    Shubhra Sankar Ray; Sanghamitra Bandyopadhyay; Sankar K Pal

    2007-08-01

    A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarray gene expressions. Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution.

  14. A discriminative approach for unsupervised clustering of DNA sequence motifs.

    Directory of Open Access Journals (Sweden)

    Philip Stegmaier

    Full Text Available Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.

  15. A putative gene cluster from a Lyngbya wollei bloom that encodes paralytic shellfish toxin biosynthesis.

    Directory of Open Access Journals (Sweden)

    Troco K Mihali

    Full Text Available Saxitoxin and its analogs cause the paralytic shellfish-poisoning syndrome, adversely affecting human health and coastal shellfish industries worldwide. Here we report the isolation, sequencing, annotation, and predicted pathway of the saxitoxin biosynthetic gene cluster in the cyanobacterium Lyngbya wollei. The gene cluster spans 36 kb and encodes enzymes for the biosynthesis and export of the toxins. The Lyngbya wollei saxitoxin gene cluster differs from previously identified saxitoxin clusters as it contains genes that are unique to this cluster, whereby the carbamoyltransferase is truncated and replaced by an acyltransferase, explaining the unique toxin profile presented by Lyngbya wollei. These findings will enable the creation of toxin probes, for water monitoring purposes, as well as proof-of-concept for the combinatorial biosynthesis of these natural occurring alkaloids for the production of novel, biologically active compounds.

  16. The rise of operon-like gene clusters in plants.

    Science.gov (United States)

    Boycheva, Svetlana; Daviet, Laurent; Wolfender, Jean-Luc; Fitzpatrick, Teresa B

    2014-07-01

    Gene clusters are common features of prokaryotic genomes also present in eukaryotes. Most clustered genes known are involved in the biosynthesis of secondary metabolites. Although horizontal gene transfer is a primary source of prokaryotic gene cluster (operon) formation and has been reported to occur in eukaryotes, the predominant source of cluster formation in eukaryotes appears to arise de novo or through gene duplication followed by neo- and sub-functionalization or translocation. Here we aim to provide an overview of the current knowledge and open questions related to plant gene cluster functioning, assembly, and regulation. We also present potential research approaches and point out the benefits of a better understanding of gene clusters in plants for both fundamental and applied plant science.

  17. Clostridium botulinum strain Af84 contains three neurotoxin gene clusters: bont/A2, bont/F4 and bont/F5.

    Directory of Open Access Journals (Sweden)

    Nir Dover

    Full Text Available Sanger and shotgun sequencing of Clostridium botulinum strain Af84 type Af and its botulinum neurotoxin gene (bont clusters identified the presence of three bont gene clusters rather than the expected two. The three toxin gene clusters consisted of bont subtypes A2, F4 and F5. The bont/A2 and bont/F4 gene clusters were located within the chromosome (the latter in a novel location, while the bont/F5 toxin gene cluster was located within a large 246 kb plasmid. These findings are the first identification of a C. botulinum strain that contains three botulinum neurotoxin gene clusters.

  18. Horizontal transfer of a nitrate assimilation gene cluster and ecological transitions in fungi: a phylogenetic study.

    Directory of Open Access Journals (Sweden)

    Jason C Slot

    Full Text Available High affinity nitrate assimilation genes in fungi occur in a cluster (fHANT-AC that can be coordinately regulated. The clustered genes include nrt2, which codes for a high affinity nitrate transporter; euknr, which codes for nitrate reductase; and NAD(PH-nir, which codes for nitrite reductase. Homologs of genes in the fHANT-AC occur in other eukaryotes and prokaryotes, but they have only been found clustered in the oomycete Phytophthora (heterokonts. We performed independent and concatenated phylogenetic analyses of homologs of all three genes in the fHANT-AC. Phylogenetic analyses limited to fungal sequences suggest that the fHANT-AC has been transferred horizontally from a basidiomycete (mushrooms and smuts to an ancestor of the ascomycetous mold Trichoderma reesei. Phylogenetic analyses of sequences from diverse eukaryotes and eubacteria, and cluster structure, are consistent with a hypothesis that the fHANT-AC was assembled in a lineage leading to the oomycetes and was subsequently transferred to the Dikarya (Ascomycota+Basidiomycota, which is a derived fungal clade that includes the vast majority of terrestrial fungi. We propose that the acquisition of high affinity nitrate assimilation contributed to the success of Dikarya on land by allowing exploitation of nitrate in aerobic soils, and the subsequent transfer of a complete assimilation cluster improved the fitness of T. reesei in a new niche. Horizontal transmission of this cluster of functionally integrated genes supports the "selfish operon" hypothesis for maintenance of gene clusters.

  19. Identification and characterization of a novel diterpene gene cluster in Aspergillus nidulans.

    Directory of Open Access Journals (Sweden)

    Kirsi Bromann

    Full Text Available Fungal secondary metabolites are a rich source of medically useful compounds due to their pharmaceutical and toxic properties. Sequencing of fungal genomes has revealed numerous secondary metabolite gene clusters, yet products of many of these biosynthetic pathways are unknown since the expression of the clustered genes usually remains silent in normal laboratory conditions. Therefore, to discover new metabolites, it is important to find ways to induce the expression of genes in these otherwise silent biosynthetic clusters. We discovered a novel secondary metabolite in Aspergillus nidulans by predicting a biosynthetic gene cluster with genomic mining. A Zn(II(2Cys(6-type transcription factor, PbcR, was identified, and its role as a pathway-specific activator for the predicted gene cluster was demonstrated. Overexpression of pbcR upregulated the transcription of seven genes in the identified cluster and led to the production of a diterpene compound, which was characterized with GC/MS as ent-pimara-8(14,15-diene. A change in morphology was also observed in the strains overexpressing pbcR. The activation of a cryptic gene cluster by overexpression of its putative Zn(II(2Cys(6-type transcription factor led to discovery of a novel secondary metabolite in Aspergillus nidulans. Quantitative real-time PCR and DNA array analysis allowed us to predict the borders of the biosynthetic gene cluster. Furthermore, we identified a novel fungal pimaradiene cyclase gene as well as genes encoding 3-hydroxy-3-methyl-glutaryl-coenzyme A (HMG-CoA reductase and a geranylgeranyl pyrophosphate (GGPP synthase. None of these genes have been previously implicated in the biosynthesis of terpenes in Aspergillus nidulans. These results identify the first Aspergillus nidulans diterpene gene cluster and suggest a biosynthetic pathway for ent-pimara-8(14,15-diene.

  20. Identification and Characterization of a Novel Diterpene Gene Cluster in Aspergillus nidulans

    Science.gov (United States)

    Bromann, Kirsi; Toivari, Mervi; Viljanen, Kaarina; Vuoristo, Anu; Ruohonen, Laura; Nakari-Setälä, Tiina

    2012-01-01

    Fungal secondary metabolites are a rich source of medically useful compounds due to their pharmaceutical and toxic properties. Sequencing of fungal genomes has revealed numerous secondary metabolite gene clusters, yet products of many of these biosynthetic pathways are unknown since the expression of the clustered genes usually remains silent in normal laboratory conditions. Therefore, to discover new metabolites, it is important to find ways to induce the expression of genes in these otherwise silent biosynthetic clusters. We discovered a novel secondary metabolite in Aspergillus nidulans by predicting a biosynthetic gene cluster with genomic mining. A Zn(II)2Cys6–type transcription factor, PbcR, was identified, and its role as a pathway-specific activator for the predicted gene cluster was demonstrated. Overexpression of pbcR upregulated the transcription of seven genes in the identified cluster and led to the production of a diterpene compound, which was characterized with GC/MS as ent-pimara-8(14),15-diene. A change in morphology was also observed in the strains overexpressing pbcR. The activation of a cryptic gene cluster by overexpression of its putative Zn(II)2Cys6–type transcription factor led to discovery of a novel secondary metabolite in Aspergillus nidulans. Quantitative real-time PCR and DNA array analysis allowed us to predict the borders of the biosynthetic gene cluster. Furthermore, we identified a novel fungal pimaradiene cyclase gene as well as genes encoding 3-hydroxy-3-methyl-glutaryl-coenzyme A (HMG-CoA) reductase and a geranylgeranyl pyrophosphate (GGPP) synthase. None of these genes have been previously implicated in the biosynthesis of terpenes in Aspergillus nidulans. These results identify the first Aspergillus nidulans diterpene gene cluster and suggest a biosynthetic pathway for ent-pimara-8(14),15-diene. PMID:22506079

  1. ROUGH SET BASED CLUSTERING OF GENE EXPRESSION DATA: A SURVEY

    Directory of Open Access Journals (Sweden)

    J.JEBA EMILYN

    2010-12-01

    Full Text Available Microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. But the high dimensionality property of gene expression data makes it difficult to be analyzed. Lot of clustering algorithms are available for clustering. In this paper we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. Then we introduce rough clustering and itsadvantage over strict and fuzzy clustering is explored. We also explain why rough clustering is preferred over other conventional methods by presenting a survey on few clustering algorithms based on rough set theory for gene expression data. We conclude by stating that this area proves to be potential research field for the researchcommunity.

  2. Diversity and evolution of MicroRNA gene clusters

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    microRNA(miRNA) gene clusters are a group of miRNA genes clustered within a proximal distance on a chromosome.Although a large number of miRNA clusters have been uncovered in animal and plant genomes,the functional consequences of this arrangement are still poorly understood.Located in a polycistron,the coexpressed miRNA clusters are pivotal in coordinately regulating multiple processes,including embryonic development,cell cycles and cell differentiation.In this review,based on recent progress,we discuss the genomic diversity of miRNA gene clusters,the coordination of expression and function of the clustered miRNAs,and the evolutionarily adaptive processes with gain and loss of the clustering miRNA genes mediated by duplication and transposition events.

  3. Diversity and evolution of MicroRNA gene clusters

    Institute of Scientific and Technical Information of China (English)

    ZHANG YanFeng; ZHANG Rui; SU Bing

    2009-01-01

    microRNA (miRNA) gene clusters are a group of miRNA genes clustered within a proximal distance on a chromosome. Although a large number of miRNA clusters have been uncovered in animal and plant genomes, the functional consequences of this arrangement are still poorly understood. Located in a polycistron, the coexpressed miRNA clusters are pivotal in coordinately regulating multiple processes, including embryonic development, cell cycles and cell differentiation. In this review, based on recent progress, we discuss the genomic diversity of miRNA gene clusters, the coordination of expression and function of the clustered miRNAs, and the evolutionarily adaptive processes with gain and loss of the clustering miRNA genes mediated by duplication and transposition events.

  4. Nemertean toxin genes revealed through transcriptome sequencing.

    Science.gov (United States)

    Whelan, Nathan V; Kocot, Kevin M; Santos, Scott R; Halanych, Kenneth M

    2014-11-27

    Nemerteans are one of few animal groups that have evolved the ability to utilize toxins for both defense and subduing prey, but little is known about specific nemertean toxins. In particular, no study has identified specific toxin genes even though peptide toxins are known from some nemertean species. Information about toxin genes is needed to better understand evolution of toxins across animals and possibly provide novel targets for pharmaceutical and industrial applications. We sequenced and annotated transcriptomes of two free-living and one commensal nemertean and annotated an additional six publicly available nemertean transcriptomes to identify putative toxin genes. Approximately 63-74% of predicted open reading frames in each transcriptome were annotated with gene names, and all species had similar percentages of transcripts annotated with each higher-level GO term. Every nemertean analyzed possessed genes with high sequence similarities to known animal toxins including those from stonefish, cephalopods, and sea anemones. One toxin-like gene found in all nemerteans analyzed had high sequence similarity to Plancitoxin-1, a DNase II hepatotoxin that may function well at low pH, which suggests that the acidic body walls of some nemerteans could work to enhance the efficacy of protein toxins. The highest number of toxin-like genes found in any one species was seven and the lowest was three. The diversity of toxin-like nemertean genes found here is greater than previously documented, and these animals are likely an ideal system for exploring toxin evolution and industrial applications of toxins. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  5. Unusual Gene Order and Organization of the Sea Urchin Hox Cluster

    Energy Technology Data Exchange (ETDEWEB)

    Cameron, R A; Rowen, L; Nesbitt, R; Bloom, S; Rast, J P; Berney, K; Arenas-Mena, C; Martinez, P; Lucas, S; Richardson, P M; Davidson, E H; Peterson, K J; Hood, L

    2005-10-11

    The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3 gene is Hox5. (The gene order is : 5-Hox1, 2, 3, 11/13c, 11/13b, 11/13a, 9/10, 8, 7, 6, 5 - 3). The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.

  6. Unusual Gene Order and Organization of the Sea Urchin HoxCluster

    Energy Technology Data Exchange (ETDEWEB)

    Richardson, Paul M.; Lucas, Susan; Cameron, R. Andrew; Rowen,Lee; Nesbitt, Ryan; Bloom, Scott; Rast, Jonathan P.; Berney, Kevin; Arenas-Mena, Cesar; Martinez, Pedro; Davidson, Eric H.; Peterson, KevinJ.; Hood, Leroy

    2005-05-10

    The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3' gene is Hox5. (The gene order is : 5'-Hox1,2, 3, 11/13c, 11/13b, '11/13a, 9/10, 8, 7, 6, 5 - 3)'. The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.

  7. Sequence variations in the FAD2 gene in seeded pumpkins.

    Science.gov (United States)

    Ge, Y; Chang, Y; Xu, W L; Cui, C S; Qu, S P

    2015-12-21

    Seeded pumpkins are important economic crops; the seeds contain various unsaturated fatty acids, such as oleic acid and linoleic acid, which are crucial for human and animal nutrition. The fatty acid desaturase-2 (FAD2) gene encodes delta-12 desaturase, which converts oleic acid to linoleic acid. However, little is known about sequence variations in FAD2 in seeded pumpkins. Twenty-seven FAD2 clones from 27 accessions of Cucurbita moschata, Cucurbita maxima, Cucurbita pepo, and Cucurbita ficifolia were obtained (totally 1152 bp; a single gene without introns). More than 90% nucleotide identities were detected among the 27 FAD2 clones. Nucleotide substitution, rather than nucleotide insertion and deletion, led to sequence polymorphism in the 27 FAD2 clones. Furthermore, the 27 FAD2 selected clones all encoded the FAD2 enzyme (delta-12 desaturase) with amino acid sequence identities from 91.7 to 100% for 384 amino acids. The same main-function domain between 47 and 329 amino acids was identified. The four species clustered separately based on differences in the sequences that were identified using the unweighted pair group method with arithmetic mean. Geographic origin and species were found to be closely related to sequence variation in FAD2.

  8. A genome-wide analysis of nonribosomal peptide synthetase gene clusters and their peptides in a Planktothrix rubescens strain

    Directory of Open Access Journals (Sweden)

    Nederbragt Alexander J

    2009-08-01

    Full Text Available Abstract Background Cyanobacteria often produce several different oligopeptides, with unknown biological functions, by nonribosomal peptide synthetases (NRPS. Although some cyanobacterial NRPS gene cluster types are well described, the entire NRPS genomic content within a single cyanobacterial strain has never been investigated. Here we have combined a genome-wide analysis using massive parallel pyrosequencing ("454" and mass spectrometry screening of oligopeptides produced in the strain Planktothrix rubescens NIVA CYA 98 in order to identify all putative gene clusters for oligopeptides. Results Thirteen types of oligopeptides were uncovered by mass spectrometry (MS analyses. Microcystin, cyanopeptolin and aeruginosin synthetases, highly similar to already characterized NRPS, were present in the genome. Two novel NRPS gene clusters were associated with production of anabaenopeptins and microginins, respectively. Sequence-depth of the genome and real-time PCR data revealed three copies of the microginin gene cluster. Since NRPS gene cluster candidates for microviridin and oscillatorin synthesis could not be found, putative (gene encoded precursor peptide sequences to microviridin and oscillatorin were found in the genes mdnA and oscA, respectively. The genes flanking the microviridin and oscillatorin precursor genes encode putative modifying enzymes of the precursor oligopeptides. We therefore propose ribosomal pathways involving modifications and cyclisation for microviridin and oscillatorin. The microviridin, anabaenopeptin and cyanopeptolin gene clusters are situated in close proximity to each other, constituting an oligopeptide island. Conclusion Altogether seven nonribosomal peptide synthetase (NRPS gene clusters and two gene clusters putatively encoding ribosomal oligopeptide biosynthetic pathways were revealed. Our results demonstrate that whole genome shotgun sequencing combined with MS-directed determination of oligopeptides successfully

  9. DNACLUST: accurate and efficient clustering of phylogenetic marker genes

    Directory of Open Access Journals (Sweden)

    Liu Bo

    2011-06-01

    Full Text Available Abstract Background Clustering is a fundamental operation in the analysis of biological sequence data. New DNA sequencing technologies have dramatically increased the rate at which we can generate data, resulting in datasets that cannot be efficiently analyzed by traditional clustering methods. This is particularly true in the context of taxonomic profiling of microbial communities through direct sequencing of phylogenetic markers (e.g. 16S rRNA - the domain that motivated the work described in this paper. Many analysis approaches rely on an initial clustering step aimed at identifying sequences that belong to the same operational taxonomic unit (OTU. When defining OTUs (which have no universally accepted definition, scientists must balance a trade-off between computational efficiency and biological accuracy, as accurately estimating an environment's phylogenetic composition requires computationally-intensive analyses. We propose that efficient and mathematically well defined clustering methods can benefit existing taxonomic profiling approaches in two ways: (i the resulting clusters can be substituted for OTUs in certain applications; and (ii the clustering effectively reduces the size of the data-sets that need to be analyzed by complex phylogenetic pipelines (e.g., only one sequence per cluster needs to be provided to downstream analyses. Results To address the challenges outlined above, we developed DNACLUST, a fast clustering tool specifically designed for clustering highly-similar DNA sequences. Given a set of sequences and a sequence similarity threshold, DNACLUST creates clusters whose radius is guaranteed not to exceed the specified threshold. Underlying DNACLUST is a greedy clustering strategy that owes its performance to novel sequence alignment and k-mer based filtering algorithms. DNACLUST can also produce multiple sequence alignments for every cluster, allowing users to manually inspect clustering results, and enabling more

  10. Computing gene expression data with a knowledge-based gene clustering approach.

    Science.gov (United States)

    Rosa, Bruce A; Oh, Sookyung; Montgomery, Beronda L; Chen, Jin; Qin, Wensheng

    2010-01-01

    Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3 treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from 1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while increasing the likelihood that they are biologically relevant.

  11. RCSLenS: The Red Cluster Sequence Lensing Survey

    CERN Document Server

    Hildebrandt, H; Heymans, C; Blake, C; Erben, T; Miller, L; Nakajima, R; van Waerbeke, L; Viola, M; Buddendiek, A; Harnois-Déraps, J; Hojjati, A; Joachimi, B; Joudaki, S; Kitching, T D; Wolf, C; Gwyn, S; Kuijken, K; Sheikhbahaee, Z; Tudorica, A; Yee, H K C

    2016-01-01

    We present the Red-sequence Cluster Lensing Survey (RCSLenS), an application of the methods developed for the Canada France Hawaii Telescope Lensing Survey (CFHTLenS) to the ~785deg$^2$, multi-band imaging data of the Red-sequence Cluster Survey 2 (RCS2). This project represents the largest public, sub-arcsecond seeing, multi-band survey to date that is suited for weak gravitational lensing measurements. With a careful assessment of systematic errors in shape measurements and photometric redshifts we extend the use of this data set to allow cross-correlation analyses between weak lensing observables and other data sets. We describe the imaging data, the data reduction, masking, multi-colour photometry, photometric redshifts, shape measurements, tests for systematic errors, and a blinding scheme to allow for more objective measurements. In total we analyse 761 pointings with r-band coverage, which constitutes our lensing sample. Residual large-scale B-mode systematics prevent the use of this shear catalogue fo...

  12. Sequencing and Gene Expression Analysis of Leishmania tropica LACK Gene.

    Directory of Open Access Journals (Sweden)

    Nour Hammoudeh

    2014-12-01

    Full Text Available Leishmania Homologue of receptors for Activated C Kinase (LACK antigen is a 36-kDa protein, which provokes a very early immune response against Leishmania infection. There are several reports on the expression of LACK through different life-cycle stages of genus Leishmania, but only a few of them have focused on L.tropica.The present study provides details of the cloning, DNA sequencing and gene expression of LACK in this parasite species. First, several local isolates of Leishmania parasites were typed in our laboratory using PCR technique to verify of Leishmania parasite species. After that, LACK gene was amplified and cloned into a vector for sequencing. Finally, the expression of this molecule in logarithmic and stationary growth phase promastigotes, as well as in amastigotes, was evaluated by Reverse Transcription-PCR (RT-PCR technique.The typing result confirmed that all our local isolates belong to L.tropica. LACK gene sequence was determined and high similarity was observed with the sequences of other Leishmania species. Furthermore, the expression of LACK gene in both promastigotes and amastigotes forms was confirmed.Overall, the data set the stage for future studies of the properties and immune role of LACK gene products.

  13. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    Science.gov (United States)

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of EOperon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system.

  14. RCSLenS: The Red Cluster Sequence Lensing Survey

    Science.gov (United States)

    Hildebrandt, H.; Choi, A.; Heymans, C.; Blake, C.; Erben, T.; Miller, L.; Nakajima, R.; van Waerbeke, L.; Viola, M.; Buddendiek, A.; Harnois-Déraps, J.; Hojjati, A.; Joachimi, B.; Joudaki, S.; Kitching, T. D.; Wolf, C.; Gwyn, S.; Johnson, N.; Kuijken, K.; Sheikhbahaee, Z.; Tudorica, A.; Yee, H. K. C.

    2016-11-01

    We present the Red Cluster Sequence Lensing Survey (RCSLenS), an application of the methods developed for the Canada-France-Hawaii Telescope Lensing Survey (CFHTLenS) to the ˜785 deg2, multi-band imaging data of the Red-sequence Cluster Survey 2. This project represents the largest public, sub-arcsecond seeing, multi-band survey to date that is suited for weak gravitational lensing measurements. With a careful assessment of systematic errors in shape measurements and photometric redshifts, we extend the use of this data set to allow cross-correlation analyses between weak lensing observables and other data sets. We describe the imaging data, the data reduction, masking, multi-colour photometry, photometric redshifts, shape measurements, tests for systematic errors, and a blinding scheme to allow for more objective measurements. In total, we analyse 761 pointings with r-band coverage, which constitutes our lensing sample. Residual large-scale B-mode systematics prevent the use of this shear catalogue for cosmic shear science. The effective number density of lensing sources over an unmasked area of 571.7 deg2 and down to a magnitude limit of r ˜ 24.5 is 8.1 galaxies per arcmin2 (weighted: 5.5 arcmin-2) distributed over 14 patches on the sky. Photometric redshifts based on four-band griz data are available for 513 pointings covering an unmasked area of 383.5 deg2. We present weak lensing mass reconstructions of some example clusters as well as the full survey representing the largest areas that have been mapped in this way. All our data products are publicly available through Canadian Astronomy Data Centre at http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/en/community/rcslens/query.html in a format very similar to the CFHTLenS data release.

  15. The nucleotide sequences of two leghemoglobin genes from soybean

    DEFF Research Database (Denmark)

    Wiborg, O; Hyldig-Nielsen, J J; Jensen, E O

    1982-01-01

    We present the complete nucleotide sequences of two leghemoglobin genes isolated from soybean DNA. Both genes contain three intervening sequences in identical positions. Comparison of the coding sequences with known amino-acid sequences of soybean leghemoglobins suggest that the two genes...

  16. MS/MS networking guided analysis of molecule and gene cluster families.

    Science.gov (United States)

    Nguyen, Don Duy; Wu, Cheng-Hsuan; Moree, Wilna J; Lamsa, Anne; Medema, Marnix H; Zhao, Xiling; Gavilan, Ronnie G; Aparicio, Marystella; Atencio, Librada; Jackson, Chanaye; Ballesteros, Javier; Sanchez, Joel; Watrous, Jeramie D; Phelan, Vanessa V; van de Wiel, Corine; Kersten, Roland D; Mehnaz, Samina; De Mot, René; Shank, Elizabeth A; Charusanti, Pep; Nagarajan, Harish; Duggan, Brendan M; Moore, Bradley S; Bandeira, Nuno; Palsson, Bernhard Ø; Pogliano, Kit; Gutiérrez, Marcelino; Dorrestein, Pieter C

    2013-07-09

    The ability to correlate the production of specialized metabolites to the genetic capacity of the organism that produces such molecules has become an invaluable tool in aiding the discovery of biotechnologically applicable molecules. Here, we accomplish this task by matching molecular families with gene cluster families, making these correlations to 60 microbes at one time instead of connecting one molecule to one organism at a time, such as how it is traditionally done. We can correlate these families through the use of nanospray desorption electrospray ionization MS/MS, an ambient pressure MS technique, in conjunction with MS/MS networking and peptidogenomics. We matched the molecular families of peptide natural products produced by 42 bacilli and 18 pseudomonads through the generation of amino acid sequence tags from MS/MS data of specific clusters found in the MS/MS network. These sequence tags were then linked to biosynthetic gene clusters in publicly accessible genomes, providing us with the ability to link particular molecules with the genes that produced them. As an example of its use, this approach was applied to two unsequenced Pseudoalteromonas species, leading to the discovery of the gene cluster for a molecular family, the bromoalterochromides, in the previously sequenced strain P. piscicida JCM 20779(T). The approach itself is not limited to 60 related strains, because spectral networking can be readily adopted to look at molecular family-gene cluster families of hundreds or more diverse organisms in one single MS/MS network.

  17. MS/MS networking guided analysis of molecule and gene cluster families

    Science.gov (United States)

    Nguyen, Don Duy; Wu, Cheng-Hsuan; Moree, Wilna J.; Lamsa, Anne; Medema, Marnix H.; Zhao, Xiling; Gavilan, Ronnie G.; Aparicio, Marystella; Atencio, Librada; Jackson, Chanaye; Ballesteros, Javier; Sanchez, Joel; Watrous, Jeramie D.; Phelan, Vanessa V.; van de Wiel, Corine; Kersten, Roland D.; Mehnaz, Samina; De Mot, René; Shank, Elizabeth A.; Charusanti, Pep; Nagarajan, Harish; Duggan, Brendan M.; Moore, Bradley S.; Bandeira, Nuno; Palsson, Bernhard Ø.; Pogliano, Kit; Gutiérrez, Marcelino; Dorrestein, Pieter C.

    2013-01-01

    The ability to correlate the production of specialized metabolites to the genetic capacity of the organism that produces such molecules has become an invaluable tool in aiding the discovery of biotechnologically applicable molecules. Here, we accomplish this task by matching molecular families with gene cluster families, making these correlations to 60 microbes at one time instead of connecting one molecule to one organism at a time, such as how it is traditionally done. We can correlate these families through the use of nanospray desorption electrospray ionization MS/MS, an ambient pressure MS technique, in conjunction with MS/MS networking and peptidogenomics. We matched the molecular families of peptide natural products produced by 42 bacilli and 18 pseudomonads through the generation of amino acid sequence tags from MS/MS data of specific clusters found in the MS/MS network. These sequence tags were then linked to biosynthetic gene clusters in publicly accessible genomes, providing us with the ability to link particular molecules with the genes that produced them. As an example of its use, this approach was applied to two unsequenced Pseudoalteromonas species, leading to the discovery of the gene cluster for a molecular family, the bromoalterochromides, in the previously sequenced strain P. piscicida JCM 20779T. The approach itself is not limited to 60 related strains, because spectral networking can be readily adopted to look at molecular family–gene cluster families of hundreds or more diverse organisms in one single MS/MS network. PMID:23798442

  18. Simultaneous clustering of multiple gene expression and physical interaction datasets.

    Directory of Open Access Journals (Sweden)

    Manikandan Narayanan

    2010-04-01

    Full Text Available Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.

  19. Clustered and transient earthquake sequences in mid-continents

    Science.gov (United States)

    Liu, M.; Stein, S. A.; Wang, H.; Luo, G.

    2012-12-01

    Earthquakes result from sudden release of strain energy on faults. On plate boundary faults, strain energy is constantly accumulating from steady and relatively rapid relative plate motion, so large earthquakes continue to occur so long as motion continues on the boundary. In contrast, such steady accumulation of stain energy does not occur on faults in mid-continents, because the far-field tectonic loading is not steadily distributed between faults, and because stress perturbations from complex fault interactions and other stress triggers can be significant relative to the slow tectonic stressing. Consequently, mid-continental earthquakes are often temporally clustered and transient, and spatially migrating. This behavior is well illustrated by large earthquakes in North China in the past two millennia, during which no single large earthquakes repeated on the same fault segments, but moment release between large fault systems was complementary. Slow tectonic loading in mid-continents also causes long aftershock sequences. We show that the recent small earthquakes in the Tangshan region of North China are aftershocks of the 1976 Tangshan earthquake (M 7.5), rather than indicators of a new phase of seismic activity in North China, as many fear. Understanding the transient behavior of mid-continental earthquakes has important implications for assessing earthquake hazards. The sequence of large earthquakes in the New Madrid Seismic Zone (NMSZ) in central US, which includes a cluster of M~7 events in 1811-1812 and perhaps a few similar ones in the past millennium, is likely a transient process, releasing previously accumulated elastic strain on recently activated faults. If so, this earthquake sequence will eventually end. Using simple analysis and numerical modeling, we show that the large NMSZ earthquakes may be ending now or in the near future.

  20. Super-paramagnetic clustering of yeast gene expression profiles

    CERN Document Server

    Getz, G; Domany, E; Zhang, M Q

    2000-01-01

    High-density DNA arrays, used to monitor gene expression at a genomic scale, have produced vast amounts of information which require the development of efficient computational methods to analyze them. The important first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of a novel clustering algorithm, Super-Paramagnetic Clustering (SPC) to analysis of gene expression profiles that were generated recently during a study of the yeast cell cycle. SPC was used to organize genes into biologically relevant clusters that are suggestive for their co-regulation. Some of the advantages of SPC are its robustness against noise and initialization, a clear signature of cluster formation and splitting, and an unsupervised self-organized determination of the number of clusters at each resolution. Our analysis revealed interesting correlated behavior of several groups of genes which has not been previously identified.

  1. Super-paramagnetic clustering of yeast gene expression profiles

    Science.gov (United States)

    Getz, G.; Levine, E.; Domany, E.; Zhang, M. Q.

    2000-04-01

    High-density DNA arrays, used to monitor gene expression at a genomic scale, have produced vast amounts of information which require the development of efficient computational methods to analyze them. The important first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of a novel clustering algorithm, super-paramagnetic clustering (SPC) to analysis of gene expression profiles that were generated recently during a study of the yeast cell cycle. SPC was used to organize genes into biologically relevant clusters that are suggestive for their co-regulation. Some of the advantages of SPC are its robustness against noise and initialization, a clear signature of cluster formation and splitting, and an unsupervised self-organized determination of the number of clusters at each resolution. Our analysis revealed interesting correlated behavior of several groups of genes which has not been previously identified.

  2. Development of assays using hexokinase and phosphoglucomutase gene sequences that distinguish strains of Leishmania tropica from different zymodemes and microsatellite clusters and their application to Palestinian foci of cutaneous leishmaniasis.

    Directory of Open Access Journals (Sweden)

    Kifaya Azmi

    Full Text Available BACKGROUND/OBJECTIVES: Palestinian strains of L.tropica characterized by multilocus enzyme electrophoresis (MLEE fall into two zymodemes, either MON-137 or MON-307. METHODOLOGY/PRINCIPLE FINDINGS: Assays employing PCR and subsequent RFLP were applied to sequences found in the Hexokinase (HK gene, an enzyme that is not used in MLEE, and the Phosphoglucomutase (PGM gene, an enzyme that is used for MLEE, to see if they would facilitate consigning local strains of L.tropica to either zymodeme MON-137 or zymodeme MON-307. Following amplification and subsequent double digestion with the restriction endonucleases MboI and HaeIII, variation in the restriction patterns of the sequence from the HK gene distinguished strains of L.tropica, L.major and L.infantum and also exposed two genotypes (G among the strains of L.tropica: HK-LtG1, associated with strains of L.tropica of the zymodemes MON-137 and MON-265, and HK-LtG2, associated with strains of L.tropica of the zymodemes MON-307, MON-288, MON-275 and MON-54. Following amplification and subsequent digestion by the restriction endonuclease MboI, variation in the sequence from the PGM gene also exposed two genotypes among the strains of L.tropica: PGM-G1, associated only with strains of L.tropica of the zymodeme MON-137; and PGM-G2, associated with strains of L.tropica of the zymodemes MON-265, MON-307, MON-288, MON-275 and MON-54, and, also, with six strains of L.major, five of L.infantum and one of L.donovani. The use of the HK and PGM gene sequences enabled distinction the L.tropica strains of the zymodeme MON-137 from those of the zymodeme MON-265. This genotyping system 'correctly' identified reference strains of L.tropica of known zymodemal affiliation and also from clinical samples, with a level of sensitivity down to <1 fg in the case of the former and to 1 pg of DNA in the case of the latter. CONCLUSIONS/SIGNIFICANCE: Both assays proved useful for identifying leishmanial parasites in clinical

  3. MADIBA: A web server toolkit for biological interpretation of Plasmodium and plant gene clusters

    Directory of Open Access Journals (Sweden)

    Louw Abraham I

    2008-02-01

    Full Text Available Abstract Background Microarray technology makes it possible to identify changes in gene expression of an organism, under various conditions. Data mining is thus essential for deducing significant biological information such as the identification of new biological mechanisms or putative drug targets. While many algorithms and software have been developed for analysing gene expression, the extraction of relevant information from experimental data is still a substantial challenge, requiring significant time and skill. Description MADIBA (MicroArray Data Interface for Biological Annotation facilitates the assignment of biological meaning to gene expression clusters by automating the post-processing stage. A relational database has been designed to store the data from gene to pathway for Plasmodium, rice and Arabidopsis. Tools within the web interface allow rapid analyses for the identification of the Gene Ontology terms relevant to each cluster; visualising the metabolic pathways where the genes are implicated, their genomic localisations, putative common transcriptional regulatory elements in the upstream sequences, and an analysis specific to the organism being studied. Conclusion MADIBA is an integrated, online tool that will assist researchers in interpreting their results and understand the meaning of the co-expression of a cluster of genes. Functionality of MADIBA was validated by analysing a number of gene clusters from several published experiments – expression profiling of the Plasmodium life cycle, and salt stress treatments of Arabidopsis and rice. In most of the cases, the same conclusions found by the authors were quickly and easily obtained after analysing the gene clusters with MADIBA.

  4. Cloning and sequencing genes related to preeclampsia

    Institute of Scientific and Technical Information of China (English)

    SHI Juan-zi; LIU Yan-fang; YAO Yuan-qing; YAN Wei; ZHU Feng; ZHAO Zhong-liang

    2001-01-01

    To clone genes specifically expressed in the placenta of patients with preeclampsia, and to explain the mechanism in the etiopathology ofpreeclampsia. Methods: The placentae ofpreeclamptic and normotensive subjects with pregnancy were used as models, and the cDNA Library was constructed and 20 differentially expressed fragments were cloned after a new version of PCR-based subtractive hybridization. The false positive clones were identified by reverse dot blot analysis. With one of the obtained gene taken as the probe, the placentas of 10 normal pregnant women and 10 preeclamptic patients were studied by using dot hybridization methods. Results: Six false positive clones were identified by reverse dot blot, and the rest 14 clones were identified as preeclampsia-related genes. These clones were sequenced, and analyzed with BLAST analysis system. Eleven of 14 clones were genes already known, among which one belongs to necdin family; the rest 3 were identified as novel genes. These 3 genes were acknowledged by GenBank, with the accession numbers AF232216, AF232217, AF233648. The results of dot hybridization using necdin gene as probe were as follows: (1) There was this mRNA in the placental tissues of normal pregnancy as well as in that ofpreeclampsia.(2) The intensity of transcription of this mRNA in the placental tissues of preeclampsia increased significantly compared with that of the normal pregnancy (P<0.05). Conclusions: This study for the first time reported this group of genes, especially necdin-expressing gene, which are related to the etiopathology of preeclampsia. In addition, the overtranscription ofnecdin gene has been found in preeclampsia. It is helpful in further studies of the etiology ofpreeclampsia.

  5. Organization and Differential Regulation of a Cluster of Lignin Peroxidase Genes of Phanerochaete chrysosporium

    Science.gov (United States)

    Stewart, Philip; Cullen, Daniel

    1999-01-01

    The lignin peroxidases of Phanerochaete chrysosporium are encoded by a minimum of 10 closely related genes. Physical and genetic mapping of a cluster of eight lip genes revealed six genes occurring in pairs and transcriptionally convergent, suggesting that portions of the lip family arose by gene duplication events. The completed sequence of lipG and lipJ, together with previously published sequences, allowed phylogenetic and intron/exon classifications, indicating two main branches within the lip family. Competitive reverse transcription-PCR was used to assess lip transcript levels in both carbon- and nitrogen-limited media. Transcript patterns showed differential regulation of lip genes in response to medium composition. No apparent correlation was observed between genomic organization and transcript levels. Both constitutive and upregulated transcripts, structurally unrelated to peroxidases, were identified within the lip cluster. PMID:10348854

  6. An Sp185/333 gene cluster from the purple sea urchin and putative microsatellite-mediated gene diversification

    Directory of Open Access Journals (Sweden)

    Buckley Katherine M

    2010-10-01

    Full Text Available Abstract Background The immune system of the purple sea urchin, Strongylocentrotus purpuratus, is complex and sophisticated. An important component of sea urchin immunity is the Sp185/333 gene family, which is significantly upregulated in immunologically challenged animals. The Sp185/333 genes are less than 2 kb with two exons and are members of a large diverse family composed of greater than 40 genes. The S. purpuratus genome assembly, however, contains only six Sp185/333 genes. This underrepresentation could be due to the difficulties that large gene families present in shotgun assembly, where multiple similar genes can be collapsed into a single consensus gene. Results To understand the genomic organization of the Sp185/333 gene family, a BAC insert containing Sp185/333 genes was assembled, with careful attention to avoiding artifacts resulting from collapse or artificial duplication/expansion of very similar genes. Twelve candidate BAC assemblies were generated with varying parameters and the optimal assembly was identified by PCR, restriction digests, and subclone sequencing. The validated assembly contained six Sp185/333 genes that were clustered in a 34 kb region at one end of the BAC with five of the six genes tightly clustered within 20 kb. The Sp185/333 genes in this cluster were no more similar to each other than to previously sequenced Sp185/333 genes isolated from three different animals. This was unexpected given their proximity and putative effects of gene homogenization in closely linked, similar genes. All six genes displayed significant similarity including both 5' and 3' flanking regions, which were bounded by microsatellites. Three of the Sp185/333 genes and their flanking regions were tandemly duplicated such that each repeated segment consisted of a gene plus 0.7 kb 5' and 2.4 kb 3' of the gene (4.5 kb total. Both edges of the segmental duplications were bounded by different microsatellites. Conclusions The high sequence

  7. Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture.

    Science.gov (United States)

    Johnson, Timothy A; Stedtfeld, Robert D; Wang, Qiong; Cole, James R; Hashsham, Syed A; Looft, Torey; Zhu, Yong-Guan; Tiedje, James M

    2016-04-12

    Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if

  8. Enzymology of aminoglycoside biosynthesis-deduction from gene clusters.

    Science.gov (United States)

    Wehmeier, Udo F; Piepersberg, Wolfgang

    2009-01-01

    The classical aminoglycosides are, with very few exceptions, typically actinobacterial secondary metabolites with antimicrobial activities all mediated by inhibiting translation on the 30S subunit of the bacterial ribosome. Some chemically related natural products inhibit glucosidases by mimicking oligo-alpha-1,4-glucosides. The biochemistry of the aminoglycoside biosynthetic pathways is still a developing field since none of the pathways has been analyzed to completeness as yet. In this chapter we treat the enzymology of aminoglycoside biosyntheses as far as it becomes apparent from recent investigations based on the availability of DNA sequence data of biosynthetic gene clusters for all major structural classes of these bacterial metabolites. We give a more general overview of the field, including descriptions of some key enzymes in various aminoglycoside pathways, whereas in Chapter 20 provides a detailed account of the better-studied enzymology thus far known for the neomycin and butirosin pathways.

  9. Physical and genetic map of the major nif gene cluster from Azotobacter vinelandii.

    Science.gov (United States)

    Jacobson, M R; Brigle, K E; Bennett, L T; Setterquist, R A; Wilson, M S; Cash, V L; Beynon, J; Newton, W E; Dean, D R

    1989-02-01

    Determination of a 28,793-base-pair DNA sequence of a region from the Azotobacter vinelandii genome that includes and flanks the nitrogenase structural gene region was completed. This information was used to revise the previously proposed organization of the major nif cluster. The major nif cluster from A. vinelandii encodes 15 nif-specific genes whose products bear significant structural identity to the corresponding nif-specific gene products from Klebsiella pneumoniae. These genes include nifH, nifD, nifK, nifT, nifY, nifE, nifN, nifX, nifU, nifS, nifV, nifW, nifZ, nifM, and nifF. Although there are significant spatial differences, the identified A. vinelandii nif-specific genes have the same sequential arrangement as the corresponding nif-specific genes from K. pneumoniae. Twelve other potential genes whose expression could be subject to nif-specific regulation were also found interspersed among the identified nif-specific genes. These potential genes do not encode products that are structurally related to the identified nif-specific gene products. Eleven potential nif-specific promoters were identified within the major nif cluster, and nine of these are preceded by an appropriate upstream activator sequence. A + T-rich regions were identified between 8 of the 11 proposed nif promoter sequences and their upstream activator sequences. Site-directed deletion-and-insertion mutagenesis was used to establish a genetic map of the major nif cluster.

  10. Identification of the Fucose Synthetase Gene in the Colanic Acid Gene Cluster of Escherichia coli K-12

    OpenAIRE

    Andrianopoulos, Kanella; Wang, Lei; Reeves, Peter R.

    1998-01-01

    GDP–l-fucose, the substrate for fucosyltransferases for addition of fucose to polysaccharides or glycoproteins in both procaryotes and eucaryotes, is made from GDP–d-mannose. l-Fucose is a component of bacterial surface antigens, including the extracellular polysaccharide colanic acid produced by most Escherichia coli strains. We previously sequenced the E. coli colanic acid gene cluster and identified one of the GDP–l-fucose biosynthetic pathway genes, gmd. We report here the identification ...

  11. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

    Directory of Open Access Journals (Sweden)

    Cooper James B

    2010-03-01

    Full Text Available Abstract Background Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. Results We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. Conclusions By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome.

  12. Genomic organization and sequences of immunoglobulin light chain genes in a primitive vertebrate suggest coevolution of immunoglobulin gene organization.

    Science.gov (United States)

    Shamblott, M J; Litman, G W

    1989-01-01

    The genomic organization and sequence of immunoglobulin light chain genes in Heterodontus francisci (horned shark), a phylogenetically primitive vertebrate, have been characterized. Light chain variable (VL) and joining (JI) segments are separated by 380 nucleotides and together with the single constant region exon (CI), occupy less than 2.7 kb, the closest linkage described thus far for a rearranging gene system. The VL segment is flanked by a characteristic recombination signal sequence possessing a 12 nucleotide spacer; the recombination signal sequence flanking the JL segment is 23 nucleotides. The VL genes, unlike heavy chain genes, possess a typical upstream regulatory octamer as well as conserved enhancer core sequences in the intervening sequence separating JL and CL. Restriction mapping and genomic Southern blotting are consistent with the presence of multiple light chain gene clusters. There appear to be considerably fewer light than heavy chain genes. Heavy and light chain clusters show no evidence of genomic linkage using field inversion gel electrophoresis. The findings of major differences in the organization and functional rearrangement properties of immunoglobulin genes in species representing different levels of vertebrate evolution, but consistent similarity in the organization of heavy and light chain genes within a species, suggests that these systems may be coevolving. Images PMID:2511000

  13. Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters – Towards Identification of Novel Secondary Metabolisms from Filamentous Fungi -

    Directory of Open Access Journals (Sweden)

    Myco eUmemura

    2015-05-01

    Full Text Available Secondary metabolites are produced mostly by clustered genes that are essential to their biosynthesis. The transcriptional expression of these genes is often cooperatively regulated by a transcription factor located inside or close to a cluster. Most of the secondary metabolism biosynthesis (SMB gene clusters identified to date contain so-called core genes with distinctive sequence features, such as polyketide synthase (PKS and non-ribosomal peptide synthetase (NRPS. Recent efforts in sequencing fungal genomes have revealed far more SMB gene clusters than expected based on the number of core genes in the genomes. Several bioinformatics tools have been developed to survey SMB gene clusters using the sequence motif information of the core genes, including SMURF and antiSMASH.More recently, accompanied by the development of sequencing techniques allowing to obtain large-scale genomic and transcriptomic data, motif-independent prediction methods of SMB gene clusters, including MIDDAS-M, have been developed. Most these methods detect the clusters in which the genes are cooperatively regulated at transcriptional levels, thus allowing the identification of novel SMB gene clusters regardless of the presence of the core genes. Another type of the method, MIPS-CG, uses the characteristics of SMB genes, which are highly enriched in non-syntenic blocks (NSBs, enabling the prediction even without transcriptome data although the results have not been evaluated in detail. Considering that large portion of SMB gene clusters might be sufficiently expressed only in limited uncommon conditions, it seems that prediction of SMB gene clusters by bioinformatics and successive experimental validation is an only way to efficiently uncover hidden SMB gene clusters. Here, we describe and discuss possible novel approaches for the determination of SMB gene clusters that have not been identified using conventional methods.

  14. Mapping gene clusters within arrayed metagenomic libraries to expand the structural diversity of biomedically relevant natural products.

    Science.gov (United States)

    Owen, Jeremy G; Reddy, Boojala Vijay B; Ternei, Melinda A; Charlop-Powers, Zachary; Calle, Paula Y; Kim, Jeffrey H; Brady, Sean F

    2013-07-16

    Complex microbial ecosystems contain large reservoirs of unexplored biosynthetic diversity. Here we provide an experimental framework and data analysis tool to facilitate the targeted discovery of natural-product biosynthetic gene clusters from the environment. Multiplex sequencing of barcoded PCR amplicons is followed by sequence similarity directed data parsing to identify sequences bearing close resemblance to biosynthetically or biomedically interesting gene clusters. Amplicons are then mapped onto arrayed metagenomic libraries to guide the recovery of targeted gene clusters. When applied to adenylation- and ketosynthase-domain amplicons derived from saturating soil DNA libraries, our analysis pipeline led to the recovery of biosynthetic clusters predicted to encode for previously uncharacterized glycopeptide- and lipopeptide-like antibiotics; thiocoraline-, azinomycin-, and bleomycin-like antitumor agents; and a rapamycin-like immunosuppressant. The utility of the approach is demonstrated by using recovered eDNA sequences to generate glycopeptide derivatives. The experiments described here constitute a systematic interrogation of a soil metagenome for gene clusters capable of encoding naturally occurring derivatives of biomedically relevant natural products. Our results show that previously undetected biosynthetic gene clusters with potential biomedical relevance are very common in the environment. This general process should permit the routine screening of environmental samples for gene clusters capable of encoding the systematic expansion of the structural diversity seen in biomedically relevant families of natural products.

  15. Characterisation of the paralytic shellfish toxin biosynthesis gene clusters in Anabaena circinalis AWQC131C and Aphanizomenon sp. NH-5

    Directory of Open Access Journals (Sweden)

    Neilan Brett A

    2009-03-01

    Full Text Available Abstract Background Saxitoxin and its analogues collectively known as the paralytic shellfish toxins (PSTs are neurotoxic alkaloids and are the cause of the syndrome named paralytic shellfish poisoning. PSTs are produced by a unique biosynthetic pathway, which involves reactions that are rare in microbial metabolic pathways. Nevertheless, distantly related organisms such as dinoflagellates and cyanobacteria appear to produce these toxins using the same pathway. Hypothesised explanations for such an unusual phylogenetic distribution of this shared uncommon metabolic pathway, include a polyphyletic origin, an involvement of symbiotic bacteria, and horizontal gene transfer. Results We describe the identification, annotation and bioinformatic characterisation of the putative paralytic shellfish toxin biosynthesis clusters in an Australian isolate of Anabaena circinalis and an American isolate of Aphanizomenon sp., both members of the Nostocales. These putative PST gene clusters span approximately 28 kb and contain genes coding for the biosynthesis and export of the toxin. A putative insertion/excision site in the Australian Anabaena circinalis AWQC131C was identified, and the organization and evolution of the gene clusters are discussed. A biosynthetic pathway leading to the formation of saxitoxin and its analogues in these organisms is proposed. Conclusion The PST biosynthesis gene cluster presents a mosaic structure, whereby genes have apparently transposed in segments of varying size, resulting in different gene arrangements in all three sxt clusters sequenced so far. The gene cluster organizational structure and sequence similarity seems to reflect the phylogeny of the producer organisms, indicating that the gene clusters have an ancient origin, or that their lateral transfer was also an ancient event. The knowledge we gain from the characterisation of the PST biosynthesis gene clusters, including the identity and sequence of the genes involved

  16. The O28 Antigen Gene Clusters of Salmonella enterica subsp. enterica Serovar Dakar and Serovar Pomona Are Different

    Directory of Open Access Journals (Sweden)

    Clifford G. Clark

    2010-01-01

    Full Text Available A 10 kb O-antigen gene cluster was sequenced from a Salmonella enterica subsp. enterica Dakar O28 reference strain and from two S. Pomona serogroup O28 isolates. The two S. Pomona O antigen gene clusters showed only moderate identity with the S. Dakar O28 gene cluster, suggesting that the O antigen oligosaccharides may contain one or more sugars conferring the O28 epitope but may otherwise be different. These novel findings are absolutely critical for the correct interpretation of molecular serotyping assays targeting genes within the O antigen gene clusters of these Salmonella serotypes and suggest the possibility that the O antigen gene clusters of other Salmonella serovars may also be heterogenous.

  17. clusterProfiler: an R package for comparing biological themes among gene clusters.

    Science.gov (United States)

    Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu

    2012-05-01

    Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.

  18. Some statistical properties of gene expression clustering for array data

    DEFF Research Database (Denmark)

    Abreu, G C G; Pinheiro, A; Drummond, R D;

    2010-01-01

    DNA array data without a corresponding statistical error measure. We propose an easy-to-implement and simple-to-use technique that uses bootstrap re-sampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented......DNA arrays have been a rich source of data for the study of genomic expression of a wide variety of biological systems. Gene clustering is one of the paradigms quite used to assess the significance of a gene (or group of genes). However, most of the gene clustering techniques are applied to c...... for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Code in Matlab is freely available, as well as some supplementary material, at the following address: https...

  19. Identification and comparative analyses of Siamois cluster genes in Xenopus laevis and tropicalis.

    Science.gov (United States)

    Haramoto, Yoshikazu; Saijyo, Tomohito; Tanaka, Toshiaki; Furuno, Nobuaki; Suzuki, Atsushi; Ito, Yuzuru; Kondo, Mariko; Taira, Masanori; Takahashi, Shuji

    2017-06-15

    Two siamois-related homeobox genes siamois (sia1) and twin (sia2), have been reported in Xenopus laevis. These genes are expressed in the blastula chordin- and noggin-expressing (BCNE) center and the Nieuwkoop center, and have complete secondary axis-inducing activity when over-expressed on the ventral side of the embryo. Using whole genome sequences of X. tropicalis and X. laevis, we identified two additional siamois-related genes, which are tandemly duplicated near sia1 and sia2 to form the siamois gene cluster. Four siamois genes in X. tropicalis are transcribed at blastula to gastrula stages. In X. laevis, the siamois gene cluster is present on both homeologous chromosomes, XLA3L and XLA3S. Transcripts from seven siamois genes (three on XLA3L and four on XLA3S) in X. laevis were detected at blastula to gastrula stages. A transcribed gene, sia1p. S, encodes an inactive protein without a homeodomain. When over-expressed ventrally, all siamois-related genes tested in this study except for sia1p. S induced a complete secondary axis, indicating that X. tropicalis and X. laevis have four and six active siamois-related genes, respectively. Of note, each gene required different amounts of mRNA for full activity. These results suggest the possibility that siamois cluster genes have functional redundancy to endow robustness and quickness to organizer formation in Xenopus species. Copyright © 2017. Published by Elsevier Inc.

  20. The Incidence of Strong-Lensing Clusters in the Red-Sequence Cluster Survey

    CERN Document Server

    Gladders, M D; Yee, H K C; Hall, P B; Barrientos, L F; Gladders, Michael D.; Hoekstra, Henk; Hall, Patrick B.

    2003-01-01

    The incidence of giant arcs due to strong-lensing clusters of galaxies is known to be discrepant with current theoretical expectations. This result derives from a comparison of several cluster samples to predictions in the framework of the currently favored $\\Lambda$CDM cosmology, and one possible explanation for the discrepancy is that this cosmological model is not correct. In this paper we discuss the incidence of giant arcs in the Red-Sequence Cluster Survey (RCS), which again shows significant disagreement with theoretical predictions. We briefly describe a total of eight strong lens systems, seven of which are discussed here for the first time. Based on the details of these systems, in particular on the ratio of single to multiple arc systems, we argue that it may be possible to explain this discrepancy in the currently favored cosmology, by modifying the details of the lenses themselves. Specifically, the high incidence of multiple arc systems and their overall high redshift suggests that a sub-populat...

  1. Acquisition and Evolution of Plant Pathogenesis–Associated Gene Clusters and Candidate Determinants of Tissue-Specificity in Xanthomonas

    Science.gov (United States)

    Van Sluys, Marie-Anne; White, Frank F.; Ryan, Robert P.; Dow, J. Maxwell; Rabinowicz, Pablo; Salzberg, Steven L.; Leach, Jan E.; Sonti, Ramesh; Brendel, Volker; Bogdanove, Adam J.

    2008-01-01

    Background Xanthomonas is a large genus of plant-associated and plant-pathogenic bacteria. Collectively, members cause diseases on over 392 plant species. Individually, they exhibit marked host- and tissue-specificity. The determinants of this specificity are unknown. Methodology/Principal Findings To assess potential contributions to host- and tissue-specificity, pathogenesis-associated gene clusters were compared across genomes of eight Xanthomonas strains representing vascular or non-vascular pathogens of rice, brassicas, pepper and tomato, and citrus. The gum cluster for extracellular polysaccharide is conserved except for gumN and sequences downstream. The xcs and xps clusters for type II secretion are conserved, except in the rice pathogens, in which xcs is missing. In the otherwise conserved hrp cluster, sequences flanking the core genes for type III secretion vary with respect to insertion sequence element and putative effector gene content. Variation at the rpf (regulation of pathogenicity factors) cluster is more pronounced, though genes with established functional relevance are conserved. A cluster for synthesis of lipopolysaccharide varies highly, suggesting multiple horizontal gene transfers and reassortments, but this variation does not correlate with host- or tissue-specificity. Phylogenetic trees based on amino acid alignments of gum, xps, xcs, hrp, and rpf cluster products generally reflect strain phylogeny. However, amino acid residues at four positions correlate with tissue specificity, revealing hpaA and xpsD as candidate determinants. Examination of genome sequences of xanthomonads Xylella fastidiosa and Stenotrophomonas maltophilia revealed that the hrp, gum, and xcs clusters are recent acquisitions in the Xanthomonas lineage. Conclusions/Significance Our results provide insight into the ancestral Xanthomonas genome and indicate that differentiation with respect to host- and tissue-specificity involved not major modifications or wholesale

  2. Acquisition and evolution of plant pathogenesis-associated gene clusters and candidate determinants of tissue-specificity in xanthomonas.

    Directory of Open Access Journals (Sweden)

    Hong Lu

    Full Text Available BACKGROUND: Xanthomonas is a large genus of plant-associated and plant-pathogenic bacteria. Collectively, members cause diseases on over 392 plant species. Individually, they exhibit marked host- and tissue-specificity. The determinants of this specificity are unknown. METHODOLOGY/PRINCIPAL FINDINGS: To assess potential contributions to host- and tissue-specificity, pathogenesis-associated gene clusters were compared across genomes of eight Xanthomonas strains representing vascular or non-vascular pathogens of rice, brassicas, pepper and tomato, and citrus. The gum cluster for extracellular polysaccharide is conserved except for gumN and sequences downstream. The xcs and xps clusters for type II secretion are conserved, except in the rice pathogens, in which xcs is missing. In the otherwise conserved hrp cluster, sequences flanking the core genes for type III secretion vary with respect to insertion sequence element and putative effector gene content. Variation at the rpf (regulation of pathogenicity factors cluster is more pronounced, though genes with established functional relevance are conserved. A cluster for synthesis of lipopolysaccharide varies highly, suggesting multiple horizontal gene transfers and reassortments, but this variation does not correlate with host- or tissue-specificity. Phylogenetic trees based on amino acid alignments of gum, xps, xcs, hrp, and rpf cluster products generally reflect strain phylogeny. However, amino acid residues at four positions correlate with tissue specificity, revealing hpaA and xpsD as candidate determinants. Examination of genome sequences of xanthomonads Xylella fastidiosa and Stenotrophomonas maltophilia revealed that the hrp, gum, and xcs clusters are recent acquisitions in the Xanthomonas lineage. CONCLUSIONS/SIGNIFICANCE: Our results provide insight into the ancestral Xanthomonas genome and indicate that differentiation with respect to host- and tissue-specificity involved not major

  3. Regulator of complement activation (RCA) gene cluster in Xenopus tropicalis.

    Science.gov (United States)

    Oshiumi, Hiroyuki; Suzuki, Yuzuru; Matsumoto, Misako; Seya, Tsukasa

    2009-05-01

    Genome and expressed sequence tag information of Xenopus tropicalis suggested that short-consensus repeat (SCR)-containing proteins are encoded by three genes that are mapped within a 300-kb downstream of PFKFB2, which is a marker gene for the regulator of complement activation (RCA) loci in human and chicken. Based on this observation, we cloned the three cDNAs of these proteins using 3'- or 5'-RACE technique. Since their primary structures and locations of the proximity to the PFKFB2 locus, we named them amphibian RCA protein (ARC) 1, 2, and 3. Expression in human HEK293 or CHO cells suggested that ARC1 is a soluble protein of Mr approximately 67 kDa, ARC2 is a membrane protein with Mr 44 kDa, and ARC3 a secretary protein with a putative transmembrane region. They were N-glycosylated during maturation. In human and chicken RCA clusters, the order in which genes for soluble, GPI-anchored, and membrane forms of SCR proteins are arranged is from the distant to proximity to the PFKFB2 gene. However, the amphibian ARC1, 2, and 3 resembled one another and did not reflect the same order found in human and chicken RCA genes. This may be due to self-duplication of ARCs to form a family, and it evolved after the amphibia separated from the ancestor of the amniotes, which possessed soluble, GPI-anchored, and membrane forms of SCR protein members. Taken together, frog possesses a RCA locus, but the constitution of the ARC proteins differs from that of the amniotes with a unique self-resemblance.

  4. Cloning and Characterization of the Polyether Salinomycin Biosynthesis Gene Cluster of Streptomyces albus XM211

    OpenAIRE

    Jiang, Chunyan; Wang, Hougen; Kang, Qianjin; Jing LIU; Bai, Linquan

    2012-01-01

    Salinomycin is widely used in animal husbandry as a food additive due to its antibacterial and anticoccidial activities. However, its biosynthesis had only been studied by feeding experiments with isotope-labeled precursors. A strategy with degenerate primers based on the polyether-specific epoxidase sequences was successfully developed to clone the salinomycin gene cluster. Using this strategy, a putative epoxidase gene, slnC, was cloned from the salinomycin producer Streptomyces albus XM211...

  5. Clustering Algorithms: Their Application to Gene Expression Data

    Science.gov (United States)

    Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel

    2016-01-01

    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867

  6. Apple contains receptor-like genes homologous to the Cladosporium fulvum resistance gene family of tomato with a cluster of genes cosegregating with Vf apple scab resistance.

    Science.gov (United States)

    Vinatzer, B A; Patocchi, A; Gianfranceschi, L; Tartarini, S; Zhang, H B; Gessler, C; Sansavini, S

    2001-04-01

    Scab caused by the fungal pathogen Venturia inaequalis is the most common disease of cultivated apple (Malus x domestica Borkh.). Monogenic resistance against scab is found in some small-fruited wild Malus species and has been used in apple breeding for scab resistance. Vf resistance of Malus floribunda 821 is the most widely used scab resistance source. Because breeding a high-quality cultivar in perennial fruit trees takes dozens of years, cloning disease resistance genes and using them in the transformation of high-quality apple varieties would be advantageous. We report the identification of a cluster of receptor-like genes with homology to the Cladosporium fulvum (Cf) resistance gene family of tomato on bacterial artificial chromosome clones derived from the Vf scab resistance locus. Three members of the cluster were sequenced completely. Similar to the Cf gene family of tomato, the deduced amino acid sequences coded by these genes contain an extracellular leucine-rich repeat domain and a transmembrane domain. The transcription of three members of the cluster was determined by reverse transcriptionpolymerase chain reaction to be constitutive, and the transcription and translation start of one member was verified by 5' rapid amplification of cDNA ends. We discuss the parallels between Cf resistance of tomato and Vf resistance of apple and the possibility that one of the members of the gene cluster is the Vf gene. Cf homologs from other regions of the apple genome also were identified and are likely to present other scab resistance genes.

  7. Assembly of the Red Sequence in Infrared-Selected Galaxy Clusters from the IRAC Shallow Cluster Survey

    CERN Document Server

    Snyder, Gregory F; Mancone, Conor M; Zeimann, Gregory R; Stanford, S A; Gonzalez, Anthony H; Stern, Daniel; Eisenhardt, Peter R M; Brown, Michael J I; Dey, Arjun; Jannuzi, Buell; Perlmutter, Saul

    2012-01-01

    We present results for the assembly and star formation histories of massive (~L*) red sequence galaxies in 11 spectroscopically confirmed, infrared-selected galaxy clusters at 1.0 ~ 4, contained some red spheroids by z ~ 1.5, and were actively assembling much of their final mass during 1 < z < 2 in the form of younger stars. Qualitatively, the slopes of the cluster color-magnitude relations are consistent with no significant evolution relative to local clusters.

  8. Identification and analysis of the paulomycin biosynthetic gene cluster and titer improvement of the paulomycins in Streptomyces paulus NRRL 8115.

    Directory of Open Access Journals (Sweden)

    Jine Li

    Full Text Available The paulomycins are a group of glycosylated compounds featuring a unique paulic acid moiety. To locate their biosynthetic gene clusters, the genomes of two paulomycin producers, Streptomyces paulus NRRL 8115 and Streptomyces sp. YN86, were sequenced. The paulomycin biosynthetic gene clusters were defined by comparative analyses of the two genomes together with the genome of the third paulomycin producer Streptomyces albus J1074. Subsequently, the identity of the paulomycin biosynthetic gene cluster was confirmed by inactivation of two genes involved in biosynthesis of the paulomycose branched chain (pau11 and the ring A moiety (pau18 in Streptomyces paulus NRRL 8115. After determining the gene cluster boundaries, a convergent biosynthetic model was proposed for paulomycin based on the deduced functions of the pau genes. Finally, a paulomycin high-producing strain was constructed by expressing an activator-encoding gene (pau13 in S. paulus, setting the stage for future investigations.

  9. Minimum Information about a Biosynthetic Gene cluster : commentary

    NARCIS (Netherlands)

    Medema, Marnix H; Kottmann, Renzo; Yilmaz, Pelin; Cummings, Matthew; Biggins, John B; Blin, Kai; de Bruijn, Irene; Chooi, Yit Heng; Claesen, Jan; Coates, R Cameron; Cruz-Morales, Pablo; Duddela, Srikanth; Dusterhus, Stephanie; Edwards, Daniel J; Fewer, David P; Garg, Neha; Geiger, Christoph; Gomez-Escribano, Juan Pablo; Greule, Anja; Hadjithomas, Michalis; Haines, Anthony S; Helfrich, Eric J N; Hillwig, Matthew L; Ishida, Keishi; Jones, Adam C; Jones, Carla S; Jungmann, Katrin; Kegler, Carsten; Kim, Hyun Uk; Kotter, Peter; Krug, Daniel; Masschelein, Joleen; Melnik, Alexey V; Mantovani, Simone M; Monroe, Emily A; Moore, Marcus; Moss, Nathan; Nutzmann, Hans-Wilhelm; Pan, Guohui; Pati, Amrita; Petras, Daniel; Reen, F Jerry; Rosconi, Federico; Rui, Zhe; Tian, Zhenhua; Tobias, Nicholas J; Tsunematsu, Yuta; Wiemann, Philipp; Wyckoff, Elizabeth; Yan, Xiaohui; Yim, Grace; Yu, Fengan; Xie, Yunchang; Aigle, Bertrand; Apel, Alexander K; Balibar, Carl J; Balskus, Emily P; Barona-Gomez, Francisco; Bechthold, Andreas; Bode, Helge B; Borriss, Rainer; Brady, Sean F; Brakhage, Axel A; Caffrey, Patrick; Cheng, Yi-Qiang; Clardy, Jon; Cox, Russell J; De Mot, Rene; Donadio, Stefano; Donia, Mohamed S; van der Donk, Wilfred A; Dorrestein, Pieter C; Doyle, Sean; Driessen, Arnold J M; Ehling-Schulz, Monika; Entian, Karl-Dieter; Fischbach, Michael A; Gerwick, Lena; Gerwick, William H; Gross, Harald; Gust, Bertolt; Hertweck, Christian; Hofte, Monica; Jensen, Susan E; Ju, Jianhua; Katz, Leonard; Kaysser, Leonard; Klassen, Jonathan L; Keller, Nancy P; Kormanec, Jan; Kuipers, Oscar P; Kuzuyama, Tomohisa; Kyrpides, Nikos C; Kwon, Hyung-Jin; Lautru, Sylvie; Lavigne, Rob; Lee, Chia Y; Linquan, Bai; Liu, Xinyu; Liu, Wen; Luzhetskyy, Andriy; Mahmud, Taifo; Mast, Yvonne; Mendez, Carmen; Metsa-Ketela, Mikko; Micklefield, Jason; Mitchell, Douglas A; Moore, Bradley S; Moreira, Leonilde M; Muller, Rolf; Neilan, Brett A; Nett, Markus; Nielsen, Jens; O'Gara, Fergal; Oikawa, Hideaki; Osbourn, Anne; Osburne, Marcia S; Ostash, Bohdan; Payne, Shelley M; Pernodet, Jean-Luc; Petricek, Miroslav; Piel, Jorn; Ploux, Olivier; Raaijmakers, Jos M; Salas, Jose A; Schmitt, Esther K; Scott, Barry; Seipke, Ryan F; Shen, Ben; Sherman, David H; Sivonen, Kaarina; Smanski, Michael J; Sosio, Margherita; Stegmann, Evi; Sussmuth, Roderich D; Tahlan, Kapil; Thomas, Christopher M; Tang, Yi; Truman, Andrew W; Viaud, Muriel; Walton, Jonathan D; Walsh, Christopher T; Weber, Tilmann; van Wezel, Gilles P; Wilkinson, Barrie; Willey, Joanne M; Wohlleben, Wolfgang; Wright, Gerard D; Ziemert, Nadine; Zhang, Changsheng; Zotchev, Sergey B; Breitling, Rainer; Takano, Eriko; Glockner, Frank Oliver

    2015-01-01

    A wide variety of enzymatic pathways that produce specialized metabolites in bacteria, fungi and plants are known to be encoded in biosynthetic gene clusters. Information about these clusters, pathways and metabolites is currently dispersed throughout the literature, making it difficult to exploit.

  10. Functional identification of gene cluster for the aniline metabolic pathway mediated by transposable element

    Institute of Scientific and Technical Information of China (English)

    LIANG Quanfeng; Takeo Masahiro; LIN Min; CHEN Ming; XU Yuquan; ZHANG Wei; PING Shuzhen; LU Wei; SONG Xianlong; WANG Weiwei; GENG Lizhao

    2005-01-01

    A convenient and widely applicable method has been developed to clone aniline metabolic gene cluster in this study. Three positive recombinant plasmids pDA1, pDB2 and pDB11 were cloned from genomic library of aniline degradation strain AD9. The result of aniline dioxygenase (AD) activity and catechol 2,3-oxygenase (C23O) activity assay showed that pDA1 and pDB11 contain aniline dioxygenase genes and catechol 2,3-dioxygenase genes, respectively. The sequence analysis of the total 24.7-kb region revealed that this region contains 25 ORFs, of which 17 genes involve metabolism of aniline. In the gene cluster, the first five genes (tadQTA1A2B) and the subsequent gene (tadR1) were predicted to encode a multi-component aniline dioxygenase and a LysR-type regulator, respectively, while the others (tadD1C1D2C2EFGIJKL) were expected to encode meta- cleavage pathway enzymes for catechol degradation. The gene cluster was surrounded by two IS1071 sequences.

  11. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

    DEFF Research Database (Denmark)

    Kautsar, Satria A.; Suarez Duran, Hernando G.; Blin, Kai

    2017-01-01

    of predicted biosynthetic enzyme-coding genes, and facilitates comparative genomic analysis to study the evolutionary conservation of each cluster. Applied on 48 high-quality plant genomes, plantiSMASH identifies a rich diversity of candidate plant BGCs. These results will guide further experimental...... exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery. The plantiSMASH web server, precalculated results...

  12. Organization, expression and evolution of a disease resistance gene cluster in soybean.

    Science.gov (United States)

    Graham, Michelle A; Marek, Laura Fredrick; Shoemaker, Randy C

    2002-01-01

    PCR amplification was previously used to identify a cluster of resistance gene analogues (RGAs) on soybean linkage group J. Resistance to powdery mildew (Rmd-c), Phytophthora stem and root rot (Rps2), and an ineffective nodulation gene (Rj2) map within this cluster. BAC fingerprinting and RGA-specific primers were used to develop a contig of BAC clones spanning this region in cultivar "Williams 82" [rps2, Rmd (adult onset), rj2]. Two cDNAs with homology to the TIR/NBD/LRR family of R-genes have also been mapped to opposite ends of a BAC in the contig Gm_Isb001_091F11 (BAC 91F11). Sequence analyses of BAC 91F11 identified 16 different resistance-like gene (RLG) sequences with homology to the TIR/NBD/LRR family of disease resistance genes. Four of these RLGs represent two potentially novel classes of disease resistance genes: TIR/NBD domains fused inframe to a putative defense-related protein (NtPRp27-like) and TIR domains fused inframe to soybean calmodulin Ca(2+)-binding domains. RT-PCR analyses using gene-specific primers allowed us to monitor the expression of individual genes in different tissues and developmental stages. Three genes appeared to be constitutively expressed, while three were differentially expressed. Analyses of the R-genes within this BAC suggest that R-gene evolution in soybean is a complex and dynamic process. PMID:12524363

  13. The first determination of DNA sequence of a specific gene.

    Science.gov (United States)

    Inouye, Masayori

    2016-05-10

    How and when the first DNA sequence of a gene was determined? In 1977, F. Sanger came up with an innovative technology to sequence DNA by using chain terminators, and determined the entire DNA sequence of the 5375-base genome of bacteriophage φX 174 (Sanger et al., 1977). While this Sanger's achievement has been recognized as the first DNA sequencing of genes, we had determined DNA sequence of a gene, albeit a partial sequence, 11 years before the Sanger's DNA sequence (Okada et al., 1966).

  14. Recurring cluster and operon assembly for Phenylacetate degradation genes

    Directory of Open Access Journals (Sweden)

    McInerney James O

    2009-02-01

    Full Text Available Abstract Background A large number of theories have been advanced to explain why genes involved in the same biochemical processes are often co-located in genomes. Most of these theories have been dismissed because empirical data do not match the expectations of the models. In this work we test the hypothesis that cluster formation is most likely due to a selective pressure to gradually co-localise protein products and that operon formation is not an inevitable conclusion of the process. Results We have selected an exemplar well-characterised biochemical pathway, the phenylacetate degradation pathway, and we show that its complex history is only compatible with a model where a selective advantage accrues from moving genes closer together. This selective pressure is likely to be reasonably weak and only twice in our dataset of 102 genomes do we see independent formation of a complete cluster containing all the catabolic genes in the pathway. Additionally, de novo clustering of genes clearly occurs repeatedly, even though recombination should result in the random dispersal of such genes in their respective genomes. Interspecies gene transfer has frequently replaced in situ copies of genes resulting in clusters that have similar content but very different evolutionary histories. Conclusion Our model for cluster formation in prokaryotes, therefore, consists of a two-stage selection process. The first stage is selection to move genes closer together, either because of macromolecular crowding, chromatin relaxation or transcriptional regulation pressure. This proximity opportunity sets up a separate selection for co-transcription.

  15. Comparative analysis of a cryptic thienamycin-like gene cluster identified in Streptomyces flavogriseus by genome mining.

    Science.gov (United States)

    Blanco, Gloria

    2012-06-01

    In silico database searches allowed the identification in the S. flavogriseus ATCC 33331 genome of a carbapenem gene cluster highly related to the S. cattleya thienamycin one. This is the second cluster found for a complex highly substituted carbapenem. Comparative analysis revealed that both gene clusters display a high degree of synteny in gene organization and in protein conservation. Although the cluster appears to be silent under our laboratory conditions, the putative metabolic product was predicted from bioinformatics analyses using sequence comparison tools. These data, together with previous reports concerning epithienamycins production by S. flavogriseus strains, suggest that the cluster metabolic product might be a thienamycin-like carbapenem, possibly the epimeric epithienamycin. This finding might help in understanding the biosynthetic pathway to thienamycin and other highly substituted carbapenems. It also provides another example of genome mining in Streptomyces sequenced genomes as a powerful approach for novel antibiotic discovery.

  16. Genetic localization and in vivo characterization of a Monascus azaphilone pigment biosynthetic gene cluster.

    Science.gov (United States)

    Balakrishnan, Bijinu; Karki, Suman; Chiu, Shih-Hau; Kim, Hyun-Ju; Suh, Jae-Won; Nam, Bora; Yoon, Yeo-Min; Chen, Chien-Chi; Kwon, Hyung-Jin

    2013-07-01

    Monascus spp. produce several well-known polyketides such as monacolin K, citrinin, and azaphilone pigments. In this study, the azaphilone pigment biosynthetic gene cluster was identified through T-DNA random mutagenesis in Monascus purpureus. The albino mutant W13 bears a T-DNA insertion upstream of a transcriptional regulator gene (mppR1). The transcription of mppR1 and the nearby polyketide synthase gene (MpPKS5) was significantly repressed in the W13 mutant. Targeted inactivation of MpPKS5 also gave rise to an albino mutant, confirming that mppR1 and MpPKS5 belong to an azaphilone pigment biosynthetic gene cluster. This M. purpureus sequence was used to identify the whole biosynthetic gene cluster in the Monascus pilosus genome. MpPKS5 contains SAT/KS/AT/PT/ACP/MT/R domains, and this domain organization is preserved in other azaphilone polyketide synthases. This biosynthetic gene cluster also encodes fatty acid synthase (FAS), which is predicted to assist the synthesis of 3-oxooactanoyl-CoA and 3-oxodecanoyl-CoA. These 3-oxoacyl compounds are proposed to be incorporated into the azaphilone backbone to complete the pigment biosynthesis. A monooxygenase gene (an azaH and tropB homolog) that is located far downstream of the FAS gene is proposed to be involved in pyrone ring formation. A homology search on other fungal genome sequences suggests that this azaphilone pigment gene cluster also exists in the Penicillium marneffei and Talaromyces stipitatus genomes.

  17. Hox gene clusters in the Indonesian coelacanth, Latimeria menadoensis.

    Science.gov (United States)

    Koh, Esther G L; Lam, Kevin; Christoffels, Alan; Erdmann, Mark V; Brenner, Sydney; Venkatesh, Byrappa

    2003-02-01

    The Hox genes encode transcription factors that play a key role in specifying body plans of metazoans. They are organized into clusters that contain up to 13 paralogue group members. The complex morphology of vertebrates has been attributed to the duplication of Hox clusters during vertebrate evolution. In contrast to the single Hox cluster in the amphioxus (Branchiostoma floridae), an invertebrate-chordate, mammals have four clusters containing 39 Hox genes. Ray-finned fishes (Actinopterygii) such as zebrafish and fugu possess more than four Hox clusters. The coelacanth occupies a basal phylogenetic position among lobe-finned fishes (Sarcopterygii), which gave rise to the tetrapod lineage. The lobe fins of sarcopterygians are considered to be the evolutionary precursors of tetrapod limbs. Thus, the characterization of Hox genes in the coelacanth should provide insights into the origin of tetrapod limbs. We have cloned the complete second exon of 33 Hox genes from the Indonesian coelacanth, Latimeria menadoensis, by extensive PCR survey and genome walking. Phylogenetic analysis shows that 32 of these genes have orthologs in the four mammalian HOX clusters, including three genes (HoxA6, D1, and D8) that are absent in ray-finned fishes. The remaining coelacanth gene is an ortholog of hoxc1 found in zebrafish but absent in mammals. Our results suggest that coelacanths have four Hox clusters bearing a gene complement more similar to mammals than to ray-finned fishes, but with an additional gene, HoxC1, which has been lost during the evolution of mammals from lobe-finned fishes.

  18. A Rough Set based Gene Expression Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    J. J. Emilyn

    2011-01-01

    Full Text Available Problem statement: Microarray technology helps in monitoring the expression levels of thousands of genes across collections of related samples. Approach: The main goal in the analysis of large and heterogeneous gene expression datasets was to identify groups of genes that get expressed in a set of experimental conditions. Results: Several clustering techniques have been proposed for identifying gene signatures and to understand their role and many of them have been applied to gene expression data, but with partial success. The main aim of this work was to develop a clustering algorithm that would successfully indentify gene patterns. The proposed novel clustering technique (RCGED provides an efficient way of finding the hidden and unique gene expression patterns. It overcomes the restriction of one object being placed in only one cluster. Conclusion/Recommendations: The proposed algorithm is termed intelligent because it automatically determines the optimum number of clusters. The proposed algorithm was experimented with colon cancer dataset and the results were compared with Rough Fuzzy K Means algorithm.

  19. Phylogeny of the Insect Homeobox Gene (Hox) Cluster

    Institute of Scientific and Technical Information of China (English)

    Sangeeta Dhawan; K. P. Gopinathan

    2005-01-01

    The homeobox (Hox) genes form an evolutionarily conserved family encoding transcription factors that play major roles in segmental identity and organ specification across species. The canonical grouping of Hox genes present in the HOM-C cluster of Drosophila or related clusters in other organisms includes eight "typical" genes,which are localized in the order labial (lab), proboscipedia (pb), Deformed (Dfd),Sex combs reduced ( Scr), Antennapedia (Antp), Ultrabithorax (Ubx), abdominalA (abdA), and AbdominalB (AbdB). The members of Hox cluster are expressed in a distinct anterior to posterior order in the embryo. Analysis of the relatedness of different members of the Hox gene cluster to each other in four evolutionarily diverse insect taxa revealed that the loci pb/Dfd and AbdB, which are farthest apart in linkage, had a high degree of evolutionary relatedness, indicating that pb/Dfd type anterior genes and AbdB are closest to the ancestral anterior and posterior Hox genes, respectively. The greater relatedness of other posterior genes Ubx and abdA to the more anterior genes such as Antp and Scr suggested that they arose by gene duplications in the more anterior members rather than the posterior AbdB.

  20. Preliminary study on mitochondrial 16S rRNA gene sequences and phylogeny of flatfishes (Pleuronectiformes)

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    A 605 bp section of mitochondrial 16S rRNA gene from Paralichthys olivaceus, Pseudorhombus cinnamomeus, Psetta maxima and Kareius bicoloratus, which represent 3 families of Order Pleuronectiformes was amplified by PCR and sequenced to show the molecular systematics of Pleuronectiformes for comparison with related gene sequences of other 6 flatfish downloaded from GenBank. Phylogenetic analysis based on genetic distance from related gene sequences of 10 flatfish showed that this method was ideal to explore the relationship between species, genera and families. Phylogenetic trees set-up is based on neighbor-joining, maximum parsimony and maximum likelihood methods that accords to the general rule of Pleuronectiformes evolution. But they also resulted in some confusion. Unlike data from morphological characters, P. olivaceus clustered with K.bicoloratus, but P. cinnamomeus did not cluster with P. olivaceus, which is worth further studying.

  1. The Accelerated Build-up of the Red Sequence in High Redshift Galaxy Clusters

    CERN Document Server

    Cerulo, P; Lidman, C; Demarco, R; Huertas-Company, M; Mei, S; Sánchez-Janssen, R; Barrientos, L F; Muñoz, R P

    2016-01-01

    We analyse the evolution of the red sequence in a sample of galaxy clusters at redshifts $0.8 11.5$) red sequence galaxies in the WINGS clusters, which do not include only the brightest cluster galaxies and which are not present in the HCS clusters, suggesting that they formed at epochs later than $z=0.8$. The comparison with the luminosity distribution of a sample of passive red sequence galaxies drawn from the COSMOS/UltraVISTA field in the photometric redshift range $0.8sequence in clusters is more developed at the faint end, suggesting that halo mass plays an important role in setting the time-scales for the build-up of the red sequence.

  2. Evolution of the RH gene family in vertebrates revealed by brown hagfish (Eptatretus atami) genome sequences.

    Science.gov (United States)

    Suzuki, Akinori; Komata, Hidero; Iwashita, Shogo; Seto, Shotaro; Ikeya, Hironobu; Tabata, Mitsutoshi; Kitano, Takashi

    2017-02-01

    In vertebrates, there are four major genes in the RH (Rhesus) gene family, RH, RHAG, RHBG, and RHCG. These genes are thought to have been formed by the two rounds of whole-genome duplication (2R-WGD) in the common ancestor of all vertebrates. In our previous work, where we analyzed details of the gene duplications process of this gene family, three nucleotide sequences belonging to this family were identified in Far Eastern brook lamprey (Lethenteron reissneri), and the phylogenetic positions of the genes were determined. Lampreys, along with hagfishes, are cyclostomata (jawless fishes), which is a sister group of gnathostomata (jawed vertebrates). Although those results suggested that one gene was orthologous to the gnathostome RHCG genes, we did not identify clear orthologues for other genes. In this study, therefore, we identified three novel cDNA sequences that belong to the RH gene family using de novo transcriptome analysis of another cyclostome: the brown hagfish (Eptatretus atami). We also determined the nucleotide sequences for the RHBG and RHCG genes in a red stingray (Dasyatis akajei), which belongs to the cartilaginous fishes. The phylogenetic tree showed that two brown hagfish genes, which were probably duplicated in the cyclostome lineage, formed a cluster with the gnathostome RHAG genes, whereas another brown hagfish gene formed a cluster with the gnathostome RHCG genes. We estimated that the RH genes had a higher evolutionary rate than the RHAG, RHBG, and RHCG genes. Interestingly, in the RHBG genes, only the bird lineage showed a higher rate of nonsynonymous substitutions. It is likely that this higher rate was caused by a state of relaxed functional constraints rather than positive selection nor by pseudogenization.

  3. Shared gene structures and clusters of mutually exclusive spliced exons within the metazoan muscle myosin heavy chain genes.

    Directory of Open Access Journals (Sweden)

    Martin Kollmar

    Full Text Available Multicellular animals possess two to three different types of muscle tissues. Striated muscles have considerable ultrastructural similarity and contain a core set of proteins including the muscle myosin heavy chain (Mhc protein. The ATPase activity of this myosin motor protein largely dictates muscle performance at the molecular level. Two different solutions to adjusting myosin properties to different muscle subtypes have been identified so far: Vertebrates and nematodes contain many independent differentially expressed Mhc genes while arthropods have single Mhc genes with clusters of mutually exclusive spliced exons (MXEs. The availability of hundreds of metazoan genomes now allowed us to study whether the ancient bilateria already contained MXEs, how MXE complexity subsequently evolved, and whether additional scenarios to control contractile properties in different muscles could be proposed, By reconstructing the Mhc genes from 116 metazoans we showed that all intron positions within the motor domain coding regions are conserved in all bilateria analysed. The last common ancestor of the bilateria already contained a cluster of MXEs coding for part of the loop-2 actin-binding sequence. Subsequently the protostomes and later the arthropods gained many further clusters while MXEs got completely lost independently in several branches (vertebrates and nematodes and species (for example the annelid Helobdella robusta and the salmon louse Lepeophtheirus salmonis. Several bilateria have been found to encode multiple Mhc genes that might all or in part contain clusters of MXEs. Notable examples are a cluster of six tandemly arrayed Mhc genes, of which two contain MXEs, in the owl limpet Lottia gigantea and four Mhc genes with three encoding MXEs in the predatory mite Metaseiulus occidentalis. Our analysis showed that similar solutions to provide different myosin isoforms (multiple genes or clusters of MXEs or both have independently been developed

  4. Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences

    Directory of Open Access Journals (Sweden)

    Jai Ram Rideout

    2014-08-01

    Full Text Available We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity. Because more of our algorithm can be run in parallel relative to “classic” open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, “classic” open-reference OTU clustering is often faster. We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons. To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of “classic” open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by “classic” open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME’s uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced. Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in

  5. Genome mining demonstrates the widespread occurrence of gene clusters encoding bacteriocins in cyanobacteria.

    Science.gov (United States)

    Wang, Hao; Fewer, David P; Sivonen, Kaarina

    2011-01-01

    Cyanobacteria are a rich source of natural products with interesting biological activities. Many of these are peptides and the end products of a non-ribosomal pathway. However, several cyanobacterial peptide classes were recently shown to be produced through the proteolytic cleavage and post-translational modification of short precursor peptides. A new class of bacteriocins produced through the proteolytic cleavage and heterocyclization of precursor proteins was recently identified from marine cyanobacteria. Here we show the widespread occurrence of bacteriocin gene clusters in cyanobacteria through comparative analysis of 58 cyanobacterial genomes. A total of 145 bacteriocin gene clusters were discovered through genome mining. These clusters encoded 290 putative bacteriocin precursors. They ranged in length from 28 to 164 amino acids with very little sequence conservation of the core peptide. The gene clusters could be classified into seven groups according to their gene organization and domain composition. This classification is supported by phylogenetic analysis, which further indicated independent evolutionary trajectories of gene clusters in different groups. Our data suggests that cyanobacteria are a prolific source of low-molecular weight post-translationally modified peptides.

  6. Genome mining demonstrates the widespread occurrence of gene clusters encoding bacteriocins in cyanobacteria.

    Directory of Open Access Journals (Sweden)

    Hao Wang

    Full Text Available Cyanobacteria are a rich source of natural products with interesting biological activities. Many of these are peptides and the end products of a non-ribosomal pathway. However, several cyanobacterial peptide classes were recently shown to be produced through the proteolytic cleavage and post-translational modification of short precursor peptides. A new class of bacteriocins produced through the proteolytic cleavage and heterocyclization of precursor proteins was recently identified from marine cyanobacteria. Here we show the widespread occurrence of bacteriocin gene clusters in cyanobacteria through comparative analysis of 58 cyanobacterial genomes. A total of 145 bacteriocin gene clusters were discovered through genome mining. These clusters encoded 290 putative bacteriocin precursors. They ranged in length from 28 to 164 amino acids with very little sequence conservation of the core peptide. The gene clusters could be classified into seven groups according to their gene organization and domain composition. This classification is supported by phylogenetic analysis, which further indicated independent evolutionary trajectories of gene clusters in different groups. Our data suggests that cyanobacteria are a prolific source of low-molecular weight post-translationally modified peptides.

  7. Characterization of the largest effector gene cluster of Ustilago maydis.

    Directory of Open Access Journals (Sweden)

    Thomas Brefort

    2014-07-01

    Full Text Available In the genome of the biotrophic plant pathogen Ustilago maydis, many of the genes coding for secreted protein effectors modulating virulence are arranged in gene clusters. The vast majority of these genes encode novel proteins whose expression is coupled to plant colonization. The largest of these gene clusters, cluster 19A, encodes 24 secreted effectors. Deletion of the entire cluster results in severe attenuation of virulence. Here we present the functional analysis of this genomic region. We show that a 19A deletion mutant behaves like an endophyte, i.e. is still able to colonize plants and complete the infection cycle. However, tumors, the most conspicuous symptoms of maize smut disease, are only rarely formed and fungal biomass in infected tissue is significantly reduced. The generation and analysis of strains carrying sub-deletions identified several genes significantly contributing to tumor formation after seedling infection. Another of the effectors could be linked specifically to anthocyanin induction in the infected tissue. As the individual contributions of these genes to tumor formation were small, we studied the response of maize plants to the whole cluster mutant as well as to several individual mutants by array analysis. This revealed distinct plant responses, demonstrating that the respective effectors have discrete plant targets. We propose that the analysis of plant responses to effector mutant strains that lack a strong virulence phenotype may be a general way to visualize differences in effector function.

  8. Characterization of the largest effector gene cluster of Ustilago maydis.

    Science.gov (United States)

    Brefort, Thomas; Tanaka, Shigeyuki; Neidig, Nina; Doehlemann, Gunther; Vincon, Volker; Kahmann, Regine

    2014-07-01

    In the genome of the biotrophic plant pathogen Ustilago maydis, many of the genes coding for secreted protein effectors modulating virulence are arranged in gene clusters. The vast majority of these genes encode novel proteins whose expression is coupled to plant colonization. The largest of these gene clusters, cluster 19A, encodes 24 secreted effectors. Deletion of the entire cluster results in severe attenuation of virulence. Here we present the functional analysis of this genomic region. We show that a 19A deletion mutant behaves like an endophyte, i.e. is still able to colonize plants and complete the infection cycle. However, tumors, the most conspicuous symptoms of maize smut disease, are only rarely formed and fungal biomass in infected tissue is significantly reduced. The generation and analysis of strains carrying sub-deletions identified several genes significantly contributing to tumor formation after seedling infection. Another of the effectors could be linked specifically to anthocyanin induction in the infected tissue. As the individual contributions of these genes to tumor formation were small, we studied the response of maize plants to the whole cluster mutant as well as to several individual mutants by array analysis. This revealed distinct plant responses, demonstrating that the respective effectors have discrete plant targets. We propose that the analysis of plant responses to effector mutant strains that lack a strong virulence phenotype may be a general way to visualize differences in effector function.

  9. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia

    Science.gov (United States)

    Gurjav, Ulziijargal; Outhred, Alexander C.; Jelfs, Peter; McCallum, Nadine; Wang, Qinning; Hill-Cawthorne, Grant A.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24) genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW), Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841) had complete demographic and genotyping data. East-African Indian (474; 28.0%) and Beijing (470; 27.8%) lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692) and was highest among Beijing lineage strains (35.7%; 168/470). One Beijing and three East-African Indian (EAI) clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates) demonstrated diverse single nucleotide polymorphisms (SNPs) within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings. PMID:27737005

  10. A method for clustering of miRNA sequences using fragmented programming

    Science.gov (United States)

    Ivashchenko, Anatoly; Pyrkova, Anna; Niyazova, Raigul

    2016-01-01

    Clustering of miRNA sequences is an important problem in molecular genetics associated cellular biology. Thousands of such sequences are known today through advancement in sophisticated molecular tools, sequencing techniques, computational resources and rule based mathematical models. Analysis of such large-scale miRNA sequences for inferring patterns towards deducing cellular function is a great challenge in modern molecular biology. Therefore, it is of interest to develop mathematical models specific for miRNA sequences. The process is to group (cluster) such miRNA sequences using well-defined known features. We describe a method for clustering of miRNA sequences using fragmented programming. Subsequently, we illustrated the utility of the model using a dendrogram (a tree diagram) for publically known A.thaliana miRNA nucleotide sequences towards the inference of observed conserved patterns PMID:27212839

  11. Environments and Morphologies of Red Sequence Galaxies with Residual Star Formation in Massive Clusters

    OpenAIRE

    Crossett, Jacob P.; Pimbblet, Kevin A.; Stott, John P; Jones, D. Heath

    2013-01-01

    We present a photometric investigation into recent star formation in galaxy clusters at z ~ 0.1. We use spectral energy distribution templates to quantify recent star formation in large X-ray selected clusters from the LARCS survey using matched GALEX NUV photometry. These clusters all have signs of red sequence galaxy recent star formation (as indicated by blue NUV-R colour), regardless of cluster morphology and size. A trend in environment is found for these galaxies, such that they prefer ...

  12. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

    Directory of Open Access Journals (Sweden)

    Kaas Rolf S

    2012-10-01

    Full Text Available Abstract Background Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. Results We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness of the 186 sequenced E. coli genomes. The core-gene tree displays high confidence and divides the E. coli strains into the observed MLST type clades and also separates defined phylotypes. Conclusion The results of comparing a large and diverse E. coli dataset support the theory that reliable and good resolution phylogenies can be inferred from the core-genome. The results further suggest that the resolution at the isolate level may, subsequently be improved by targeting more variable genes. The use of whole genome sequencing will make it possible to eliminate, or at least reduce, the need for several typing steps used in traditional epidemiology.

  13. ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.

    Science.gov (United States)

    Cai, Yunpeng; Zheng, Wei; Yao, Jin; Yang, Yujie; Mai, Volker; Mao, Qi; Sun, Yijun

    2017-04-01

    The rapid development of sequencing technology has led to an explosive accumulation of genomic sequence data. Clustering is often the first step to perform in sequence analysis, and hierarchical clustering is one of the most commonly used approaches for this purpose. However, it is currently computationally expensive to perform hierarchical clustering of extremely large sequence datasets due to its quadratic time and space complexities. In this paper we developed a new algorithm called ESPRIT-Forest for parallel hierarchical clustering of sequences. The algorithm achieves subquadratic time and space complexity and maintains a high clustering accuracy comparable to the standard method. The basic idea is to organize sequences into a pseudo-metric based partitioning tree for sub-linear time searching of nearest neighbors, and then use a new multiple-pair merging criterion to construct clusters in parallel using multiple threads. The new algorithm was tested on the human microbiome project (HMP) dataset, currently one of the largest published microbial 16S rRNA sequence dataset. Our experiment demonstrated that with the power of parallel computing it is now compu- tationally feasible to perform hierarchical clustering analysis of tens of millions of sequences. The software is available at http://www.acsu.buffalo.edu/∼yijunsun/lab/ESPRIT-Forest.html.

  14. An improved algorithm for clustering gene expression data.

    Science.gov (United States)

    Bandyopadhyay, Sanghamitra; Mukhopadhyay, Anirban; Maulik, Ujjwal

    2007-11-01

    Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.

  15. Gene and translation initiation site prediction in metagenomic sequences

    Energy Technology Data Exchange (ETDEWEB)

    Hyatt, Philip Douglas [ORNL; LoCascio, Philip F [ORNL; Hauser, Loren John [ORNL; Uberbacher, Edward C [ORNL

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.

  16. Molecular cloning and characterization of the human beta-like globin gene cluster.

    Science.gov (United States)

    Fritsch, E F; Lawn, R M; Maniatis, T

    1980-04-01

    The genes encoding human embryonic (epsilon), fetal (G gamma, A gamma) and adult (delta, beta) beta-like globin polypeptides were isolated as a set of overlapping cloned DNA fragments from bacteriophage lambda libraries of high molecular weight (15-20 kb) chromosomal DNA. The 65 kb of DNA represented in these overlapping clones contains the genes for all five beta-like polypeptides, including the embryonic epsilon-globin gene, for which the chromosomal location was previously unknown. All five genes are transcribed from the same DNA strand and are arranged in the order 5'-epsilon-(13.3 kb)-G gamma-(3.5 kb)-A gamma-(13.9 kb)-delta-(5.4 kb)-beta-3'. Thus the genes are positioned on the chromosome in the order of their expression during development. In addition to the five known beta-like globin genes, we have detected two other beta-like globin sequences which do not correspond to known polypeptides. One of these sequences has been mapped to the A gamma-delta intergenic region while the other is located 6-9 kb 5' to the epsilon gene. Cross hybridization experiments between the intergenic sequences of the gene cluster have revealed a nonglobin repeat sequence (*) which is interspersed with the globin genes in the following manner: 5'-**epsilon-*G gamma-A gamma*-**delta-beta*-3'. Fine structure mapping of the region located 5' to the delta-globin gene revealed two repeats with a maximum size of 400 bp, which are separated by approximately 700 bp of DNA not repeated within the cluster. Preliminary experiments indicate that this repeat family is also repeated many times in the human genome.

  17. Genome classification by gene distribution: An overlapping subspace clustering approach

    Directory of Open Access Journals (Sweden)

    Halgamuge Saman K

    2008-04-01

    Full Text Available Abstract Background Genomes of lower organisms have been observed with a large amount of horizontal gene transfers, which cause difficulties in their evolutionary study. Bacteriophage genomes are a typical example. One recent approach that addresses this problem is the unsupervised clustering of genomes based on gene order and genome position, which helps to reveal species relationships that may not be apparent from traditional phylogenetic methods. Results We propose the use of an overlapping subspace clustering algorithm for such genome classification problems. The advantage of subspace clustering over traditional clustering is that it can associate clusters with gene arrangement patterns, preserving genomic information in the clusters produced. Additionally, overlapping capability is desirable for the discovery of multiple conserved patterns within a single genome, such as those acquired from different species via horizontal gene transfers. The proposed method involves a novel strategy to vectorize genomes based on their gene distribution. A number of existing subspace clustering and biclustering algorithms were evaluated to identify the best framework upon which to develop our algorithm; we extended a generic subspace clustering algorithm called HARP to incorporate overlapping capability. The proposed algorithm was assessed and applied on bacteriophage genomes. The phage grouping results are consistent overall with the Phage Proteomic Tree and showed common genomic characteristics among the TP901-like, Sfi21-like and sk1-like phage groups. Among 441 phage genomes, we identified four significantly conserved distribution patterns structured by the terminase, portal, integrase, holin and lysin genes. We also observed a subgroup of Sfi21-like phages comprising a distinctive divergent genome organization and identified nine new phage members to the Sfi21-like genus: Staphylococcus 71, phiPVL108, Listeria A118, 2389, Lactobacillus phi AT3, A2

  18. Interpolation based consensus clustering for gene expression time series.

    Science.gov (United States)

    Chiu, Tai-Yu; Hsu, Ting-Chieh; Yen, Chia-Cheng; Wang, Jia-Shung

    2015-04-16

    Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue. An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm. The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.

  19. Identification and functional analysis of gene cluster involvement in biosynthesis of the cyclic lipopeptide antibiotic pelgipeptin produced by Paenibacillus elgii

    Directory of Open Access Journals (Sweden)

    Qian Chao-Dong

    2012-09-01

    Full Text Available Abstract Background Pelgipeptin, a potent antibacterial and antifungal agent, is a non-ribosomally synthesised lipopeptide antibiotic. This compound consists of a β-hydroxy fatty acid and nine amino acids. To date, there is no information about its biosynthetic pathway. Results A potential pelgipeptin synthetase gene cluster (plp was identified from Paenibacillus elgii B69 through genome analysis. The gene cluster spans 40.8 kb with eight open reading frames. Among the genes in this cluster, three large genes, plpD, plpE, and plpF, were shown to encode non-ribosomal peptide synthetases (NRPSs, with one, seven, and one module(s, respectively. Bioinformatic analysis of the substrate specificity of all nine adenylation domains indicated that the sequence of the NRPS modules is well collinear with the order of amino acids in pelgipeptin. Additional biochemical analysis of four recombinant adenylation domains (PlpD A1, PlpE A1, PlpE A3, and PlpF A1 provided further evidence that the plp gene cluster involved in pelgipeptin biosynthesis. Conclusions In this study, a gene cluster (plp responsible for the biosynthesis of pelgipeptin was identified from the genome sequence of Paenibacillus elgii B69. The identification of the plp gene cluster provides an opportunity to develop novel lipopeptide antibiotics by genetic engineering.

  20. Alignment of Red-Sequence Cluster Dwarf Galaxies: From the Frontier Fields to the Local Universe

    Science.gov (United States)

    Barkhouse, Wayne Alan; Archer, Haylee; Burgad, Jaford; Foote, Gregory; Rude, Cody; Lopez-Cruz, Omar

    2015-08-01

    Galaxy clusters are the largest virialized structures in the universe. Due to their high density and mass, they are an excellent laboratory for studying the environmental effects on galaxy evolution. Numerical simulations have predicted that tidal torques acting on dwarf galaxies as they fall into the cluster environment will cause the major axis of the galaxies to align with their radial position vector (a line that extends from the cluster center to the galaxy's center). We have undertaken a study to measure the redshift evolution of the alignment of red-sequence cluster dwarf galaxies based on a sample of 57 low-redshift Abell clusters imaged at KPNO using the 0.9-meter telescope, and 64 clusters from the WINGS dataset. To supplement our low-redshift sample, we have included galaxies selected from the Hubble Space Telescope Frontier fields. Leveraging the HST data allows us to look for evolutionary changes in the alignment of red-sequence cluster dwarf galaxies over a redshift range of 0 < z < 0.35. The alignment of the major axis of the dwarf galaxies is measured by fitting a Sersic function to each red-sequence galaxy using GALFIT. The quality of each model is checked visually after subtracting the model from the galaxy. The cluster sample is then combined by scaling each cluster by r200. We present our preliminary results based on the alignment of the red-sequence dwarf galaxies with: 1) the major axis of the brightest cluster galaxy, 2) the major axis of the cluster defined by the position of cluster members, and 3) a radius vector pointing from the cluster center to individual dwarf galaxies. Our combined cluster sample is sub-divided into different radial regions and redshift bins.

  1. Clusters of Galaxies at 1 < z < 2 The Spitzer Adaptation of the Red-Sequence Cluster Survey

    CERN Document Server

    Wilson, G; Lacy, M; Yee, H; Surace, J; Lonsdale, C; Hoekstra, H; Majumdar, S; Gilbank, D; Gladders, M; Wilson, Gillian; Muzzin, Adam; Lacy, Mark; Yee, Howard; Surace, Jason; Lonsdale, Carol; Hoekstra, Henk; Majumdar, Subhabrata; Gilbank, David

    2006-01-01

    As the densest galaxy environments in the universe, clusters are vital to our understanding of the role that environment plays in galaxy formation and evolution. Unfortunately, the evolution of high-redshift cluster galaxies is poorly understood because of the ``cluster desert'' that exists at 1 2 to the quiescent population at z < 1. The existing seven-passband Spitzer data (3.6, 4.5, 5.8, 8.0, 24, 70, 160 micron) will allow us to make the first measurements of the evolution of the cluster red-sequence, IR luminosity function, and the mid-IR dust-obscured star-formation rate for 1 < z < 2 clusters.

  2. [Molecular phylogeny of the gayal inferred from the analysis of cytochrome b gene entire sequences].

    Science.gov (United States)

    Li, Shi-Ping; Chang, Hong; Ma, Guo-Long; Chen, Hong-Yu; Ji, De-Jun; Geng, Rong-Qing

    2008-01-01

    The gayal (Bos frontalis) is a very rare, semi-wild and semi-domestic bovine species. There still exist remarkable divergences on the gayal's origin and phylogenetic status. The cytochrome b (Cyt b) gene entire sequences (1,140 bp) of 11 gayals were sequenced and analyzed. Combined with other bovine Cyt b entire sequences cited in GenBank, the phylogenetic trees of genus Bos were reconstructed by neighbor-joining (NJ) and maximum parsimony (MP) methods with Bubalus bubalis as outgroup. Sequence analysis showed that, among 1,140 sites of Cyt b gene entire sequences of 11 gayals, 95 variable sites (8.33% of all sites) and 6 haplotypes were found, showing abundant genetic diversity in mitochondrial Cyt b gene of the gayals. Both NJ and MP trees demonstrated that the gayals in this study were markedly divided into three embranchments: one embranchment clustering with Bos taurus, another clustering with Bos indicus, and the third clustering with Bos gaurus. The result of phylogenetic analysis suggested that the gayal might be the domesticated form of the gaur (Bos gaurus), and a great proportion of the gayal bloodline was invaded by other bovine species.

  3. Marker2sequence, mine your QTL regions for candidate genes

    NARCIS (Netherlands)

    Chibon, P.Y.F.R.P.; Schoof, H.; Visser, R.G.F.; Finkers, H.J.

    2012-01-01

    Marker2sequence (M2S) aims at mining quantitative trait loci (QTLs) for candidate genes. For each gene, within the QTL region, M2S uses data integration technology to integrate putative gene function with associated gene ontology terms, proteins, pathways and literature. As a typical QTL region

  4. Ontology-Driven Co-clustering of Gene Expression Data

    Science.gov (United States)

    Cordero, Francesca; Pensa, Ruggero G.; Visconti, Alessia; Ienco, Dino; Botta, Marco

    The huge volume of gene expression data produced by microarrays and other high-throughput techniques has encouraged the development of new computational techniques to evaluate the data and to formulate new biological hypotheses. To this purpose, co-clustering techniques are widely used: these identify groups of genes that show similar activity patterns under a specific subset of the experimental conditions by measuring the similarity in expression within these groups. However, in many applications, distance metrics based only on expression levels fail in capturing biologically meaningful clusters.

  5. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...

  6. The Effect of Pre-Main Sequence Stars on Star Cluster Dynamics

    CERN Document Server

    Wiersma, R; Zwart, S P

    2006-01-01

    We investigate the effects of the addition of pre-main sequence evolution to star cluster simulations. We allowed stars to follow pre-main sequence tracks that begin at the deuterium burning birthline and end at the zero age main sequence. We compared our simulations to ones in which the stars began their lives at the zero age main sequence, and also investigated the effects of particular choices for initial binary orbital parameters. We find that the inclusion of the pre-main sequence phase results in a slightly higher core concentration, lower binary fraction, and fewer hard binary systems. In general, the global properties of star clusters remain almost unchanged, but the properties of the binary star population in the cluster can be dramatically modified by the correct treatment of the pre-main sequence stage.

  7. Copy number of pilus gene clusters in Haemophilus influenzae and variation in the hifE pilin gene.

    Science.gov (United States)

    Read, T D; Satola, S W; Opdyke, J A; Farley, M M

    1998-04-01

    Brazilian purpuric fever (BPF)-associated Haemophilus influenzae biogroup aegyptius strain F3031 contains two identical copies of a five gene cluster (hifA to hifE) encoding pili similar to well-characterized Hif fimbriae of H. influenzae type b. HifE, the putative pilus tip adhesin of F3031, shares only 40% amino acid sequence similarity with the same molecule from type b strains, whereas the other four proteins have 75 to 95% identity. To determine whether pilus cluster duplication and the hifE(F3031) allele were special features of BPF-associated bacteria, we analyzed a collection of H. influenzae strains by PCR with hifA- and hifE-specific oligonucleotides, by Southern hybridization with a hifC gene probe, and by nucleotide sequencing. The presence of two pilus clusters was limited to some H. influenzae biogroup aegyptius strains. The hifE(F3031) allele was limited to H. influenzae biogroup aegyptius. Two strains contained one copy of hifE(F3031) and one copy of a variant hifE allele. We determined the nucleotide sequences of four hifE genes from H. influenzae biogroup aegyptius and H. influenzae capsule serotypes a and c. The predicted proteins produced by these genes demonstrated only 35 to 70% identity to the three published HifE proteins from nontypeable H. influenzae, serotype b, and BPF strains. The C-terminal third of the molecules implicated in chaperone binding was the most highly conserved region. Three conserved domains in the otherwise highly variable N-terminal putative receptor-binding region of HifE were similar to conserved portions in the N terminus of Neisseria pilus adhesin PilC. We concluded that two pilus clusters and hifE(F3031) were not specific for BPF-causing H. influenzae, and we also identified portions of HifE possibly involved in binding mammalian cell receptors.

  8. CLUSEAN: a computer-based framework for the automated analysis of bacterial secondary metabolite biosynthetic gene clusters.

    Science.gov (United States)

    Weber, T; Rausch, C; Lopez, P; Hoof, I; Gaykova, V; Huson, D H; Wohlleben, W

    2009-03-10

    Bacterial secondary metabolites are an important source of antimicrobial and cytostatic drugs. These molecules are often synthesized in a stepwise fashion by multimodular megaenzymes that are encoded in clusters of genes encoding enzymes for precursor supply and modification. In this work,we present an open source software pipeline, CLUSEAN (CLUster SEquence ANalyzer) that helps to annotate and analyze such gene clusters. CLUSEAN integrates standard analysis tools, like BLAST and HMMer, with specific tools for the identification of the functional domains and motifs in nonribosomal peptide synthetases (NRPS)/type I polyketide synthases (PKS) and the prediction of specificities of NRPS.

  9. Sequencing genes in silico using single nucleotide polymorphisms

    Directory of Open Access Journals (Sweden)

    Zhang Xinyi

    2012-01-01

    Full Text Available Abstract Background The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive. Results To accelerate the translation from discovery to functional studies, we propose an in silico gene sequencing method (ISS, which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of prediction models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79%-100%. This database of prediction models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes. Conclusions Prior to the general availability of routine sequencing of all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate

  10. A Novel Type Pathway-Specific Regulator and Dynamic Genome Environments of a Solanapyrone Biosynthesis Gene Cluster in the Fungus Ascochyta rabiei.

    Science.gov (United States)

    Kim, Wonyong; Park, Jeong-Jin; Gang, David R; Peever, Tobin L; Chen, Weidong

    2015-11-01

    Secondary metabolite genes are often clustered together and situated in particular genomic regions, like the subtelomere, that can facilitate niche adaptation in fungi. Solanapyrones are toxic secondary metabolites produced by fungi occupying different ecological niches. Full-genome sequencing of the ascomycete Ascochyta rabiei revealed a solanapyrone biosynthesis gene cluster embedded in an AT-rich region proximal to a telomere end and surrounded by Tc1/Mariner-type transposable elements. The highly AT-rich environment of the solanapyrone cluster is likely the product of repeat-induced point mutations. Several secondary metabolism-related genes were found in the flanking regions of the solanapyrone cluster. Although the solanapyrone cluster appears to be resistant to repeat-induced point mutations, a P450 monooxygenase gene adjacent to the cluster has been degraded by such mutations. Among the six solanapyrone cluster genes (sol1 to sol6), sol4 encodes a novel type of Zn(II)2Cys6 zinc cluster transcription factor. Deletion of sol4 resulted in the complete loss of solanapyrone production but did not compromise growth, sporulation, or virulence. Gene expression studies with the sol4 deletion and sol4-overexpressing mutants delimited the boundaries of the solanapyrone gene cluster and revealed that sol4 is likely a specific regulator of solanapyrone biosynthesis and appears to be necessary and sufficient for induction of the solanapyrone cluster genes. Despite the dynamic surrounding genomic regions, the solanapyrone gene cluster has maintained its integrity, suggesting important roles of solanapyrones in fungal biology.

  11. Cloning and sequence analysis of chitin synthase gene fragments of Demodex mites.

    Science.gov (United States)

    Zhao, Ya-e; Wang, Zheng-hang; Xu, Yang; Xu, Ji-ru; Liu, Wen-yan; Wei, Meng; Wang, Chu-ying

    2012-10-01

    To our knowledge, few reports on Demodex studied at the molecular level are available at present. In this study our group, for the first time, cloned, sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum, Demodex brevis, and Demodex canis (three isolates from each species) from Xi'an China, by designing specific primers based on the only partial sequence of the CHS gene of D. canis from Japan, retrieved from GenBank. Results show that amplification was successful only in three D. canis isolates and one D. brevis isolate out of the nine Demodex isolates. The obtained fragments were sequenced to be 339 bp for D. canis and 338 bp for D. brevis. The CHS gene sequence similarities between the three Xi'an D. canis isolates and one Japanese D. canis isolate ranged from 99.7% to 100.0%, and those between four D. canis isolates and one D. brevis isolate were 99.1%-99.4%. Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters, according with the traditional classification. Two open reading frames (ORFs) were identified in each CHS gene sequenced, and their corresponding amino acid sequences were located at the catalytic domain. The relatively conserved sequences could be deduced to be a CHS class A gene, which is associated with chitin synthesis in the integument of Demodex mites.

  12. Semi-automatic time-series transfer functions via temporal clustering and sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Woodring, Jonathan L [Los Alamos National Laboratory; Shen, H W [OHIO STATE UNIV.

    2009-01-01

    When creating transfer functions for time-varying data, it is not clear what range of values to use for classification, as data value ranges and distributions change over time. In order to generate time-varying transfer functions, they search the data for classes that have similar behavior over time, assuming that data points that behave similarly belong to the same feature. They utilize a method they call temporal clustering and sequencing to find dynamic features in value space and create a corresponding transfer function. First, clustering finds groups of data points that have the same value space activity over time. Then, sequencing derives a progression of clusters over time, creating chains that follow value distribution changes. Finally, the cluster sequences are used to create transfer functions, as sequences describe the value range distributions over time in a data set.

  13. Complete nucleotide sequence and gene rearrangement of the mitochondrial genome of Occidozyga martensii

    Indian Academy of Sciences (India)

    En Li; Xiaoqiang Li; Xiaobing Wu; Ge Feng; Man Zhang; Haitao Shi; Lijun Wang; Jianping Jiang

    2014-12-01

    In this study, the complete nucleotide sequence (18,321 bp) of the mitochondrial (mt) genome of the round-tongued floating frog, Occidozyga martensii was determined. Although, the base composition and codon usage of O. martensii conformed to the typical vertebrate patterns, this mt genome contained 23 tRNAs (a tandem duplication of tRNA-Met gene). The LTPF tRNA-gene cluster, and the derived position of the ND5 gene downstream of the control region, were present in this mitogenome. Moreover, we found that in the WANCY tRNA-gene cluster, the tRNA-Asn gene was located between the tRNA-Tyr and COI genes instead of between the tRNA-Ala and tRNA-Cys genes, which is a novel mtDNA gene rearrangement in vertebrates. Based on the concatenated nucleotide sequences of the 13 protein-coding genes, phylogenetic analysis (BI, ML, MP) was performed to further clarify the phylogenetic relations of this species within anurans.

  14. Escherichia coli contains a protein that is homologous in function and N-terminal sequence to the protein encoded by the nifS gene of Azotobacter vinelandii and that can participate in the synthesis of the Fe-S cluster of dihydroxy-acid dehydratase.

    Science.gov (United States)

    Flint, D H

    1996-07-05

    In this paper, I report the purification of a protein from Escherichia coli that is very similar in sequence, molecular weight, and the reactions it can catalyze to the protein encoded by the Azotobacter vinelandii nifS gene. This E. coli protein contains pyridoxal phosphate as a cofactor and catalyzes the removal of sulfur from cysteine to form alanine and S0. When dithiothreitol is present along with cysteine, the S0 formed is reduced to S2-. This protein has a reactive sulfhydryl group that is essential for activity. As isolated, this sulfhydryl group appears to be in a disulfide linkage with the sulfhydryl group from the phosphopantetheine moiety of the acyl carrier protein. The purified E. coli protein can mobilize the sulfur from cysteine and contribute it to the formation of a [4Fe-4S] cluster on the apoprotein of E. coli dihydroxy-acid dehydratase. A mechanism is proposed for the early stages of the synthesis of Fe-S clusters using this protein and sulfur in the S0 oxidation state.

  15. SATB1 regulates {beta}-like globin genes through matrix related nuclear relocation of the cluster

    Energy Technology Data Exchange (ETDEWEB)

    Gong, Huan; Wang, Zhao; Zhao, Guo-wei; Lv, Xiang; Wei, Gong-hong; Wang, Li [National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) and Peking Union Medical College (PUMC), 5 Dong Dan San Tiao, Beijing 100005 (China); Liu, De-pei, E-mail: liudp@pumc.edu.cn [National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) and Peking Union Medical College (PUMC), 5 Dong Dan San Tiao, Beijing 100005 (China); Liang, Chih-chuan [National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) and Peking Union Medical College (PUMC), 5 Dong Dan San Tiao, Beijing 100005 (China)

    2009-05-22

    The nuclear location and relocation of genes play crucial regulatory roles in gene expression. SATB1, a MAR-binding protein, has been found to regulate {beta}-like globin genes through chromatin remodeling. In this study, we generated K562 cells over-expressing wild-type or nuclear matrix targeting sequences (NMTS)-deficient SATB1 and found that like wild-type SATB1, NMTS-deficient SATB1 induces out loop of {beta}-globin cluster from its chromosome territory (CT), while it is unable to associate the cluster with the nuclear matrix as wild-type SATB1 does and had no regulatory functions to the {beta}-globin cluster. Besides, our data showed that the transacting factor occupancies and chromatin modifications at {beta}-globin cluster were differentially affected by wild-type and NMTS-deficient SATB1. These results indicate that SATB1 regulates {beta}-like globin genes at the nuclear level interlaced with chromatin and DNA level, and emphasize the nuclear matrix binding activity of SATB1 to its regulatory function.

  16. De novo transcriptome sequencing of axolotl blastema for identification of differentially expressed genes during limb regeneration

    Science.gov (United States)

    2013-01-01

    Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514

  17. An approximation polynomial-time algorithm for a sequence bi-clustering problem

    Science.gov (United States)

    Kel'manov, A. V.; Khamidullin, S. A.

    2015-06-01

    We consider a strongly NP-hard problem of partitioning a finite sequence of vectors in Euclidean space into two clusters using the criterion of the minimal sum of the squared distances from the elements of the clusters to the centers of the clusters. The center of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The center of the other cluster is fixed at the origin. Moreover, the partition is such that the difference between the indices of two successive vectors in the first cluster is bounded above and below by prescribed constants. A 2-approximation polynomial-time algorithm is proposed for this problem.

  18. Gene cluster analysis for the biosynthesis of elgicins, novel lantibiotics produced by paenibacillus elgii B69

    Directory of Open Access Journals (Sweden)

    Teng Yi

    2012-03-01

    Full Text Available Abstract Background The recent increase in bacterial resistance to antibiotics has promoted the exploration of novel antibacterial materials. As a result, many researchers are undertaking work to identify new lantibiotics because of their potent antimicrobial activities. The objective of this study was to provide details of a lantibiotic-like gene cluster in Paenibacillus elgii B69 and to produce the antibacterial substances coded by this gene cluster based on culture screening. Results Analysis of the P. elgii B69 genome sequence revealed the presence of a lantibiotic-like gene cluster composed of five open reading frames (elgT1, elgC, elgT2, elgB, and elgA. Screening of culture extracts for active substances possessing the predicted properties of the encoded product led to the isolation of four novel peptides (elgicins AI, AII, B, and C with a broad inhibitory spectrum. The molecular weights of these peptides were 4536, 4593, 4706, and 4820 Da, respectively. The N-terminal sequence of elgicin B was Leu-Gly-Asp-Tyr, which corresponded to the partial sequence of the peptide ElgA encoded by elgA. Edman degradation suggested that the product elgicin B is derived from ElgA. By correlating the results of electrospray ionization-mass spectrometry analyses of elgicins AI, AII, and C, these peptides are deduced to have originated from the same precursor, ElgA. Conclusions A novel lantibiotic-like gene cluster was shown to be present in P. elgii B69. Four new lantibiotics with a broad inhibitory spectrum were isolated, and these appear to be promising antibacterial agents.

  19. Cloning and characterization of the polyether salinomycin biosynthesis gene cluster of Streptomyces albus XM211.

    Science.gov (United States)

    Jiang, Chunyan; Wang, Hougen; Kang, Qianjin; Liu, Jing; Bai, Linquan

    2012-02-01

    Salinomycin is widely used in animal husbandry as a food additive due to its antibacterial and anticoccidial activities. However, its biosynthesis had only been studied by feeding experiments with isotope-labeled precursors. A strategy with degenerate primers based on the polyether-specific epoxidase sequences was successfully developed to clone the salinomycin gene cluster. Using this strategy, a putative epoxidase gene, slnC, was cloned from the salinomycin producer Streptomyces albus XM211. The targeted replacement of slnC and subsequent trans-complementation proved its involvement in salinomycin biosynthesis. A 127-kb DNA region containing slnC was sequenced, including genes for polyketide assembly and release, oxidative cyclization, modification, export, and regulation. In order to gain insight into the salinomycin biosynthesis mechanism, 13 gene replacements and deletions were conducted. Including slnC, 7 genes were identified as essential for salinomycin biosynthesis and putatively responsible for polyketide chain release, oxidative cyclization, modification, and regulation. Moreover, 6 genes were found to be relevant to salinomycin biosynthesis and possibly involved in precursor supply, removal of aberrant extender units, and regulation. Sequence analysis and a series of gene replacements suggest a proposed pathway for the biosynthesis of salinomycin. The information presented here expands the understanding of polyether biosynthesis mechanisms and paves the way for targeted engineering of salinomycin activity and productivity.

  20. Precision Measurements of the Cluster Red Sequence using an Error Corrected Gaussian Mixture Model

    Energy Technology Data Exchange (ETDEWEB)

    Hao, Jiangang; /Fermilab /Michigan U.; Koester, Benjamin P.; /Chicago U.; Mckay, Timothy A.; /Michigan U.; Rykoff, Eli S.; /UC, Santa Barbara; Rozo, Eduardo; /Ohio State U.; Evrard, August; /Michigan U.; Annis, James; /Fermilab; Becker, Matthew; /Chicago U.; Busha, Michael; /KIPAC, Menlo Park /SLAC; Gerdes, David; /Michigan U.; Johnston, David E.; /Northwestern U. /Brookhaven

    2009-07-01

    The red sequence is an important feature of galaxy clusters and plays a crucial role in optical cluster detection. Measurement of the slope and scatter of the red sequence are affected both by selection of red sequence galaxies and measurement errors. In this paper, we describe a new error corrected Gaussian Mixture Model for red sequence galaxy identification. Using this technique, we can remove the effects of measurement error and extract unbiased information about the intrinsic properties of the red sequence. We use this method to select red sequence galaxies in each of the 13,823 clusters in the maxBCG catalog, and measure the red sequence ridgeline location and scatter of each. These measurements provide precise constraints on the variation of the average red galaxy populations in the observed frame with redshift. We find that the scatter of the red sequence ridgeline increases mildly with redshift, and that the slope decreases with redshift. We also observe that the slope does not strongly depend on cluster richness. Using similar methods, we show that this behavior is mirrored in a spectroscopic sample of field galaxies, further emphasizing that ridgeline properties are independent of environment. These precise measurements serve as an important observational check on simulations and mock galaxy catalogs. The observed trends in the slope and scatter of the red sequence ridgeline with redshift are clues to possible intrinsic evolution of the cluster red-sequence itself. Most importantly, the methods presented in this work lay the groundwork for further improvements in optically-based cluster cosmology.

  1. Evolution of coding and non-coding genes in HOX clusters of a marsupial

    Directory of Open Access Journals (Sweden)

    Yu Hongshi

    2012-06-01

    Full Text Available Abstract Background The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Results Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. Conclusions This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial.

  2. Structure and gene cluster of the O-antigen of Escherichia coli O133.

    Science.gov (United States)

    Shashkov, Alexander S; Zhang, Yuanyuan; Sun, Qiangzheng; Guo, Xi; Senchenkova, Sof'ya N; Perepelov, Andrei V; Knirel, Yuriy A

    2016-07-22

    The O-specific polysaccharide (O-antigen) of Escherichia coli O133 was obtained by mild acid hydrolysis of the lipopolysaccharide of E. coli O133. The structure of the hexasaccharide repeating unit of the polysaccharide was elucidated by (1)H and (13)C NMR spectroscopy, including a two-dimensional (1)H-(1)H ROESY experiment: Functions of genes in the O-antigen gene cluster were putatively identified by comparison with sequences in the available databases and, particularly, an encoded predicted multifunctional glycosyltransferase was assigned to three α-l-rhamnosidic linkages.

  3. Coupled Two-Way Clustering Analysis of Gene Microarray Data

    CERN Document Server

    Getz, G; Domany, E

    2000-01-01

    We present a novel coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task: we present an algorithm, based on iterative clustering, which performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  4. Coupled two-way clustering analysis of gene microarray data

    Science.gov (United States)

    Getz, Gad; Levine, Erel; Domany, Eytan

    2000-10-01

    We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  5. Gene conversion-like events in the diversification of human rearranged IGHV3-23*01 gene sequences

    Directory of Open Access Journals (Sweden)

    Bhargavi eDuvvuri

    2012-06-01

    Full Text Available Gene conversion (GCV as a mechanism of immunoglobulin diversification is well established in a few species. However, definitive evidence of GCV-like events in human immunoglobulin genes is scarce. GCV is mediated by activation-induced cytidine deaminase (AID. The lack of evidence of GCV in human rearranged immunoglobulin gene sequences is puzzling given the presence of highly similar germline donors and all the enzymatic machinery required for GCV. In this study, we undertook a computational analysis of rearranged IGHV3-23*01 gene sequences from common variable immunodeficiency (CVID patients and healthy individuals to survey ‘GCV-like’ activities. Our search identified strong evidence of GCV-like patterns. Germline VH sequences were identified as potential donors for clustered mutations in rearranged IGHV3-23*01 gene sequences. We identified minimum and maximum sequence identities between donor and recipient sequences that can serve as targets for GCV and our findings are consistent with those reported in literature. We observed that GCV-like tracts are flanked by activation-induced cytidine deaminase (AID hotspot motifs. Structural modeling of IGHV3-23*01 gene sequence revealed that hypermutable bases flanking GCV-like tracts, are in the single stranded DNA (ssDNA of stable stem-loop structures (SLSs. SsDNA is inherently fragile and also an optimal target for AID. We speculate that GCV could have been initiated by the targeting of hypermutable bases in ssDNA state in stable SLSs, plausibly by AID. We have observed that the frequency of GCV-like events is significantly higher in rearranged IGHV323-*01 sequences from healthy individuals compared to that of CVID patients. GCV, unlike SHM, can result in multiple base substitutions that can alter many amino acids. The extensive changes in antibody affinity by GCV-like events, as identified in this study would be instrumental in protecting humans against pathogens that diversify their genome by

  6. A Papaver somniferum 10-gene cluster for synthesis of the anticancer alkaloid noscapine.

    Science.gov (United States)

    Winzer, Thilo; Gazda, Valeria; He, Zhesi; Kaminski, Filip; Kern, Marcelo; Larson, Tony R; Li, Yi; Meade, Fergus; Teodor, Roxana; Vaistij, Fabián E; Walker, Carol; Bowser, Tim A; Graham, Ian A

    2012-06-29

    Noscapine is an antitumor alkaloid from opium poppy that binds tubulin, arrests metaphase, and induces apoptosis in dividing human cells. Elucidation of the biosynthetic pathway will enable improvement in the commercial production of noscapine and related bioactive molecules. Transcriptomic analysis revealed the exclusive expression of 10 genes encoding five distinct enzyme classes in a high noscapine-producing poppy variety, HN1. Analysis of an F(2) mapping population indicated that these genes are tightly linked in HN1, and bacterial artificial chromosome sequencing confirmed that they exist as a complex gene cluster for plant alkaloids. Virus-induced gene silencing resulted in accumulation of pathway intermediates, allowing gene function to be linked to noscapine synthesis and a novel biosynthetic pathway to be proposed.

  7. A human gut microbial gene catalogue established by metagenomic sequencing

    DEFF Research Database (Denmark)

    dos Santos, Marcelo Bertalan Quintanilha; Sicheritz-Pontén, Thomas; Nielsen, Henrik Bjørn

    2010-01-01

    , from faecal samples of 124 European individuals. The gene set, ,150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes......To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence...

  8. Yeast DNA sequences initiating gene expression in Escherichia coli.

    Science.gov (United States)

    Lewin, Astrid; Tran, Thi Tuyen; Jacob, Daniela; Mayer, Martin; Freytag, Barbara; Appel, Bernd

    2004-01-01

    DNA transfer between pro- and eukaryotes occurs either during natural horizontal gene transfer or as a result of the employment of gene technology. We analysed the capacity of DNA sequences from a eukaryotic donor organism (Saccharomyces cerevisiae) to serve as promoter region in a prokaryotic recipient (Escherichia coli) by creating fusions between promoterless luxAB genes from Vibrio harveyi and random DNA sequences from S. cerevisiae and measuring the luminescence of transformed E. coli. Fifty-four out of 100 randomly analysed S. cerevisiae DNA sequences caused considerable gene expression in E. coli. Determination of transcription start sites within six selected yeast sequences in E. coli confirmed the existence of bacterial -10 and -35 consensus sequences at appropriate distances upstream from transcription initiation sites. Our results demonstrate that the probability of transcription of transferred eukaryotic DNA in bacteria is extremely high and does not require the insertion of the transferred DNA behind a promoter of the recipient genome.

  9. SEQUENCE POLYMORPHISMS OF FOUR CHLOROPLAST GENES IN FOUR ACACIA SPECIES

    Directory of Open Access Journals (Sweden)

    Anthonius Y.P.B.C. Widyatmoko

    2011-06-01

    Full Text Available Sequence polymorphisms among and within four Acacia species,  A. aulacocarpa, A. auriculiformis, A. crassicarpa, and A. mangium, were investigated using four chloroplast DNA genes (atpA, petA, rbcL, and rpoA. The phylogenetic relationship among these species is discussed in light of the results of the sequence information. No intraspecific sequence variation was found in the four genes of the four species, and a conservative rate of mutation of the chloroplast DNA genes was also confirmed in the Acacia species. In the atpA and petA of the four genes, all four species possessed identical sequences, and no sequence variation was found among the four Acacia species. In the rbcL and rpoA genes, however, sequence polymorphisms were revealed among these species. Acacia aulacocarpa and A. crassicarpa shared an identical sequence, and A. auriculiformis and A. mangium also showed no sequence variation.  The fact that A. mangium and A. auriculiformis shared identical sequences as did A. aulacocarpa and A. crassicarpa indicated that the two respective species were extremely closely related. Although a putative natural hybrid of A. aulacocarpa and A. auriculiformis has been reported, our results suggested that natural hybridization should be further verified using molecular markers.

  10. Portuguese Lexical Clusters and CVC Sequences in Speech Perception and Production.

    Science.gov (United States)

    Cunha, Conceição

    2015-01-01

    This paper investigates similarities between lexical consonant clusters and CVC sequences differing in the presence or absence of a lexical vowel in speech perception and production in two Portuguese varieties. The frequent high vowel deletion in the European variety (EP) and the realization of intervening vocalic elements between lexical clusters in Brazilian Portuguese (BP) may minimize the contrast between lexical clusters and CVC sequences in the two Portuguese varieties. In order to test this hypothesis we present a perception experiment with 72 participants and a physiological analysis of 3-dimensional movement data from 5 EP and 4 BP speakers. The perceptual results confirmed a gradual confusion of lexical clusters and CVC sequences in EP, which corresponded roughly to the gradient consonantal overlap found in production. © 2015 S. Karger AG, Basel.

  11. Molecular analysis of SCARECROW genes expressed in white lupin cluster roots.

    Science.gov (United States)

    Sbabou, Laila; Bucciarelli, Bruna; Miller, Susan; Liu, Junqi; Berhada, Fatiha; Filali-Maltouf, Abdelkarim; Allan, Deborah; Vance, Carroll

    2010-03-01

    The Scarecrow (SCR) transcription factor plays a crucial role in root cell radial patterning and is required for maintenance of the quiescent centre and differentiation of the endodermis. In response to phosphorus (P) deficiency, white lupin (Lupinus albus L.) root surface area increases some 50-fold to 70-fold due to the development of cluster (proteoid) roots. Previously it was reported that SCR-like expressed sequence tags (ESTs) were expressed during early cluster root development. Here the cloning of two white lupin SCR genes, LaSCR1 and LaSCR2, is reported. The predicted amino acid sequences of both LaSCR gene products are highly similar to AtSCR and contain C-terminal conserved GRAS family domains. LaSCR1 and LaSCR2 transcript accumulation localized to the endodermis of both normal and cluster roots as shown by in situ hybridization and gene promoter::reporter staining. Transcript analysis as evaluated by quantitative real-time-PCR (qRT-PCR) and RNA gel hybridization indicated that the two LaSCR genes are expressed predominantly in roots. Expression of LaSCR genes was not directly responsive to the P status of the plant but was a function of cluster root development. Suppression of LaSCR1 in transformed roots of lupin and Medicago via RNAi (RNA interference) delivered through Agrobacterium rhizogenes resulted in decreased root numbers, reflecting the potential role of LaSCR1 in maintaining root growth in these species. The results suggest that the functional orthologues of AtSCR have been characterized.

  12. Comparative sequence analyses of the neurotoxin complex genes in Clostridium botulinum serotypes A, B, E, and F

    Directory of Open Access Journals (Sweden)

    Ajay K. Singh

    2012-09-01

    Full Text Available Neurotoxin complex (NTC genes are arranged in two known hemagglutinin (HA and open reading frame X (ORFX clusters. NTC genes have been analyzed in four serotypes A, B, E and F of Clostridium botulinum causing human botulism. Analysis of amino acid sequences of NT genes demonstrated significant differences among subtypes and four serotypes. Phylogram tree of NT genes reveals that serotypes A1 and B1 are much closer compared to serotype E1 and F1. However, non-toxic non-hemagglutinin (NTNH gene is highly conserved among four serotypes. Analysis of phylogram tree of NTNH gene reveals that serotypes A and F are more closely related compared to serotype B and E. Additionally, sequences of HAs and ORFX genes are very divergent but these genes are specific in subtypes and serotypes of Clostridium botulinum. Information derived from sequence analyses of NTC has direct implication in development of detection tools and therapeutic countermeasures for botulism.

  13. Complete nucleotide sequence of primitive vertebrate immunoglobulin light chain genes.

    Science.gov (United States)

    Shamblott, M J; Litman, G W

    1989-06-01

    Antibody to Heterodontus francisci (horned shark) immunoglobulin light chain was used to screen a spleen cDNA expression library, and recombinant clones encoding light chain genes were isolated. The complete sequences of the mature coding regions of two light chain genes in this phylogenetically distant vertebrate have been determined and are reported here. Comparisons of the sequences are consistent with the presence of mammalian-like framework and complementarity-determining regions. The predicted amino acid sequences of the genes are more related to mammalian lambda than to kappa light chains. The nucleotide sequences of the genes are most related to mammalian T-cell antigen receptor beta chain. Heterodontus light chain genes may reflect characteristics of the common ancestor of immunoglobulin and T-cell antigen receptors before its evolutionary diversification.

  14. Some statistical properties of gene expression clustering for array data

    DEFF Research Database (Denmark)

    Abreu, G C G; Pinheiro, A; Drummond, R D

    2010-01-01

    DNA array data without a corresponding statistical error measure. We propose an easy-to-implement and simple-to-use technique that uses bootstrap re-sampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented...... for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Code in Matlab is freely available, as well as some supplementary material, at the following address: https...

  15. Comprehensive assessment of sequence variation within the copy number variable defensin cluster on 8p23 by target enriched in-depth 454 sequencing

    Directory of Open Access Journals (Sweden)

    Zhang Xinmin

    2011-05-01

    Full Text Available Abstract Background In highly copy number variable (CNV regions such as the human defensin gene locus, comprehensive assessment of sequence variations is challenging. PCR approaches are practically restricted to tiny fractions, and next-generation sequencing (NGS approaches of whole individual genomes e.g. by the 1000 Genomes Project is confined by an affordable sequence depth. Combining target enrichment with NGS may represent a feasible approach. Results As a proof of principle, we enriched a ~850 kb section comprising the CNV defensin gene cluster DEFB, the invariable DEFA part and 11 control regions from two genomes by sequence capture and sequenced it by 454 technology. 6,651 differences to the human reference genome were found. Comparison to HapMap genotypes revealed sensitivities and specificities in the range of 94% to 99% for the identification of variations. Using error probabilities for rigorous filtering revealed 2,886 unique single nucleotide variations (SNVs including 358 putative novel ones. DEFB CN determinations by haplotype ratios were in agreement with alternative methods. Conclusion Although currently labor extensive and having high costs, target enriched NGS provides a powerful tool for the comprehensive assessment of SNVs in highly polymorphic CNV regions of individual genomes. Furthermore, it reveals considerable amounts of putative novel variations and simultaneously allows CN estimation.

  16. A conserved cluster of three PRD-class homeobox genes (homeobrain, rx and orthopedia in the Cnidaria and Protostomia

    Directory of Open Access Journals (Sweden)

    Mazza Maureen E

    2010-07-01

    Full Text Available Abstract Background Homeobox genes are a superclass of transcription factors with diverse developmental regulatory functions, which are found in plants, fungi and animals. In animals, several Antennapedia (ANTP-class homeobox genes reside in extremely ancient gene clusters (for example, the Hox, ParaHox, and NKL clusters and the evolution of these clusters has been implicated in the morphological diversification of animal bodyplans. By contrast, similarly ancient gene clusters have not been reported among the other classes of homeobox genes (that is, the LIM, POU, PRD and SIX classes. Results Using a combination of in silico queries and phylogenetic analyses, we found that a cluster of three PRD-class homeobox genes (Homeobrain (hbn, Rax (rx and Orthopedia (otp is present in cnidarians, insects and mollusks (a partial cluster comprising hbn and rx is present in the placozoan Trichoplax adhaerens. We failed to identify this 'HRO' cluster in deuterostomes; in fact, the Homeobrain gene appears to be missing from the chordate genomes we examined, although it is present in hemichordates and echinoderms. To illuminate the ancestral organization and function of this ancient cluster, we mapped the constituent genes against the assembled genome of a model cnidarian, the sea anemone Nematostella vectensis, and characterized their spatiotemporal expression using in situ hybridization. In N. vectensis, these genes reside in a span of 33 kb with the same gene order as previously reported in insects. Comparisons of genomic sequences and expressed sequence tags revealed the presence of alternative transcripts of Nv-otp and two highly unusual protein-coding polymorphisms in the terminal helix of the Nv-rx homeodomain. A population genetic survey revealed the Rx polymorphisms to be widespread in natural populations. During larval development, all three genes are expressed in the ectoderm, in non-overlapping territories along the oral-aboral axis, with distinct

  17. Insights into the evolutionary origins of clostridial neurotoxins from analysis of the Clostridium botulinum strain A neurotoxin gene cluster

    Directory of Open Access Journals (Sweden)

    Meiering Elizabeth M

    2008-11-01

    Full Text Available Abstract Background Clostridial neurotoxins (CNTs are the most deadly toxins known and causal agents of botulism and tetanus neuroparalytic diseases. Despite considerable progress in understanding CNT structure and function, the evolutionary origins of CNTs remain a mystery as they are unique to Clostridium and possess a sequence and structural architecture distinct from other protein families. Uncovering the origins of CNTs would be a significant contribution to our understanding of how pathogens evolve and generate novel toxin families. Results The C. botulinum strain A genome was examined for potential homologues of CNTs. A key link was identified between the neurotoxin and the flagellin gene (CBO0798 located immediately upstream of the BoNT/A neurotoxin gene cluster. This flagellin sequence displayed the strongest sequence similarity to the neurotoxin and NTNH homologue out of all proteins encoded within C. botulinum strain A. The CBO0798 gene contains a unique hypervariable region, which in closely related flagellins encodes a collagenase-like domain. Remarkably, these collagenase-containing flagellins were found to possess the characteristic HEXXH zinc-protease motif responsible for the neurotoxin's endopeptidase activity. Additional links to collagenase-related sequences and functions were detected by further analysis of CNTs and surrounding genes, including sequence similarities to collagen-adhesion domains and collagenases. Furthermore, the neurotoxin's HCRn domain was found to exhibit both structural and sequence similarity to eukaryotic collagen jelly-roll domains. Conclusion Multiple lines of evidence suggest that the neurotoxin and adjacent genes evolved from an ancestral collagenase-like gene cluster, linking CNTs to another major family of clostridial proteolytic toxins. Duplication, reshuffling and assembly of neighboring genes within the BoNT/A neurotoxin gene cluster may have lead to the neurotoxin's unique architecture. This

  18. Phylogeny of the Leucosphyrus Group of Anopheles (Cellia) (Diptera: Culicidae) Based on Mitochondrial Gene Sequences

    Science.gov (United States)

    2007-01-01

    of the COl gene were UEA9.2 (5’·crA ACA TIlTITccrCAA CAT TIT TTA CC-3’) and UEAlO.2 (5’-TIA TTA CTI AAT AAY CCT ART Tcr C-3’), both designed for this...elegan~. Mono- phyly of the LIIl, is ambiguolL~ because All. latt .’!lS and All. letlcospllyms sequences clustered together in a poorly supported clade

  19. Transcription mediated insulation and interference direct gene cluster expression switches.

    Science.gov (United States)

    Nguyen, Tania; Fischl, Harry; Howe, Françoise S; Woloszczuk, Ronja; Serra Barros, Ana; Xu, Zhenyu; Brown, David; Murray, Struan C; Haenni, Simon; Halstead, James M; O'Connor, Leigh; Shipkovenska, Gergana; Steinmetz, Lars M; Mellor, Jane

    2014-11-19

    In yeast, many tandemly arranged genes show peak expression in different phases of the metabolic cycle (YMC) or in different carbon sources, indicative of regulation by a bi-modal switch, but it is not clear how these switches are controlled. Using native elongating transcript analysis (NET-seq), we show that transcription itself is a component of bi-modal switches, facilitating reciprocal expression in gene clusters. HMS2, encoding a growth-regulated transcription factor, switches between sense- or antisense-dominant states that also coordinate up- and down-regulation of transcription at neighbouring genes. Engineering HMS2 reveals alternative mono-, di- or tri-cistronic and antisense transcription units (TUs), using different promoter and terminator combinations, that underlie state-switching. Promoters or terminators are excluded from functional TUs by read-through transcriptional interference, while antisense TUs insulate downstream genes from interference. We propose that the balance of transcriptional insulation and interference at gene clusters facilitates gene expression switches during intracellular and extracellular environmental change.

  20. A polyketide synthase-peptide synthetase gene cluster from an uncultured bacterial symbiont of Paederus beetles.

    Science.gov (United States)

    Piel, Jörn

    2002-10-29

    Many drug candidates from marine and terrestrial invertebrates are suspected metabolites of uncultured bacterial symbionts. The antitumor polyketides of the pederin family, isolated from beetles and sponges, are an example. Drug development from such sources is commonly hampered by low yields and the difficulty of sustaining invertebrate cultures. To obtain insight into the true producer and find alternative supplies of these rare drug candidates, the putative pederin biosynthesis genes were cloned from total DNA of Paederus fuscipes beetles, which use this compound for chemical defense. Sequence analysis of the gene cluster and adjacent regions revealed the presence of ORFs with typical bacterial architecture and homologies. The ped cluster, which is present only in beetle specimens with high pederin content, is located on a 54-kb region bordered by transposase pseudogenes and encodes a mixed modular polyketide synthase/nonribosomal peptide synthetase. Notably, none of the modules contains regions with homology to acyltransferase domains, but two copies of isolated monodomain acyltransferase genes were found at the upstream end of the cluster. In line with an involvement in pederin biosynthesis, the upstream cluster region perfectly mirrors pederin structure. The unexpected presence of additional polyketide synthase/nonribosomal peptide synthetase modules reveals surprising insights into the evolutionary relationship between pederin-type pathways in beetles and sponges.

  1. Identification and sequence analysis of Tapasin gene in guinea fowl

    Directory of Open Access Journals (Sweden)

    Varuna P. Panicker

    2014-12-01

    Full Text Available Aim: An attempt has been made to identify and study the nucleotide sequence variability in exon 5 - exon 6 regions of guinea fowl Tapasin gene. Materials and Methods: Blood samples were collected from randomly selected birds (12 guinea fowl birds and Tapasin gene amplified using chicken specific primers designed from GenBank submitted sequences. Polymerase chain reaction conditions were standardized so as get only single amplicons. Obtained products were then cloned and sequenced; sequences were then analyzed using suitable software. Results: Amplicon size of the Tapasin gene in guinea fowl was same as reported in chicken with areas of transitions and transversions. The sequence variations reported in these coding sequences might have influence in the protein structure, which may be correlated with the increased immune status of the bird when compared with chicken breeds. Conclusion: Since Tapasin gene is an immunologically important gene, which plays an important role in the immune status of the bird. Sequence variations in the gene can be correlated with the altered immune status of the bird.

  2. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  3. Environments and Morphologies of Red Sequence Galaxies with Residual Star Formation in Massive Clusters

    CERN Document Server

    Crossett, Jacob P; Stott, John P; Jones, D Heath

    2013-01-01

    We present a photometric investigation into recent star formation in galaxy clusters at z ~ 0.1. We use spectral energy distribution templates to quantify recent star formation in large X-ray selected clusters from the LARCS survey using matched GALEX NUV photometry. These clusters all have signs of red sequence galaxy recent star formation (as indicated by blue NUV-R colour), regardless of cluster morphology and size. A trend in environment is found for these galaxies, such that they prefer to occupy low density, high cluster radius environments. The morphology of these UV bright galaxies suggests that they are in fact red spirals, which we confirm with light curves and Galaxy Zoo voting percentages as morphological proxies. These UV bright galaxies are therefore seen to be either truncated spiral galaxies, caught by ram pressure in falling into the cluster, or high mass spirals, with the photometry dominated by the older stellar population.

  4. Deletion of a regulatory gene within the cpk gene cluster reveals novel antibacterial activity in Streptomyces coelicolor A3(2)

    NARCIS (Netherlands)

    Gottelt, Marco; Kol, Stefan; Gomez-Escribano, Juan Pablo; Bibb, Mervyn; Takano, Eriko; Herron, P.R.

    2010-01-01

    Genome sequencing of Streptomyces coelicolor A3(2) revealed an uncharacterized type I polyketide synthase gene cluster (cpk) Here we describe the discovery of a novel antibacterial activity (abCPK) and a yellow-pigmented secondary metabolite (yCPK) after deleting a presumed pathway-specific regulato

  5. Deletion of a regulatory gene within the cpk gene cluster reveals novel antibacterial activity in Streptomyces coelicolor A3(2)

    NARCIS (Netherlands)

    Gottelt, Marco; Kol, Stefan; Gomez-Escribano, Juan Pablo; Bibb, Mervyn; Takano, Eriko

    Genome sequencing of Streptomyces coelicolor A3(2) revealed an uncharacterized type I polyketide synthase gene cluster (cpk) Here we describe the discovery of a novel antibacterial activity (abCPK) and a yellow-pigmented secondary metabolite (yCPK) after deleting a presumed pathway-specific

  6. Global analysis of biosynthetic gene clusters reveals vast potential of secondary metabolite production in Penicillium species

    DEFF Research Database (Denmark)

    Nielsen, Jens Christian; Grijseels, Sietske; Prigent, Sylvain

    2017-01-01

    Filamentous fungi produce a wide range of bioactive compounds with important pharmaceutical applications, such as antibiotic penicillins and cholesterol-lowering statins. However, less attention has been paid to fungal secondary metabolites compared to those from bacteria. In this study, we...... sequenced the genomes of 9 Penicillium species and, together with 15 published genomes, we investigated the secondary metabolism of Penicillium and identified an immense, unexploited potential for producing secondary metabolites by this genus. A total of 1,317 putative biosynthetic gene clusters (BGCs) were...... identified, and polyketide synthase and non-ribosomal peptide synthetase based BGCs were grouped into gene cluster families and mapped to known pathways. The grouping of BGCs allowed us to study the evolutionary trajectory of pathways based on 6-methylsalicylic acid (6-MSA) synthases. Finally, we cross...

  7. Cloning and Sequencing of glnZ and the other Genes Clustered Around glnZ from Azospirillum brasilense Yu62%巴西固氮螺菌Yu62 glnZ基因及其相邻基因的克隆和序列分析

    Institute of Scientific and Technical Information of China (English)

    陈三凤; 杨红; 李季伦

    2001-01-01

    通过原位杂交从巴西固氮螺菌(Azospirillum brasilense)Yu62的基因组文库中获得glnZ基因的阳性克隆,对该阳性克隆进行亚克隆和序列分析,结果表明glnZ基因位于3.7 kb的SaLI片段上。glnZ基因编码区长336 bp,编码的产物是Pz蛋白,由112个氨基酸组成,分子量为1.12 kD。用Blastax 软件对Pz蛋白的氨基酸序列在GenBank数据库中进行同源比较,结果表明Pz蛋白与其它几种固氮菌及大肠杆菌的GlnK蛋白的同源性(identities)达66%以上;与 PII蛋白的同源性达64%以上。glnZ基因上游是部分ubiH-like 基因,与E.coli ubiH基因 (编码辅酶Q,ubiquinone)N-端有31%的同源性(identity)和50%相似性(similarity);glnZ基因下游是aat-like基因,与E.coli和 Bacillus subtilis aat基因(编码天冬氨酸氨基转移酶, aspartate aminotransferase)有同源性和相似性都为26% 和42%;aat-like基因下游是部分ftsK-like基因,与E.coli ftsK基因 (编码肽聚糖, peptidoglycan)N-端有42%同源性和56%相似性。这几个基因在GenBank中的登录号是AF279917。%The glnZ gene of A.brasilense was determined in a 3.7 kb SalI fragment,and ubiH-like,aat-like and ftsK-like genes were clustered around glnZ.A partial ubiH-like gene(required for ubiquinone synthesis)precedes glnZ and is transcribed from the opposite strand.The ubiH-likegene of A.brasilense has 31% identitiy and 50% similarity with ubiH gene of E.coli.The aat(aspartate aminotransferase)-like and ftsK(involved in peptidoglycan synthesis during cell septation )-like genes were found clustered downstream of glnZ.The aat-like gene of A.brasilense has 26% identity and 42% similarity with aat gene of E.coli. The ftsK-like gene of A.brasilense has 42% identity and 56% similarity with ftsKgene of E.coli.The glnZ gene encodes 112 amino acid-long Pz polypeptide, maping between positions 1057~1395bp.The amino acid sequence of Pz from A.brasilense is more than 66% identical to that

  8. Open-Source Sequence Clustering Methods Improve the State Of the Art.

    Science.gov (United States)

    Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

    2016-01-01

    Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http

  9. Gene duplication, modularity and adaptation in the evolution of the aflatoxin gene cluster

    Directory of Open Access Journals (Sweden)

    Jakobek Judy L

    2007-07-01

    Full Text Available Abstract Background The biosynthesis of aflatoxin (AF involves over 20 enzymatic reactions in a complex polyketide pathway that converts acetate and malonate to the intermediates sterigmatocystin (ST and O-methylsterigmatocystin (OMST, the respective penultimate and ultimate precursors of AF. Although these precursors are chemically and structurally very similar, their accumulation differs at the species level for Aspergilli. Notable examples are A. nidulans that synthesizes only ST, A. flavus that makes predominantly AF, and A. parasiticus that generally produces either AF or OMST. Whether these differences are important in the evolutionary/ecological processes of species adaptation and diversification is unknown. Equally unknown are the specific genomic mechanisms responsible for ordering and clustering of genes in the AF pathway of Aspergillus. Results To elucidate the mechanisms that have driven formation of these clusters, we performed systematic searches of aflatoxin cluster homologs across five Aspergillus genomes. We found a high level of gene duplication and identified seven modules consisting of highly correlated gene pairs (aflA/aflB, aflR/aflS, aflX/aflY, aflF/aflE, aflT/aflQ, aflC/aflW, and aflG/aflL. With the exception of A. nomius, contrasts of mean Ka/Ks values across all cluster genes showed significant differences in selective pressure between section Flavi and non-section Flavi species. A. nomius mean Ka/Ks values were more similar to partial clusters in A. fumigatus and A. terreus. Overall, mean Ka/Ks values were significantly higher for section Flavi than for non-section Flavi species. Conclusion Our results implicate several genomic mechanisms in the evolution of ST, OMST and AF cluster genes. Gene modules may arise from duplications of a single gene, whereby the function of the pre-duplication gene is retained in the copy (aflF/aflE or the copies may partition the ancestral function (aflA/aflB. In some gene modules, the

  10. Functional Analysis of the Fusarielin Biosynthetic Gene Cluster

    Directory of Open Access Journals (Sweden)

    Aida Droce

    2016-12-01

    Full Text Available Fusarielins are polyketides with a decalin core produced by various species of Aspergillus and Fusarium. Although the responsible gene cluster has been identified, the biosynthetic pathway remains to be elucidated. In the present study, members of the gene cluster were deleted individually in a Fusarium graminearum strain overexpressing the local transcription factor. The results suggest that a trans-acting enoyl reductase (FSL5 assists the polyketide synthase FSL1 in biosynthesis of a polyketide product, which is released by hydrolysis by a trans-acting thioesterase (FSL2. Deletion of the epimerase (FSL3 resulted in accumulation of an unstable compound, which could be the released product. A novel compound, named prefusarielin, accumulated in the deletion mutant of the cytochrome P450 monooxygenase FSL4. Unlike the known fusarielins from Fusarium, this compound does not contain oxygenized decalin rings, suggesting that FSL4 is responsible for the oxygenation.

  11. Organization of nif gene cluster in Frankia sp. EuIK1 strain, a symbiont of Elaeagnus umbellata.

    Science.gov (United States)

    Oh, Chang Jae; Kim, Ho Bang; Kim, Jitae; Kim, Won Jin; Lee, Hyoungseok; An, Chung Sun

    2012-01-01

    The nucleotide sequence of a 20.5-kb genomic region harboring nif genes was determined and analyzed. The fragment was obtained from Frankia sp. EuIK1 strain, an indigenous symbiont of Elaeagnus umbellata. A total of 20 ORFs including 12 nif genes were identified and subjected to comparative analysis with the genome sequences of 3 Frankia strains representing diverse host plant specificities. The nucleotide and deduced amino acid sequences showed highest levels of identity with orthologous genes from an Elaeagnus-infecting strain. The gene organization patterns around the nif gene clusters were well conserved among all 4 Frankia strains. However, characteristic features appeared in the location of the nifV gene for each Frankia strain, depending on the type of host plant. Sequence analysis was performed to determine the transcription units and suggested that there could be an independent operon starting from the nifW gene in the EuIK strain. Considering the organization patterns and their total extensions on the genome, we propose that the nif gene clusters remained stable despite genetic variations occurring in the Frankia genomes.

  12. Loss of Bloom syndrome protein destabilizes human gene cluster architecture.

    Science.gov (United States)

    Killen, Michael W; Stults, Dawn M; Adachi, Noritaka; Hanakahi, Les; Pierce, Andrew J

    2009-09-15

    Bloom syndrome confers strong predisposition to malignancy in multiple tissue types. The Bloom syndrome patient (BLM) protein defective in the disease biochemically functions as a Holliday junction dissolvase and human cells lacking functional BLM show 10-fold elevated rates of sister chromatid exchange. Collectively, these phenomena suggest that dysregulated mitotic recombination drives the genomic instability underpinning the development of cancer in these individuals. Here we use physical analysis of the highly repeated, highly self-similar human ribosomal RNA gene clusters as sentinel biomarkers for dysregulated homologous recombination to demonstrate that loss of BLM protein function causes a striking increase in spontaneous molecular level genomic restructuring. Analysis of single-cell derived sub-clonal populations from wild-type human cell lines shows that gene cluster architecture is ordinarily very faithfully preserved under mitosis, but is so unstable in cell lines derived from BLMs as to make gene cluster architecture in different sub-clonal populations essentially unrecognizable one from another. Human cells defective in a different RecQ helicase, the WRN protein involved in the premature aging Werner syndrome, do not exhibit the gene cluster instability (GCI) phenotype, indicating that the BLM protein specifically, rather than RecQ helicases generally, holds back this recombination-mediated genomic instability. An ataxia-telangiectasia defective cell line also shows elevated rDNA GCI, although not to the extent of BLM defective cells. Genomic restructuring mediated by dysregulated recombination between the abundant low-copy repeats in the human genome may prove to be an important additional mechanism of genomic instability driving the initiation and progression of human cancer.

  13. Degenerative primer design and gene sequencing validation for select turkey genes.

    Science.gov (United States)

    Hutsko, Stephanie L; Lilburn, Michael S; Wick, Macdonald

    2016-06-01

    We successfully designed and validated degenerative primers for turkey genes MUC2, RPS13, TBP and TFF2 based on chicken sequences in order to use gene transcription analysis to evaluate (quantify) the mucin transcription to probiotic supplementation in turkeys. Primers were designed for the genes MUC2, TFF2, RPS13 and TBP using a degenerative primer design method based on the available Gallus gallus sequences. All primer sets, which produced a single PCR amplicon of the expected sizes, were cloned into the TOPO(®) vector and then transformed into TOP 10(®) competent cells. Plasmid DNA isolation was performed on the TOP10(®) cell culture and sent for sequencing. Sequences were analyzed using NCBI BLAST. All genes sequenced had over 90% homology with both the chicken and predicted turkey sequences. The sequences were used to design new 100% homologous primer sets for the genes of interest. © 2016 Poultry Science Association Inc.

  14. Tetrachloroethene Dehalogenase from Dehalospirillum multivorans: Cloning, Sequencing of the Encoding Genes, and Expression of the pceA Gene in Escherichia coli

    Science.gov (United States)

    Neumann, Anke; Wohlfarth, Gert; Diekert, Gabriele

    1998-01-01

    The genes encoding tetrachloroethene reductive dehalogenase, a corrinoid-Fe/S protein, of Dehalospirillum multivorans were cloned and sequenced. The pceA gene is upstream of pceB and overlaps it by 4 bp. The presence of a ς70-like promoter sequence upstream of pceA and of a ρ-independent terminator downstream of pceB indicated that both genes are cotranscribed. This assumption is supported by reverse transcriptase PCR data. The pceA and pceB genes encode putative 501- and 74-amino-acid proteins, respectively, with calculated molecular masses of 55,887 and 8,354 Da, respectively. Four peptides obtained after trypsin treatment of tetrachloroethene (PCE) dehalogenase were found in the deduced amino acid sequence of pceA. The N-terminal amino acid sequence of the PCE dehalogenase isolated from D. multivorans was found 30 amino acids downstream of the N terminus of the deduced pceA product. The pceA gene contained a nucleotide stretch highly similar to binding motifs for two Fe4S4 clusters or for one Fe4S4 cluster and one Fe3S4 cluster. A consensus sequence for the binding of a corrinoid was not found in pceA. No significant similarities to genes in the databases were detected in sequence comparisons. The pceB gene contained two membrane-spanning helices as indicated by two hydrophobic stretches in the hydropathic plot. Sequence comparisons of pceB revealed no sequence similarities to genes present in the databases. Only in the presence of pUBS 520 supplying the recombinant bacteria with high levels of the rare Escherichia coli tRNA4Arg was pceA expressed, albeit nonfunctionally, in recombinant E. coli BL21 (DE3). PMID:9696761

  15. Regulatory sequence of cupin family gene

    Energy Technology Data Exchange (ETDEWEB)

    Hood, Elizabeth; Teoh, Thomas

    2017-07-25

    This invention is in the field of plant biology and agriculture and relates to novel seed specific promoter regions. The present invention further provide methods of producing proteins and other products of interest and methods of controlling expression of nucleic acid sequences of interest using the seed specific promoter regions.

  16. Evaluation of clustering algorithms for gene expression data using gene ontology annotations

    Institute of Scientific and Technical Information of China (English)

    MA Ning; ZHANG Zheng-guo

    2012-01-01

    Background Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes.Biologists frequently face the problem of choosing an appropriate algorithm.We aimed to provide a standalone,easily accessible and biologically oriented criterion for expression data clustering evaluation.Methods An external criterion utilizing annotation based similarities between genes is proposed in this work.Gene ontology information is employed as the annotation source.Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed.Results The rank of these algorithms given by the criterion coincides with our common knowledge.Single-linkage has significantly poorer performance,even worse than the random algorithm.Ward's method archives the best performance in most cases.Conclusions The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements.It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters.As an addition,we suggest using Ward's algorithm for gene expression data analysis.

  17. Multiple stellar populations in Magellanic Cloud clusters. V. The split main sequence of the young cluster NGC1866

    CERN Document Server

    Milone, A P; D'Antona, F; Bedin, L R; Piotto, G; Jerjen, H; Anderson, J; Dotter, A; Di Criscienzo, M; Lagioia, E P

    2016-01-01

    One of the most unexpected results in the field of stellar populations of the last few years, is the discovery that some Magellanic-Cloud globular clusters younger than ~400 Myr, exhibit bimodal main sequences (MSs) in their color-magnitude diagrams (CMDs). Moreover, these young clusters host an extended main sequence turn off (eMSTO) in close analogy with what is observed in most ~1-2 Gyr old clusters of both Magellanic Clouds. We use high-precision Hubble-Space-Telescope photometry to study the young star cluster NGC1866 in the Large Magellanic Cloud. We discover an eMSTO and a split MS. The analysis of the CMD reveals that (i) the blue MS is the less populous one, hosting about one-third of the total number of MS stars; (ii) red-MS stars are more centrally concentrated than blue-MS stars; (iii) the fraction of blue-MS stars with respect to the total number of MS stars drops by a factor of ~2 in the upper MS with F814W <~19.7. The comparison between the observed CMDs and stellar models reveals that the o...

  18. Evolutionary dynamics of rRNA gene clusters in cichlid fish

    Directory of Open Access Journals (Sweden)

    Nakajima Rafael T

    2012-10-01

    Full Text Available Abstract Background Among multigene families, ribosomal RNA (rRNA genes are the most frequently studied and have been explored as cytogenetic markers to study the evolutionary history of karyotypes among animals and plants. In this report, we applied cytogenetic and genomic methods to investigate the organization of rRNA genes among cichlid fishes. Cichlids are a group of fishes that are of increasing scientific interest due to their rapid and convergent adaptive radiation, which has led to extensive ecological diversity. Results The present paper reports the cytogenetic mapping of the 5S rRNA genes from 18 South American, 22 African and one Asian species and the 18S rRNA genes from 3 African species. The data obtained were comparatively analyzed with previously published information related to the mapping of rRNA genes in cichlids. The number of 5S rRNA clusters per diploid genome ranged from 2 to 15, with the most common pattern being the presence of 2 chromosomes bearing a 5S rDNA cluster. Regarding 18S rDNA mapping, the number of sites ranged from 2 to 6, with the most common pattern being the presence of 2 sites per diploid genome. Furthermore, searching the Oreochromis niloticus genome database led to the identification of a total of 59 copies of 5S rRNA and 38 copies of 18S rRNA genes that were distributed in several genomic scaffolds. The rRNA genes were frequently flanked by transposable elements (TEs and spread throughout the genome, complementing the FISH analysis that detect only clustered copies of rRNA genes. Conclusions The organization of rRNA gene clusters seems to reflect their intense and particular evolutionary pathway and not the evolutionary history of the associated taxa. The possible role of TEs as one source of rRNA gene movement, that could generates the spreading of ribosomal clusters/copies, is discussed. The present paper reinforces the notion that the integration of cytogenetic data and genomic analysis provides a

  19. Evolutionary dynamics of rRNA gene clusters in cichlid fish

    Science.gov (United States)

    2012-01-01

    Background Among multigene families, ribosomal RNA (rRNA) genes are the most frequently studied and have been explored as cytogenetic markers to study the evolutionary history of karyotypes among animals and plants. In this report, we applied cytogenetic and genomic methods to investigate the organization of rRNA genes among cichlid fishes. Cichlids are a group of fishes that are of increasing scientific interest due to their rapid and convergent adaptive radiation, which has led to extensive ecological diversity. Results The present paper reports the cytogenetic mapping of the 5S rRNA genes from 18 South American, 22 African and one Asian species and the 18S rRNA genes from 3 African species. The data obtained were comparatively analyzed with previously published information related to the mapping of rRNA genes in cichlids. The number of 5S rRNA clusters per diploid genome ranged from 2 to 15, with the most common pattern being the presence of 2 chromosomes bearing a 5S rDNA cluster. Regarding 18S rDNA mapping, the number of sites ranged from 2 to 6, with the most common pattern being the presence of 2 sites per diploid genome. Furthermore, searching the Oreochromis niloticus genome database led to the identification of a total of 59 copies of 5S rRNA and 38 copies of 18S rRNA genes that were distributed in several genomic scaffolds. The rRNA genes were frequently flanked by transposable elements (TEs) and spread throughout the genome, complementing the FISH analysis that detect only clustered copies of rRNA genes. Conclusions The organization of rRNA gene clusters seems to reflect their intense and particular evolutionary pathway and not the evolutionary history of the associated taxa. The possible role of TEs as one source of rRNA gene movement, that could generates the spreading of ribosomal clusters/copies, is discussed. The present paper reinforces the notion that the integration of cytogenetic data and genomic analysis provides a more complete picture for

  20. Citrus plastid-related gene profiling based on expressed sequence tag analyses

    Directory of Open Access Journals (Sweden)

    Tercilio Calsa Jr.

    2007-01-01

    Full Text Available Plastid-related sequences, derived from putative nuclear or plastome genes, were searched in a large collection of expressed sequence tags (ESTs and genomic sequences from the Citrus Biotechnology initiative in Brazil. The identified putative Citrus chloroplast gene sequences were compared to those from Arabidopsis, Eucalyptus and Pinus. Differential expression profiling for plastid-directed nuclear-encoded proteins and photosynthesis-related gene expression variation between Citrus sinensis and Citrus reticulata, when inoculated or not with Xylella fastidiosa, were also analyzed. Presumed Citrus plastome regions were more similar to Eucalyptus. Some putative genes appeared to be preferentially expressed in vegetative tissues (leaves and bark or in reproductive organs (flowers and fruits. Genes preferentially expressed in fruit and flower may be associated with hypothetical physiological functions. Expression pattern clustering analysis suggested that photosynthesis- and carbon fixation-related genes appeared to be up- or down-regulated in a resistant or susceptible Citrus species after Xylella inoculation in comparison to non-infected controls, generating novel information which may be helpful to develop novel genetic manipulation strategies to control Citrus variegated chlorosis (CVC.

  1. Genome-wide upstream motif analysis of Cryptosporidium parvum genes clustered by expression profile.

    Science.gov (United States)

    Oberstaller, Jenna; Joseph, Sandeep J; Kissinger, Jessica C

    2013-07-29

    There are very few molecular genetic tools available to study the apicomplexan parasite Cryptosporidium parvum. The organism is not amenable to continuous in vitro cultivation or transfection, and purification of intracellular developmental stages in sufficient numbers for most downstream molecular applications is difficult and expensive since animal hosts are required. As such, very little is known about gene regulation in C. parvum. We have clustered whole-genome gene expression profiles generated from a previous study of seven post-infection time points of 3,281 genes to identify genes that show similar expression patterns throughout the first 72 hours of in vitro epithelial cell culture. We used the algorithms MEME, AlignACE and FIRE to identify conserved, overrepresented DNA motifs in the upstream promoter region of genes with similar expression profiles. The most overrepresented motifs were E2F (5'-TGGCGCCA-3'); G-box (5'-G.GGGG-3'); a well-documented ApiAP2 binding motif (5'-TGCAT-3'), and an unknown motif (5'-[A/C] AACTA-3'). We generated a recombinant C. parvum DNA-binding protein domain from a putative ApiAP2 transcription factor [CryptoDB: cgd8_810] and determined its binding specificity using protein-binding microarrays. We demonstrate that cgd8_810 can putatively bind the overrepresented G-box motif, implicating this ApiAP2 in the regulation of many gene clusters. Several DNA motifs were identified in the upstream sequences of gene clusters that might serve as potential cis-regulatory elements. These motifs, in concert with protein DNA binding site data, establish for the first time the beginnings of a global C. parvum gene regulatory map that will contribute to our understanding of the development of this zoonotic parasite.

  2. Genetic diversity within Clostridium botulinum serotypes, botulinum neurotoxin gene clusters and toxin subtypes.

    Science.gov (United States)

    Hill, Karen K; Smith, Theresa J

    2013-01-01

    Clostridium botulinum is a species of spore-forming anaerobic bacteria defined by the expression of any one or two of seven serologically distinct botulinum neurotoxins (BoNTs) designated BoNT/A-G. This Gram-positive bacterium was first identified in 1897 and since then the paralyzing and lethal effects of its toxin have resulted in the recognition of different forms of the intoxication known as food-borne, infant, or wound botulism. Early microbiological and biochemical characterization of C. botulinum isolates revealed that the bacteria within the species had different characteristics and expressed different toxin types. To organize the variable bacterial traits within the species, Group I-IV designations were created. Interestingly, it was observed that isolates within different Groups could express the same toxin type and conversely a single Group could express different toxin types. This discordant phylogeny between the toxin and the host bacteria indicated that horizontal gene transfer of the toxin was responsible for the variation observed within the species. The recent availability of multiple C. botulinum genomic sequences has offered the ability to bioinformatically analyze the locations of the bont genes, the composition of their toxin gene clusters, and the genes flanking these regions to understand their variation. Comparison of the genomic sequences representing multiple serotypes indicates that the bont genes are not in random locations. Instead the analyses revealed specific regions where the toxin genes occur within the genomes representing serotype A, B, C, E, and F C. botulinum strains and C. butyricum type E strains. The genomic analyses have provided evidence of horizontal gene transfer, site-specific insertion, and recombination events. These events have contributed to the variation observed among the neurotoxins, the toxin gene clusters and the bacteria that contain them, and has supported the historical microbiological, and biochemical

  3. The major resistance gene cluster in lettuce is highly duplicated and spans several megabases.

    Science.gov (United States)

    Meyers, B C; Chin, D B; Shen, K A; Sivaramakrishnan, S; Lavelle, D O; Zhang, Z; Michelmore, R W

    1998-11-01

    At least 10 Dm genes conferring resistance to the oomycete downy mildew fungus Bremia lactucae map to the major resistance cluster in lettuce. We investigated the structure of this cluster in the lettuce cultivar Diana, which contains Dm3. A deletion breakpoint map of the chromosomal region flanking Dm3 was saturated with a variety of molecular markers. Several of these markers are components of a family of resistance gene candidates (RGC2) that encode a nucleotide binding site and a leucine-rich repeat region. These motifs are characteristic of plant disease resistance genes. Bacterial artificial chromosome clones were identified by using duplicated restriction fragment length polymorphism markers from the region, including the nucleotide binding site-encoding region of RGC2. Twenty-two distinct members of the RGC2 family were characterized from the bacterial artificial chromosomes; at least two additional family members exist. The RGC2 family is highly divergent; the nucleotide identity was as low as 53% between the most distantly related copies. These RGC2 genes span at least 3.5 Mb. Eighteen members were mapped on the deletion breakpoint map. A comparison between the phylogenetic and physical relationships of these sequences demonstrated that closely related copies are physically separated from one another and indicated that complex rearrangements have shaped this region. Analysis of low-copy genomic sequences detected no genes, including RGC2, in the Dm3 region, other than sequences related to retrotransposons and transposable elements. The related but divergent family of RGC2 genes may act as a resource for the generation of new resistance phenotypes through infrequent recombination or unequal crossing over.

  4. The Histidine Decarboxylase Gene Cluster of Lactobacillus parabuchneri Was Gained by Horizontal Gene Transfer and Is Mobile within the Species

    Science.gov (United States)

    Wüthrich, Daniel; Berthoud, Hélène; Wechsler, Daniel; Eugster, Elisabeth; Irmler, Stefan; Bruggmann, Rémy

    2017-01-01

    Histamine in food can cause intolerance reactions in consumers. Lactobacillus parabuchneri (L. parabuchneri) is one of the major causes of elevated histamine levels in cheese. Despite its significant economic impact and negative influence on human health, no genomic study has been published so far. We sequenced and analyzed 18 L. parabuchneri strains of which 12 were histamine positive and 6 were histamine negative. We determined the complete genome of the histamine positive strain FAM21731 with PacBio as well as Illumina and the genomes of the remaining 17 strains using the Illumina technology. We developed the synteny aware ortholog finding algorithm SynOrf to compare the genomes and we show that the histidine decarboxylase (HDC) gene cluster is located in a genomic island. It is very likely that the HDC gene cluster was transferred from other lactobacilli, as it is highly conserved within several lactobacilli species. Furthermore, we have evidence that the HDC gene cluster was transferred within the L. parabuchneri species. PMID:28261177

  5. ON THE POWER AND LIMITS OF SEQUENCE SIMILARITY BASED CLUSTERING OF PROTEINS INTO FAMILIES

    DEFF Research Database (Denmark)

    Wiwie, Christian; Röttger, Richard

    2017-01-01

    used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters...... important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We...... in an automated fashion. Our analysis demonstrates the benefits and limitations of the clustering of proteins with low sequence similarity indicating that each protein family requires its own distinct set of tools and parameters. All results, a tool prediction service, and additional supporting material is also...

  6. Genetic variations and haplotype diversity of the UGT1 gene cluster in the Chinese population.

    Directory of Open Access Journals (Sweden)

    Jing Yang

    Full Text Available Vertebrates require tremendous molecular diversity to defend against numerous small hydrophobic chemicals. UDP-glucuronosyltransferases (UGTs are a large family of detoxification enzymes that glucuronidate xenobiotics and endobiotics, facilitating their excretion from the body. The UGT1 gene cluster contains a tandem array of variable first exons, each preceded by a specific promoter, and a common set of downstream constant exons, similar to the genomic organization of the protocadherin (Pcdh, immunoglobulin, and T-cell receptor gene clusters. To assist pharmacogenomics studies in Chinese, we sequenced nine first exons, promoter and intronic regions, and five common exons of the UGT1 gene cluster in a population sample of 253 unrelated Chinese individuals. We identified 101 polymorphisms and found 15 novel SNPs. We then computed allele frequencies for each polymorphism and reconstructed their linkage disequilibrium (LD map. The UGT1 cluster can be divided into five linkage blocks: Block 9 (UGT1A9, Block 9/7/6 (UGT1A9, UGT1A7, and UGT1A6, Block 5 (UGT1A5, Block 4/3 (UGT1A4 and UGT1A3, and Block 3' UTR. Furthermore, we inferred haplotypes and selected their tagSNPs. Finally, comparing our data with those of three other populations of the HapMap project revealed ethnic specificity of the UGT1 genetic diversity in Chinese. These findings have important implications for future molecular genetic studies of the UGT1 gene cluster as well as for personalized medical therapies in Chinese.

  7. Bacillus sp.CDB3 isolated from cattle dip-sites possesses two ars gene clusters

    Institute of Scientific and Technical Information of China (English)

    Somanath Bhat; Xi Luo; Zhiqiang Xu; Lixia Liu; Ren Zhang

    2011-01-01

    Contamination of soil and water by arsenic is a global problem.In Australia, the dipping of cattle in arsenic-containing solution to control cattle ticks in last centenary has left many sites heavily contaminated with arsenic and other toxicants.We had previously isolated five soil bacterial strains (CDB1-5) highly resistant to arsenic.To understand the resistance mechanism, molecular studies have been carried out.Two chromosome-encoded arsenic resistance (ars) gene clusters have been cloned from CDB3 (Bacillus sp.).They both function in Escherichia coli and cluster 1 exerts a much higher resistance to the toxic metalloid.Cluster 2 is smaller possessing four open reading frames (ORFs) arsRorf2BC, similar to that identified in Bacillus subtilis Skin element.Among the eight ORFs in cluster 1 five are analogs of common ars genes found in other bacteria, however, organized in a unique order arsRBCDA instead of arsRDABC.Three other putative genes are located directly downstream and designated as arsTIP based on the homologies of their theoretical translation sequences respectively to thioredoxin reductases, iron-sulphur cluster proteins and protein phosphatases.The latter two are novel of any known ars operons.The arsD gene from Bacillus species was cloned for the first time and the predict protein differs from the well studied E.coli ArsD by lacking two pairs of C-terrninal cysteine residues.Its functional involvement in arsenic resistance has been confirmed by a deletion experiment.There exists also an inverted repeat in the intergenic region between arsC and arsD implying some unknown transcription regulation.

  8. Apicidin F: characterization and genetic manipulation of a new secondary metabolite gene cluster in the rice pathogen Fusarium fujikuroi.

    Directory of Open Access Journals (Sweden)

    Eva-Maria Niehaus

    Full Text Available The fungus F. fujikuroi is well known for its production of gibberellins causing the 'bakanae' disease of rice. Besides these plant hormones, it is able to produce other secondary metabolites (SMs, such as pigments and mycotoxins. Genome sequencing revealed altogether 45 potential SM gene clusters, most of which are cryptic and silent. In this study we characterize a new non-ribosomal peptide synthetase (NRPS gene cluster that is responsible for the production of the cyclic tetrapeptide apicidin F (APF. This new SM has structural similarities to the known histone deacetylase inhibitor apicidin. To gain insight into the biosynthetic pathway, most of the 11 cluster genes were deleted, and the mutants were analyzed by HPLC-DAD and HPLC-HRMS for their ability to produce APF or new derivatives. Structure elucidation was carried out be HPLC-HRMS and NMR analysis. We identified two new derivatives of APF named apicidin J and K. Furthermore, we studied the regulation of APF biosynthesis and showed that the cluster genes are expressed under conditions of high nitrogen and acidic pH in a manner dependent on the nitrogen regulator AreB, and the pH regulator PacC. In addition, over-expression of the atypical pathway-specific transcription factor (TF-encoding gene APF2 led to elevated expression of the cluster genes under inducing and even repressing conditions and to significantly increased product yields. Bioinformatic analyses allowed the identification of a putative Apf2 DNA-binding ("Api-box" motif in the promoters of the APF genes. Point mutations in this sequence motif caused a drastic decrease of APF production indicating that this motif is essential for activating the cluster genes. Finally, we provide a model of the APF biosynthetic pathway based on chemical identification of derivatives in the cultures of deletion mutants.

  9. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  10. Stellar Models of Multiple Populations in Globular Clusters. I. The Main Sequence of NGC 6752

    CERN Document Server

    Dotter, Aaron; Conroy, Charlie; Milone, A P; Marino, A F; Yong, David

    2014-01-01

    We present stellar atmosphere and evolution models of main sequence stars in two stellar populations of the Galactic globular cluster NGC 6752. These populations represent the two extremes of light-element abundance variations in the cluster. NGC 6752 is a benchmark cluster in the study of multiple stellar populations because of the rich array of spectroscopic abundances and panchromatic Hubble Space Telescope photometry. The spectroscopic abundances are used to compute stellar atmosphere and evolution models. The synthetic spectra for the two populations show significant differences in the ultraviolet and, for the coolest temperatures, in the near-infrared. The stellar evolution models exhibit insignificant differences in the H-R diagram except on the lower main sequence. The appearance of multiple sequences in the colour-magnitude diagrams (CMDs) of NGC 6752 is almost exclusively due to spectral effects caused by the abundance variations. The models reproduce the observed splitting and/or broadening of sequ...

  11. Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression

    Directory of Open Access Journals (Sweden)

    Sakaki Yoshiyuki

    2004-02-01

    Full Text Available Abstract Background Gene expression is regulated mainly by transcription factors (TFs that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS using position weight matrices (PWMs that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions. Results We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster, we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI. Conclusion Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1 those that show TFBS clustered in promoters associated with CGI, and (2 those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in

  12. Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites.

    Science.gov (United States)

    Qin, Zhaohui S; McCue, Lee Ann; Thompson, William; Mayerhofer, Linda; Lawrence, Charles E; Liu, Jun S

    2003-04-01

    The identification of co-regulated genes and their transcription-factor binding sites (TFBS) are key steps toward understanding transcription regulation. In addition to effective laboratory assays, various computational approaches for the detection of TFBS in promoter regions of coexpressed genes have been developed. The availability of complete genome sequences combined with the likelihood that transcription factors and their cognate sites are often conserved during evolution has led to the development of phylogenetic footprinting. The modus operandi of this technique is to search for conserved motifs upstream of orthologous genes from closely related species. The method can identify hundreds of TFBS without prior knowledge of co-regulation or coexpression. Because many of these predicted sites are likely to be bound by the same transcription factor, motifs with similar patterns can be put into clusters so as to infer the sets of co-regulated genes, that is, the regulons. This strategy utilizes only genome sequence information and is complementary to and confirmative of gene expression data generated by microarray experiments. However, the limited data available to characterize individual binding patterns, the variation in motif alignment, motif width, and base conservation, and the lack of knowledge of the number and sizes of regulons make this inference problem difficult. We have developed a Gibbs sampling-based Bayesian motif clustering (BMC) algorithm to address these challenges. Tests on simulated data sets show that BMC produces many fewer errors than hierarchical and K-means clustering methods. The application of BMC to hundreds of predicted gamma-proteobacterial motifs correctly identified many experimentally reported regulons, inferred the existence of previously unreported members of these regulons, and suggested novel regulons.

  13. Mechanism of Gene Amplification via Yeast Autonomously Replicating Sequences

    Directory of Open Access Journals (Sweden)

    Shelly Sehgal

    2015-01-01

    Full Text Available The present investigation was aimed at understanding the molecular mechanism of gene amplification. Interplay of fragile sites in promoting gene amplification was also elucidated. The amplification promoting sequences were chosen from the Saccharomyces cerevisiae ARS, 5S rRNA regions of Plantago ovata and P. lagopus, proposed sites of replication pausing at Ste20 gene locus of S. cerevisiae, and the bend DNA sequences within fragile site FRA11A in humans. The gene amplification assays showed that plasmid bearing APS from yeast and human beings led to enhanced protein concentration as compared to the wild type. Both the in silico and in vitro analyses were pointed out at the strong bending potential of these APS. In addition, high mitotic stability and presence of TTTT repeats and SAR amongst these sequences encourage gene amplification. Phylogenetic analysis of S. cerevisiae ARS was also conducted. The combinatorial power of different aspects of APS analyzed in the present investigation was harnessed to reach a consensus about the factors which stimulate gene expression, in presence of these sequences. It was concluded that the mechanism of gene amplification was that AT rich tracts present in fragile sites of yeast serve as binding sites for MAR/SAR and DNA unwinding elements. The DNA protein interactions necessary for ORC activation are facilitated by DNA bending. These specific bindings at ORC promote repeated rounds of DNA replication leading to gene amplification.

  14. Cloning large natural product gene clusters from the environment: Piecing environmental DNA gene clusters back together with TAR

    OpenAIRE

    Kim, Jeffrey H.; Feng, Zhiyang; Bauer, John D.; Kallifidas, Dimitris; Calle, Paula Y.; Brady, Sean F

    2010-01-01

    A single gram of soil can contain thousands of unique bacterial species, of which only a small fraction is regularly cultured in the laboratory. Although the fermentation of cultured microorganisms has provided access to numerous bioactive secondary metabolites, with these same methods it is not possible to characterize the natural products encoded by the uncultured majority. The heterologous expression of biosynthetic gene clusters cloned from DNA extracted directly from environmental sample...

  15. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group......Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...

  16. Targeted sequencing of cancer-related genes in colorectal cancer using next-generation sequencing.

    Directory of Open Access Journals (Sweden)

    Sae-Won Han

    Full Text Available Recent advance in sequencing technology has enabled comprehensive profiling of genetic alterations in cancer. We have established a targeted sequencing platform using next-generation sequencing (NGS technology for clinical use, which can provide mutation and copy number variation data. NGS was performed with paired-end library enriched with exons of 183 cancer-related genes. Normal and tumor tissue pairs of 60 colorectal adenocarcinomas were used to test feasibility. Somatic mutation and copy number alteration were analyzed. A total of 526 somatic non-synonymous sequence variations were found in 113 genes. Among these, 278 single nucleotide variations were 232 different somatic point mutations. 216 SNV were 79 known single nucleotide polymorphisms in the dbSNP. 32 indels were 28 different indel mutations. Median number of mutated gene per tumor was 4 (range 0-23. Copy number gain (>X2 fold was found in 65 genes in 40 patients, whereas copy number loss (genes in 39 patients. The most frequently altered genes (mutation and/or copy number alteration were APC in 35 patients (58%, TP53 in 34 (57%, and KRAS in 24 (40%. Altered gene list revealed ErbB signaling pathway as the most commonly involved pathway (25 patients, 42%. Targeted sequencing platform using NGS technology is feasible for clinical use and provides comprehensive genetic alteration data.

  17. Ancient expansion of the hox cluster in lepidoptera generated four homeobox genes implicated in extra-embryonic tissue formation.

    Directory of Open Access Journals (Sweden)

    Laura Ferguson

    2014-10-01

    Full Text Available Gene duplications within the conserved Hox cluster are rare in animal evolution, but in Lepidoptera an array of divergent Hox-related genes (Shx genes has been reported between pb and zen. Here, we use genome sequencing of five lepidopteran species (Polygonia c-album, Pararge aegeria, Callimorpha dominula, Cameraria ohridella, Hepialus sylvina plus a caddisfly outgroup (Glyphotaelius pellucidus to trace the evolution of the lepidopteran Shx genes. We demonstrate that Shx genes originated by tandem duplication of zen early in the evolution of large clade Ditrysia; Shx are not found in a caddisfly and a member of the basally diverging Hepialidae (swift moths. Four distinct Shx genes were generated early in ditrysian evolution, and were stably retained in all descendent Lepidoptera except the silkmoth which has additional duplications. Despite extensive sequence divergence, molecular modelling indicates that all four Shx genes have the potential to encode stable homeodomains. The four Shx genes have distinct spatiotemporal expression patterns in early development of the Speckled Wood butterfly (Pararge aegeria, with ShxC demarcating the future sites of extraembryonic tissue formation via strikingly localised maternal RNA in the oocyte. All four genes are also expressed in presumptive serosal cells, prior to the onset of zen expression. Lepidopteran Shx genes represent an unusual example of Hox cluster expansion and integration of novel genes into ancient developmental regulatory networks.

  18. Ancient expansion of the hox cluster in lepidoptera generated four homeobox genes implicated in extra-embryonic tissue formation.

    Science.gov (United States)

    Ferguson, Laura; Marlétaz, Ferdinand; Carter, Jean-Michel; Taylor, William R; Gibbs, Melanie; Breuker, Casper J; Holland, Peter W H

    2014-10-01

    Gene duplications within the conserved Hox cluster are rare in animal evolution, but in Lepidoptera an array of divergent Hox-related genes (Shx genes) has been reported between pb and zen. Here, we use genome sequencing of five lepidopteran species (Polygonia c-album, Pararge aegeria, Callimorpha dominula, Cameraria ohridella, Hepialus sylvina) plus a caddisfly outgroup (Glyphotaelius pellucidus) to trace the evolution of the lepidopteran Shx genes. We demonstrate that Shx genes originated by tandem duplication of zen early in the evolution of large clade Ditrysia; Shx are not found in a caddisfly and a member of the basally diverging Hepialidae (swift moths). Four distinct Shx genes were generated early in ditrysian evolution, and were stably retained in all descendent Lepidoptera except the silkmoth which has additional duplications. Despite extensive sequence divergence, molecular modelling indicates that all four Shx genes have the potential to encode stable homeodomains. The four Shx genes have distinct spatiotemporal expression patterns in early development of the Speckled Wood butterfly (Pararge aegeria), with ShxC demarcating the future sites of extraembryonic tissue formation via strikingly localised maternal RNA in the oocyte. All four genes are also expressed in presumptive serosal cells, prior to the onset of zen expression. Lepidopteran Shx genes represent an unusual example of Hox cluster expansion and integration of novel genes into ancient developmental regulatory networks.

  19. Whole genome sequence of two Rathayibacter toxicus strains reveals a tunicamycin biosynthetic cluster similar to Streptomyces chartreusis

    Science.gov (United States)

    Sechler, Aaron J.; Tancos, Matthew A.; Schneider, David J.; King, Jonas G.; Fennessey, Christine M.; Schroeder, Brenda K.; Murray, Timothy D.; Luster, Douglas G.; Schneider, William L.

    2017-01-01

    Rathayibacter toxicus is a forage grass associated Gram-positive bacterium of major concern to food safety and agriculture. This species is listed by USDA-APHIS as a plant pathogen select agent because it produces a tunicamycin-like toxin that is lethal to livestock and may be vectored by nematode species native to the U.S. The complete genomes of two strains of R. toxicus, including the type strain FH-79, were sequenced and analyzed in comparison with all available, complete R. toxicus genomes. Genome sizes ranged from 2,343,780 to 2,394,755 nucleotides, with 2079 to 2137 predicted open reading frames; all four strains showed remarkable synteny over nearly the entire genome, with only a small transposed region. A cluster of genes with similarity to the tunicamycin biosynthetic cluster from Streptomyces chartreusis was identified. The tunicamycin gene cluster (TGC) in R. toxicus contained 14 genes in two transcriptional units, with all of the functional elements for tunicamycin biosynthesis present. The TGC had a significantly lower GC content (52%) than the rest of the genome (61.5%), suggesting that the TGC may have originated from a horizontal transfer event. Further analysis indicated numerous remnants of other potential horizontal transfer events are present in the genome. In addition to the TGC, genes potentially associated with carotenoid and exopolysaccharide production, bacteriocins and secondary metabolites were identified. A CRISPR array is evident. There were relatively few plant-associated cell-wall hydrolyzing enzymes, but there were numerous secreted serine proteases that share sequence homology to the pathogenicity-associated protein Pat-1 of Clavibacter michiganensis. Overall, the genome provides clear insight into the possible mechanisms for toxin production in R. toxicus, providing a basis for future genetic approaches. PMID:28796837

  20. Main-sequence variable stars in young open cluster NGC 1893

    OpenAIRE

    Lata, Sneh; Yadav, Ram Kesh; Pandey, A.K.(Indian Institute of Technology Bombay (IIT), Mumbai, India); Richichi, Andrea; Eswaraiah, C.; Kumar, Brajesh; Kappelmann, Norbert; Sharma, Saurabh

    2014-01-01

    In this paper we present time series photometry of 104 variable stars in the cluster region NGC 1893. The association of the present variable candidates to the cluster NGC 1893 has been determined by using $(U-B)/(B-V)$ and $(J-H)/(H-K)$ two colour diagrams, and $V/(V-I)$ colour magnitude diagram. Forty five stars are found to be main-sequence variables and these could be B-type variable stars associated with the cluster. We classified these objects as $\\beta$ Cep, slowly pulsating B stars an...

  1. Non-ribosomal peptide synthetases: Identifying the cryptic gene clusters and decoding the natural product

    Indian Academy of Sciences (India)

    MANGAL SINGH; SANDEEP CHAUDHARY; DIPTI SAREEN

    2017-03-01

    Non-ribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs) present in bacteria and fungi are themajor multi-modular enzyme complexes which synthesize secondary metabolites like the pharmacologically importantantibiotics and siderophores. Each of the multiple modules of an NRPS activates a different amino or aryl acid,followed by their condensation to synthesize a linear or cyclic natural product. The studies on NRPS domains, theknowledge of their gene cluster architecture and tailoring enzymes have helped in the in silico genetic screening of theever-expanding sequenced microbial genomic data for the identification of novel NRPS/PKS clusters and thusdeciphering novel non-ribosomal peptides (NRPs). Adenylation domain is an integral part of the NRPSs and is thesubstrate selecting unit for the final assembled NRP. In some cases, it also requires a small protein, the MbtHhomolog, for its optimum activity. The presence of putative adenylation domain and MbtH homologs in a sequencedgenome can help identify the novel secondary metabolite producers. The role of the adenylation domain in the NRPSgene clusters and its characterization as a tool for the discovery of novel cryptic NRPS gene clusters are discussed.

  2. A score system for quality evaluation of RNA sequence tags: an improvement for gene expression profiling

    Directory of Open Access Journals (Sweden)

    Pinheiro Daniel G

    2009-06-01

    Full Text Available Abstract Background High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE, Massively Parallel Signature Sequencing (MPSS or Sequencing-by-Synthesis (SBS represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/. S3T source code and datasets can also be downloaded from the aforementioned website.

  3. Engineered Streptomyces avermitilis host for heterologous expression of biosynthetic gene cluster for secondary metabolites.

    Science.gov (United States)

    Komatsu, Mamoru; Komatsu, Kyoko; Koiwai, Hanae; Yamada, Yuuki; Kozone, Ikuko; Izumikawa, Miho; Hashimoto, Junko; Takagi, Motoki; Omura, Satoshi; Shin-ya, Kazuo; Cane, David E; Ikeda, Haruo

    2013-07-19

    An industrial microorganism, Streptomyces avermitilis, which is a producer of anthelmintic macrocyclic lactones, avermectins, has been constructed as a versatile model host for heterologous expression of genes encoding secondary metabolite biosynthesis. Twenty of the entire biosynthetic gene clusters for secondary metabolites were successively cloned and introduced into a versatile model host S. avermitilis SUKA17 or 22. Almost all S. avermitilis transformants carrying the entire gene cluster produced metabolites as a result of the expression of biosynthetic gene clusters introduced. A few transformants were unable to produce metabolites, but their production was restored by the expression of biosynthetic genes using an alternative promoter or the expression of a regulatory gene in the gene cluster that controls the expression of biosynthetic genes in the cluster using an alternative promoter. Production of metabolites in some transformants of the versatile host was higher than that of the original producers, and cryptic biosynthetic gene clusters in the original producer were also expressed in a versatile host.

  4. Metabolic diversification--independent assembly of operon-like gene clusters in different plants.

    Science.gov (United States)

    Field, Ben; Osbourn, Anne E

    2008-04-25

    Operons are clusters of unrelated genes with related functions that are a feature of prokaryotic genomes. Here, we report on an operon-like gene cluster in the plant Arabidopsis thaliana that is required for triterpene synthesis (the thalianol pathway). The clustered genes are coexpressed, as in bacterial operons. However, despite the resemblance to a bacterial operon, this gene cluster has been assembled from plant genes by gene duplication, neofunctionalization, and genome reorganization, rather than by horizontal gene transfer from bacteria. Furthermore, recent assembly of operon-like gene clusters for triterpene synthesis has occurred independently in divergent plant lineages (Arabidopsis and oat). Thus, selection pressure may act during the formation of certain plant metabolic pathways to drive gene clustering.

  5. Wide Distribution of Foxicin Biosynthetic Gene Clusters in Streptomyces Strains - An Unusual Secondary Metabolite with Various Properties.

    Science.gov (United States)

    Greule, Anja; Marolt, Marija; Deubel, Denise; Peintner, Iris; Zhang, Songya; Jessen-Trefzer, Claudia; De Ford, Christian; Burschel, Sabrina; Li, Shu-Ming; Friedrich, Thorsten; Merfort, Irmgard; Lüdeke, Steffen; Bisel, Philippe; Müller, Michael; Paululat, Thomas; Bechthold, Andreas

    2017-01-01

    Streptomyces diastatochromogenes Tü6028 is known to produce the polyketide antibiotic polyketomycin. The deletion of the pokOIV oxygenase gene led to a non-polyketomycin-producing mutant. Instead, novel compounds were produced by the mutant, which have not been detected before in the wild type strain. Four different compounds were identified and named foxicins A-D. Foxicin A was isolated and its structure was elucidated as an unusual nitrogen-containing quinone derivative using various spectroscopic methods. Through genome mining, the foxicin biosynthetic gene cluster was identified in the draft genome sequence of S. diastatochromogenes. The cluster spans 57 kb and encodes three PKS type I modules, one NRPS module and 41 additional enzymes. A foxBII gene-inactivated mutant of S. diastatochromogenes Tü6028 ΔpokOIV is unable to produce foxicins. Homologous fox biosynthetic gene clusters were found in more than 20 additional Streptomyces strains, overall in about 2.6% of all sequenced Streptomyces genomes. However, the production of foxicin-like compounds in these strains has never been described indicating that the clusters are expressed at a very low level or are silent under fermentation conditions. Foxicin A acts as a siderophore through interacting with ferric ions. Furthermore, it is a weak inhibitor of the Escherichia coli aerobic respiratory chain and shows moderate antibiotic activity. The wide distribution of the cluster and the various properties of the compound indicate a major role of foxicins in Streptomyces strains.

  6. Wide Distribution of Foxicin Biosynthetic Gene Clusters in Streptomyces Strains – An Unusual Secondary Metabolite with Various Properties

    Science.gov (United States)

    Greule, Anja; Marolt, Marija; Deubel, Denise; Peintner, Iris; Zhang, Songya; Jessen-Trefzer, Claudia; De Ford, Christian; Burschel, Sabrina; Li, Shu-Ming; Friedrich, Thorsten; Merfort, Irmgard; Lüdeke, Steffen; Bisel, Philippe; Müller, Michael; Paululat, Thomas; Bechthold, Andreas

    2017-01-01

    Streptomyces diastatochromogenes Tü6028 is known to produce the polyketide antibiotic polyketomycin. The deletion of the pokOIV oxygenase gene led to a non-polyketomycin-producing mutant. Instead, novel compounds were produced by the mutant, which have not been detected before in the wild type strain. Four different compounds were identified and named foxicins A–D. Foxicin A was isolated and its structure was elucidated as an unusual nitrogen-containing quinone derivative using various spectroscopic methods. Through genome mining, the foxicin biosynthetic gene cluster was identified in the draft genome sequence of S. diastatochromogenes. The cluster spans 57 kb and encodes three PKS type I modules, one NRPS module and 41 additional enzymes. A foxBII gene-inactivated mutant of S. diastatochromogenes Tü6028 ΔpokOIV is unable to produce foxicins. Homologous fox biosynthetic gene clusters were found in more than 20 additional Streptomyces strains, overall in about 2.6% of all sequenced Streptomyces genomes. However, the production of foxicin-like compounds in these strains has never been described indicating that the clusters are expressed at a very low level or are silent under fermentation conditions. Foxicin A acts as a siderophore through interacting with ferric ions. Furthermore, it is a weak inhibitor of the Escherichia coli aerobic respiratory chain and shows moderate antibiotic activity. The wide distribution of the cluster and the various properties of the compound indicate a major role of foxicins in Streptomyces strains. PMID:28270798

  7. Molecular analysis of the bovine coronavirus S1 gene by direct sequencing of diarrheic fecal specimens

    Directory of Open Access Journals (Sweden)

    E. Takiuchi

    2008-04-01

    Full Text Available Bovine coronavirus (BCoV causes severe diarrhea in newborn calves, is associated with winter dysentery in adult cattle and respiratory infections in calves and feedlot cattle. The BCoV S protein plays a fundamental role in viral attachment and entry into the host cell, and is cleaved into two subunits termed S1 (amino terminal and S2 (carboxy terminal. The present study describes a strategy for the sequencing of the BCoV S1 gene directly from fecal diarrheic specimens that were previously identified as BCoV positive by RT-PCR assay for N gene detection. A consensus sequence of 2681 nucleotides was obtained through direct sequencing of seven overlapping PCR fragments of the S gene. The samples did not undergo cell culture passage prior to PCR amplification and sequencing. The structural analysis was based on the genomic differences between Brazilian strains and other known BCoV from different geographical regions. The phylogenetic analysis of the entire S1 gene showed that the BCoV Brazilian strains were more distant from the Mebus strain (97.8% identity for nucleotides and 96.8% identity for amino acids and more similar to the BCoV-ENT strain (98.7% for nucleotides and 98.7% for amino acids. Based on the phylogenetic analysis of the hypervariable region of the S1 subunit, these strains clustered with the American (BCoV-ENT, 182NS and Canadian (BCQ20, BCQ2070, BCQ9, BCQ571, BCQ1523 calf diarrhea and the Canadian winter dysentery (BCQ7373, BCQ2590 strains, but clustered on a separate branch of the Korean and respiratory BCoV strains. The BCoV strains of the present study were not clustered in the same branch of previously published Brazilian strains (AY606193, AY606194. These data agree with the genealogical construction and suggest that at least two different BCoV strains are circulating in Brazil.

  8. Divergence and transcriptional analysis of the division cell wall (dcw) gene cluster in Neisseria spp.

    Science.gov (United States)

    Snyder, Lori A S; Shafer, William M; Saunders, Nigel J

    2003-01-01

    Three of the 18 open reading frames in the division and cell wall synthesis cluster of the pathogenic Neisseria spp. are not present in the clusters of other bacterial species. The region containing two of these, dcaB and dcaC, displays interstrain and interspecies variability uncharacteristic of such clusters. 3' of dcaB is a Correia repeat enclosed element (CREE), which is only present in some strains. It has been suggested that this CREE is a transcriptional terminator, although we demonstrate otherwise. A gearbox-like promoter within this CREE is active in Escherichia coli but not in Neisseria meningitidis. There is an active promoter 5' of dcaC, although its sequence is not conserved. The presence of similarly located promoters has not been demonstrated in other species. In Neisseria lactamica, this promoter involves another dcw-associated CREE, the first demonstration of active promoter generation at the 5' end of this common intergenic, apparently mobile, element. Upstream of this promoter is an inverted pair of neisserial uptake signal sequences, which are commonly considered to be transcriptional terminators. It has been proposed to terminate transcription in this location, although we have demonstrated transcript extending through this uptake signal sequence. dcaC contains a 108 bp tandem repeat, which is present in different copy numbers in the neisserial strains examined. This investigation reveals extensive sequence variation, disputes the presence of transcriptional terminators and identifies active internal promoters in this normally highly conserved cluster of essential genes, and addresses the transcriptional activity of two common neisserial intergenic components.

  9. Pulsation of Pre-Main Sequence Stars in Young Open Clusters

    Science.gov (United States)

    Zwintz, Konstanze; Weiss, Werner W.

    2001-08-01

    The aim of this proposal is to determine observationally the parameter space of the pre-main sequence instability strip. For that purpose we intend to obtain photometric timeseries with high time resolution and low noise level of the stars in young open clusters (IC 4996, NGC 6910 and NGC 6383) and to identify pre-main sequence pulsators. Several cluster members have the spectral types of interest (A-F) and lie between the birthline and the zero-age main sequence. Up to now the number of pre-main sequence pulsators is absolutely inadequate to determine reliably the hot and cool border of the according instability region. Its definition is indispensable for a better understanding of the internal structure and evolution of such stars.

  10. Complete nucleotide sequences of two adjacent early vaccinia virus genes located within the inverted terminal repetition.

    Science.gov (United States)

    Venkatesan, S; Gershowitz, A; Moss, B

    1982-11-01

    The proximal part of the 10,000-base pair (bp) inverted terminal repetition of vaccinia virus DNA encodes at least three early mRNAs. A 2,236-bp segment of the repetition was sequenced to characterize two of the genes. This task was facilitated by constructing a series of recombinants containing overlapping deletions; oligonucleotide linkers with synthetic restriction sites provided points for radioactive labeling before sequencing by the chemical degradation method of Maxam and Gilbert (Methods Enzymol. 65:499-560, 1980). The ends of the transcripts were mapped by hybridizing labeled DNA fragments to early viral RNA and resolving nuclease S1-protected fragments in sequencing gels, by sequencing cDNA clones, and from the lengths of the RNAs. The nucleotide sequences for at least 60 bp upstream of both transcriptional initiation sites are more than 80% adenine . thymine rich and contain long runs of adenines and thymines with some homology to procaryotic and eucaryotic consensus sequences. The gene transcribed in the rightward direction encodes an RNA of approximately 530 nucleotides with a single open reading frame of 420 nucleotides. Preceding the first AUG, there is a heptanucleotide that can hybridize to the 3' end of 18S rRNA with only one mismatch. The derived amino acid sequence of the protein indicated a molecular weight of 15,500. The gene transcribed in the leftward direction encodes an RNA 1,000 to 1,100 nucleotides long with an open reading frame of 996 nucleotides and a leader sequence of only 5 to 6 nucleotides. The derived amino acid sequence of this protein indicated a molecular weight of 38,500. The 3' ends of the two transcripts were located within 100 bp of each other. Although there are adenine . thymine-rich clusters near the putative transcriptional termination sites, specific AATAAA polyadenylic acid signal sequences are absent.

  11. Data Preprocessing in Cluster Analysis of Gene Expression

    Institute of Scientific and Technical Information of China (English)

    杨春梅; 万柏坤; 高晓峰

    2003-01-01

    Considering that the DNA microarray technology has generated explosive gene expression data and that it is urgent to analyse and to visualize such massive datasets with efficient methods, we investigate the data preprocessing methods used in cluster analysis, normalization or logarithm of the matrix, by using hierarchical clustering, principal component analysis (PCA) and self-organizing maps (SOMs). The results illustrate that when using the Euclidean distance as measuring metrics, logarithm of relative expression level is the best preprocessing method, while data preprocessed by normalization cannot attain the expected results because the data structure is ruined. If there are only a few principal components, the PCA is an effective method to extract the frame structure, while SOMs are more suitable for a specific structure.

  12. Methylome sequencing in triple-negative breast cancer reveals distinct methylation clusters with prognostic value.

    Science.gov (United States)

    Stirzaker, Clare; Zotenko, Elena; Song, Jenny Z; Qu, Wenjia; Nair, Shalima S; Locke, Warwick J; Stone, Andrew; Armstong, Nicola J; Robinson, Mark D; Dobrovic, Alexander; Avery-Kiejda, Kelly A; Peters, Kate M; French, Juliet D; Stein, Sandra; Korbie, Darren J; Trau, Matt; Forbes, John F; Scott, Rodney J; Brown, Melissa A; Francis, Glenn D; Clark, Susan J

    2015-02-02

    Epigenetic alterations in the cancer methylome are common in breast cancer and provide novel options for tumour stratification. Here, we perform whole-genome methylation capture sequencing on small amounts of DNA isolated from formalin-fixed, paraffin-embedded tissue from triple-negative breast cancer (TNBC) and matched normal samples. We identify differentially methylated regions (DMRs) enriched with promoters associated with transcription factor binding sites and DNA hypersensitive sites. Importantly, we stratify TNBCs into three distinct methylation clusters associated with better or worse prognosis and identify 17 DMRs that show a strong association with overall survival, including DMRs located in the Wilms tumour 1 (WT1) gene, bi-directional-promoter and antisense WT1-AS. Our data reveal that coordinated hypermethylation can occur in oestrogen receptor-negative disease, and that characterizing the epigenetic framework provides a potential signature to stratify TNBCs. Together, our findings demonstrate the feasibility of profiling the cancer methylome with limited archival tissue to identify regulatory regions associated with cancer.

  13. Sequence Variability in Staphylococcal Enterotoxin Genes seb, sec, and sed

    Directory of Open Access Journals (Sweden)

    Sophia Johler

    2016-06-01

    Full Text Available Ingestion of staphylococcal enterotoxins preformed by Staphylococcus aureus in food leads to staphylococcal food poisoning, the most prevalent foodborne intoxication worldwide. There are five major staphylococcal enterotoxins: SEA, SEB, SEC, SED, and SEE. While variants of these toxins have been described and were linked to specific hosts or levels or enterotoxin production, data on sequence variation is still limited. In this study, we aim to extend the knowledge on promoter and gene variants of the major enterotoxins SEB, SEC, and SED. To this end, we determined seb, sec, and sed promoter and gene sequences of a well-characterized set of enterotoxigenic Staphylococcus aureus strains originating from foodborne outbreaks, human infections, human nasal colonization, rabbits, and cattle. New nucleotide sequence variants were detected for all three enterotoxins and a novel amino acid sequence variant of SED was detected in a strain associated with human nasal colonization. While the seb promoter and gene sequences exhibited a high degree of variability, the sec and sed promoter and gene were more conserved. Interestingly, a truncated variant of sed was detected in all tested sed harboring rabbit strains. The generated data represents a further step towards improved understanding of strain-specific differences in enterotoxin expression and host-specific variation in enterotoxin sequences.

  14. Multiple stellar populations in Magellanic Cloud clusters. IV. The double main sequence of the young cluster NGC1755

    CERN Document Server

    Milone, A P; D'Antona, F; Bedin, L R; Da Costa, G S; Jerjen, H; Mackey, A D

    2016-01-01

    Nearly all the star clusters with ages of ~1-2 Gyr in both Magellanic Clouds exhibit an extended main-sequence turn off (eMSTO) whose origin is under debate. The main scenarios suggest that the eMSTO could be either due to multiple generations of stars with different ages or to coeval stellar populations with different rotation rates. In this paper we use Hubble-Space-Telescope images to investigate the ~80-Myr old cluster NGC1755 in the LMC. We find that the MS is split with the blue and the red MS hosting about the 25% and the 75% of the total number of MS stars, respectively. Moreover, the MSTO of NGC1755 is broadened in close analogy with what is observed in the ~300-Myr-old NGC1856 and in most intermediate-age Magellanic-Cloud clusters. We demonstrate that both the split MS and the eMSTO are not due to photometric errors, field-stars contamination, differential reddening, or non-interacting binaries. These findings make NGC1755 the youngest cluster with an eMSTO. We compare the observed CMD with isochron...

  15. Cloning and sequence analysis of chitin synthase gene fragments of Demodex mites

    Institute of Scientific and Technical Information of China (English)

    Ya-e ZHAO; Zheng-hang WANG; Yang XU; Ji-ru XU; Wen-yan LIU; Meng WEI; Chu-ying WANG

    2012-01-01

    To our knowledge,few reports on Demodex studied at the molecular level are available at present.In this study our group,for the first time,cloned,sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum,Demodex brevis,and Demodex canis (three isolates from each species) from Xi'an China,by designing specific primers based on the only partial sequence of the CHS gene of D.canis from Japan,retrieved from GenBank.Results show that amplification was successful only in three D.canis isolates and one D.brevis isolate out of the nine Demodex isolates.The obtained fragments were sequenced to be 339 bp for D.canis and 338 bp for D.brevis.The CHS gene sequence similarities between the three Xi'an D.canis isolates and one Japanese D.canis isolate ranged from 99.7% to 100.0%,and those between four D.canis isolates and one D.brevis isolate were 99.1%-99.4%.Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters,according with the traditional classification.Two open reading frames (ORFs) were identified in each CHS gene sequenced,and their corresponding amino acid sequences were located at the catalytic domain.The relatively conserved sequences could be deduced to be a CHS class A gene,which is associated with chitin synthesis in the integument of Demodex mites.

  16. MSClust: A Multi-Seeds Based Clustering Algorithm for microbiome profiling using 16S rRNA Sequence

    Science.gov (United States)

    Chen, Wei; Cheng, Yongmei; Zhang, Clarence; Zhang, Shaowu; Zhao, Hongyu

    2013-01-01

    Recent developments of next generation sequencing technologies have led to rapid accumulation of 16s rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability. PMID:23899776

  17. Phylogenetic relationships of citrus and its relatives based on matK gene sequences.

    Directory of Open Access Journals (Sweden)

    Tshering Penjor

    Full Text Available The genus Citrus includes mandarin, orange, lemon, grapefruit and lime, which have high economic and nutritional value. The family Rutaceae can be divided into 7 subfamilies, including Aurantioideae. The genus Citrus belongs to the subfamily Aurantioideae. In this study, we sequenced the chloroplast matK genes of 135 accessions from 22 genera of Aurantioideae and analyzed them phylogenetically. Our study includes many accessions that have not been examined in other studies. The subfamily Aurantioideae has been classified into 2 tribes, Clauseneae and Citreae, and our current molecular analysis clearly discriminate Citreae from Clauseneae by using only 1 chloroplast DNA sequence. Our study confirms previous observations on the molecular phylogeny of Aurantioideae in many aspects. However, we have provided novel information on these genetic relationships. For example, inconsistent with the previous observation, and consistent with our preliminary study using the chloroplast rbcL genes, our analysis showed that Feroniella oblata is not nested in Citrus species and is closely related with Feronia limonia. Furthermore, we have shown that Murraya paniculata is similar to Merrillia caloxylon and is dissimilar to Murraya koenigii. We found that "true citrus fruit trees" could be divided into 2 subclusters. One subcluster included Citrus, Fortunella, and Poncirus, while the other cluster included Microcitrus and Eremocitrus. Compared to previous studies, our current study is the most extensive phylogenetic study of Citrus species since it includes 93 accessions. The results indicate that Citrus species can be classified into 3 clusters: a citron cluster, a pummelo cluster, and a mandarin cluster. Although most mandarin accessions belonged to the mandarin cluster, we found some exceptions. We also obtained the information on the genetic background of various species of acid citrus grown in Japan. Because the genus Citrus contains many important accessions

  18. Phylogenetic relationships of citrus and its relatives based on matK gene sequences.

    Science.gov (United States)

    Penjor, Tshering; Yamamoto, Masashi; Uehara, Miki; Ide, Manami; Matsumoto, Natsumi; Matsumoto, Ryoji; Nagano, Yukio

    2013-01-01

    The genus Citrus includes mandarin, orange, lemon, grapefruit and lime, which have high economic and nutritional value. The family Rutaceae can be divided into 7 subfamilies, including Aurantioideae. The genus Citrus belongs to the subfamily Aurantioideae. In this study, we sequenced the chloroplast matK genes of 135 accessions from 22 genera of Aurantioideae and analyzed them phylogenetically. Our study includes many accessions that have not been examined in other studies. The subfamily Aurantioideae has been classified into 2 tribes, Clauseneae and Citreae, and our current molecular analysis clearly discriminate Citreae from Clauseneae by using only 1 chloroplast DNA sequence. Our study confirms previous observations on the molecular phylogeny of Aurantioideae in many aspects. However, we have provided novel information on these genetic relationships. For example, inconsistent with the previous observation, and consistent with our preliminary study using the chloroplast rbcL genes, our analysis showed that Feroniella oblata is not nested in Citrus species and is closely related with Feronia limonia. Furthermore, we have shown that Murraya paniculata is similar to Merrillia caloxylon and is dissimilar to Murraya koenigii. We found that "true citrus fruit trees" could be divided into 2 subclusters. One subcluster included Citrus, Fortunella, and Poncirus, while the other cluster included Microcitrus and Eremocitrus. Compared to previous studies, our current study is the most extensive phylogenetic study of Citrus species since it includes 93 accessions. The results indicate that Citrus species can be classified into 3 clusters: a citron cluster, a pummelo cluster, and a mandarin cluster. Although most mandarin accessions belonged to the mandarin cluster, we found some exceptions. We also obtained the information on the genetic background of various species of acid citrus grown in Japan. Because the genus Citrus contains many important accessions, we have

  19. Hierarchical clustering of breast cancer methylomes revealed differentially methylated and expressed breast cancer genes.

    Directory of Open Access Journals (Sweden)

    I-Hsuan Lin

    Full Text Available Oncogenic transformation of normal cells often involves epigenetic alterations, including histone modification and DNA methylation. We conducted whole-genome bisulfite sequencing to determine the DNA methylomes of normal breast, fibroadenoma, invasive ductal carcinomas and MCF7. The emergence, disappearance, expansion and contraction of kilobase-sized hypomethylated regions (HMRs and the hypomethylation of the megabase-sized partially methylated domains (PMDs are the major forms of methylation changes observed in breast tumor samples. Hierarchical clustering of HMR revealed tumor-specific hypermethylated clusters and differential methylated enhancers specific to normal or breast cancer cell lines. Joint analysis of gene expression and DNA methylation data of normal breast and breast cancer cells identified differentially methylated and expressed genes associated with breast and/or ovarian cancers in cancer-specific HMR clusters. Furthermore, aberrant patterns of X-chromosome inactivation (XCI was found in breast cancer cell lines as well as breast tumor samples in the TCGA BRCA (breast invasive carcinoma dataset. They were characterized with differentially hypermethylated XIST promoter, reduced expression of XIST, and over-expression of hypomethylated X-linked genes. High expressions of these genes were significantly associated with lower survival rates in breast cancer patients. Comprehensive analysis of the normal and breast tumor methylomes suggests selective targeting of DNA methylation changes during breast cancer progression. The weak causal relationship between DNA methylation and gene expression observed in this study is evident of more complex role of DNA methylation in the regulation of gene expression in human epigenetics that deserves further investigation.

  20. Hunting down frame shifts: Ecological analysis of diverse functional gene sequences

    Directory of Open Access Journals (Sweden)

    Michal eStrejcek

    2015-11-01

    Full Text Available Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frame-shifts (FS. Genes encoding for alpha subunits of biphenyl (bphA and benzoate (benA dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 43.1% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of Maximum Expected Error (MEE filtering and single linkage pre-clustering (SLP proved the most efficient read procession. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study and the tool was implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/ and https://github.com/rdpstaff/Framebot.

  1. The Red-Sequence Luminosity Function in Galaxy Clusters since z~1

    CERN Document Server

    Gilbank, David G; Ellingson, E; Gladders, M D; Loh, Y -S; Barrientos, L F; Barkhouse, W A

    2007-01-01

    We use a statistical sample of ~500 rich clusters taken from 72 square degrees of the Red-Sequence Cluster Survey (RCS-1) to study the evolution of ~30,000 red-sequence galaxies in clusters over the redshift range 0.35 -19.7) with their numbers increasing towards the present epoch. This is consistent with the `down-sizing` picture in which star-formation ended at earlier times for the most massive (luminous) galaxies and more recently for less massive (fainter) galaxies. We observe a richness dependence to the down-sizing effect in the sense that, at a given redshift, the drop-off of faint red galaxies is greater for poorer (less massive) clusters, suggesting that star-formation ended earlier for galaxies in more massive clusters. The decrease in faint red-sequence galaxies is accompanied by an increase in faint blue galaxies, implying that the process responsible for this evolution of faint galaxies is the termination of star-formation, possibly with little or no need for merging. At the bright end, we also ...

  2. Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.

    Science.gov (United States)

    Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka

    2014-02-01

    In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain.

  3. Cloning and sequencing of the trpE gene from Arthrobacter globiformis ATCC 8010 and several related subsurface Arthrobacter isolates

    Energy Technology Data Exchange (ETDEWEB)

    Chernova, T.; Viswanathan, V.K.; Austria, N.; Nichols, B.P.

    1998-09-01

    Tryptophan dependent mutants of Arthrobacter globiformis ATCC 8010 were isolated and trp genes were cloned by complementation and marker rescue of the auxotrophic strains. Rescue studies and preliminary sequence analysis reveal that at least the genes trpE, trpC, and trpB are clustered together in this organism. In addition, sequence analysis of the entire trpE gene, which encodes component I of anthranilate synthase, is described. Segments of the trpE gene from 17 subsurface isolates of Arthrobacter sp. were amplified by PCR and sequenced. The partial trpE sequences from the various strains were aligned and subjected to phylogenetic analysis. The data suggest that in addition to single base changes, recombination and genetic exchange play a major role in the evolution of the Arthrobacter genome.

  4. Combinatorial pooling enables selective sequencing of the barley gene space.

    Directory of Open Access Journals (Sweden)

    Stefano Lonardi

    2013-04-01

    Full Text Available For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.

  5. Multiple stellar populations in Magellanic Cloud clusters - V. The split main sequence of the young cluster NGC 1866

    Science.gov (United States)

    Milone, A. P.; Marino, A. F.; D'Antona, F.; Bedin, L. R.; Piotto, G.; Jerjen, H.; Anderson, J.; Dotter, A.; Criscienzo, M. Di; Lagioia, E. P.

    2017-03-01

    One of the most unexpected results in the field of stellar populations of the last few years is the discovery that some Magellanic Cloud globular clusters younger than ∼400 Myr exhibit bimodal main sequences (MSs) in their colour-magnitude diagrams (CMDs). Moreover, these young clusters host an extended main-sequence turn-off (eMSTO) in close analogy with what is observed in most ∼1-2 Gyr old clusters of both Magellanic Clouds. We use high-precision Hubble Space Telescope photometry to study the young star cluster NGC 1866 in the Large Magellanic Cloud. We discover an eMSTO and a split MS. The analysis of the CMD reveals that (i) the blue MS is the less populous one, hosting about one-third of the total number of MS stars; (ii) red MS stars are more centrally concentrated than blue MS stars; (iii) the fraction of blue MS stars with respect to the total number of MS stars drops by a factor of ∼2 in the upper MS with mF814W ≲ 19.7. The comparison between the observed CMDs and stellar models reveals that the observations are consistent with ∼200 Myr old highly rotating stars on the red MS, with rotation close to critical value, plus a non-rotating stellar population spanning an age interval between ∼140 and 220 Myr, on the blue MS. Noticeable, neither stellar populations with different ages only, nor coeval stellar models with different rotation rates, properly reproduce the observed split MS and eMSTO. We discuss these results in the context of the eMSTO and multiple MS phenomenon.

  6. Yersinia spp. Identification Using Copy Diversity in the Chromosomal 16S rRNA Gene Sequence.

    Science.gov (United States)

    Hao, Huijing; Liang, Junrong; Duan, Ran; Chen, Yuhuang; Liu, Chang; Xiao, Yuchun; Li, Xu; Su, Mingming; Jing, Huaiqi; Wang, Xin

    2016-01-01

    API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method.

  7. Yersinia spp. Identification Using Copy Diversity in the Chromosomal 16S rRNA Gene Sequence.

    Directory of Open Access Journals (Sweden)

    Huijing Hao

    Full Text Available API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method.

  8. Coupled Two-Way Clustering Analysis of Breast Cancer and Colon Cancer Gene Expression Data

    CERN Document Server

    Getz, G; Kela, I; Domany, E; Notterman, D A; Getz, Gad; Gal, Hilah; Kela, Itai; Domany, Eytan; Notterman, Dan A.

    2003-01-01

    We present and review Coupled Two Way Clustering, a method designed to mine gene expression data. The method identifies submatrices of the total expression matrix, whose clustering analysis reveals partitions of samples (and genes) into biologically relevant classes. We demonstrate, on data from colon and breast cancer, that we are able to identify partitions that elude standard clustering analysis.

  9. Two distinct sequences of blue straggler stars in the globular cluster M30

    CERN Document Server

    Ferraro, F R; Dalessandro, E; Lanzoni, B; Sills, A; Rood, R T; Pecci, F Fusi; Karakas, A I; Miocchi, P; Bovinelli, S; 10.1038/nature08607

    2010-01-01

    Stars in globular clusters are generally believed to have all formed at the same time, early in the Galaxy's history. 'Blue stragglers' are stars massive enough that they should have evolved into white dwarfs long ago. Two possible mechanisms have been proposed for their formation: mass transfer between binary companions and stellar mergers resulting from direct collisions between two stars. Recently, the binary explanation was claimed to be dominant. Here we report that there are two distinct parallel sequences of blue stragglers in M30. This globular cluster is thought to have undergone 'core collapse', during which both the collision rate and the mass transfer activity in binary systems would have been enhanced. We suggest that the two observed sequences arise from the cluster core collapse, with the bluer population arising from direct stellar collisions and the redder one arising from the evolution of close binaries that are probably still experiencing an active phase of mass transfer.

  10. Cloning,Sequencing and Phylogenetic Study of rbcL Gene from Cyanobacteria Arthrospira and Spirulina

    Institute of Scientific and Technical Information of China (English)

    Liu Jinjie(刘金姐); Zhang Xuecheng; Sui Zhenghong; Mao Yunxiang; Sun Xue

    2004-01-01

    Large subunit gene of rubisco (rbcL) of cyanobacteria Arthrospira platensis FACHB341, A. Platensis FACHB439, A. Maxima OUQDSM and Spirulina sp. FACHB440 is cloned, sequenced and characterized. Results show that GC content of the gene in strain Spirulina sp. FACHB440 is higher than that in the others. The alignments based on deduced amino acid sequences indicate that Spirulina sp. FACHB440 is different from that in other three samples of Arthrospira, though they have the same conserved functional sites (95, 98, 121, 124, 221, 257). The nucleotide sequence similarity among the three strains of the genus of Arthrospira (96.5~99.6%) is higher than that between Arthrospira and Spirulina (78.1~78.5%). By comparison of the corresponding sequence of other cyanobacteria, a phylogenetic tree with two clusters is constructed. A. Platensis FACHB341, A. Maxima OUQDSM and A. Platensis FACHB439 form the monophyletic linage, which is fully supported by bootstrap values (1000), while Spirulina sp. FACHB440 and Anabaena sp. PCC7120 cluster in another linage with the bootstrap value of 909.

  11. Proteolipid protein 1 gene sequencing of hereditary spastic paraplegia

    Institute of Scientific and Technical Information of China (English)

    Yu Gao; Lumei Chi; Yinshi Jin; Guangxian Nan

    2012-01-01

    PCR amplification and sequencing of whole blood DNA from an individual with hereditary spastic paraplegia, as well as family members, revealed a fragment of proteolipid protein 1 (PLP1) gene exon 1, which excluded the possibility of isomer 1 expression for this family. The fragment sequence of exon 3 and exon 5 was consistent with the proteolipid protein 1 sequence at NCBI. In the proband samples, a PLP1 point mutation in exon 4 was detected at the basic group of position 844, T→C, phenylalanine→leucine. In proband samples from a male cousin, the basic group at position 844 was C, but gene sequencing signals revealed mixed signals of T and C, indicating possible mutation at this locus. Results demonstrated that changes in PLP1 exon 4 amino acids were associated with onset of hereditary spastic paraplegia.

  12. Speeding disease gene discovery by sequence based candidate prioritization

    Directory of Open Access Journals (Sweden)

    Porteous David J

    2005-03-01

    Full Text Available Abstract Background Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation to knowledge of the disease or phenotype in question. However, here we show that disease genes share patterns of sequence-based features that can provide a good basis for automatic prioritization of candidates by machine learning. Results We examined a variety of sequence-based features and found that for many of them there are significant differences between the sets of genes known to be involved in human hereditary disease and those not known to be involved in disease. We have created an automatic classifier called PROSPECTR based on those features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease. On average, PROSPECTR enriches lists for disease genes two-fold 77% of the time, five-fold 37% of the time and twenty-fold 11% of the time. Conclusion PROSPECTR is a simple and effective way to identify genes involved in Mendelian and oligogenic disorders. It performs markedly better than the single existing sequence-based classifier on novel data. PROSPECTR could save investigators looking at large regions of interest time and effort by prioritizing positional candidate genes for mutation detection and case-control association studies.

  13. Copy number variants in the kallikrein gene cluster.

    Directory of Open Access Journals (Sweden)

    Pernilla Lindahl

    Full Text Available The kallikrein gene family (KLK1-KLK15 is the largest contiguous group of protease genes within the human genome and is associated with both risk and outcome of cancer and other diseases. We searched for copy number variants in all KLK genes using quantitative PCR analysis and analysis of inheritance patterns of single nucleotide polymorphisms. Two deletions were identified: one 2235-bp deletion in KLK9 present in 1.2% of alleles, and one 3394-bp deletion in KLK15 present in 4.0% of alleles. Each deletion eliminated one complete exon and created out-of-frame coding that eliminated the catalytic triad of the resulting truncated gene product, which therefore likely is a non-functional protein. Deletion breakpoints identified by DNA sequencing located the KLK9 deletion breakpoint to a long interspersed element (LINE repeated sequence, while the deletion in KLK15 is located in a single copy sequence. To search for an association between each deletion and risk of prostate cancer (PC, we analyzed a cohort of 667 biopsied men (266 PC cases and 401 men with no evidence of PC at biopsy using short deletion-specific PCR assays. There was no association between evidence of PC in this cohort and the presence of either gene deletion. Haplotyping revealed a single origin of each deletion, with most recent common ancestor estimates of 3000-8000 and 6000-14 000 years for the deletions in KLK9 and KLK15, respectively. The presence of the deletions on the same haplotypes in 1000 Genomes data of both European and African populations indicate an early origin of both deletions. The old age in combination with homozygous presence of loss-of-function variants suggests that some kallikrein-related peptidases have non-essential functions.

  14. A human gut microbial gene catalogue established by metagenomic sequencing

    DEFF Research Database (Denmark)

    dos Santos, Marcelo Bertalan Quintanilha; Sicheritz-Pontén, Thomas; Nielsen, Henrik Bjørn;

    2010-01-01

    To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence...... gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively....

  15. Nearly identical bacteriophage structural gene sequences are widely distributed in both marine and freshwater environments.

    Science.gov (United States)

    Short, Cindy M; Suttle, Curtis A

    2005-01-01

    Primers were designed to amplify a 592-bp region within a conserved structural gene (g20) found in some cyanophages. The goal was to use this gene as a proxy to infer genetic richness in natural cyanophage communities and to determine if sequences were more similar in similar environments. Gene products were amplified from samples from the Gulf of Mexico, the Arctic, Southern, and Northeast and Southeast Pacific Oceans, an Arctic cyanobacterial mat, a catfish production pond, lakes in Canada and Germany, and a depth of ca. 3,246 m in the Chuckchi Sea. Amplicons were separated by denaturing gradient gel electrophoresis, and selected bands were sequenced. Phylogenetic analysis revealed four previously unknown groups of g20 clusters, two of which were entirely found in freshwater. Also, sequences with >99% identities were recovered from environments that differed greatly in temperature and salinity. For example, nearly identical sequences were recovered from the Gulf of Mexico, the Southern Pacific Ocean, an Arctic freshwater cyanobacterial mat, and Lake Constance, Germany. These results imply that closely related hosts and the viruses infecting them are distributed widely across environments or that horizontal gene exchange occurs among phage communities from very different environments. Moreover, the amplification of g20 products from deep in the cyanobacterium-sparse Chuckchi Sea suggests that this primer set targets bacteriophages other than those infecting cyanobacteria.

  16. Phylogeny and identification of Pantoea species and typing of Pantoea agglomerans strains by multilocus gene sequencing.

    Science.gov (United States)

    Delétoile, Alexis; Decré, Dominique; Courant, Stéphanie; Passet, Virginie; Audo, Jennifer; Grimont, Patrick; Arlet, Guillaume; Brisse, Sylvain

    2009-02-01

    Pantoea agglomerans and other Pantoea species cause infections in humans and are also pathogenic to plants, but the diversity of Pantoea strains and their possible association with hosts and disease remain poorly known, and identification of Pantoea species is difficult. We characterized 36 Pantoea strains, including 28 strains of diverse origins initially identified as P. agglomerans, by multilocus gene sequencing based on six protein-coding genes, by biochemical tests, and by antimicrobial susceptibility testing. Phylogenetic analysis and comparison with other species of Enterobacteriaceae revealed that the genus Pantoea is highly diverse. Most strains initially identified as P. agglomerans by use of API 20E strips belonged to a compact sequence cluster together with the type strain, but other strains belonged to diverse phylogenetic branches corresponding to other species of Pantoea or Enterobacteriaceae and to probable novel species. Biochemical characteristics such as fosfomycin resistance and utilization of d-tartrate could differentiate P. agglomerans from other Pantoea species. All 20 strains of P. agglomerans could be distinguished by multilocus sequence typing, revealing the very high discrimination power of this method for strain typing and population structure in this species, which is subdivided into two phylogenetic groups. PCR detection of the repA gene, associated with pathogenicity in plants, was positive in all clinical strains of P. agglomerans, suggesting that clinical and plant-associated strains do not form distinct populations. We provide a multilocus gene sequencing method that is a powerful tool for Pantoea species delineation and identification and for strain tracking.

  17. Identification of Legionella pneumophila serogroups and other Legionella species by mip gene sequencing.

    Science.gov (United States)

    Haroon, Attiya; Koide, Michio; Higa, Futoshi; Tateyama, Masao; Fujita, Jiro

    2012-04-01

    The virulence factor known as the macrophage infectivity potentiator (mip) is responsible for the intracellular survival of Legionella species. In this study, we investigated the potential of the mip gene sequence to differentiate isolates of different species of Legionella and different serogroups of Legionella pneumophila. We used 35 clinical L. pneumophila isolates and one clinical isolate each of Legionella micdadei, Legionella longbeachae, and Legionella dumoffii (collected from hospitals all over Japan between 1980 and 2007). We used 19 environmental Legionella anisa isolates (collected in the Okinawa, Nara, Osaka, and Hyogo prefectures between 1987 and 2007) and two Legionella type strains. We extracted bacterial genomic DNA and amplified out the mip gene by PCR. PCR products were purified by agarose gel electrophoresis and the mip gene was then sequenced. The L. pneumophila isolates could be divided into two groups: one group was very similar to the type strain and was composed of serogroup (SG) 1 isolates only; the second group had more sequence variations and was composed of SG1 isolates as well as SG2, SG3, SG5, and SG10 isolates. Phylogenetic analysis displayed one cluster for L. anisa isolates, while other Legionella species were present at discrete levels. Our findings show that mip gene sequencing is an effective technique for differentiating L. pneumophila strains from other Legionella species.

  18. Evolutionary formation of gene clusters by reorganization: the meleagrin/roquefortine paradigm in different fungi.

    Science.gov (United States)

    Martín, Juan F; Liras, Paloma

    2016-02-01

    The biosynthesis of secondary metabolites in fungi is catalyzed by enzymes encoded by genes linked in clusters that are frequently co-regulated at the transcriptional level. Formation of gene clusters may take place by de novo assembly of genes recruited from other cellular functions, but also novel gene clusters are formed by reorganization of progenitor clusters and are distributed by horizontal gene transfer. This article reviews (i) the published information on the roquefortine/meleagrin/neoxaline gene clusters of Penicillium chrysogenum (Penicillium rubens) and the short roquefortine cluster of Penicillium roqueforti, and (ii) the correlation of the genes present in those clusters with the enzymes and metabolites derived from these pathways. The P. chrysogenum roq/mel cluster consists of seven genes and includes a gene (roqT) encoding a 12-TMS transporter protein of the MFS family. Interestingly, the orthologous P. roquefortine gene cluster has only four genes and the roqT gene is present as a residual pseudogene that encodes only small peptides. Two of the genes present in the central region of the P. chrysogenum roq/mel cluster have been lost during the evolutionary formation of the short cluster and the order of the structural genes in the cluster has been rearranged. The two lost genes encode a N1 atom hydroxylase (nox) and a roquefortine scaffold-reorganizing oxygenase (sro). As a consequence P. roqueforti has lost the ability to convert the roquefortine-type carbon skeleton to the glandicoline/meleagrin-type scaffold and is unable to produce glandicoline B, meleagrin and neoxaline. The loss of this genetic information is not recent and occurred probably millions of years ago when a progenitor Penicillium strain got adapted to life in a few rich habitats such as cheese, fermented cereal grains or silage. P. roqueforti may be considered as a "domesticated" variant of a progenitor common to contemporary P. chrysogenum and related Penicillia.

  19. Functional clustering of time series gene expression data by Granger causality

    Science.gov (United States)

    2012-01-01

    Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425

  20. Functional clustering of time series gene expression data by Granger causality

    Directory of Open Access Journals (Sweden)

    Fujita André

    2012-10-01

    Full Text Available Abstract Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.

  1. Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

    Science.gov (United States)

    Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

    2003-01-01

    Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375

  2. Gene prioritization and clustering by multi-view text mining.

    Science.gov (United States)

    Yu, Shi; Tranchevent, Leon-Charles; De Moor, Bart; Moreau, Yves

    2010-01-14

    Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.

  3. Human paraoxonase gene cluster overexpression alleviates angiotensin II-induced cardiac hypertrophy in mice.

    Science.gov (United States)

    Pei, Jian-Fei; Yan, Yun-Fei; Tang, Xiaoqiang; Zhang, Yang; Cui, Shen-Shen; Zhang, Zhu-Qin; Chen, Hou-Zao; Liu, De-Pei

    2016-11-01

    Cardiac hypertrophy is the strongest predictor of the development of heart failure, and anti-hypertrophic treatment holds the key to improving the clinical syndrome and increasing the survival rates for heart failure. The paraoxonase (PON) gene cluster (PC) protects against atherosclerosis and coronary artery diseases. However, the role of PC in the heart is largely unknown. To evaluate the roles of PC in cardiac hypertrophy, transgenic mice carrying the intact human PON1, PON2, and PON3 genes and their flanking sequences were studied. We demonstrated that the PC transgene (PC-Tg) protected mice from cardiac hypertrophy induced by Ang II; these mice had reduced heart weight/body weight ratios, decreased left ventricular wall thicknesses and increased fractional shortening compared with wild-type (WT) control. The same protective tendency was also observed with an Apoe (-/-) background. Mechanically, PC-Tg normalized the disequilibrium of matrix metalloproteinases (MMPs)/tissue inhibitors of MMPs (TIMPs) in hypertrophic hearts, which might contribute to the protective role of PC-Tg in cardiac fibrosis and, thus, protect against cardiac remodeling. Taken together, our results identify a novel anti-hypertrophic role for the PON gene cluster, suggesting a possible strategy for the treatment of cardiac hypertrophy through elevating the levels of the PON gene family.

  4. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  5. Phylogenetic diversity of sequences of cyanophage photosynthetic gene psbA in marine and freshwaters.

    Science.gov (United States)

    Chénard, C; Suttle, C A

    2008-09-01

    Many cyanophage isolates which infect the marine cyanobacteria Synechococcus spp. and Prochlorococcus spp. contain a gene homologous to psbA, which codes for the D1 protein involved in photosynthesis. In the present study, cyanophage psbA gene fragments were readily amplified from freshwater and marine samples, confirming their widespread occurrence in aquatic communities. Phylogenetic analyses demonstrated that sequences from freshwaters have an evolutionary history that is distinct from that of their marine counterparts. Similarly, sequences from cyanophages infecting Prochlorococcus and Synechococcus spp. were readily discriminated, as were sequences from podoviruses and myoviruses. Viral psbA sequences from the same geographic origins clustered within different clades. For example, cyanophage psbA sequences from the Arctic Ocean fell within the Synechococcus as well as Prochlorococcus phage groups. Moreover, as psbA sequences are not confined to a single family of phages, they provide an additional genetic marker that can be used to explore the diversity and evolutionary history of cyanophages in aquatic environments.

  6. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

    Directory of Open Access Journals (Sweden)

    Ramy Karam Aziz

    2015-05-01

    Full Text Available Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.

  7. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications

    Science.gov (United States)

    Herzog, Michel; Maroteaux, Luc

    1986-01-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage. PMID:16578795

  8. The lineage-specific evolution of aquaporin gene clusters facilitated tetrapod terrestrial adaptation.

    Directory of Open Access Journals (Sweden)

    Roderick Nigel Finn

    Full Text Available A major physiological barrier for aquatic organisms adapting to terrestrial life is dessication in the aerial environment. This barrier was nevertheless overcome by the Devonian ancestors of extant Tetrapoda, but the origin of specific molecular mechanisms that solved this water problem remains largely unknown. Here we show that an ancient aquaporin gene cluster evolved specifically in the sarcopterygian lineage, and subsequently diverged into paralogous forms of AQP2, -5, or -6 to mediate water conservation in extant Tetrapoda. To determine the origin of these apomorphic genomic traits, we combined aquaporin sequencing from jawless and jawed vertebrates with broad taxon assembly of >2,000 transcripts amongst 131 deuterostome genomes and developed a model based upon Bayesian inference that traces their convergent roots to stem subfamilies in basal Metazoa and Prokaryota. This approach uncovered an unexpected diversity of aquaporins in every lineage investigated, and revealed that the vertebrate superfamily consists of 17 classes of aquaporins (Aqp0 - Aqp16. The oldest orthologs associated with water conservation in modern Tetrapoda are traced to a cluster of three aqp2-like genes in Actinistia that likely arose >500 Ma through duplication of an aqp0-like gene present in a jawless ancestor. In sea lamprey, we show that aqp0 first arose in a protocluster comprised of a novel aqp14 paralog and a fused aqp01 gene. To corroborate these findings, we conducted phylogenetic analyses of five syntenic nuclear receptor subfamilies, which, together with observations of extensive genome rearrangements, support the coincident loss of ancestral aqp2-like orthologs in Actinopterygii. We thus conclude that the divergence of sarcopterygian-specific aquaporin gene clusters was permissive for the evolution of water conservation mechanisms that facilitated tetrapod terrestrial adaptation.

  9. A pyrosequencing assay for the quantitative methylation analysis of the PCDHB gene cluster, the major factor in neuroblastoma methylator phenotype.

    Science.gov (United States)

    Banelli, Barbara; Brigati, Claudio; Di Vinci, Angela; Casciano, Ida; Forlani, Alessandra; Borzì, Luana; Allemanni, Giorgio; Romani, Massimo

    2012-03-01

    Epigenetic alterations are hallmarks of cancer and powerful biomarkers, whose clinical utilization is made difficult by the absence of standardization and of common methods of data interpretation. The coordinate methylation of many loci in cancer is defined as 'CpG island methylator phenotype' (CIMP) and identifies clinically distinct groups of patients. In neuroblastoma (NB), CIMP is defined by a methylation signature, which includes different loci, but its predictive power on outcome is entirely recapitulated by the PCDHB cluster only. We have developed a robust and cost-effective pyrosequencing-based assay that could facilitate the clinical application of CIMP in NB. This assay permits the unbiased simultaneous amplification and sequencing of 17 out of 19 genes of the PCDHB cluster for quantitative methylation analysis, taking into account all the sequence variations. As some of these variations were at CpG doublets, we bypassed the data interpretation conducted by the methylation analysis software to assign the corrected methylation value at these sites. The final result of the assay is the mean methylation level of 17 gene fragments in the protocadherin B cluster (PCDHB) cluster. We have utilized this assay to compare the methylation levels of the PCDHB cluster between high-risk and very low-risk NB patients, confirming the predictive value of CIMP. Our results demonstrate that the pyrosequencing-based assay herein described is a powerful instrument for the analysis of this gene cluster that may simplify the data comparison between different laboratories and, in perspective, could facilitate its clinical application. Furthermore, our results demonstrate that, in principle, pyrosequencing can be efficiently utilized for the methylation analysis of gene clusters with high internal homologies.

  10. Characterization and expression of genes from the RubisCO gene cluster of the chemoautotrophic symbiont of Solemya velum: cbbLSQO.

    Science.gov (United States)

    Schwedock, Julie; Harmer, Tara L; Scott, Kathleen M; Hektor, Harm J; Seitz, Angelica P; Fontana, Matthew C; Distel, Daniel L; Cavanaugh, Colleen M

    2004-09-01

    Chemoautotrophic endosymbionts residing in Solemya velum gills provide this shallow water clam with most of its nutritional requirements. The cbb gene cluster of the S. velum symbiont, including cbbL and cbbS, which encode the large and small subunits of the carbon-fixing enzyme ribulose 1,5-bisphosphate carboxylase/oxygenase (RubisCO), was cloned and expressed in Escherichia coli. The recombinant RubisCO had a high specific activity, approximately 3 micromol min(-1) mg protein (-1), and a KCO2 of 40.3 microM. Based on sequence identity and phylogenetic analyses, these genes encode a form IA RubisCO, both subunits of which are closely related to those of the symbiont of the deep-sea hydrothermal vent gastropod Alviniconcha hessleri and the photosynthetic bacterium Allochromatium vinosum. In the cbb gene cluster of the S. velum symbiont, the cbbLS genes were followed by cbbQ and cbbO, which are found in some but not all cbb gene clusters and whose products are implicated in enhancing RubisCO activity post-translationally. cbbQ shares sequence similarity with nirQ and norQ, found in denitrification clusters of Pseudomonas stutzeri and Paracoccus denitrificans. The 3' region of cbbO from the S. velum symbiont, like that of the three other known cbbO genes, shares similarity to the 3' region of norD in the denitrification cluster. This is the first study to explore the cbb gene structure for a chemoautotrophic endosymbiont, which is critical both as an initial step in evaluating cbb operon structure in chemoautotrophic endosymbionts and in understanding the patterns and forces governing RubisCO evolution and physiology.

  11. Gravitation field algorithm and its application in gene cluster

    Directory of Open Access Journals (Sweden)

    Zheng Ming

    2010-09-01

    Full Text Available Abstract Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA.

  12. Adaptive evolution of the FADS gene cluster within Africa.

    Directory of Open Access Journals (Sweden)

    Rasika A Mathias

    Full Text Available Long chain polyunsaturated fatty acids (LC-PUFAs are essential for brain structure, development, and function, and adequate dietary quantities of LC-PUFAs are thought to have been necessary for both brain expansion and the increase in brain complexity observed during modern human evolution. Previous studies conducted in largely European populations suggest that humans have limited capacity to synthesize brain LC-PUFAs such as docosahexaenoic acid (DHA from plant-based medium chain (MC PUFAs due to limited desaturase activity. Population-based differences in LC-PUFA levels and their product-to-substrate ratios can, in part, be explained by polymorphisms in the fatty acid desaturase (FADS gene cluster, which have been associated with increased conversion of MC-PUFAs to LC-PUFAs. Here, we show evidence that these high efficiency converter alleles in the FADS gene cluster were likely driven to near fixation in African populations by positive selection ∼85 kya. We hypothesize that selection at FADS variants, which increase LC-PUFA synthesis from plant-based MC-PUFAs, played an important role in allowing African populations obligatorily tethered to marine sources for LC-PUFAs in isolated geographic regions, to rapidly expand throughout the African continent 60-80 kya.

  13. Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

    Science.gov (United States)

    Choi, Woo-Yong; Chatterjee, Mainak

    2015-03-01

    With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.

  14. Sequence Analysis of the ank Gene of Granulocytic Ehrlichiae

    OpenAIRE

    2000-01-01

    The ank gene of the agent of human granulocytic ehrlichiosis (HGE) codes for a protein with a predicted molecular size of 131.2 kDa that is recognized by serum from both dogs and humans infected with granulocytic ehrlichiae. As part of an effort to assess the phylogenetic relatedness of granulocytic ehrlichiae from different geographic regions and in different host species, the ank gene was PCR amplified and sequenced from a variety of sources. These included 10 blood specimens from patients ...

  15. Sequence Similarity of Clostridium difficile Strains by Analysis of Conserved Genes and Genome Content Is Reflected by Their Ribotype Affiliation

    Science.gov (United States)

    Kurka, Hedwig; Ehrenreich, Armin; Ludwig, Wolfgang; Monot, Marc; Rupnik, Maja; Barbut, Frederic; Indra, Alexander; Dupuy, Bruno; Liebl, Wolfgang

    2014-01-01

    PCR-ribotyping is a broadly used method for the classification of isolates of Clostridium difficile, an emerging intestinal pathogen, causing infections with increased disease severity and incidence in several European and North American countries. We have now carried out clustering analysis with selected genes of numerous C. difficile strains as well as gene content comparisons of their genomes in order to broaden our view of the relatedness of strains assigned to different ribotypes. We analyzed the genomic content of 48 C. difficile strains representing 21 different ribotypes. The calculation of distance matrix-based dendrograms using the neighbor joining method for 14 conserved genes (standard phylogenetic marker genes) from the genomes of the C. difficile strains demonstrated that the genes from strains with the same ribotype generally clustered together. Further, certain ribotypes always clustered together and formed ribotype groups, i.e. ribotypes 078, 033 and 126, as well as ribotypes 002 and 017, indicating their relatedness. Comparisons of the gene contents of the genomes of ribotypes that clustered according to the conserved gene analysis revealed that the number of common genes of the ribotypes belonging to each of these three ribotype groups were very similar for the 078/033/126 group (at most 69 specific genes between the different strains with the same ribotype) but less similar for the 002/017 group (86 genes difference). It appears that the ribotype is indicative not only of a specific pattern of the amplified 16S–23S rRNA intergenic spacer but also reflects specific differences in the nucleotide sequences of the conserved genes studied here. It can be anticipated that the sequence deviations of more genes of C. difficile strains are correlated with their PCR-ribotype. In conclusion, the results of this study corroborate and extend the concept of clonal C. difficile lineages, which correlate with ribotypes affiliation. PMID:24482682

  16. Sequence and gene expression evolution of paralogous genes in willows.

    Science.gov (United States)

    Harikrishnan, Srilakshmy L; Pucholt, Pascal; Berlin, Sofia

    2015-12-22

    Whole genome duplications (WGD) have had strong impacts on species diversification by triggering evolutionary novelties, however, relatively little is known about the balance between gene loss and forces involved in the retention of duplicated genes originating from a WGD. We analyzed putative Salicoid duplicates in willows, originating from the Salicoid WGD, which took place more than 45 Mya. Contigs were constructed by de novo assembly of RNA-seq data derived from leaves and roots from two genotypes. Among the 48,508 contigs, 3,778 pairs were, based on fourfold synonymous third-codon transversion rates and syntenic positions, predicted to be Salicoid duplicates. Both copies were in most cases expressed in both tissues and 74% were significantly differentially expressed. Mean Ka/Ks was 0.23, suggesting that the Salicoid duplicates are evolving by purifying selection. Gene Ontology enrichment analyses showed that functions related to DNA- and nucleic acid binding were over-represented among the non-differentially expressed Salicoid duplicates, while functions related to biosynthesis and metabolism were over-represented among the differentially expressed Salicoid duplicates. We propose that the differentially expressed Salicoid duplicates are regulatory neo- and/or subfunctionalized, while the non-differentially expressed are dose sensitive, hence, functionally conserved. Multiple evolutionary processes, thus drive the retention of Salicoid duplicates in willows.

  17. Evolution of the C-Type Lectin-Like Receptor Genes of the DECTIN-1 Cluster in the NK Gene Complex

    Directory of Open Access Journals (Sweden)

    Susanne Sattler

    2012-01-01

    Full Text Available Pattern recognition receptors are crucial in initiating and shaping innate and adaptive immune responses and often belong to families of structurally and evolutionarily related proteins. The human C-type lectin-like receptors encoded in the DECTIN-1 cluster within the NK gene complex contain prominent receptors with pattern recognition function, such as DECTIN-1 and LOX-1. All members of this cluster share significant homology and are considered to have arisen from subsequent gene duplications. Recent developments in sequencing and the availability of comprehensive sequence data comprising many species showed that the receptors of the DECTIN-1 cluster are not only homologous to each other but also highly conserved between species. Even in Caenorhabditis elegans, genes displaying homology to the mammalian C-type lectin-like receptors have been detected. In this paper, we conduct a comprehensive phylogenetic survey and give an up-to-date overview of the currently available data on the evolutionary emergence of the DECTIN-1 cluster genes.

  18. The cluster index of regularly varying sequences with applications to limit theory for functions of multivariate Markov chains

    DEFF Research Database (Denmark)

    Mikosch, Thomas Valentin; Wintenberger, Olivier

    2014-01-01

    We introduce the cluster index of a multivariate stationary sequence and characterize the index in terms of the spectral tail process. This index plays a major role in limit theory for partial sums of sequences. We illustrate the use of the cluster index by characterizing infinite variance stable...

  19. Analysis and validation of genome-specific DNA variations in 5' flanking conserved sequences of wheat low-molecular-weight glutenin subunit genes

    Institute of Scientific and Technical Information of China (English)

    LONG; Hai; WEI; Yuming

    2006-01-01

    The thirty-three 5' flanking conserved sequences of the known low-molecular-weight subunit (LMW-GS) genes have been divided into eight clusters, which was in agreement with the classification based on the deduced N-terminal protein sequences. The DNA polymorphism between the eight clusters was obtained by sequence alignment, and a total of 34 polymorphic positions were observed in the approximately 200 bp regions, among which 18 polymorphic positions were candidate SNPs. Seven cluster-specific primer sets were designed for seven out of eight clusters containing cluster-specific bases, with which the genomic DNA of the ditelosomic lines of group 1 chromosomes of a wheat variety 'Chinese Spring' was employed to carry out chromosome assignment. The subsequent cloning and DNA sequencing of PCR fragments validated the sequences specificity of the 5' flanking conserved sequences between LMW-GS gene groups in different genomes. These results suggested that the coding and 5' flanking regions of LMW-GS genes are likely to have evolved in a concerted fashion. The seven primer sets developed in this study could be used to isolate the complete ORFs of seven groups of LMW-GS genes, respectively, and therefore possess great value for further research in the contributions of a single LMW-GS gene to wheat quality in the complex genetic background and the efficient selections of quality-related components in breeding programs.

  20. MeSH key terms for validation and annotation of gene expression clusters

    Energy Technology Data Exchange (ETDEWEB)

    Rechtsteiner, A. (Andreas); Rocha, L. M. (Luis Mateus)

    2004-01-01

    Integration of different sources of information is a great challenge for the analysis of gene expression data, and for the field of Functional Genomics in general. As the availability of numerical data from high-throughput methods increases, so does the need for technologies that assist in the validation and evaluation of the biological significance of results extracted from these data. In mRNA assaying with microarrays, for example, numerical analysis often attempts to identify clusters of co-expressed genes. The important task to find the biological significance of the results and validate them has so far mostly fallen to the biological expert who had to perform this task manually. One of the most promising avenues to develop automated and integrative technology for such tasks lies in the application of modern Information Retrieval (IR) and Knowledge Management (KM) algorithms to databases with biomedical publications and data. Examples of databases available for the field are bibliographic databases c ntaining scientific publications (e.g. MEDLINE/PUBMED), databases containing sequence data (e.g. GenBank) and databases of semantic annotations (e.g. the Gene Ontology Consortium and Medical Subject Headings (MeSH)). We present here an approach that uses the MeSH terms and their concept hierarchies to validate and obtain functional information for gene expression clusters. The controlled and hierarchical MeSH vocabulary is used by the National Library of Medicine (NLM) to index all the articles cited in MEDLINE. Such indexing with a controlled vocabulary eliminates some of the ambiguity due to polysemy (terms that have multiple meanings) and synonymy (multiple terms have similar meaning) that would be encountered if terms would be extracted directly from the articles due to differing article contexts or author preferences and background. Further, the hierarchical organization of the MeSH terms can illustrate the conceptuallfunctional relationships of genes

  1. Discovery of clubroot-resistant genes in Brassica napus by transcriptome sequencing.

    Science.gov (United States)

    Chen, S W; Liu, T; Gao, Y; Zhang, C; Peng, S D; Bai, M B; Li, S J; Xu, L; Zhou, X Y; Lin, L B

    2016-01-01

    Clubroot significantly affects plants of the Brassicaceae family and is one of the main diseases causing serious losses in B. napus yield. Few studies have investigated the clubroot-resistance mechanism in B. napus. Identification of clubroot-resistant genes may be used in clubroot-resistant breeding, as well as to elucidate the molecular mechanism behind B. napus clubroot-resistance. We used three B. napus transcriptome samples to construct a transcriptome sequencing library by using Illumina HiSeq™ 2000 sequencing and bioinformatic analysis. In total, 171 million high-quality reads were obtained, containing 96,149 unigenes of N50-value. We aligned the obtained unigenes with the Nr, Swiss-Prot, clusters of orthologous groups, and gene ontology databases and annotated their functions. In the Kyoto encyclopedia of genes and genomes database, 25,033 unigenes (26.04%) were assigned to 124 pathways. Many genes, including broad-spectrum disease-resistance genes, specific clubroot-resistant genes, and genes related to indole-3-acetic acid (IAA) signal transduction, cytokinin synthesis, and myrosinase synthesis in the Huashuang 3 variety of B. napus were found to be related to clubroot-resistance. The effective clubroot-resistance observed in this variety may be due to the induced increased expression of these disease-resistant genes and strong inhibition of the IAA signal transduction, cytokinin synthesis, and myrosinase synthesis. The homology observed between unigenes 0048482, 0061770 and the Crr1 gene shared 94% nucleotide similarity. Furthermore, unigene 0061770 could have originated from an inversion of the Crr1 5'-end sequence.

  2. The evolution of the galaxy Red Sequence in simulated clusters and groups

    CERN Document Server

    Romeo, Alessio D; Covone, G; Sommer-Larsen, J; Antonuccio-Delogu, V; Capaccioli, M

    2008-01-01

    N-body + hydrodynamical simulations of the formation and evolution of galaxy groups and clusters in a LambdaCDM cosmology are used in order to follow the building-up of the colour-magnitude relation in two clusters and in 12 groups. We have found that galaxies, starting from the more massive, move to the Red Sequence (RS) as they get aged over times and eventually set upon a ``dead sequence'' (DS) once they have stopped their bulk star formation activity. Fainter galaxies keep having significant star formation out to very recent epochs and lie broader around the RS. Environment plays a role as galaxies in groups and cluster outskirts hold star formation activity longer than the central cluster regions. However galaxies experiencing infall from the outskirts to the central parts keep star formation on until they settle on to the DS of the core galaxies. Merging contributes to mass assembly until z~1, after which major events only involve the brightest cluster galaxies. The emerging scenario is that the evoluti...

  3. Opossum carboxylesterases: sequences, phylogeny and evidence for CES gene duplication events predating the marsupial-eutherian common ancestor

    Directory of Open Access Journals (Sweden)

    Chan Jeannie

    2008-02-01

    Full Text Available Abstract Background Carboxylesterases (CES perform diverse metabolic roles in mammalian organisms in the detoxification of a broad range of drugs and xenobiotics and may also serve in specific roles in lipid, cholesterol, pheromone and lung surfactant metabolism. Five CES families have been reported in mammals with human CES1 and CES2 the most extensively studied. Here we describe the genetics, expression and phylogeny of CES isozymes in the opossum and report on the sequences and locations of CES1, CES2 and CES6 'like' genes within two gene clusters on chromosome one. We also discuss the likely sequence of gene duplication events generating multiple CES genes during vertebrate evolution. Results We report a cDNA sequence for an opossum CES and present evidence for CES1 and CES2 like genes expressed in opossum liver and intestine and for distinct gene locations of five opossum CES genes,CES1, CES2.1, CES2.2, CES2.3 and CES6, on chromosome 1. Phylogenetic and sequence alignment studies compared the predicted amino acid sequences for opossum CES with those for human, mouse, chicken, frog, salmon and Drosophila CES gene products. Phylogenetic analyses produced congruent phylogenetic trees depicting a rapid early diversification into at least five distinct CES gene family clusters: CES2, CES1, CES7, CES3, and CES6. Molecular divergence estimates based on a Bayesian relaxed clock approach revealed an origin for the five mammalian CES gene families between 328–378 MYA. Conclusion The deduced amino acid sequence for an opossum cDNA was consistent with its identity as a mammalian CES2 gene product (designated CES2.1. Distinct gene locations for opossum CES1 (1: 446,222,550–446,274,850, three CES2 genes (1: 677,773,395–677,927,030 and a CES6 gene (1: 677,585,520–677,730,419 were observed on chromosome 1. Opossum CES1 and multiple CES2 genes were expressed in liver and intestine. Amino acid sequences for opossum CES1 and three CES2 gene products

  4. Time-series clustering of gene expression in irradiated and bystander fibroblasts: an application of FBPA clustering

    Directory of Open Access Journals (Sweden)

    Markatou Marianthi

    2011-01-01

    Full Text Available Abstract Background The radiation bystander effect is an important component of the overall biological response of tissues and organisms to ionizing radiation, but the signaling mechanisms between irradiated and non-irradiated bystander cells are not fully understood. In this study, we measured a time-series of gene expression after α-particle irradiation and applied the Feature Based Partitioning around medoids Algorithm (FBPA, a new clustering method suitable for sparse time series, to identify signaling modules that act in concert in the response to direct irradiation and bystander signaling. We compared our results with those of an alternate clustering method, Short Time series Expression Miner (STEM. Results While computational evaluations of both clustering results were similar, FBPA provided more biological insight. After irradiation, gene clusters were enriched for signal transduction, cell cycle/cell death and inflammation/immunity processes; but only FBPA separated clusters by function. In bystanders, gene clusters were enriched for cell communication/motility, signal transduction and inflammation processes; but biological functions did not separate as clearly with either clustering method as they did in irradiated samples. Network analysis confirmed p53 and NF-κB transcription factor-regulated gene clusters in irradiated and bystander cells and suggested novel regulators, such as KDM5B/JARID1B (lysine (K-specific demethylase 5B and HDACs (histone deacetylases, which could epigenetically coordinate gene expression after irradiation. Conclusions In this study, we have shown that a new time series clustering method, FBPA, can provide new leads to the mechanisms regulating the dynamic cellular response to radiation. The findings implicate epigenetic control of gene expression in addition to transcription factor networks.

  5. Pre-main sequence variable stars in young open cluster NGC 1893

    OpenAIRE

    Lata, Sneh; Pandey, A.K.(Indian Institute of Technology Bombay (IIT), Mumbai, India); Chen, W. P.; Maheswar, G.; Chauhan, Neelam

    2012-01-01

    We present results of multi-epoch (fourteen nights during 2007-2010) $V$-band photometry of the cluster NGC 1893 region to identify photometric variable stars in the cluster. The study identified a total of 53 stars showing photometric variability. The members associated with the region are identified on the basis of spectral energy distribution, $J-H/H-K$ two colour diagram and $V/V-I$ colour-magnitude diagram. The ages and masses of the majority of pre-main-sequence sources are found to be ...

  6. Recurrent adenylation domain replacement in the microcystin synthetase gene cluster

    Directory of Open Access Journals (Sweden)

    Laakso Kati

    2007-10-01

    Full Text Available Abstract Background Microcystins are small cyclic heptapeptide toxins produced by a range of distantly related cyanobacteria. Microcystins are synthesized on large NRPS-PKS enzyme complexes. Many structural variants of microcystins are produced simulatenously. A recombination event between the first module of mcyB (mcyB1 and mcyC in the microcystin synthetase gene cluster is linked to the simultaneous production of microcystin variants in strains of the genus Microcystis. Results Here we undertook a phylogenetic study to investigate the order and timing of recombination between the mcyB1 and mcyC genes in a diverse selection of microcystin producing cyanobacteria. Our results provide support for complex evolutionary processes taking place at the mcyB1 and mcyC adenylation domains which recognize and activate the amino acids found at X and Z positions. We find evidence for recent recombination between mcyB1 and mcyC in strains of the genera Anabaena, Microcystis, and Hapalosiphon. We also find clear evidence for independent adenylation domain conversion of mcyB1 by unrelated peptide synthetase modules in strains of the genera Nostoc and Microcystis. The recombination events replace only the adenylation domain in each case and the condensation domains of mcyB1 and mcyC are not transferred together with the adenylation domain. Our findings demonstrate that the mcyB1 and mcyC adenylation domains are recombination hotspots in the microcystin synthetase gene cluster. Conclusion Recombination is thought to be one of the main mechanisms driving the diversification of NRPSs. However, there is very little information on how recombination takes place in nature. This study demonstrates that functional peptide synthetases are created in nature through transfer of adenylation domains without the concomitant transfer of condensation domains.

  7. Nucleotide Sequence of the Protective Antigen Gene of Bacillus Anthracis

    Science.gov (United States)

    1988-02-02

    Montie, S. Kadis, and S. I. Ajl (ed.), Microbial toxins, vol. 3. Academic Press, Inc., New York. 23. Little, S. F., and G. B. Knudaon. 1986...Takkinen, and L. Kaariainen. 1981. Nucleotide sequence of the promoter and NHa-terminal signal peptide region of the a- amylase gene from Bacillus

  8. Application of Multi-SOM clustering approach to macrophage gene expression analysis.

    Science.gov (United States)

    Ghouila, Amel; Yahia, Sadok Ben; Malouche, Dhafer; Jmel, Haifa; Laouini, Dhafer; Guerfali, Fatma Z; Abdelhak, Sonia

    2009-05-01

    The production of increasingly reliable and accessible gene expression data has stimulated the development of computational tools to interpret such data and to organize them efficiently. The clustering techniques are largely recognized as useful exploratory tools for gene expression data analysis. Genes that show similar expression patterns over a wide range of experimental conditions can be clustered together. This relies on the hypothesis that genes that belong to the same cluster are coregulated and involved in related functions. Nevertheless, clustering algorithms still show limits, particularly for the estimation of the number of clusters and the interpretation of hierarchical dendrogram, which may significantly influence the outputs of the analysis process. We propose here a multi level SOM based clustering algorithm named Multi-SOM. Through the use of clustering validity indices, Multi-SOM overcomes the problem of the estimation of clusters number. To test the validity of the proposed clustering algorithm, we first tested it on supervised training data sets. Results were evaluated by computing the number of misclassified samples. We have then used Multi-SOM for the analysis of macrophage gene expression data generated in vitro from the same individual blood infected with 5 different pathogens. This analysis led to the identification of sets of tightly coregulated genes across different pathogens. Gene Ontology tools were then used to estimate the biological significance of the clustering, which showed that the obtained clusters are coherent and biologically significant.

  9. [Sequence characterization of the 5'-Flanking region of the GHR gene in Tibetan sheep].

    Science.gov (United States)

    Ma, Zhi-Jie; Wei, Ya-Ping; Zhong, Jin-Cheng; Chen, Zhi-Hua; Lu, Hong; Tong, Zi-Bao

    2007-08-01

    The 5'-Flanking sequence (including the P1 promotor and exon 1A) of the GHR gene in Oura-type Tibetan sheep (O. aries) was cloned by T-A method and sequenced (GenBank accession No. EF116490). Characterization and comparison of this sequence with mouflons (O. musimon), goat (C. hircus), cattle (B. taurus) and European bison (B. bonasus) orthologues were also conducted. Results showed that: 1) The 5'-flanking region contained many potential transcriptional factor binding sites such as those for C/EBPb, C/EBP, SP1, Cap, USF, HFH-2, HNF-3b, and Oct-1, which might have an important effect on transcription activation and regulation as well as tissue-specific expression. The rate of repetitive sequences was 2.55% and no SINEs, LINEs, LTR anti-transcription elements or DNA transposon elements were found, although one (TG)11 microsatellite was found. 2) In the P1 promotor region, sequence homology between the Tibetan sheep and mouflon, goat, cattle and European bison was 99.7%, 94.2%, 85.9% and 86.5%, respectively, while that for exon 1A was 99.0%, 97.0%, 92.7% and 94.6%, respectively. 3) The molecular phylogenetic tree among these species, constructed by the neighborhood joining method based on the sequences of no-coding region of the GHR genes, placed the two Bovinae species on one branch and the three Caprinae species on the other. Tibetan sheep and mouflons were joined first, followed by the goat, and then the Bovinae species, including the cattle and European bison. This result of phylogenetic clustering was not only identical to the taxonomy, but also to the phylogenetic clustering using the mitochondrial DNA of these species.

  10. Thermodynamics-based models of transcriptional regulation with gene sequence.

    Science.gov (United States)

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  11. Functional analysis of alcS, a gene of the alc cluster in Aspergillus nidulans.

    Science.gov (United States)

    Flipphi, Michel; Robellet, Xavier; Dequier, Emmanuel; Leschelle, Xavier; Felenbok, Béatrice; Vélot, Christian

    2006-04-01

    The ethanol utilization pathway (alc system) of Aspergillus nidulans requires two structural genes, alcA and aldA, which encode the two enzymes (alcohol dehydrogenase and aldehyde dehydrogenase, respectively) allowing conversion of ethanol into acetate via acetyldehyde, and a regulatory gene, alcR, encoding the pathway-specific autoregulated transcriptional activator. The alcR and alcA genes are clustered with three other genes that are also positively regulated by alcR, although they are dispensable for growth on ethanol. In this study, we characterized alcS, the most abundantly transcribed of these three genes. alcS is strictly co-regulated with alcA, and encodes a 262-amino acid protein. Sequence comparison with protein databases detected a putative conserved domain that is characteristic of the novel GPR1/FUN34/YaaH membrane protein family. It was shown that the AlcS protein is located in the plasma membrane. Deletion or overexpression of alcS did not result in any obvious phenotype. In particular, AlcS does not appear to be essential for the transport of ethanol, acetaldehyde or acetate. Basic Local Alignment Search Tool analysis against the A. nidulans genome led to the identification of two novel ethanol- and ethylacetate-induced genes encoding other members of the GPR1/FUN34/YaaH family, AN5226 and AN8390.

  12. Quantitative modeling of a gene's expression from its intergenic sequence.

    Directory of Open Access Journals (Sweden)

    Md Abul Hassan Samee

    2014-03-01

    Full Text Available Modeling a gene's expression from its intergenic locus and trans-regulatory context is a fundamental goal in computational biology. Owing to the distributed nature of cis-regulatory information and the poorly understood mechanisms that integrate such information, gene locus modeling is a more challenging task than modeling individual enhancers. Here we report the first quantitative model of a gene's expression pattern as a function of its locus. We model the expression readout of a locus in two tiers: 1 combinatorial regulation by transcription factors bound to each enhancer is predicted by a thermodynamics-based model and 2 independent contributions from multiple enhancers are linearly combined to fit the gene expression pattern. The model does not require any prior knowledge about enhancers contributing toward a gene's expression. We demonstrate that the model captures the complex multi-domain expression patterns of anterior-posterior patterning genes in the early Drosophila embryo. Altogether, we model the expression patterns of 27 genes; these include several gap genes, pair-rule genes, and anterior, posterior, trunk, and terminal genes. We find that the model-selected enhancers for each gene overlap strongly with its experimentally characterized enhancers. Our findings also suggest the presence of sequence-segments in the locus that would contribute ectopic expression patterns and hence were "shut down" by the model. We applied our model to identify the transcription factors responsible for forming the stripe boundaries of the studied genes. The resulting network of regulatory interactions exhibits a high level of agreement with known regulatory influences on the target genes. Finally, we analyzed whether and why our assumption of enhancer independence was necessary for the genes we studied. We found a deterioration of expression when binding sites in one enhancer were allowed to influence the readout of another enhancer. Thus, interference

  13. The Red Sequence of High-Redshift Clusters: A Comparison with Cosmological Galaxy Formation Models

    Science.gov (United States)

    Menci, N.

    2008-10-01

    We compare the results from a state-of-the-art semi-analytic model of galaxy formation with spectroscopic observations of the distant galaxy clusters observed in the range 1≲ z≲ 1.5. In our model we find that i) a well-defined, narrow red sequence (RS) is obtained already by z≈ 1.2; this is more populated than the field RS analogously to what observed and predicted at z=0; ii) the predicted RS colors and width have average values of 1 and 0.15, respectively, with a cluster-to-cluster variance. The width of the RS of cluster galaxy is 5-10 times lower than the corresponding field value; iii) The predicted distribution of stellar ages of RS galaxies at z=1.2 are peaked at the value τ=3.7 Gyr for both cluster and field; however, for the latter the distribution is significantly skewed toward lower ages. When compared with observations, the above findings show an overall consistency, although the average value ≈ 0.07 of the observed cluster RS width at z≈1.2 is smaller than the corresponding model central value. We discuss the physical origin and the significance of the above results in the framework of cosmological galaxy formation.

  14. The Red Sequence of High-Redshift Clusters: a Comparison with Cosmological Galaxy Formation Models

    CERN Document Server

    Menci, N; Gobat, R; Strazzullo, V; Rettura, A; Mei, S; Demarco, R

    2008-01-01

    We compare the results from a semi-analytic model of galaxy formation with spectro-photometric observations of distant galaxy clusters observed in the range 0.8< z< 1.3. We investigate the properties of their red sequence (RS) galaxies and compare them with those of the field at the same redshift. In our model we find that i) a well-defined, narrow RS is obtained already by z= 1.2; this is found to be more populated than the field RS, analogously to what observed and predicted at z=0; ii) the predicted U-V rest-frame colors and scatter of the cluster RS at z=1.2 have average values of 1 and 0.15 respectively, with a cluster-to-cluster variance of 0.2 and 0.06, respectively. The scatter of the RS of cluster galaxies is around 5 times smaller than the corresponding field value; iii) when the RS galaxies are considered, the mass growth histories of field and cluster galaxies at z=1.2 are similar, with 90 % of the stellar mass of RS galaxies at z=1.2 already formed at cosmic times t=2.5 Gyr, and 50 % at t=1...

  15. Sequence validation of candidates for selectively important genes in sunflower.

    Directory of Open Access Journals (Sweden)

    Mark A Chapman

    Full Text Available Analyses aimed at identifying genes that have been targeted by past selection provide a powerful means for investigating the molecular basis of adaptive differentiation. In the case of crop plants, such studies have the potential to not only shed light on important evolutionary processes, but also to identify genes of agronomic interest. In this study, we test for evidence of positive selection at the DNA sequence level in a set of candidate genes previously identified in a genome-wide scan for genotypic evidence of selection during the evolution of cultivated sunflower. In the majority of cases, we were able to confirm the effects of selection in shaping diversity at these loci. Notably, the genes that were found to be under selection via our sequence-based analyses were devoid of variation in the cultivated sunflower gene pool. This result confirms a possible strategy for streamlining the search for adaptively-important loci process by pre-screening the derived population to identify the strongest candidates before sequencing them in the ancestral population.

  16. Cloning and sequencing of a Moraxella bovis pilin gene.

    Science.gov (United States)

    Marrs, C F; Schoolnik, G; Koomey, J M; Hardy, J; Rothbard, J; Falkow, S

    1985-07-01

    Moraxella bovis pili have been shown to play a major role in both infectivity and protective immunity of bovine infectious keratoconjunctivitis. Sonicated M. bovis DNA from the piliated strain EPP63 was inserted into the vector lambda gt11 with EcoRI linkers. Recombinant phage were screened with an oligonucleotide probe based on the amino-terminal portion of the DNA sequence of a Neisseria gonorrhoeae pilin gene. Two candidate phages produced a protein that comigrated with EPP63 beta pilin in sodium dodecyl sulfate-polyacrylamide gels and bound anti-pilus antisera. The 1.9-kilobase insert from one of these, lambda gt11M182, was subcloned in both orientations into pBR322, forming the plasmids pMxB7 and pMxB9, both of which produced beta pilin, as did pMxB12, a HindIII deletion derivative of pMxB7. In HB101(pMxB12), the M. bovis pilin protein was shown to be primarily localized in the inner membrane. The entire 939-base-pair insert of pMxB12 was sequenced, revealing a ribosome binding site just upstream of the coding region and an AT-rich region further upstream containing some potential RNA polymerase recognition sites. The translation of the sequence predicts a six-amino-acid leader sequence preceding the phenylalanine that begins the mature protein. Codon usage analysis of the M. bovis beta pilin gene revealed greater use of the CUA codon for leucine than usual for a well-expressed Escherichia coli gene. Comparisons of the M. bovis EPP63 beta pilin protein sequence with other pilin gene sequences are presented.

  17. Transgene-induced silencing of the zoosporogenesis-specific NIFC gene cluster of Phytophthora infestans involves chromatin alterations.

    Science.gov (United States)

    Judelson, Howard S; Tani, Shuji

    2007-07-01

    Clustered within the genome of the oomycete phytopathogen Phytophthora infestans are four genes encoding spore-specific nuclear LIM interactor-interacting factors (NIF proteins, a type of transcriptional regulator) that are moderately conserved in DNA sequence. NIFC1, NIFC2, and NIFC3 are zoosporogenesis-induced and grouped within 4 kb, and 20 kb away resides a sporulation-induced form, NIFS. To test the function of the NIFC family, plasmids expressing full-length hairpin constructs of NIFC1 or NIFC2 were stably transformed into P. infestans. This triggered silencing of the cognate gene in about one-third of transformants, and all three NIFC genes were usually cosilenced. However, NIFS escaped silencing despite its high sequence similarity to the NIFC genes. Silencing of the three NIFC genes impaired zoospore cyst germination by 60% but did not affect other aspects of the life cycle. Silencing was transcriptional based on nuclear run-on assays and associated with tighter chromatin packing based on nuclease accessibility experiments. The chromatin alterations extended a few hundred nucleotides beyond the boundaries of the transcribed region of the NIFC cluster and were not associated with increased DNA methylation. A plasmid expressing a short hairpin RNA having sequence similarity only to NIFC1 silenced both that gene and an adjacent member of the gene cluster, likely due to the expansion of a heterochromatic domain from the targeted locus. These data help illuminate the mechanism of silencing in Phytophthora and suggest that caution should be used when interpreting silencing experiments involving closely spaced genes.

  18. Sequence and analysis of the gene for bacteriophage T3 RNA polymerase.

    Science.gov (United States)

    McGraw, N J; Bailey, J N; Cleaves, G R; Dembinski, D R; Gocke, C R; Joliffe, L K; MacWright, R S; McAllister, W T

    1985-01-01

    The RNA polymerases encoded by bacteriophages T3 and T7 have similar structures, but exhibit nearly exclusive template specificities. We have determined the nucleotide sequence of the region of T3 DNA that encodes the T3 RNA polymerase (the gene 1.0 region), and have compared this sequence with the corresponding region of T7 DNA. The predicted amino acid sequence of the T3 RNA polymerase exhibits very few changes when compared to the T7 enzyme (82% of the residues are identical). Significant differences appear to cluster in three distinct regions in the amino-terminal half of the protein. Analysis of the data from both enzymes suggests features that may be important for polymerase function. In particular, a region that differs between the T3 and T7 enzymes exhibits significant homology to the bi-helical domain that is common to many sequence-specific DNA binding proteins. The region that flanks the structural gene contains a number of regulatory elements including: a promoter for the E. coli RNA polymerase, a potential processing site for RNase III and a promoter for the T3 polymerase. The promoter for the T3 RNA polymerase is located only 12 base pairs distal to the stop codon for the structural gene. PMID:3903658

  19. Identification of the nik Gene Cluster of Brucella suis: Regulation and Contribution to Urease Activity

    Science.gov (United States)

    Jubier-Maurin, Véronique; Rodrigue, Agnès; Ouahrani-Bettache, Safia; Layssac, Marion; Mandrand-Berthelot, Marie-Andrée; Köhler, Stephan; Liautard, Jean-Pierre

    2001-01-01

    Analysis of a Brucella suis 1330 gene fused to a gfp reporter, and identified as being induced in J774 murine macrophage-like cells, allowed the isolation of a gene homologous to nikA, the first gene of the Escherichia coli operon encoding the specific transport system for nickel. DNA sequence analysis of the corresponding B. suis nik locus showed that it was highly similar to that of E. coli except for localization of the nikR regulatory gene, which lies upstream from the structural nikABCDE genes and in the opposite orientation. Protein sequence comparisons suggested that the deduced nikABCDE gene products belong to a periplasmic binding protein-dependent transport system. The nikA promoter-gfp fusion was activated in vitro by low oxygen tension and metal ion deficiency and was repressed by NiCl2 excess. Insertional inactivation of nikA strongly reduced the activity of the nickel metalloenzyme urease, which was restored by addition of a nickel excess. Moreover, the nikA mutant of B. suis was functionally complemented with the E. coli nik gene cluster, leading to the recovery of urease activity. Reciprocally, an E. coli strain harboring a deleted nik operon recovered hydrogenase activity by heterologous complementation with the B. suis nik locus. Taking into account these results, we propose that the nik locus of B. suis encodes a nickel transport system. The results further suggest that nickel could enter B. suis via other transport systems. Intracellular growth rates of the B. suis wild-type and nikA mutant strains in human monocytes were similar, indicating that nikA was not essential for this step of infection. We discuss a possible role of nickel transport in maintaining enzymatic activities which could be crucial for survival of the bacteria under the environmental conditions encountered within the host. PMID:11133934

  20. Comparisons of Graph-structure Clustering Methods for Gene Expression Data

    Institute of Scientific and Technical Information of China (English)

    Zhuo FANG; Lei LIU; Jiong YANG; Qing-Ming LUO; Yi-Xue LI

    2006-01-01

    Although many numerical clustering algorithms have been applied to gene expression data analysis, the essential step is still biological interpretation by manual inspection. The correlation between genetic co-regulation and affiliation to a common biological process is what biologists expect. Here, we introduce some clustering algorithms that are based on graph structure constituted by biological knowledge. After applying a widely used dataset, we compared the result clusters of two of these algorithms in terms of the homogeneity of clusters and coherence of annotation and matching ratio. The results show that the clusters of knowledge-guided analysis are the kernel parts of the clusters of Gene Ontology (GO)-Cluster software, which contains the genes that are most expression correlative and most consistent with biological functions. Moreover, knowledge-guided analysis seems much more applicable than GO-Cluster in a larger dataset.

  1. Prevalence and characteristics of pks genotoxin gene cluster-positive clinical Klebsiella pneumoniae isolates in Taiwan

    Science.gov (United States)

    Chen, Ying-Tsong; Lai, Yi-Chyi; Tan, Mei-Chen; Hsieh, Li-Yun; Wang, Jann-Tay; Shiau, Yih-Ru; Wang, Hui-Ying; Lin, Ann-Chi; Lai, Jui-Fen; Huang, I-Wen; Lauderdale, Tsai-Ling

    2017-01-01

    The pks gene cluster encodes enzymes responsible for the synthesis of colibactin, a genotoxin that has been shown to induce DNA damage and contribute to increased virulence. The present study investigated the prevalence of pks in clinical K. pneumoniae isolates from a national surveillance program in Taiwan, and identified microbiological and molecular factors associated with pks-carriage. The pks gene cluster was detected in 67 (16.7%) of 400 isolates from various specimen types. Multivariate analysis revealed that isolates of K1, K2, K20, and K62 capsular types (p < 0.001), and those more susceptible to antimicrobial agents (p = 0.001) were independent factors strongly associated with pks-carriage. Phylogenetic studies on the sequence type (ST) and pulsed-field gel electrophoresis patterns indicated that the pks-positive isolates belong to a clonal group of ST23 in K1, a locally expanding ST65 clone in K2, a ST268-related K20 group, and a highly clonal ST36:K62 group. Carriage of rmpA, iutC, and ybtA, the genes associated with hypervirulence, was significantly higher in the pks-positive isolates than the pks-negative isolates (95.5% vs. 13.2%, p < 0.001). Further studies to determine the presence of hypervirulent pks-bearing bacterial populations in the flora of community residents and their association with different disease entities may be warranted. PMID:28233784

  2. Accurate prediction of secondary metabolite gene clusters in filamentous fungi

    DEFF Research Database (Denmark)

    Andersen, Mikael Rørdam; Nielsen, Jakob Blæsbjerg; Klitgaard, Andreas

    2013-01-01

    Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify suppo...... used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom....

  3. Molecular diversity at the major cluster of disease resistance genes in cultivated and wild Lactuca spp.

    Science.gov (United States)

    Sicard, D; Woo, S S; Arroyo-Garcia, R; Ochoa, O; Nguyen, D; Korol, A; Nevo, E; Michelmore, R

    1999-08-01

    Diversity was analyzed in wild and cultivated Lactuca germplasm using molecular markers derived from resistance genes of the NBS-LRR type. Three molecular markers, one microsatellite marker and two SCAR markers that amplified LRR-encoding regions, were developed from sequences of resistance gene homologs at the main resistance gene cluster in lettuce. Variation for these markers were assessed in germplasm including accessions of cultivated lettuce, Lactuca sativa L. and three wild Lactuca spp., L. serriola L., L. saligna and L. virosa L. Diversity was also studied within and between natural populations of L. serriola from Israel and California; the former is close to the center of diversity for Lactuca spp. while the latter is an area of more recent colonization. Large numbers of haplotypes were detected indicating the presence of numerous resistance genes in wild species. The diversity in haplotypes provided evidence for gene duplication and unequal crossing-over during the evolution of this cluster of resistance genes. However, there was no evidence for duplications and deletions within the LRR-encoding regions studied. The three markers were highly correlated with resistance phenotypes in L. sativa. They were able to discriminate between accessions that had previously been shown to be resistant to all known isolates of Bremia lactucae. Therefore, these markers will be highly informative for the establishment of core collections and marker-aided selection. A hierarchical analysis of the population structure of L. serriola showed that countries, as well as locations, were significantly differentiated. These differences may reflect local founder effects and/or divergent selection.

  4. Whole genome sequencing as a tool to investigate a cluster of seven cases of listeriosis in Austria and Germany, 2011-2013.

    Science.gov (United States)

    Schmid, D; Allerberger, F; Huhulescu, S; Pietzka, A; Amar, C; Kleta, S; Prager, R; Preußel, K; Aichinger, E; Mellmann, A

    2014-05-01

    A cluster of seven human cases of listeriosis occurred in Austria and in Germany between April 2011 and July 2013. The Listeria monocytogenes serovar (SV) 1/2b isolates shared pulsed-field gel electrophoresis (PFGE) and fluorescent amplified fragment length polymorphism (fAFLP) patterns indistinguishable from those from five food producers. The seven human isolates, a control strain with a different PFGE/fAFLP profile and ten food isolates were subjected to whole genome sequencing (WGS) in a blinded fashion. A gene-by-gene comparison (multilocus sequence typing (MLST)+) was performed, and the resulting whole genome allelic profiles were compared using SeqSphere(+) software version 1.0. On analysis of 2298 genes, the four human outbreak isolates from 2012 to 2013 had different alleles at ≤6 genes, i.e. differed by ≤6 genes from each other; the dendrogram placed these isolates in between five Austrian unaged soft cheese isolates from producer A (≤19-gene difference from the human cluster) and two Austrian ready-to-eat meat isolates from producer B (≤8-gene difference from the human cluster). Both food products appeared on grocery bills prospectively collected by these outbreak cases after hospital discharge. Epidemiological results on food consumption and MLST+ clearly separated the three cases in 2011 from the four 2012-2013 outbreak cases (≥48 different genes). We showed that WGS is capable of discriminating L. monocytogenes SV1/2b clones not distinguishable by PFGE and fAFLP. The listeriosis outbreak described clearly underlines the potential of sequence-based typing methods to offer enhanced resolution and comparability of typing systems for public health applications.

  5. Selections of data preprocessing methods and similarity metrics for gene cluster analysis

    Institute of Scientific and Technical Information of China (English)

    YANG Chunmei; WAN Baikun; GAO Xiaofeng

    2006-01-01

    Clustering is one of the major exploratory techniques for gene expression data analysis. Only with suitable similarity metrics and when datasets are properly preprocessed, can results of high quality be obtained in cluster analysis. In this study, gene expression datasets with external evaluation criteria were preprocessed as normalization by line, normalization by column or logarithm transformation by base-2, and were subsequently clustered by hierarchical clustering, k-means clustering and self-organizing maps (SOMs) with Pearson correlation coefficient or Euclidean distance as similarity metric. Finally, the quality of clusters was evaluated by adjusted Rand index. The results illustrate that k-means clustering and SOMs have distinct advantages over hierarchical clustering in gene clustering, and SOMs are a bit better than k-means when randomly initialized. It also shows that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and dataset normalized by line. Meanwhile, k-means clustering and SOMs can produce better clusters with Euclidean distance and logarithm transformed datasets. These results will afford valuable reference to the implementation of gene expression cluster analysis.

  6. Imprinted genes show unique patterns of sequence conservation

    Directory of Open Access Journals (Sweden)

    Helms Volkhard

    2010-11-01

    Full Text Available Abstract Background Genomic imprinting is an evolutionary conserved mechanism of epigenetic gene regulation in placental mammals that results in silencing of one of the parental alleles. In order to decipher interactions between allele-specific DNA methylation of imprinted genes and evolutionary conservation, we performed a genome-wide comparative investigation of genomic sequences and highly conserved elements of imprinted genes in human and mouse. Results Evolutionarily conserved elements in imprinted regions differ from those associated with autosomal genes in various ways. Whereas for maternally expressed genes strong divergence of protein-encoding sequences is most prominent, paternally expressed genes exhibit substantial conservation of coding and noncoding sequences. Conserved elements in imprinted regions are marked by enrichment of CpG dinucleotides and low (TpG+CpA/(2·CpG ratios indicate reduced CpG deamination. Interestingly, paternally and maternally expressed genes can be distinguished by differences in G+C and CpG contents that might be associated with unusual epigenetic features. Especially noncoding conserved elements of paternally expressed genes are exceptionally G+C and CpG rich. In addition, we confirmed a frequent occurrence of intronic CpG islands and observed a decelerated degeneration of ancient LINE-1 repeats. We also found a moderate enrichment of YY1 and CTCF binding sites in imprinted regions and identified several short sequence motifs in highly conserved elements that might act as additional regulatory elements. Conclusions We discovered several novel conserved DNA features that might be related to allele-specific DNA methylation. Our results hint at reduced CpG deamination rates in imprinted regions, which affects mostly noncoding conserved elements of paternally expressed genes. Pronounced differences between maternally and paternally expressed genes imply specific modes of evolution as a result of differences in

  7. The type F6 neurotoxin gene cluster locus of group II clostridium botulinum has evolved by successive disruption of two different ancestral precursors.

    Science.gov (United States)

    Carter, Andrew T; Stringer, Sandra C; Webb, Martin D; Peck, Michael W

    2013-01-01

    Genome sequences of five different Group II (nonproteolytic) Clostridium botulinum type F6 strains were compared at a 50-kb locus containing the neurotoxin gene cluster. A clonal origin for these strains is indicated by the fact that sequences were identical except for strain Eklund 202F, with 10 single-nucleotide polymorphisms and a 15-bp deletion. The essential topB gene encoding topoisomerase III was found to have been split by the apparent insertion of 34.4 kb of foreign DNA (in a similar manner to that in Group II C. botulinum type E where the rarA gene has been disrupted by a neurotoxin gene cluster). The foreign DNA, which includes the intact 13.6-kb type F6 neurotoxin gene cluster, bears not only a newly introduced topB gene but also two nonfunctional botulinum neurotoxin gene remnants, a type B and a type E. This observation combined with the discovery of bacteriophage integrase genes and IS4 elements suggest that several rounds of recombination/horizontal gene transfer have occurred at this locus. The simplest explanation for the current genotype is that the ancestral bacterium, a Group II C. botulinum type B strain, received DNA firstly from a strain containing a type E neurotoxin gene cluster, then from a strain containing a type F6 neurotoxin gene cluster. Each event disrupted the previously functional neurotoxin gene. This degree of successive recombination at one hot spot is without precedent in C. botulinum, and it is also the first description of a Group II C. botulinum genome containing more than one neurotoxin gene sequence.

  8. Evidence for the Universality of Properties of Red-Sequence Galaxies in X-ray- and Red-Sequence-Selected Clusters at z ~ 1

    CERN Document Server

    Foltz, Ryan; Wilson, Gillian; van der Burg, Remco; Muzzin, Adam; Lidman, Chris; Demarco, Ricardo; Nantais, Julie; DeGroot, Andrew; Yee, Howard

    2015-01-01

    We study the slope, intercept, and scatter of the color-magnitude and color-mass relations for a sample of ten infrared red-sequence-selected clusters at z ~ 1. The quiescent galaxies in these clusters formed the bulk of their stars above z ~ 3 with an age spread {\\Delta}t ~ 1 Gyr. We compare UVJ color-color and spectroscopic-based galaxy selection techniques, and find a 15% difference in the galaxy populations classified as quiescent by these methods. We compare the color-magnitude relations from our red-sequence selected sample with X-ray- and photometric- redshift-selected cluster samples of similar mass and redshift. Within uncertainties, we are unable to detect any difference in the ages and star formation histories of quiescent cluster members in clusters selected by different methods, suggesting that the dominant quenching mechanism is insensitive to cluster baryon partitioning at z ~ 1.

  9. EVIDENCE FOR THE UNIVERSALITY OF PROPERTIES OF RED-SEQUENCE GALAXIES IN X-RAY- AND RED-SEQUENCE-SELECTED CLUSTERS AT z ∼ 1

    Energy Technology Data Exchange (ETDEWEB)

    Foltz, R.; Wilson, G.; DeGroot, A. [Department of Physics and Astronomy, University of California Riverside, 900 University Avenue, Riverside, CA 92521 (United States); Rettura, A. [Infrared Processing and Analysis Center, California Institute of Technology, KS 314-6, Pasadena, CA 91125 (United States); Van der Burg, R. F. J. [Laboratoire AIM, IRFU/Service d’Astrophysique—CEA/DSM—CNRS—Université Paris Diderot, Bât. 709, CEA-Saclay, F-91191 Gif-sur-Yvette Cedex (France); Muzzin, A. [Institute of Astronomy, University of Cambridge, Madingley Road, Cambridge, CB3 0HA (United Kingdom); Lidman, C. [Australian Astronomical Observatory, P.O. Box 915, North Ryde NSW 1670 (Australia); Demarco, R. [Department of Astronomy, Universidad de Concepcion, Barrio Universitario. Casilla 160-C, Concepcion (Chile); Nantais, Julie [Grupo Astronomi´a, Departamento de Ciencias Fi´sicas, Universidad Andrés Bello, República 220, Santiago (Chile); Yee, H., E-mail: ryan.foltz@email.ucr.edu, E-mail: gillian.wilson@ucr.edu, E-mail: adegr001@ucr.edu, E-mail: arettura@astro.caltech.edu, E-mail: remco.van-der-burg@cea.fr, E-mail: avmuzzin@ast.cam.ac.uk, E-mail: clidman@aao.gov.au, E-mail: rdemarco@astro-udec.cl, E-mail: julie.nantais@unab.cl, E-mail: hyee@astro.utoronto.ca [Dept of Astronomy and Astrophysics, University of Toronto, 50 Saint George Street, Toronto, ON M5S 3H4 (Canada)

    2015-10-20

    We study the slope, intercept, and scatter of the color–magnitude and color–mass relations for a sample of 10 infrared red-sequence-selected clusters at z ∼ 1. The quiescent galaxies in these clusters formed the bulk of their stars above z ≳ 3 with an age spread Δt ≳ 1 Gyr. We compare UVJ color–color and spectroscopic-based galaxy selection techniques, and find a 15% difference in the galaxy populations classified as quiescent by these methods. We compare the color–magnitude relations from our red-sequence selected sample with X-ray- and photometric-redshift-selected cluster samples of similar mass and redshift. Within uncertainties, we are unable to detect any difference in the ages and star formation histories of quiescent cluster members in clusters selected by different methods, suggesting that the dominant quenching mechanism is insensitive to cluster baryon partitioning at z ∼ 1.

  10. Comparative Analysis of Cluster Validity Indices in Identifying Some Possible Genes Mediating Certain Cancers.

    Science.gov (United States)

    Ghosh, Anupam; Dhara, Bibhas Chandra; De, Rajat K

    2013-04-01

    In this article, we compare the performance of 19 cluster validity indices, in identifying some possible genes mediating certain cancers, based on gene expression data. For the purpose of this comparison, we have developed a method. The proposed method involves cluster generation, selection of the best k-value or c-values, cluster identification, identifying the altered gene cluster, scoring an altered gene cluster and determining the best k-value or c-value exploring through biological repositories. The effectiveness of the method has been demonstrated on three gene expression data sets dealing with human lung cancer, colon cancer, and leukemia. Here, we have used three clustering algorithms, i.e., k-means, PAM and fuzzy c-means. We have used biochemical pathways related to these cancers and p-value statistics for validating the study. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Multilocus sequence typing of Candida tropicalis shows clonal cluster enrichment in azole-resistant isolates from patients in Shanghai, China.

    Science.gov (United States)

    Wang, Ying; Shi, Ce; Liu, Jin-Yan; Li, Wen-Jing; Zhao, Yue; Xiang, Ming-Jie

    2016-10-01

    To explore the putative correlation between the multilocus sequence types (MLST) and antifungal susceptibility of clinical Candida tropicalis isolates in Mainland China. Eighty-two clinical C. tropicalis isolates were collected from sixty-nine patients at Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, from July 2012 to February 2015, and antifungal susceptibility tests were performed. Genetic profiles of those 82 isolates (30 azole-resistant and 52 azole-susceptible) were characterised by multilocus sequence typing. Phylogenetic analysis of the data was conducted with the clustering method, using UPGMA (unweighted pair group method with arithmetic averages) and the minimal spanning tree algorithm. MLST clonal clusters were analysed using the eBURST V3 package. Of the six gene fragments identified in multilocus sequence typing, SAPT4 presented the highest typing efficiency, whereas SAPT2 was the least efficient. Of the 44 diploid sequence types (DSTs) differentiated, 32 DSTs and 12 genotypes were identified as new to the C. tropicalis DST database. Twenty (45.45%) of the 44 DSTs were assigned to seven major groups based on eBURST analysis. Of these, Group 6, which contained DST 376, DST 505, DST 506 and DST 507, accounted for 76.7% of the 30 azole-resistant isolates. However, the genetic relationships among the azole-susceptible isolates were relatively decentralised. This MLST analysis of the putative correlation between the MLST types and antifungal susceptibility of clinical C. tropicalis isolates in Mainland China shows that DSTs 376, 505, 506 and 507 are closely related azole-resistant C. tropicalis clones.

  12. Maize Gene Atlas Developed by RNA Sequencing and Comparative Evaluation of Transcriptomes Based on RNA Sequencing and Microarrays

    Science.gov (United States)

    Sekhon, Rajandeep S.; Briskine, Roman; Hirsch, Candice N.; Myers, Chad L.; Springer, Nathan M.; Buell, C. Robin; de Leon, Natalia; Kaeppler, Shawn M.

    2013-01-01

    Transcriptome analysis is a valuable tool for identification and characterization of genes and pathways underlying plant growth and development. We previously published a microarray-based maize gene atlas from the analysis of 60 unique spatially and temporally separated tissues from 11 maize organs [1]. To enhance the coverage and resolution of the maize gene atlas, we have analyzed 18 selected tissues representing five organs using RNA sequencing (RNA-Seq). For a direct comparison of the two methodologies, the same RNA samples originally used for our microarray-based atlas were evaluated using RNA-Seq. Both technologies produced similar transcriptome profiles as evident from high Pearson's correlation statistics ranging from 0.70 to 0.83, and from nearly identical clustering of the tissues. RNA-Seq provided enhanced coverage of the transcriptome, with 82.1% of the filtered maize genes detected as expressed in at least one tissue by RNA-Seq compared to only 56.5% detected by microarrays. Further, from the set of 465 maize genes that have been historically well characterized by mutant analysis, 427 show significant expression in at least one tissue by RNA-Seq compared to 390 by microarray analysis. RNA-Seq provided higher resolution for identifying tissue-specific expression as well as for distinguishing the expression profiles of closely related paralogs as compared to microarray-derived profiles. Co-expression analysis derived from the microarray and RNA-Seq data revealed that broadly similar networks result from both platforms, and that co-expression estimates are stable even when constructed from mixed data including both RNA-Seq and microarray expression data. The RNA-Seq information provides a useful complement to the microarray-based maize gene atlas and helps to further understand the dynamics of transcription during maize development. PMID:23637782

  13. Morphological evolution of cluster red sequence galaxies in the past 9 Gyr

    CERN Document Server

    De Propris, Roberto; Phillipps, Steve

    2016-01-01

    Galaxies arrive on the red sequences of clusters at high redshift ($z>1$) once their star formation is quenched and evolve passively thereafter. However, we have previously found that cluster red sequence galaxies (CRSGs) undergo significant morphological evolution subsequent to the cessation of star formation, at some point in the past 9-10~Gyr. Through a detailed study of a large sample of cluster red sequence galaxies spanning $0.2

  14. Structural organization, sequence, and expression of the mouse HEXA gene encoding the alpha subunit of hexosaminidase A.

    Science.gov (United States)

    Wakamatsu, N; Benoit, G; Lamhonwah, A M; Zhang, Z X; Trasler, J M; Triggs-Raine, B L; Gravel, R A

    1994-11-01

    Genomic clones of the mouse HEXA gene encoding the alpha subunit of lysosomal beta-hexosaminidase A have been isolated, analyzed, and sequenced. The HEXA gene spans approximately 26 kb and consists of 14 exons and 13 introns. The 5' flanking region of the gene has three candidate GC boxes and a number of potential promoter and regulatory elements. Promoter analysis using deletion constructs of 5' flanking sequence fused to the bacterial chloramphenicol acetyltransferase (CAT) gene showed that 150 bp of 5' sequence was sufficient for expression in transfected monkey kidney COS cells. Determination of the sequence of the 5' end of the Hex alpha mRNA by an "anchor-ligation PCR" procedure showed that transcription is initiated from a cluster of sites centered -42, -32, and -21 bp from the first in-frame ATG. Northern blot analysis from 11 different tissues showed over five times the steady-state level of Hex alpha mRNA in testis as compared to that found in three different brain regions; the lowest level (about 1/3 of brain) was found in liver. Comparison of the 5' flanking sequence with that of the human HEXA gene revealed 78% identity within the first 100 bp. These data suggest that the mouse HEXA gene is controlled mainly by sequences located within 150 bp of the 5' flanking region, and we speculate that it may have a role, not only in brain and other tissues, but also in reproductive function in the adult male mouse.

  15. Molecular phylogenetic and sequence variation analysis of dimeric α-amylase inhibitor genes in wheat and its wild relative species

    Directory of Open Access Journals (Sweden)

    Bharati Pandey

    2016-06-01

    Full Text Available Dimeric alpha-amylase inhibitors serve protection against insects that are highly dependent on starch for their energy. In order to study the molecular evolution and sequence variation, we have sequenced dimeric α-amylase inhibitors gene from different genomes in Triticeae including Indian bread and durum wheat genotypes. Using BLAST, obtained sequences show very high homology with other inhibitors available at GenBank database and had common conserved 10 cysteine residues. Investigated frequency of significant SNPs in the α-amylase inhibitor gene was 1 out of 60 bases. The phylogenetic analysis based on deduced amino acid sequences revealed that the genes encoding dimeric α-amylase inhibitors formed three groups and genes isolated from Indian bread wheat clustered with 0.19 inhibitors. In addition, we predicted that dimeric α-amylase inhibitors co-localized into chloroplast and mitochondria expect for the sequences isolated from Aegilops tauschii. Fingerprinting analysis done with ScanProsite confirmed biologically meaningful signatures. Multiple sequence alignment of dimeric α-amylase proteins from different plant species revealed a conserved secondary structure region, indicating homology at the sequence and structural levels. Analysis of the protein sequences obtained from wheat and its wild related species are very similar, indicates a highest conservation of these proteins.

  16. The gsdf gene locus harbors evolutionary conserved and clustered genes preferentially expressed in fish previtellogenic oocytes.

    Science.gov (United States)

    Gautier, Aude; Le Gac, Florence; Lareyre, Jean-Jacques

    2011-02-01

    display a different cellular localization compared to that of the gsdf gene indicating that the later gene is not co-regulated. Interestingly, our study identifies new clustered genes that are specifically expressed in previtellogenic oocytes (nup54, aff1, klhl8, sdad1).

  17. Genetic Diversity among Parents of Hybrid Rice Based on Cluster Analysis of Morphological Traits and Simple Sequence Repeat Markers

    Institute of Scientific and Technical Information of China (English)

    WANG Sheng-jun; LU Zuo-mei; WAN Jian-min

    2006-01-01

    The genetic diversity of 41 parental lines popularized in commercial hybrid rice production in China was studied by using cluster analysis of morphological traits and simple sequence repeat (SSR) markers. Forty-one entries were assigned into two clusters (I.e. Early or medium-maturing cluster; medium or late-maturing cluster) and further assigned into six sub-clusters based on morphological trait cluster analysis. The early or medium-maturing cluster was composed of 15 maintainer lines, four early-maturing restorer lines and two thermo-sensitive genic male sterile lines, and the medium or late-maturing cluster included 16 restorer lines and 4 medium or late-maturing maintainer lines. Moreover, the SSR cluster analysis classified 41 entries into two clusters (I.e. Maintainer line cluster and restorer line cluster) and seven sub-clusters. The maintainer line cluster consisted of all 19 maintainer lines, two thermo-sensitive genic male sterile lines, while the restorer line cluster was composed of all 20 restorer lines. The SSR analysis fitted better with the pedigree information. From the views on hybrid rice breeding, the results suggested that SSR analysis might be a better method to study the diversity of parental lines in indica hybrid rice.

  18. A cluster finding algorithm based on the multi-band identification of red-sequence galaxies

    CERN Document Server

    Oguri, Masamune

    2014-01-01

    We present a new algorithm, CAMIRA, to identify clusters of galaxies in wide-field imaging survey data. We base our algorithm on the stellar population synthesis model to predict colours of red-sequence galaxies at a given redshift for an arbitrary set of bandpass filters, with additional calibration using a sample of spectroscopic galaxies to improve the accuracy of the model prediction. We run the algorithm on ~11960 deg^2 of imaging data from the Sloan Digital Sky Survey (SDSS) Data Release 8 to construct a catalogue of 71743 clusters in the redshift range 0.1cluster catalogue with external cluster catalogues to find that our photometric cluster redshift estimates are accurate with low bias and scatter, and that the corrected richness correlates well with X-ray luminosities and temperatures. We use the publicly available Canada-France-Hawaii Telescope Lensing Survey (CFHTLenS) she...

  19. A comparative analysis of the observed white dwarf cooling sequence from globular clusters

    CERN Document Server

    Campos, Fabíola; Romero, A D; Kepler, S O; Ourique, G; Costa, J E S; Bonatto, C J; Winget, D E; Montgomery, M H; Pacheco, T A; Bedin, L R

    2015-01-01

    We report our study of features at the observed red end of the white dwarf cooling sequences for three Galactic globular clusters: NGC\\,6397, 47\\,Tucanae and M\\,4. We use deep colour-magnitude diagrams constructed from archival Hubble Space Telescope (ACS) to systematically investigate the blue turn at faint magnitudes and the age determinations for each cluster. We find that the age difference between NGC\\,6397 and 47\\,Tuc is 1.98$^{+0.44}_{-0.26}$\\,Gyr, consistent with the picture that metal-rich halo clusters were formed later than metal-poor halo clusters. We self-consistently include the effect of metallicity on the progenitor age and the initial-to-final mass relation. In contrast with previous investigations that invoked a single white dwarf mass for each cluster, the data shows a spread of white dwarf masses that better reproduce the shape and location of the blue turn. This effect alone, however, does not completely reproduce the observational data - the blue turn retains some mystery. In this contex...

  20. Informational structure of genetic sequences and nature of gene splicing

    Science.gov (United States)

    Trifonov, E. N.

    1991-10-01

    Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.

  1. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Science.gov (United States)

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is

  2. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Directory of Open Access Journals (Sweden)

    Landfors Mattias

    2010-10-01

    Full Text Available Abstract Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered, missing value imputation (2, standardization of data (2, gene selection (19 or clustering method (11. The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that

  3. Evolution of C2H2-zinc finger genes and subfamilies in mammals: Species-specific duplication and loss of clusters, genes and effector domains

    Directory of Open Access Journals (Sweden)

    Aubry Muriel

    2008-06-01

    Full Text Available Abstract Background C2H2 zinc finger genes (C2H2-ZNF constitute the largest class of transcription factors in humans and one of the largest gene families in mammals. Often arranged in clusters in the genome, these genes are thought to have undergone a massive expansion in vertebrates, primarily by tandem duplication. However, this view is based on limited datasets restricted to a single chromosome or a specific subset of genes belonging to the large KRAB domain-containing C2H2-ZNF subfamily. Results Here, we present the first comprehensive study of the evolution of the C2H2-ZNF family in mammals. We assembled the complete repertoire of human C2H2-ZNF genes (718 in total, about 70% of which are organized into 81 clusters across all chromosomes. Based on an analysis of their N-terminal effector domains, we identified two new C2H2-ZNF subfamilies encoding genes with a SET or a HOMEO domain. We searched for the syntenic counterparts of the human clusters in other mammals for which complete gene data are available: chimpanzee, mouse, rat and dog. Cross-species comparisons show a large variation in the numbers of C2H2-ZNF genes within homologous mammalian clusters, suggesting differential patterns of evolution. Phylogenetic analysis of selected clusters reveals that the disparity in C2H2-ZNF gene repertoires across mammals not only originates from differential gene duplication but also from gene loss. Further, we discovered variations among orthologs in the number of zinc finger motifs and association of the effector domains, the latter often undergoing sequence degeneration. Combined with phylogenetic studies, physical maps and an analysis of the exon-intron organization of genes from the SCAN and KRAB domains-containing subfamilies, this result suggests that the SCAN subfamily emerged first, followed by the SCAN-KRAB and finally by the KRAB subfamily. Conclusion Our results are in agreement with the "birth and death hypothesis" for the evolution of

  4. Identification of the microbiota in carious dentin lesions using 16S rRNA gene sequencing.

    Science.gov (United States)

    Obata, Junko; Takeshita, Toru; Shibata, Yukie; Yamanaka, Wataru; Unemori, Masako; Akamine, Akifumi; Yamashita, Yoshihisa

    2014-01-01

    While mutans streptococci have long been assumed to be the specific pathogen responsible for human dental caries, the concept of a complex dental caries-associated microbiota has received significant attention in recent years. Molecular analyses revealed the complexity of the microbiota with the predominance of Lactobacillus and Prevotella in carious dentine lesions. However, characterization of the dentin caries-associated microbiota has not been extensively explored in different ethnicities and races. In the present study, the bacterial communities in the carious dentin of Japanese subjects were analyzed comprehensively with molecular approaches using the16S rRNA gene. Carious dentin lesion samples were collected from 32 subjects aged 4-76 years, and the 16S rRNA genes, amplified from the extracted DNA with universal primers, were sequenced with a pyrosequencer. The bacterial composition was classified into clusters I, II, and III according to the relative abundance (high, middle, low) of Lactobacillus. The bacterial composition in cluster II was composed of relatively high proportions of Olsenella and Propionibacterium or subdominated by heterogeneous genera. The bacterial communities in cluster III were characterized by the predominance of Atopobium, Prevotella, or Propionibacterium with Streptococcus or Actinomyces. Some samples in clusters II and III, mainly related to Atopobium and Propionibacterium, were novel combinations of microbiota in carious dentin lesions and may be characteristic of the Japanese population. Clone library analysis revealed that Atopobium sp. HOT-416 and P. acidifaciens were specific species associated with dentinal caries among these genera in a Japanese population. We summarized the bacterial composition of dentinal carious lesions in a Japanese population using next-generation sequencing and found typical Japanese types with Atopobium or Propionibacterium predominating.

  5. Phylogenetic diversity of Klebsiella pneumoniae and Klebsiella oxytoca clinical isolates revealed by randomly amplified polymorphic DNA, gyrA and parC genes sequencing and automated ribotyping.

    Science.gov (United States)

    Brisse, S; Verhoef, J

    2001-05-01

    The infra-specific phylogenetic diversity and genetic structure of both Klebsiella pneumoniae and Klebsiella oxytoca was investigated using a combination of randomly amplified polymorphic DNA (RAPD) analysis, sequencing of gyrA and parC genes, and automated ribotyping. After RAPD analysis with four independent primers of 120 clinical isolates collected from 22 European hospitals in 13 countries, K. pneumoniae isolates fell into three clusters and K. oxytoca isolates fell into two clusters, while Klebsiella planticola isolates formed a sixth cluster. Each cluster was geographically widespread. K. pneumoniae cluster I (KpI) accounted for 80% of the isolates of this species and included reference strains of the three subspecies K. pneumoniae subsp. pneumoniae, K. pneumoniae subsp. ozaenae and K. pneumoniae subsp. rhinoscleromatis. Clusters KpII and KpIII were equally represented, as were the two K. oxytoca clusters. Individualization of each cluster was fully confirmed by phylogenetic analysis of gyrA and parC gene sequences. In addition, sequence data supported the evolutionary separation of K. pneumoniae from a phylogenetic group including K. oxytoca, Klebsiella terrigena, K. planticola and Klebsiella ornithinolytica. Automated ribotyping using Mlu I appeared suitable for identification of each Klebsiella cluster. The adonitol fermentation test was found to be useful for cluster identification in K. pneumoniae, since it was negative in all strains of clusters KpIII and in some KpII strains, but always positive in cluster KpI. The usefulness of gyrA and parC sequence data for population genetics and cluster identification in bacteria was demonstrated, even for the phylogenetic positioning of quinolone-resistant isolates.

  6. Cloning,sequencing and phylogenic analysis of duck prion gene

    Institute of Scientific and Technical Information of China (English)

    WANG Qigui; ZHANG Lei; HU Xiaoxiang; FAN Baoliang; LI Ning; LI Hui; WU Changxin

    2004-01-01

    Duck prion gene was cloned and sequenced. Similar to mammalian prion protein (PrP), duck prion is encoded by a single exon of a single copy in genome, which was confirmed by Southern blot analysis. All of the structural features of mammalian PrP were also identified in the duck PrP. Compared with mammalian PrP, it exhibited a 30 % of general similarity. When compared with chicken PrP, it showed a higher homology of 97%. A phylogenetic tree was constructed to trace evolution of prion gene in animals.

  7. Identification of rat genes by TWINSCAN gene prediction, RT-PCR, and direct sequencing

    DEFF Research Database (Denmark)

    Wu, Jia Qian; Shteynberg, David; Arumugam, Manimozhiyan

    2004-01-01

    The publication of a draft sequence of a third mammalian genome--that of the rat--suggests a need to rethink genome annotation. New mammalian sequences will not receive the kind of labor-intensive annotation efforts that are currently being devoted to human. In this paper, we demonstrate...... an alternative approach: reverse transcription-polymerase chain reaction (RT-PCR) and direct sequencing based on dual-genome de novo predictions from TWINSCAN. We tested 444 TWINSCAN-predicted rat genes that showed significant homology to known human genes implicated in disease but that were partially...

  8. Resolving arthropod phylogeny: exploring phylogenetic signal within 41 kb of protein-coding nuclear gene sequence.

    Science.gov (United States)

    Regier, Jerome C; Shultz, Jeffrey W; Ganley, Austen R D; Hussey, April; Shi, Diane; Ball, Bernard; Zwick, Andreas; Stajich, Jason E; Cummings, Michael P; Martin, Joel W; Cunningham, Clifford W

    2008-12-01

    or intermediate categories, whereas groups supported only by a single gene region tended to be from genes of the fast category, arguing that fast genes provide a less consistent signal. (2) A sensitivity analysis was performed in which increasing numbers of genes were excluded, beginning with the fastest. The number of strongly supported nodes increased up to a point and then decreased slightly. Recovery of Hexapoda required removal of fast genes. Support for Mandibulata (Pancrustacea + Myriapoda) also increased, at times to "strong" levels, with removal of the fastest genes. (3) Concordance selection was evaluated by clustering genes according to their ability to recover Pancrustacea, Euchelicerata, or Myriapoda and analyzing the three clusters separately. All clusters of genes recovered the three concordance clades but were at times inconsistent in the relationships recovered among and within these clades, a result that indicates that the a priori concordance criteria may bias phylogenetic signal in unexpected ways. In a further attempt to increase support of taxonomic relationships, sequence data from 49 additional taxa for three slow genes (i.e., EF-1 alpha, EF-2, and Pol II) were combined with the various 13-taxon data sets. The 62-taxon analyses supported the results of the 13-taxon analyses and provided increased support for additional pancrustacean clades found in an earlier analysis including only EF-1 alpha, EF-2, and Pol II.

  9. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  10. Dissection of Two Complex Clusters of Resistance Genes in Lettuce (Lactuca sativa).

    Science.gov (United States)

    Christopoulou, Marilena; McHale, Leah K; Kozik, Alex; Reyes-Chin Wo, Sebastian; Wroblewski, Tadeusz; Michelmore, Richard W

    2015-07-01

    Of the over 50 phenotypic resistance genes mapped in lettuce, 25 colocalize to three major resistance clusters (MRC) on chromosomes 1, 2, and 4. Similarly, the majority of candidate resistance genes encoding nucleotide binding-leucine rich repeat (NLR) proteins genetically colocalize with phenotypic resistance loci. MRC1 and MRC4 span over 66 and 63 Mb containing 84 and 21 NLR-encoding genes, respectively, as well as 765 and 627 genes that are not related to NLR genes. Forward and reverse genetic approaches were applied to dissect MRC1 and MRC4. Transgenic lines exhibiting silencing were selected using silencing of β-glucuronidase as a reporter. Silencing of two of five NLR-encoding gene families resulted in abrogation of nine of 14 tested resistance phenotypes mapping to these two regions. At MRC1, members of the coiled coil-NLR-encoding RGC1 gene family were implicated in host and nonhost resistance through requirement for Dm5/8- and Dm45-mediated resistance to downy mildew caused by Bremia lactucae as well as the hypersensitive response to effectors AvrB, AvrRpm1, and AvrRpt2 of the nonpathogen Pseudomonas syringae. At MRC4, RGC12 family members, which encode toll interleukin receptor-NLR proteins, were implicated in Dm4-, Dm7-, Dm11-, and Dm44-mediated resistance to B. lactucae. Lesions were identified in the sequence of a candidate gene within dm7 loss-of-resistance mutant lines, confirming that RGC12G confers Dm7.

  11. Structure, Function, and Regulation of the Aldouronate Utilization Gene Cluster from Paenibacillus sp. Strain JDR-2▿

    Science.gov (United States)

    Chow, Virginia; Nong, Guang; Preston, James F.

    2007-01-01

    Direct bacterial conversion of the hemicellulose fraction of hardwoods and crop residues to biobased products depends upon extracellular depolymerization of methylglucuronoxylan (MeGAXn), followed by assimilation and intracellular conversion of aldouronates and xylooligosaccharides to fermentable xylose. Paenibacillus sp. strain JDR-2, an aggressively xylanolytic bacterium, secretes a multimodular cell-associated GH10 endoxylanase (XynA1) that catalyzes depolymerization of MeGAXn and rapidly assimilates the principal products, β-1,4-xylobiose, β-1,4-xylotriose, and MeGAX3, the aldotetrauronate 4-O-methylglucuronosyl-α-1,2-xylotriose. Genomic libraries derived from this bacterium have now allowed cloning and sequencing of a unique aldouronate utilization gene cluster comprised of genes encoding signal transduction regulatory proteins, ABC transporter proteins, and the enzymes AguA (GH67 α-glucuronidase), XynA2 (GH10 endoxylanase), and XynB (GH43 β-xylosidase/α-arabinofuranosidase). Expression of these genes, as well as xynA1 encoding the secreted GH10 endoxylanase, is induced by growth on MeGAXn and repressed by glucose. Sequences in the yesN, lplA, and xynA2 genes within the cluster and in the distal xynA1 gene show significant similarity to catabolite responsive element (cre) defined in Bacillus subtilis for recognition of the catabolite control protein (CcpA) and consequential repression of catabolic regulons. The aldouronate utilization gene cluster in Paenibacillus sp. strain JDR-2 operates as a regulon, coregulated with the expression of xynA1, conferring the ability for efficient assimilation and catabolism of the aldouronate product generated by a multimodular cell surface-anchored GH10 endoxylanase. This cluster offers a desirable metabolic potential for bacterial conversion of hemicellulose fractions of hardwood and crop residues to biobased products. PMID:17921311

  12. Cloning and sequencing of a Moraxella bovis pilin gene.

    OpenAIRE

    1985-01-01

    Moraxella bovis pili have been shown to play a major role in both infectivity and protective immunity of bovine infectious keratoconjunctivitis. Sonicated M. bovis DNA from the piliated strain EPP63 was inserted into the vector lambda gt11 with EcoRI linkers. Recombinant phage were screened with an oligonucleotide probe based on the amino-terminal portion of the DNA sequence of a Neisseria gonorrhoeae pilin gene. Two candidate phages produced a protein that comigrated with EPP63 beta pilin in...

  13. The build-up of the red-sequence in galaxy clusters since z~0.8

    CERN Document Server

    De Lucia, G; Aragón-Salamanca, A; Clowe, D; Halliday, C; Jablonka, P; Milvang-Jensen, B; Pellò, R; Poirier, S; Rudnick, G; Saglia, R; Simard, L; White, S D M

    2004-01-01

    We study the rest-frame (U-V) color-magnitude relation in 4 clusters at redshifts 0.7-0.8, drawn from the ESO Distant Cluster Survey. We confirm that red-sequence galaxies in these clusters can be described as an old, passively-evolving population and we demonstrate, by comparison with the Coma cluster, that there has been significant evolution in the stellar mass distribution of red-sequence galaxies since z~0.75. The EDisCS clusters exhibit a deficiency of low luminosity passive red galaxies. Defining as `faint' all galaxies in the passive evolution corrected range 0.4>~ L/L*>~0.1, the luminous-to-faint ratio of red-sequence galaxies varies from 0.34+/-0.06 for the Coma cluster to 0.81+/-0.18 for the high redshift clusters. These results exclude a synchronous formation of all red-sequence galaxies and suggest that a large fraction of the faint red galaxies in current clusters moved on to the red-sequence relatively recently. Their star formation activity presumably came to an end at z<~0.8.

  14. A novel and complete gene cluster involved in the degradation of aniline by Delftia sp.AN3

    Institute of Scientific and Technical Information of China (English)

    ZHANG Tao; ZHANG Jinglei; LIU Shuangjiang; LIU Zhipei

    2008-01-01

    A recombinant strain, Escherichia coli JM109-AN1,was obtained by constructing of a genomic library of the total DNA of Delftia sp.AN3 in E. coli JM109 and screening for catechol 2,3-dioxygenase activity.This recombinant strain could grow on aniline as sole carbou,nitrogen and energy source.Enzymatic assays revealed that the exogenous genes including aniline dioxygenase (AD) and catechol 2,3-dioxygenase (C23O) genes could well express in the recombinant strain with the activities of AD and C23O up to O.31 U/mg wet cell and 1.92 U/mg crude proteins.respectively.The AD or C23O of strain AN3 could only catalyze aniline or catechol but not any other substituted substrates.This recombinant strain contained a recombinant plasmid,pKC505-AN1,in which a 29.7-kb DNA fragment from Delftia sp.AN3 was inserted.Sequencing and open reading frame (orfs) analysis of this 29.7 kb fragment revealed that it contained at least 27 orfs,among them a gene cluster (consisting of at least 16 genes,named danQTAlA2BRDCEFGlHIJKG2) was responsible for the complete metabolism of aniline to TCA-cycle intermediates.This gene cluster could be divided into two main parts,the upper sequences consisted of 7 genes (danQTAlA2BRD) were predicted to encode a multi-component aniline dioxygenase and a LysR-type regulator, and the central genes (danCEFGIHIJKG2) were expected to encode meta-cleavage pathway enzymes for catechol degradation to TCA-cycle intermediates.Unlike clusters tad from Delftia tsuruhatensis AD9 and tdn from Pseudomonas put/da UCC22,in this gene cluster,all the genes were in the Same transcriptional direction.There was only one set of C23O gene (danC) and ferredoxin-like protein gene fdanD).The presence of only one set of these two genes and specificity of AD and C23O might be the reason for strain AN3 could only degrade aniline.The products ofdanQTA1A2BRDC showed 99%-100% identity to those from Delflia acidovorans 7N.and 50%-85% identity to those of tad cluster from D.tsuruhatensis AD9 in

  15. Fragmentation of an aflatoxin-like gene cluster in a forest pathogen

    Science.gov (United States)

    Secondary metabolic pathway genes are typically clustered in fungi. An exception to this paradigm is seen for genes required for the production of dothistromin, an aflatoxin-like virulence factor produced by the pine needle pathogen Dothistroma septosporum. In contrast to the tight clustering of gen...

  16. Full-length minor ampullate spidroin gene sequence.

    Directory of Open Access Journals (Sweden)

    Gefei Chen

    Full Text Available Spider silk includes seven protein based fibers and glue-like substances produced by glands in the spider's abdomen. Minor ampullate silk is used to make the auxiliary spiral of the orb-web and also for wrapping prey, has a high tensile strength and does not supercontract in water. So far, only partial cDNA sequences have been obtained for minor ampullate spidroins (MiSps. Here we describe the first MiSp full-length gene sequence from the spider species Araneus ventricosus, using a multidimensional PCR approach. Comparative analysis of the sequence reveals regulatory elements, as well as unique spidroin gene and protein architecture including the presence of an unusually large intron. The spliced full-length transcript of MiSp gene is 5440 bp in size and encodes 1766 amino acid residues organized into conserved nonrepetitive N- and C-terminal domains and a central predominantly repetitive region composed of four units that are iterated in a non regular manner. The repeats are more conserved within A. ventricosus MiSp than compared to repeats from homologous proteins, and are interrupted by two nonrepetitive spacer regions, which have 100% identity even at the nucleotide level.

  17. antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification

    DEFF Research Database (Denmark)

    Blin, Kai; Wolf, Thomas; Chevrette, Marc G.

    2017-01-01

    Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding......, including prediction of gene cluster boundaries using the ClusterFinder method or the newly integrated CASSIS algorithm, improved substrate specificity prediction for non-ribosomal peptide synthetase adenylation domains based on the new SANDPUMA algorithm, improved predictions for terpene and ribosomally...

  18. The Red Sequence at Birth in the Galaxy Cluster Cl J1449+0856 at z = 2

    Science.gov (United States)

    Strazzullo, V.; Daddi, E.; Gobat, R.; Valentino, F.; Pannella, M.; Dickinson, M.; Renzini, A.; Brammer, G.; Onodera, M.; Finoguenov, A.; Cimatti, A.; Carollo, C. M.; Arimoto, N.

    2016-12-01

    We use Hubble Space Telescope/WFC3 imaging to study the red population in the IR-selected, X-ray detected, low-mass cluster Cl J1449+0856 at z = 2, one of the few bona fide established clusters discovered at this redshift, and likely a typical progenitor of an average massive cluster today. This study explores the presence and significance of an early red sequence in the core of this structure, investigating the nature of red-sequence galaxies, highlighting environmental effects on cluster galaxy populations at high redshift, and at the same time underlining similarities and differences with other distant dense environments. Our results suggest that the red population in the core of Cl J1449+0856 is made of a mixture of quiescent and dusty star-forming galaxies, with a seedling of the future red sequence already growing in the very central cluster region, and already characterizing the inner cluster core with respect to lower-density environments. On the other hand, the color-magnitude diagram of this cluster is definitely different from that of lower-redshift z ≲ 1 clusters, as well as of some rare particularly evolved massive clusters at similar redshift, and it is suggestive of a transition phase between active star formation and passive evolution occurring in the protocluster and established lower-redshift cluster regimes.

  19. [Phylogenetic and Bioinformatics Analysis of Replicase Gene Sequence of Cucumber Green Mottle Mosaic Virus].

    Science.gov (United States)

    Liang, Chaoqiong; Meng, Yan; Luo, Laixin; Liu, Pengfei; Li, Jianqiang

    2015-11-01

    The replicase genes of five isolates of Cucumber green mottle mosaic virus from Jiangsu, Zhejiang, Hunan and Beijing were amplificated, sequenced and analyzed. The similarities of nucleotide acid sequences indicated that 129 kD and 57 kD replicase genes of CGMMV-No. 1, CGMMV-No. 2, CGMMV-No. 3, CGMMV-No. 4 and CGMMV-No. 5 were 99.64% and 99.74%, respectively. The similarities of 129 kD and 57 kD replicase genes of CGMMV-No. 1, CGMMV-No. 3 and CGMMV-No. 4 were 99.95% and 99.94%, while they were lower between CGMMV-No. 2 and the rest of four reference sequences, just from 99.16% to 99.27% and from 99.04% to 99.18%. All reference sequences could be divided into six groups in neighbor-joining (NJ) phylogenetic trees based on the replicase gene sequences of 129 kD, 57 kD protein respectively. CGMMV-No. 1, CGMMV-No. 3 and CGMMV-No. 4 were clustered together with Shandong isolate (Accession No. KJ754195) in two NJ trees; CGMMV-No. 5 was clustered together with Liaoning isolate (Accession No. EF611826) in two NJ trees; CGMMV-No. 2 was clustered together with Korea watermelon isolate (Accession No. AF417242) in phylogenetic tree of 129 kD replicase gene of CGMMV; Interestingly, CGMMV-No. 2 was classified as a independent group in phylogenetic tree of 57 kD replicase gene of CGMMV. There were no significant hydrophobic and highly coiled coil regions on 129 kD and 57 kD proteins of tested CGMMV isolates. Except 129 kD protein of CGMMV-No. 4, the rest were unstable protein. The number of transmembrane helical segments (TMHs) of 129 kD protein of CGMMV-No. 1, CGMMV-No. 2, CGMMV-No. 3 and CGMMV-No. 5 were 6, 6, 2 and 4, respectively, which were 13, 13 and 5 on the 57 kD protein of CGMMV-No. 2, CGMMV-No. 4 and CGMMV-No. 5. The glycosylation site of 129 kD protein of tested CGMMV isolates were 2, 4, 4, 4 and 4, and that of 57 kD protein were 2, 5, 2, 5 and 2. There were difference between the disorders, globulins, phosphorylation sites and B cell antigen epitopes of 129 kD and 57

  20. Increased glycopeptide production after overexpression of shikimate pathway genes being part of the balhimycin biosynthetic gene cluster

    DEFF Research Database (Denmark)

    Thykær, Jette; Nielsen, Jens; Wohlleben, W.

    2010-01-01

    Amycolatopsis balhimycina produces the vancomycin-analogue balhimycin. The strain therefore serves as a model strain for glycopeptide antibiotic production. Previous characterisation of the balhimycin biosynthetic cluster had shown that the border sequences contained both, a putative 3-deoxy...

  1. discussion on validity of rana maoershanensis based on partial sequence of 16s rrna gene

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    rana maoershanensis found in mt.maoershan in guangxi,china was reported as a new species in 2007,but there was no molecular data for this frog.the partial sequences (543 bp) of 16s rrna gene from 12 specimens of 3 brown frog species (rana hanluica,r.maoershanensis and r.chensinensis) were analyzed with 17 specimens of 9 species from genbank.the nucleotide sequence divergence between r.maoershanensis and the other brown frog species were 4.5%-6.5%,with 22-30 nucleotide substitutions at this locus.the phylogenetic relationships based on mp,ml,and bayesian inference indicate that the brown frogs from southern china were diverged into three groups (clades a,b and c).r.maoershanensis was clustered together a well-supported subclade (b-l).it is suggested that r.maoershanensis is a valid species.

  2. Base J represses genes at the end of polycistronic gene clusters in Leishmania major by promoting RNAP II termination.

    Science.gov (United States)

    Reynolds, David L; Hofmeister, Brigitte T; Cliffe, Laura; Siegel, T Nicolai; Anderson, Britta A; Beverley, Stephen M; Schmitz, Robert J; Sabatini, Robert

    2016-08-01

    The genomes of kinetoplastids are organized into polycistronic gene clusters that are flanked by the modified DNA base J. Previous work has established a role of base J in promoting RNA polymerase II termination in Leishmania spp. where the loss of J leads to termination defects and transcription into adjacent gene clusters. It remains unclear whether these termination defects affect gene expression and whether read through transcription is detrimental to cell growth, thus explaining the essential nature of J. We now demonstrate that reduction of base J at specific sites within polycistronic gene clusters in L. major leads to read through transcription and increased expression of downstream genes in the cluster. Interestingly, subsequent transcription into the opposing polycistronic gene cluster does not lead to downregulation of sense mRNAs. These findings indicate a conserved role for J regulating transcription termination and expression of genes within polycistronic gene clusters in trypanosomatids. In contrast to the expectations often attributed to opposing transcription, the essential nature of J in Leishmania spp. is related to its role in gene repression rather than preventing transcriptional interference resulting from read through and dual strand transcription.

  3. Blast fungus-induction and developmental and tissuespecific expression of a rice P450 CYP72A gene cluster

    Institute of Scientific and Technical Information of China (English)

    WANG Yaling; LI Qun; HE Zuhua

    2004-01-01

    Cytochrome P450 gene superfamily is widely involved in diverse processes of plant development and environmental responses including defense response to pathogens. We previously isolated a rice cDNA fragment in a DD-PCR screening for blast fungus-induced genes. In the current study, we isolated a CYP72A gene cluster consisting of 7 P450 CYP72A genes (CYP72A17~23) with the conserved cDNA sequence through the public rice genome data. There are total 14 putative CYP72A members in the rice genome, with high diversity at N-terminal sequences while high homology at C-terminal sequences of those 14 putative proteins. We analyzed expression profiles of the cloned 7 CYP72A genes during pathogen infection and development. The results showed that expression of CYP72A18, 19, 22 and 23 was differentially regulated in the incompatible and compatible interactions between rice and blast fungus. Except CYP72A20, a pseudogene, other 6 CYP72A genes also exhibited temporal and spatial expression patterns, respectively. These findings provide fundamental data for rice P450 gene function analysis.

  4. Co-expression of six tightly clustered odorant receptor genes in the antenna of the malaria mosquito Anopheles gambiae

    Directory of Open Access Journals (Sweden)

    Tim eKarner

    2015-03-01

    Full Text Available The behavior of female malaria mosquitoes, Anopheles gambiae, especially seeking out blood hosts or selecting oviposition sites, highly depends on the detection of relevant odorants by their sense of smell. This is mediated by olfactory sensory neurons (OSNs which express distinct odorant receptor (OR types. In the genome of A. gambiae 76 genes have been annotated to encode putative odorant receptors and the majority of these AgOR genes are arranged in clusters. To assess whether clustered AgOR genes are expressed in a characteristic manner we explored the topographic expression pattern of six tightly adjoined AgOR genes in the female antenna. Whole mount fluorescence in situ hybridization experiments were performed to visualize the olfactory neurons which express a distinct AgOR type in order to determine the number and the distribution of the cells. We found that within the thirteen antennal segments about 75 cells contain mRNA for the four receptor types AgOR13, AgOR15, AgOR17 and AgOR55. Moreover, about half of these cells also transcribe mRNA for the subtypes AgOR16 and AgOR47. Subsequent RT-PCR experiments with primer pairs spanning the coding regions of adjacent AgOR genes revealed the existence of polycistronic mRNA. This result indicates that individual genes were not transcribed but mRNA was comprised of coding sequence from several genes within the studied cluster. Taken together, the data indicate a unique principle for the expression of odorant receptor genes arranged in a large cluster and suggest that the corresponding olfactory neurons are endowed with a distinct set of odorant receptor types.

  5. Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available Gclust Server Cluster based on sequence comparison of homologous proteins of 95 organism species Data detail... Data name Cluster based on sequence comparison of homologous proteins of 95 organism species Description of...e History of This Database Site Policy | Contact Us Cluster based on sequence comparison of homologous proteins of 95 organism species - Gclust Server | LSDB Archive ...

  6. Sequence analysis of a few species of termites (Order: Isoptera) on the basis of partial characterization of COII gene.

    Science.gov (United States)

    Sobti, Ranbir Chander; Kumari, Mamtesh; Sharma, Vijay Lakshmi; Sodhi, Monika; Mukesh, Manishi; Shouche, Yogesh

    2009-11-01

    The present study was aimed to get the nucleotide sequences of a part of COII mitochondrial gene amplified from individuals of five species of Termites (Isoptera: Termitidae: Macrotermitinae). Four of them belonged to the genus Odontotermes (O. obesus, O. horni, O. bhagwatii and Odontotermes sp.) and one to Microtermes (M. obesi). Partial COII gene fragments were amplified by using specific primers. The sequences so obtained were characterized to calculate the frequencies of each nucleotide bases and a high A + T content was observed. The interspecific pairwise sequence divergence in Odontotermes species ranged from 6.5% to 17.1% across COII fragment. M. obesi sequence diversity ranged from 2.5 with Odontotermes sp. to 19.0% with O. bhagwatii. Phylogenetic trees drawn on the basis of distance neighbour-joining method revealed three main clades clustering all the individuals according to their genera and families.

  7. Wind Farm Dynamic Equivalence Based on the Wind Turbine Output Active Power Sequence Clustering

    Directory of Open Access Journals (Sweden)

    Zhang Ge

    2016-01-01

    Full Text Available In order to reduce the complexity of simulation model containing wind farms in the context of keeping the accuracy static, this paper put forward a kind of Dynamic Equivalence method aiming at making output characteristic of the connecting point of wind farm consistent. Based on the output power sequence of wind turbines, geometric template matching algorithm is used to obtain the characteristic of that power sequence and then Attribute Threshold Clustering Algorithm is used to classify wind turbine. In each cluster, the parameter of wind turbine is made equal according to the principle of constant power output character and then be distinguished according to AMPSO. At last, this paper takes a practical wind farm as an example and respectively simulates the conditions of fault of system side and variation of wind speed, which is used in comparing the output characteristic of detailed model and Equivalent model. Results show that the output characteristic of the connecting point of wind farm keeps consistent after equivalent and that the Clustering Algorithm can reflect the operating characteristics of the wind turbine in the whole moment of any time period. It can also be saw that Equivalent method is reasonable and effective, which has certain value in engineering application.

  8. The Rest-Frame Optical Luminosity Function of Cluster Galaxies at z<0.8 and the Assembly of the Cluster Red Sequence

    CERN Document Server

    Rudnick, Gregory; Pello, Roser; Aragon-Salamanca, Alfonso; Marchesini, Danilo; Clowe, Douglas; De Lucia, Gabriella; Halliday, Claire; Jablonka, Pascale; Milvang-Jensen, Bo; Poggianti, Bianca; Saglia, Roberto; Simard, Luc; White, Simon; Zaritsky, Dennis

    2009-01-01

    We present the rest-frame optical luminosity function (LF) of red sequence galaxies in 16 clusters at 0.4Cluster Survey (EDisCS). We compare our clusters to an analogous sample from the Sloan Digital Sky Survey (SDSS) and match the EDisCS clusters to their most likely descendants. We measure all LFs down to M M* + (2.5 - 3.5). At z<0.8, the bright end of the LF is consistent with passive evolution but there is a significant build-up of the faint end of the red sequence towards lower redshift. There is a weak dependence of the LF on cluster velocity dispersion for EDisCS but no such dependence for the SDSS clusters. We find tentative evidence that red sequence galaxies brighter than a threshold magnitude are already in place, and that this threshold evolves to fainter magnitudes toward lower redshifts. We compare the EDisCS LFs with the LF of co-eval red sequence galaxies in the field and find that the bright end of the LFs agree. However, relative to the number of br...

  9. A novel snoRNA gene cluster in yeast is transcribed as polycistronic pre-snoRNAs

    Institute of Scientific and Technical Information of China (English)

    陆勇军; 周惠; 周惟欣; 朱远琪; 屈良鹄

    1999-01-01

    Small nueleolar RNAs (snoRNAs) play an important role in eukaryotic rRNA biogenesis. By combination of a computer search of EMBL database and experimental procedure, a novel snoRNA coding sequence (Z8) was screened out and characterized from yeast Saccharomyces cerevisiae genome. Z8 snoRNA gene codes a boxC/D antisonse snoRNA which guides, deduced from structure analysis, the 2’-O-ribose methylation at U2421 of 25S rRNA. After disruption of Z8 snoRNA gene, the methylation at corresponding site was abolished, but no growth delay was observed in various cultural temperatures. Z8 DNA is the first gene of a gene cluster consisting of three cognate snoRNA genes which are located on an intergenie region of chromosome ⅩⅢ. This gene cluster is co-transcribed as a pelycistronic precursor from a+247 bp U snoRNA gene promoter, followed by processing to release individual snoRNAs, representing a new expression pattern of snoRNA genes.

  10. Expressed sequence tag analysis of functional genes associated with adventitious rooting in Liriodendron hybrids.

    Science.gov (United States)

    Zhong, Y D; Sun, X Y; Liu, E Y; Li, Y Q; Gao, Z; Yu, F X

    2016-06-24

    Liriodendron hybrids (Liriodendron chinense x L. tulipifera) are important landscaping and afforestation hardwood trees. To date, little genomic research on adventitious rooting has been reported in these hybrids, as well as in the genus Liriodendron. In the present study, we used adventitious roots to construct the first cDNA library for Liriodendron hybrids. A total of 5176 expressed sequence tags (ESTs) were generated and clustered into 2921 unigenes. Among these unigenes, 2547 had significant homology to the non-redundant protein database representing a wide variety of putative functions. Homologs of these genes regulated many aspects of adventitious rooting, including those for auxin signal transduction and root hair development. Results of quantitative real-time polymerase chain reaction showed that AUX1, IRE, and FB1 were highly expressed in adventitious roots and the expression of AUX1, ARF1, NAC1, RHD1, and IRE increased during the development of adventitious roots. Additionally, 181 simple sequence repeats were identified from 166 ESTs and more than 91.16% of these were dinucleotide and trinucleotide repeats. To the best of our knowledge, the present study reports the identification of the genes associated with adventitious rooting in the genus Liriodendron for the first time and provides a valuable resource for future genomic studies. Expression analysis of selected genes could allow us to identify regulatory genes that may be essential for adventitious rooting.

  11. In silico phylogenetic and virulence gene profile analyses of avian pathogenic Escherichia coli genome sequences

    Directory of Open Access Journals (Sweden)

    Thaís C.G. Rojas

    2014-02-01

    Full Text Available Avian pathogenic Escherichia coli (APEC infections are responsible for significant losses in the poultry industry worldwide. A zoonotic risk has been attributed to APEC strains because they present similarities to extraintestinal pathogenic E. coli (ExPEC associated with illness in humans, mainly urinary tract infections and neonatal meningitis. Here, we present in silico analyses with pathogenic E. coli genome sequences, including recently available APEC genomes. The phylogenetic tree, based on multi-locus sequence typing (MLST of seven housekeeping genes, revealed high diversity in the allelic composition. Nevertheless, despite this diversity, the phylogenetic tree was able to cluster the different pathotypes together. An in silico virulence gene profile was also determined for each of these strains, through the presence or absence of 83 well-known virulence genes/traits described in pathogenic E. coli strains. The MLST phylogeny and the virulence gene profiles demonstrated a certain genetic similarity between Brazilian APEC strains, APEC isolated in the United States, UPEC (uropathogenic E. coli and diarrheagenic strains isolated from humans. This correlation corroborates and reinforces the zoonotic potential hypothesis proposed to APEC.

  12. Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome

    Directory of Open Access Journals (Sweden)

    Dougan Gordon

    2009-12-01

    Full Text Available Abstract Background Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region. Results The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI, and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC to establish a standardized naming scheme for alpha-defensins. Conclusions Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene

  13. Nonlinear biosynthetic gene cluster dose effect on penicillin production by Penicillium chrysogenum.

    Science.gov (United States)

    Nijland, Jeroen G; Ebbendorf, Bjorg; Woszczynska, Marta; Boer, Rémon; Bovenberg, Roel A L; Driessen, Arnold J M

    2010-11-01

    Industrial penicillin production levels by the filamentous fungus Penicillium chrysogenum increased dramatically by classical strain improvement. High-yielding strains contain multiple copie