WorldWideScience

Sample records for based gene discovery

  1. Speeding disease gene discovery by sequence based candidate prioritization

    Directory of Open Access Journals (Sweden)

    Porteous David J

    2005-03-01

    Full Text Available Abstract Background Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation to knowledge of the disease or phenotype in question. However, here we show that disease genes share patterns of sequence-based features that can provide a good basis for automatic prioritization of candidates by machine learning. Results We examined a variety of sequence-based features and found that for many of them there are significant differences between the sets of genes known to be involved in human hereditary disease and those not known to be involved in disease. We have created an automatic classifier called PROSPECTR based on those features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease. On average, PROSPECTR enriches lists for disease genes two-fold 77% of the time, five-fold 37% of the time and twenty-fold 11% of the time. Conclusion PROSPECTR is a simple and effective way to identify genes involved in Mendelian and oligogenic disorders. It performs markedly better than the single existing sequence-based classifier on novel data. PROSPECTR could save investigators looking at large regions of interest time and effort by prioritizing positional candidate genes for mutation detection and case-control association studies.

  2. Gene set-based module discovery in the breast cancer transcriptome

    Directory of Open Access Journals (Sweden)

    Zhang Michael Q

    2009-02-01

    Full Text Available Abstract Background Although microarray-based studies have revealed global view of gene expression in cancer cells, we still have little knowledge about regulatory mechanisms underlying the transcriptome. Several computational methods applied to yeast data have recently succeeded in identifying expression modules, which is defined as co-expressed gene sets under common regulatory mechanisms. However, such module discovery methods are not applied cancer transcriptome data. Results In order to decode oncogenic regulatory programs in cancer cells, we developed a novel module discovery method termed EEM by extending a previously reported module discovery method, and applied it to breast cancer expression data. Starting from seed gene sets prepared based on cis-regulatory elements, ChIP-chip data, and gene locus information, EEM identified 10 principal expression modules in breast cancer based on their expression coherence. Moreover, EEM depicted their activity profiles, which predict regulatory programs in each subtypes of breast tumors. For example, our analysis revealed that the expression module regulated by the Polycomb repressive complex 2 (PRC2 is downregulated in triple negative breast cancers, suggesting similarity of transcriptional programs between stem cells and aggressive breast cancer cells. We also found that the activity of the PRC2 expression module is negatively correlated to the expression of EZH2, a component of PRC2 which belongs to the E2F expression module. E2F-driven EZH2 overexpression may be responsible for the repression of the PRC2 expression modules in triple negative tumors. Furthermore, our network analysis predicts regulatory circuits in breast cancer cells. Conclusion These results demonstrate that the gene set-based module discovery approach is a powerful tool to decode regulatory programs in cancer cells.

  3. Weighted gene co-expression based biomarker discovery for psoriasis detection.

    Science.gov (United States)

    Sundarrajan, Sudharsana; Arumugam, Mohanapriya

    2016-11-15

    Psoriasis is a chronic inflammatory disease of the skin with an unknown aetiology. The disease manifests itself as red and silvery scaly plaques distributed over the scalp, lower back and extensor aspects of the limbs. After receiving scant consideration for quite a few years, psoriasis has now become a prominent focus for new drug development. A group of closely connected and differentially co-expressed genes may act in a network and may serve as molecular signatures for an underlying phenotype. A weighted gene coexpression network analysis (WGCNA), a system biology approach has been utilized for identification of new molecular targets for psoriasis. Gene coexpression relationships were investigated in 58 psoriatic lesional samples resulting in five gene modules, clustered based on the gene coexpression patterns. The coexpression pattern was validated using three psoriatic datasets. 10 highly connected and informative genes from each module was selected and termed as psoriasis specific hub signatures. A random forest based binary classifier built using the expression profiles of signature genes robustly distinguished psoriatic samples from the normal samples in the validation set with an accuracy of 0.95 to 1. These signature genes may serve as potential candidates for biomarker discovery leading to new therapeutic targets. WGCNA, the network based approach has provided an alternative path to mine out key controllers and drivers of psoriasis. The study principle from the current work can be extended to other pathological conditions.

  4. A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

    Directory of Open Access Journals (Sweden)

    Borui Pi

    Full Text Available Secondary metabolites (SMs produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.

  5. Sleeping Beauty transposon insertional mutagenesis based mouse models for cancer gene discovery

    Science.gov (United States)

    Moriarity, Branden S; Largaespada, David A

    2016-01-01

    Large-scale genomic efforts to study human cancer, such as the cancer gene atlas (TCGA), have identified numerous cancer drivers in a wide variety of tumor types. However, there are limitations to this approach, the mutations and expression or copy number changes that are identified are not always clearly functionally relevant, and only annotated genes and genetic elements are thoroughly queried. The use of complimentary, nonbiased, functional approaches to identify drivers of cancer development and progression is ideal to maximize the rate at which cancer discoveries are achieved. One such approach that has been successful is the use of the Sleeping Beauty (SB) transposon-based mutagenesis system in mice. This system uses a conditionally expressed transposase and mutagenic transposon allele to target mutagenesis to somatic cells of a given tissue in mice to cause random mutations leading to tumor development. Analysis of tumors for transposon common insertion sites (CIS) identifies candidate cancer genes specific to that tumor type. While similar screens have been performed in mice with the PiggyBac (PB) transposon and viral approaches, we limit extensive discussion to SB. Here we discuss the basic structure of these screens, screens that have been performed, methods used to identify CIS. PMID:26051241

  6. Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling

    Directory of Open Access Journals (Sweden)

    Guo Zheng

    2006-01-01

    Full Text Available Abstract Background It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that propel and characterize the progression of versatile life phenomena, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. The vast amount of large-scale and genome-wide time-resolved data is becoming increasing available, which provides the golden opportunity to unravel the challenging reverse-engineering problem of time-delayed gene regulatory networks. Results In particular, this methodological paper aims to reconstruct regulatory networks from temporal gene expression data by using delayed correlations between genes, i.e., pairwise overlaps of expression levels shifted in time relative each other. We have thus developed a novel model-free computational toolbox termed TdGRN (Time-delayed Gene Regulatory Network to address the underlying regulations of genes that can span any unit(s of time intervals. This bioinformatics toolbox has provided a unified approach to uncovering time trends of gene regulations through decision analysis of the newly designed time-delayed gene expression matrix. We have applied the proposed method to yeast cell cycling and human HeLa cell cycling and have discovered most of the underlying time-delayed regulations that are supported by multiple lines of experimental evidence and that are remarkably consistent with the current knowledge on phase characteristics for the cell cyclings. Conclusion We established a usable and powerful model-free approach to dissecting high-order dynamic trends of gene-gene interactions. We have carefully validated the proposed algorithm by applying it to two publicly available cell cycling datasets. In addition to uncovering the time trends of gene regulations for cell cycling, this unified approach can also be used to study the complex

  7. Seed-based systematic discovery of specific transcription factor target genes.

    Science.gov (United States)

    Mrowka, Ralf; Blüthgen, Nils; Fähling, Michael

    2008-06-01

    Reliable prediction of specific transcription factor target genes is a major challenge in systems biology and functional genomics. Current sequence-based methods yield many false predictions, due to the short and degenerated DNA-binding motifs. Here, we describe a new systematic genome-wide approach, the seed-distribution-distance method, that searches large-scale genome-wide expression data for genes that are similarly expressed as known targets. This method is used to identify genes that are likely targets, allowing sequence-based methods to focus on a subset of genes, giving rise to fewer false-positive predictions. We show by cross-validation that this method is robust in recovering specific target genes. Furthermore, this method identifies genes with typical functions and binding motifs of the seed. The method is illustrated by predicting novel targets of the transcription factor nuclear factor kappaB (NF-kappaB). Among the new targets is optineurin, which plays a key role in the pathogenesis of acquired blindness caused by adult-onset primary open-angle glaucoma. We show experimentally that the optineurin gene and other predicted genes are targets of NF-kappaB. Thus, our data provide a missing link in the signalling of NF-kappaB and the damping function of optineurin in signalling feedback of NF-kappaB. We present a robust and reliable method to enhance the genome-wide prediction of specific transcription factor target genes that exploits the vast amount of expression information available in public databases today. PMID:18485006

  8. Independent Gene Discovery and Testing

    Science.gov (United States)

    Palsule, Vrushalee; Coric, Dijana; Delancy, Russell; Dunham, Heather; Melancon, Caleb; Thompson, Dennis; Toms, Jamie; White, Ashley; Shultz, Jeffry

    2010-01-01

    A clear understanding of basic gene structure is critical when teaching molecular genetics, the central dogma and the biological sciences. We sought to create a gene-based teaching project to improve students' understanding of gene structure and to integrate this into a research project that can be implemented by instructors at the secondary level…

  9. SSHscreen and SSHdb, generic software for microarray based gene discovery: application to the stress response in cowpea

    Directory of Open Access Journals (Sweden)

    Oelofse Dean

    2010-04-01

    Full Text Available Abstract Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L. Walp. We aimed to use this method to enrich for genes expressed during drought stress in a drought tolerant cowpea line. However, current methods were inefficient in screening libraries and management of the sequence data, and thus there was a need to develop software tools to facilitate the process. Results Forward and reverse cDNA libraries enriched for cowpea drought response genes were screened on microarrays, and the R software package SSHscreen 2.0.1 was developed (i to normalize the data effectively using spike-in control spot normalization, and (ii to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self-BLAST function within SSHdb grouped

  10. Genome-wide SNP discovery and linkage analysis in barley based on genes responsive to abiotic stress.

    Science.gov (United States)

    Rostoks, Nils; Mudie, Sharon; Cardle, Linda; Russell, Joanne; Ramsay, Luke; Booth, Allan; Svensson, Jan T; Wanamaker, Steve I; Walia, Harkamal; Rodriguez, Edmundo M; Hedley, Peter E; Liu, Hui; Morris, Jenny; Close, Timothy J; Marshall, David F; Waugh, Robbie

    2005-12-01

    More than 2,000 genome-wide barley single nucleotide polymorphisms (SNPs) were developed by resequencing unigene fragments from eight diverse accessions. The average genome-wide SNP frequency observed in 877 unigenes was 1 SNP per 200 bp. However, SNP frequency was highly variable with the least number of SNP and SNP haplotypes observed within European cultivated germplasm reflecting effects of breeding history on genetic diversity. More than 300 SNP loci were mapped genetically in three experimental mapping populations which allowed the construction of an integrated SNP map incorporating a large number of RFLP, AFLP and SSR markers (1,237 loci in total). The genes used for SNP discovery were selected based on their transcriptional response to a variety of abiotic stresses. A set of known barley abiotic stress QTL was positioned on the linkage map, while the available sequence and gene expression information facilitated the identification of genes potentially associated with these traits. Comparison of the sequenced SNP loci to the rice genome sequence identified several regions of highly conserved gene order providing a framework for marker saturation in barley genomic regions of interest. The integration of genome-wide SNP and expression data with available genetic and phenotypic information will facilitate the identification of gene function in barley and other non-model organisms. PMID:16244872

  11. High-Throughput, Motility-Based Sorter for Microswimmers and Gene Discovery Platform

    Science.gov (United States)

    Yuan, Jinzhou; Raizen, David; Bau, Haim

    2015-11-01

    Animal motility varies with genotype, disease progression, aging, and environmental conditions. In many studies, it is desirable to carry out high throughput motility-based sorting to isolate rare animals for, among other things, forward genetic screens to identify genetic pathways that regulate phenotypes of interest. Many commonly used screening processes are labor-intensive, lack sensitivity, and require extensive investigator training. Here, we describe a sensitive, high throughput, automated, motility-based method for sorting nematodes. Our method was implemented in a simple microfluidic device capable of sorting many thousands of animals per hour per module, and is amenable to parallelism. The device successfully enriched for known C. elegans motility mutants. Furthermore, using this device, we isolated low-abundance mutants capable of suppressing the somnogenic effects of the flp-13 gene, which regulates sleep-like quiescence in C. elegans. Subsequent genomic sequencing led to the identification of a flp-13-suppressor gene. This research was supported, in part, by NIH NIA Grant 5R03AG042690-02.

  12. Genomics-Based Discovery of Plant Genes for Synthetic Biology of Terpenoid Fragrances: A Case Study in Sandalwood oil Biosynthesis.

    Science.gov (United States)

    Celedon, J M; Bohlmann, J

    2016-01-01

    Terpenoid fragrances are powerful mediators of ecological interactions in nature and have a long history of traditional and modern industrial applications. Plants produce a great diversity of fragrant terpenoid metabolites, which make them a superb source of biosynthetic genes and enzymes. Advances in fragrance gene discovery have enabled new approaches in synthetic biology of high-value speciality molecules toward applications in the fragrance and flavor, food and beverage, cosmetics, and other industries. Rapid developments in transcriptome and genome sequencing of nonmodel plant species have accelerated the discovery of fragrance biosynthetic pathways. In parallel, advances in metabolic engineering of microbial and plant systems have established platforms for synthetic biology applications of some of the thousands of plant genes that underlie fragrance diversity. While many fragrance molecules (eg, simple monoterpenes) are abundant in readily renewable plant materials, some highly valuable fragrant terpenoids (eg, santalols, ambroxides) are rare in nature and interesting targets for synthetic biology. As a representative example for genomics/transcriptomics enabled gene and enzyme discovery, we describe a strategy used successfully for elucidation of a complete fragrance biosynthetic pathway in sandalwood (Santalum album) and its reconstruction in yeast (Saccharomyces cerevisiae). We address questions related to the discovery of specific genes within large gene families and recovery of rare gene transcripts that are selectively expressed in recalcitrant tissues. To substantiate the validity of the approaches, we describe the combination of methods used in the gene and enzyme discovery of a cytochrome P450 in the fragrant heartwood of tropical sandalwood, responsible for the fragrance defining, final step in the biosynthesis of (Z)-santalols. PMID:27480682

  13. An ensemble method for gene discovery based on DNA microarray data

    Institute of Scientific and Technical Information of China (English)

    2004-01-01

    The advent of DNA microarray technology has offered the promise of casting new insights onto deciphering secrets of life by monitoring activities of thousands of genes simultaneously.Current analyses of microarray data focus on precise classification of biological types,for example,tumor versus normal tissues.A further scientific challenging task is to extract disease-relevant genes from the bewildering amounts of raw data,which is one of the most critical themes in the post-genomic era,but it is generally ignored due to lack of an efficient approach.In this paper,we present a novel ensemble method for gene extraction that can be tailored to fulfill multiple biological tasks including(i)precise classification of biological types;(ii)disease gene mining; and(iii)target-driven gene networking.We also give a numerical application for(i)and(ii)using a public microarrary data set and set aside a separate paper to address(iii).

  14. Discovery of molecular associations among aging, stem cells, and cancer based on gene expression profiling

    Institute of Scientific and Technical Information of China (English)

    Xiaosheng Wang

    2013-01-01

    The emergence of a huge volume of "omics" data enables a computational approach to the investigation of the biology of cancer.The cancer informatics approach is a useful supplement to the traditional experimental approach.I reviewed several reports that used a bioinformatics approach to analyze the associations among aging,stem cells,and cancer by microarray gene expression profiling.The high expression of aging-or human embryonic stem cell-related molecules in cancer suggests that certain important mechanisms are commonly underlying aging,stem cells,and cancer.These mechanisms are involved in cell cycle regulation,metabolic process,DNA damage response,apoptosis,p53 signaling pathway,immune/inflammatory response,and other processes,suggesting that cancer is a developmental and evolutional disease that is strongly related to aging.Moreover,these mechanisms demonstrate that the initiation,proliferation,and metastasis of cancer are associated with the deregulation of stem cells.These findings provide insights into the biology of cancer.Certainly,the findings that are obtained by the informatics approach should be justified by experimental validation.This review also noted that next-generation sequencing data provide enriched sources for cancer informatics study.

  15. Discovery of molecular associations among aging, stem cells, and cancer based on gene expression profiling.

    Science.gov (United States)

    Wang, Xiaosheng

    2013-04-01

    The emergence of a huge volume of "omics" data enables a computational approach to the investigation of the biology of cancer. The cancer informatics approach is a useful supplement to the traditional experimental approach. I reviewed several reports that used a bioinformatics approach to analyze the associations among aging, stem cells, and cancer by microarray gene expression profiling. The high expression of aging- or human embryonic stem cell-related molecules in cancer suggests that certain important mechanisms are commonly underlying aging, stem cells, and cancer. These mechanisms are involved in cell cycle regulation, metabolic process, DNA damage response, apoptosis, p53 signaling pathway, immune/inflammatory response, and other processes, suggesting that cancer is a developmental and evolutional disease that is strongly related to aging. Moreover, these mechanisms demonstrate that the initiation, proliferation, and metastasis of cancer are associated with the deregulation of stem cells. These findings provide insights into the biology of cancer. Certainly, the findings that are obtained by the informatics approach should be justified by experimental validation. This review also noted that next-generation sequencing data provide enriched sources for cancer informatics study.

  16. Gene discovery in Triatoma infestans

    Directory of Open Access Journals (Sweden)

    de Burgos Nelia

    2011-03-01

    Full Text Available Abstract Background Triatoma infestans is the most relevant vector of Chagas disease in the southern cone of South America. Since its genome has not yet been studied, sequencing of Expressed Sequence Tags (ESTs is one of the most powerful tools for efficiently identifying large numbers of expressed genes in this insect vector. Results In this work, we generated 826 ESTs, resulting in an increase of 47% in the number of ESTs available for T. infestans. These ESTs were assembled in 471 unique sequences, 151 of which represent 136 new genes for the Reduviidae family. Conclusions Among the putative new genes for the Reduviidae family, we identified and described an interesting subset of genes involved in development and reproduction, which constitute potential targets for insecticide development.

  17. Gene discovery and molecular marker development, based on high-throughput transcript sequencing of Paspalum dilatatum Poir.

    Directory of Open Access Journals (Sweden)

    Andrea Giordano

    Full Text Available BACKGROUND: Paspalum dilatatum Poir. (common name dallisgrass is a native grass species of South America, with special relevance to dairy and red meat production. P. dilatatum exhibits higher forage quality than other C4 forage grasses and is tolerant to frost and water stress. This species is predominantly cultivated in an apomictic monoculture, with an inherent high risk that biotic and abiotic stresses could potentially devastate productivity. Therefore, advanced breeding strategies that characterise and use available genetic diversity, or assess germplasm collections effectively are required to deliver advanced cultivars for production systems. However, there are limited genomic resources available for this forage grass species. RESULTS: Transcriptome sequencing using second-generation sequencing platforms has been employed using pooled RNA from different tissues (stems, roots, leaves and inflorescences at the final reproductive stage of P. dilatatum cultivar Primo. A total of 324,695 sequence reads were obtained, corresponding to c. 102 Mbp. The sequences were assembled, generating 20,169 contigs of a combined length of 9,336,138 nucleotides. The contigs were BLAST analysed against the fully sequenced grass species of Oryza sativa subsp. japonica, Brachypodium distachyon, the closely related Sorghum bicolor and foxtail millet (Setaria italica genomes as well as against the UniRef 90 protein database allowing a comprehensive gene ontology analysis to be performed. The contigs generated from the transcript sequencing were also analysed for the presence of simple sequence repeats (SSRs. A total of 2,339 SSR motifs were identified within 1,989 contigs and corresponding primer pairs were designed. Empirical validation of a cohort of 96 SSRs was performed, with 34% being polymorphic between sexual and apomictic biotypes. CONCLUSIONS: The development of genetic and genomic resources for P. dilatatum will contribute to gene discovery and expression

  18. Discovery and identification of candidate sex-related genes based on transcriptome sequencing of Russian sturgeon (Acipenser gueldenstaedtii) gonads.

    Science.gov (United States)

    Chen, Yadong; Xia, Yongtao; Shao, Changwei; Han, Lei; Chen, Xuejie; Yu, Mengjun; Sha, Zhenxia

    2016-07-01

    As the Russian sturgeon (Acipenser gueldenstaedtii) is an important food and is the main source of caviar, it is necessary to discover the genes associated with its sex differentiation. However, the complicated life and maturity cycles of the Russian sturgeon restrict the accurate identification of sex in early development. To generate a first look at specific sex-related genes, we sequenced the transcriptome of gonads in different development stages (1, 2, and 5 yr old stages) with next-generation RNA sequencing. We generated >60 million raw reads, and the filtered reads were assembled into 263,341 contigs, which produced 38,505 unigenes. Genes involved in signal transduction mechanisms were the most abundant, suggesting that development of sturgeon gonads is under control of signal transduction mechanisms. Differentially expressed gene analysis suggests that more genes for protein synthesis, cytochrome c oxidase subunits, and ribosomal proteins were expressed in female gonads than in male. Meanwhile, male gonads expressed more transposable element transposase, reverse transcriptase, and transposase-related genes than female. In total, 342, 782, and 7,845 genes were detected in intersex, male, and female transcriptomes, respectively. The female gonad expressed more genes than the male gonad, and more genes were involved in female gonadal development. Genes (sox9, foxl2) are differentially expressed in different sexes and may be important sex-related genes in Russian sturgeon. Sox9 genes are responsible for the development of male gonads and foxl2 for female gonads.

  19. Maximizing biomarker discovery by minimizing gene signatures

    Directory of Open Access Journals (Sweden)

    Chang Chang

    2011-12-01

    Full Text Available Abstract Background The use of gene signatures can potentially be of considerable value in the field of clinical diagnosis. However, gene signatures defined with different methods can be quite various even when applied the same disease and the same endpoint. Previous studies have shown that the correct selection of subsets of genes from microarray data is key for the accurate classification of disease phenotypes, and a number of methods have been proposed for the purpose. However, these methods refine the subsets by only considering each single feature, and they do not confirm the association between the genes identified in each gene signature and the phenotype of the disease. We proposed an innovative new method termed Minimize Feature's Size (MFS based on multiple level similarity analyses and association between the genes and disease for breast cancer endpoints by comparing classifier models generated from the second phase of MicroArray Quality Control (MAQC-II, trying to develop effective meta-analysis strategies to transform the MAQC-II signatures into a robust and reliable set of biomarker for clinical applications. Results We analyzed the similarity of the multiple gene signatures in an endpoint and between the two endpoints of breast cancer at probe and gene levels, the results indicate that disease-related genes can be preferably selected as the components of gene signature, and that the gene signatures for the two endpoints could be interchangeable. The minimized signatures were built at probe level by using MFS for each endpoint. By applying the approach, we generated a much smaller set of gene signature with the similar predictive power compared with those gene signatures from MAQC-II. Conclusions Our results indicate that gene signatures of both large and small sizes could perform equally well in clinical applications. Besides, consistency and biological significances can be detected among different gene signatures, reflecting the

  20. Antibiotic resistance gene discovery in food-producing animals.

    Science.gov (United States)

    Allen, Heather K

    2014-06-01

    Numerous environmental reservoirs contribute to the widespread antibiotic resistance problem in human pathogens. One environmental reservoir of particular importance is the intestinal bacteria of food-producing animals. In this review I examine recent discoveries of antibiotic resistance genes in agricultural animals. Two types of antibiotic resistance gene discoveries will be discussed: the use of classic microbiological and molecular techniques, such as culturing and PCR, to identify known genes not previously reported in animals; and the application of high-throughput technologies, such as metagenomics, to identify novel genes and gene transfer mechanisms. These discoveries confirm that antibiotics should be limited to prudent uses.

  1. Species-independent MicroRNA Gene Discovery

    KAUST Repository

    Kamanu, Timothy K.

    2012-12-01

    MicroRNA (miRNA) are a class of small endogenous non-coding RNA that are mainly negative transcriptional and post-transcriptional regulators in both plants and animals. Recent studies have shown that miRNA are involved in different types of cancer and other incurable diseases such as autism and Alzheimer’s. Functional miRNAs are excised from hairpin-like sequences that are known as miRNA genes. There are about 21,000 known miRNA genes, most of which have been determined using experimental methods. miRNA genes are classified into different groups (miRNA families). This study reports about 19,000 unknown miRNA genes in nine species whereby approximately 15,300 predictions were computationally validated to contain at least one experimentally verified functional miRNA product. The predictions are based on a novel computational strategy which relies on miRNA family groupings and exploits the physics and geometry of miRNA genes to unveil the hidden palindromic signals and symmetries in miRNA gene sequences. Unlike conventional computational miRNA gene discovery methods, the algorithm developed here is species-independent: it allows prediction at higher accuracy and resolution from arbitrary RNA/DNA sequences in any species and thus enables examination of repeat-prone genomic regions which are thought to be non-informative or ’junk’ sequences. The information non-redundancy of uni-directional RNA sequences compared to information redundancy of bi-directional DNA is demonstrated, a fact that is overlooked by most pattern discovery algorithms. A novel method for computing upstream and downstream miRNA gene boundaries based on mathematical/statistical functions is suggested, as well as cutoffs for annotation of miRNA genes in different miRNA families. Another tool is proposed to allow hypotheses generation and visualization of data matrices, intra- and inter-species chromosomal distribution of miRNA genes or miRNA families. Our results indicate that: miRNA and mi

  2. Genome-enabled Discovery of Carbon Sequestration Genes

    Energy Technology Data Exchange (ETDEWEB)

    Tuskan, Gerald A [ORNL; Tschaplinski, Timothy J [ORNL; Kalluri, Udaya C [ORNL; Yin, Tongming [ORNL; Yang, Xiaohan [ORNL; Zhang, Xinye [ORNL; Engle, Nancy L [ORNL; Ranjan, Priya [ORNL; Basu, Manojit M [ORNL; Gunter, Lee E [ORNL; Jawdy, Sara [ORNL; Martin, Madhavi Z [ORNL; Campbell, Alina S [ORNL; DiFazio, Stephen P [ORNL; Davis, John M [University of Florida; Hinchee, Maud [ORNL; Pinnacchio, Christa [U.S. Department of Energy, Joint Genome Institute; Meilan, R [Purdue University; Busov, V. [Michigan Technological University; Strauss, S [Oregon State University

    2009-01-01

    The fate of carbon below ground is likely to be a major factor determining the success of carbon sequestration strategies involving plants. Despite their importance, molecular processes controlling belowground C allocation and partitioning are poorly understood. This project is leveraging the Populus trichocarpa genome sequence to discover genes important to C sequestration in plants and soils. The focus is on the identification of genes that provide key control points for the flow and chemical transformations of carbon in roots, concentrating on genes that control the synthesis of chemical forms of carbon that result in slower turnover rates of soil organic matter (i.e., increased recalcitrance). We propose to enhance carbon allocation and partitioning to roots by 1) modifying the auxin signaling pathway, and the invertase family, which controls sucrose metabolism, and by 2) increasing root proliferation through transgenesis with genes known to control fine root proliferation (e.g., ANT), 3) increasing the production of recalcitrant C metabolites by identifying genes controlling secondary C metabolism by a major mQTL-based gene discovery effort, and 4) increasing aboveground productivity by enhancing drought tolerance to achieve maximum C sequestration. This broad, integrated approach is aimed at ultimately enhancing root biomass as well as root detritus longevity, providing the best prospects for significant enhancement of belowground C sequestration.

  3. Characterization of Capsicum annuum genetic diversity and population structure based on parallel polymorphism discovery with a 30K unigene Pepper GeneChip.

    Directory of Open Access Journals (Sweden)

    Theresa A Hill

    Full Text Available The widely cultivated pepper, Capsicum spp., important as a vegetable and spice crop world-wide, is one of the most diverse crops. To enhance breeding programs, a detailed characterization of Capsicum diversity including morphological, geographical and molecular data is required. Currently, molecular data characterizing Capsicum genetic diversity is limited. The development and application of high-throughput genome-wide markers in Capsicum will facilitate more detailed molecular characterization of germplasm collections, genetic relationships, and the generation of ultra-high density maps. We have developed the Pepper GeneChip® array from Affymetrix for polymorphism detection and expression analysis in Capsicum. Probes on the array were designed from 30,815 unigenes assembled from expressed sequence tags (ESTs. Our array design provides a maximum redundancy of 13 probes per base pair position allowing integration of multiple hybridization values per position to detect single position polymorphism (SPP. Hybridization of genomic DNA from 40 diverse C. annuum lines, used in breeding and research programs, and a representative from three additional cultivated species (C. frutescens, C. chinense and C. pubescens detected 33,401 SPP markers within 13,323 unigenes. Among the C. annuum lines, 6,426 SPPs covering 3,818 unigenes were identified. An estimated three-fold reduction in diversity was detected in non-pungent compared with pungent lines, however, we were able to detect 251 highly informative markers across these C. annuum lines. In addition, an 8.7 cM region without polymorphism was detected around Pun1 in non-pungent C. annuum. An analysis of genetic relatedness and diversity using the software Structure revealed clustering of the germplasm which was confirmed with statistical support by principle components analysis (PCA and phylogenetic analysis. This research demonstrates the effectiveness of parallel high-throughput discovery and

  4. Characterization of Capsicum annuum genetic diversity and population structure based on parallel polymorphism discovery with a 30K unigene Pepper GeneChip.

    Science.gov (United States)

    Hill, Theresa A; Ashrafi, Hamid; Reyes-Chin-Wo, Sebastian; Yao, JiQiang; Stoffel, Kevin; Truco, Maria-Jose; Kozik, Alexander; Michelmore, Richard W; Van Deynze, Allen

    2013-01-01

    The widely cultivated pepper, Capsicum spp., important as a vegetable and spice crop world-wide, is one of the most diverse crops. To enhance breeding programs, a detailed characterization of Capsicum diversity including morphological, geographical and molecular data is required. Currently, molecular data characterizing Capsicum genetic diversity is limited. The development and application of high-throughput genome-wide markers in Capsicum will facilitate more detailed molecular characterization of germplasm collections, genetic relationships, and the generation of ultra-high density maps. We have developed the Pepper GeneChip® array from Affymetrix for polymorphism detection and expression analysis in Capsicum. Probes on the array were designed from 30,815 unigenes assembled from expressed sequence tags (ESTs). Our array design provides a maximum redundancy of 13 probes per base pair position allowing integration of multiple hybridization values per position to detect single position polymorphism (SPP). Hybridization of genomic DNA from 40 diverse C. annuum lines, used in breeding and research programs, and a representative from three additional cultivated species (C. frutescens, C. chinense and C. pubescens) detected 33,401 SPP markers within 13,323 unigenes. Among the C. annuum lines, 6,426 SPPs covering 3,818 unigenes were identified. An estimated three-fold reduction in diversity was detected in non-pungent compared with pungent lines, however, we were able to detect 251 highly informative markers across these C. annuum lines. In addition, an 8.7 cM region without polymorphism was detected around Pun1 in non-pungent C. annuum. An analysis of genetic relatedness and diversity using the software Structure revealed clustering of the germplasm which was confirmed with statistical support by principle components analysis (PCA) and phylogenetic analysis. This research demonstrates the effectiveness of parallel high-throughput discovery and application of genome

  5. DNA Coding Based Knowledge Discovery Algorithm

    Institute of Scientific and Technical Information of China (English)

    LI Ji-yun; GENG Zhao-feng; SHAO Shi-huang

    2002-01-01

    A novel DNA coding based knowledge discovery algorithm was proposed, an example which verified its validity was given. It is proved that this algorithm can discover new simplified rules from the original rule set efficiently.

  6. Bioinformatics Assisted Gene Discovery and Annotation of Human Genome

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    As the sequencing stage of human genome project is near the end, the work has begun for discovering novel genes from genome sequences and annotating their biological functions. Here are reviewed current major bioinformatics tools and technologies available for large scale gene discovery and annotation from human genome sequences. Some ideas about possible future development are also provided.

  7. Indexer Based Dynamic Web Services Discovery

    CERN Document Server

    Bashir, Saba; Javed, M Younus; Khan, Aihab; Khiyal, Malik Sikandar Hayat

    2010-01-01

    Recent advancement in web services plays an important role in business to business and business to consumer interaction. Discovery mechanism is not only used to find a suitable service but also provides collaboration between service providers and consumers by using standard protocols. A static web service discovery mechanism is not only time consuming but requires continuous human interaction. This paper proposed an efficient dynamic web services discovery mechanism that can locate relevant and updated web services from service registries and repositories with timestamp based on indexing value and categorization for faster and efficient discovery of service. The proposed prototype focuses on quality of service issues and introduces concept of local cache, categorization of services, indexing mechanism, CSP (Constraint Satisfaction Problem) solver, aging and usage of translator. Performance of proposed framework is evaluated by implementing the algorithm and correctness of our method is shown. The results of p...

  8. SNP marker discovery in koala TLR genes.

    Directory of Open Access Journals (Sweden)

    Jian Cui

    Full Text Available Toll-like receptors (TLRs play a crucial role in the early defence against invading pathogens, yet our understanding of TLRs in marsupial immunity is limited. Here, we describe the characterisation of nine TLRs from a koala immune tissue transcriptome and one TLR from a draft sequence of the koala genome and the subsequent development of an assay to study genetic diversity in these genes. We surveyed genetic diversity in 20 koalas from New South Wales, Australia and showed that one gene, TLR10 is monomorphic, while the other nine TLR genes have between two and 12 alleles. 40 SNPs (16 non-synonymous were identified across the ten TLR genes. These markers provide a springboard to future studies on innate immunity in the koala, a species under threat from two major infectious diseases.

  9. SNP marker discovery in koala TLR genes.

    Science.gov (United States)

    Cui, Jian; Frankham, Greta J; Johnson, Rebecca N; Polkinghorne, Adam; Timms, Peter; O'Meally, Denis; Cheng, Yuanyuan; Belov, Katherine

    2015-01-01

    Toll-like receptors (TLRs) play a crucial role in the early defence against invading pathogens, yet our understanding of TLRs in marsupial immunity is limited. Here, we describe the characterisation of nine TLRs from a koala immune tissue transcriptome and one TLR from a draft sequence of the koala genome and the subsequent development of an assay to study genetic diversity in these genes. We surveyed genetic diversity in 20 koalas from New South Wales, Australia and showed that one gene, TLR10 is monomorphic, while the other nine TLR genes have between two and 12 alleles. 40 SNPs (16 non-synonymous) were identified across the ten TLR genes. These markers provide a springboard to future studies on innate immunity in the koala, a species under threat from two major infectious diseases.

  10. Rice mutant resources for gene discovery

    NARCIS (Netherlands)

    Hirochika, H.; Guiderdoni, E.; An, G.; Hsing, Y.I.; Eun, M.Y.; Han, C.D.; Upadhyaya, N.; Ramachandran, S.; Zhang, Q.F.; Pereira, A.B.; Sundaresan, V.; Leung, H.

    2004-01-01

    With the completion of genomic sequencing of rice, rice has been firmly established as a model organism for both basic and applied research. The next challenge is to uncover the functions of genes predicted by sequence analysis. Considering the amount of effort and the diversity of disciplines requi

  11. Risk genes for schizophrenia: translational opportunities for drug discovery.

    Science.gov (United States)

    Winchester, Catherine L; Pratt, Judith A; Morris, Brian J

    2014-07-01

    Despite intensive research over many years, the treatment of schizophrenia remains a major health issue. Current and emerging treatments for schizophrenia are based upon the classical dopamine and glutamate hypotheses of disease. Existing first and second generation antipsychotic drugs based upon the dopamine hypothesis are limited by their inability to treat all symptom domains and their undesirable side effect profiles. Third generation drugs based upon the glutamate hypothesis of disease are currently under evaluation but are more likely to be used as add on treatments. Hence there is a large unmet clinical need. A major challenge in neuropsychiatric disease research is the relatively limited knowledge of disease mechanisms. However, as our understanding of the genetic causes of the disease evolves, novel strategies for the development of improved therapeutic agents will become apparent. In this review we consider the current status of knowledge of the genetic basis of schizophrenia, including methods for identifying genetic variants associated with the disorder and how they impact on gene function. Although the genetic architecture of schizophrenia is complex, some targets amenable to pharmacological intervention can be discerned. We conclude that many challenges lie ahead but the stratification of patients according to biobehavioural constructs that cross existing disease classifications but with common genetic and neurobiological bases, offer opportunities for new approaches to effective drug discovery.

  12. Gene prioritization for imaging genetics studies using Gene Ontology and a stratified False Discovery Rate approach

    Directory of Open Access Journals (Sweden)

    Sejal ePatel

    2016-04-01

    Full Text Available Imaging genetics is an emerging field in which the association between genes and neuroimaging-based quantitative phenotypes are used to explore the functional role of genes in neuroanatomy and neurophysiology in the context of healthy function and neuropsychiatric disorders. The main obstacle for researchers in the field is the high dimensionality of the data in both the imaging phenotypes and the genetic variants commonly typed. In this article, we develop a novel method that utilizes Gene Ontology, an online database, to select and prioritize certain genes, employing a stratified false discovery rate (sFDR approach to investigate their associations with imaging phenotypes. sFDR has the potential to increase power in genome wide association studies (GWAS, and is quickly gaining traction as a method for multiple testing correction. Our novel approach addresses both the pressing need in genetic research to move beyond candidate gene studies, while not being overburdened with a loss of power due to multiple testing. As an example of our methodology, we perform a GWAS of hippocampal volume using both the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA2 and the Alzheimer’s Disease Neuroimaging Initiative datasets. The analysis of ENIGMA2 data yielded a set of SNPs with sFDR values between 10 to 20%. Our approach demonstrates a potential method to prioritize genes based on biological systems impaired in a disease.

  13. Discovery of pinoresinol reductase genes in sphingomonads.

    Science.gov (United States)

    Fukuhara, Y; Kamimura, N; Nakajima, M; Hishiyama, S; Hara, H; Kasai, D; Tsuji, Y; Narita-Yamada, S; Nakamura, S; Katano, Y; Fujita, N; Katayama, Y; Fukuda, M; Kajita, S; Masai, E

    2013-01-10

    Bacterial genes for the degradation of major dilignols produced in lignifying xylem are expected to be useful tools for the structural modification of lignin in plants. For this purpose, we isolated pinZ involved in the conversion of pinoresinol from Sphingobium sp. strain SYK-6. pinZ showed 43-77% identity at amino acid level with bacterial NmrA-like proteins of unknown function, a subgroup of atypical short chain dehydrogenases/reductases, but revealed only 15-21% identity with plant pinoresinol/lariciresinol reductases. PinZ completely converted racemic pinoresinol to lariciresinol, showing a specific activity of 46±3 U/mg in the presence of NADPH at 30°C. In contrast, the activity for lariciresinol was negligible. This substrate preference is similar to a pinoresinol reductase, AtPrR1, of Arabidopsis thaliana; however, the specific activity of PinZ toward (±)-pinoresinol was significantly higher than that of AtPrR1. The role of pinZ and a pinZ ortholog of Novosphingobium aromaticivorans DSM 12444 were also characterized.

  14. Beegle: from literature mining to disease-gene discovery.

    Science.gov (United States)

    ElShal, Sarah; Tranchevent, Léon-Charles; Sifrim, Alejandro; Ardeshirdavani, Amin; Davis, Jesse; Moreau, Yves

    2016-01-29

    Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeavour (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/.

  15. Ontology Based Qos Driven Web Service Discovery

    Directory of Open Access Journals (Sweden)

    R Suganyakala

    2011-07-01

    Full Text Available In today's scenario web services have become a grand vision to implement the business process functionalities. With increase in number of similar web services, one of the essential challenges is to discover relevant web service with regard to user specification. Relevancy of web service discovery can be improved by augmenting semantics through expressive formats like OWL. QoS based service selection will play a significant role in meeting the non-functional user requirements. Hence QoS and semantics has been used as finer search constraints to discover the most relevant service. In this paper, we describe a QoS framework for ontology based web service discovery. The QoS factors taken into consideration are execution time, response time, throughput, scalability, reputation, accessibility and availability. The behavior of each web service at various instances is observed over a period of time and their QoS based performance is analyzed.

  16. Technology development for gene discovery and full-length sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  17. Does Discovery-Based Instruction Enhance Learning?

    OpenAIRE

    Alfieri, L.; Brooks, PJ; Aldrich, NJ; Tenenbaum, HR

    2011-01-01

    Discovery learning approaches to education have recently come under scrutiny (Tobias & Duffy, 2009), with many studies indicating limitations to discovery learning practices. Therefore, 2 meta-analyses were conducted using a sample of 164 studies: The 1st examined the effects of unassisted discovery learning versus explicit instruction, and the 2nd examined the effects of enhanced and/or assisted discovery versus other types of instruction (e.g., explicit, unassisted discovery). Random effect...

  18. Gene discovery of modular diterpene metabolism in nonmodel systems.

    Science.gov (United States)

    Zerbe, Philipp; Hamberger, Björn; Yuen, Macaire M S; Chiang, Angela; Sandhu, Harpreet K; Madilao, Lina L; Nguyen, Anh; Hamberger, Britta; Bach, Søren Spanner; Bohlmann, Jörg

    2013-06-01

    Plants produce over 10,000 different diterpenes of specialized (secondary) metabolism, and fewer diterpenes of general (primary) metabolism. Specialized diterpenes may have functions in ecological interactions of plants with other organisms and also benefit humanity as pharmaceuticals, fragrances, resins, and other industrial bioproducts. Examples of high-value diterpenes are taxol and forskolin pharmaceuticals or ambroxide fragrances. Yields and purity of diterpenes obtained from natural sources or by chemical synthesis are often insufficient for large-volume or high-end applications. Improvement of agricultural or biotechnological diterpene production requires knowledge of biosynthetic genes and enzymes. However, specialized diterpene pathways are extremely diverse across the plant kingdom, and most specialized diterpenes are taxonomically restricted to a few plant species, genera, or families. Consequently, there is no single reference system to guide gene discovery and rapid annotation of specialized diterpene pathways. Functional diversification of genes and plasticity of enzyme functions of these pathways further complicate correct annotation. To address this challenge, we used a set of 10 different plant species to develop a general strategy for diterpene gene discovery in nonmodel systems. The approach combines metabolite-guided transcriptome resources, custom diterpene synthase (diTPS) and cytochrome P450 reference gene databases, phylogenies, and, as shown for select diTPSs, single and coupled enzyme assays using microbial and plant expression systems. In the 10 species, we identified 46 new diTPS candidates and over 400 putatively terpenoid-related P450s in a resource of nearly 1 million predicted transcripts of diterpene-accumulating tissues. Phylogenetic patterns of lineage-specific blooms of genes guided functional characterization.

  19. Does Discovery-Based Instruction Enhance Learning?

    Science.gov (United States)

    Alfieri, Louis; Brooks, Patricia J.; Aldrich, Naomi J.; Tenenbaum, Harriet R.

    2011-01-01

    Discovery learning approaches to education have recently come under scrutiny (Tobias & Duffy, 2009), with many studies indicating limitations to discovery learning practices. Therefore, 2 meta-analyses were conducted using a sample of 164 studies: The 1st examined the effects of unassisted discovery learning versus explicit instruction, and the…

  20. Metagenomics and novel gene discovery: promise and potential for novel therapeutics.

    Science.gov (United States)

    Culligan, Eamonn P; Sleator, Roy D; Marchesi, Julian R; Hill, Colin

    2014-04-01

    Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics.

  1. Inflammatory bowel disease gene discovery. CRADA final report

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-09-09

    The ultimate goal of this project is to identify the human gene(s) responsible for the disorder known as IBD. The work was planned in two phases. The desired products resulting from Phase 1 were BAC clone(s) containing the genetic marker(s) identified by gene/Networks, Inc. as potentially linked to IBD, plasmid subclones of those BAC(s), and new genetic markers developed from these plasmid subclones. The newly developed markers would be genotyped by gene/Networks, Inc. to ascertain evidence for linkage or non-linkage of IBD to this region. If non-linkage was indicated, the project would move to investigation of other candidate chromosomal regions. Where linkage was indicated, the project would move to Phase 2, in which a physical map of the candidate region(s) would be developed. The products of this phase would be contig(s) of BAC clones in the region exhibiting linkage to IBD, as well as plasmic subclones of the BACs and further genetic marker development. There would also be continued genotyping with new polymorphic markers during this phase. It was anticipated that clones identified and developed during these two phases would provide the physical resources for eventual disease gene discovery.

  2. Psychiatric gene discoveries shape evidence on ADHD's biology.

    Science.gov (United States)

    Thapar, A; Martin, J; Mick, E; Arias Vásquez, A; Langley, K; Scherer, S W; Schachar, R; Crosbie, J; Williams, N; Franke, B; Elia, J; Glessner, J; Hakonarson, H; Owen, M J; Faraone, S V; O'Donovan, M C; Holmans, P

    2016-09-01

    A strong motivation for undertaking psychiatric gene discovery studies is to provide novel insights into unknown biology. Although attention-deficit hyperactivity disorder (ADHD) is highly heritable, and large, rare copy number variants (CNVs) contribute to risk, little is known about its pathogenesis and it remains commonly misunderstood. We assembled and pooled five ADHD and control CNV data sets from the United Kingdom, Ireland, United States of America, Northern Europe and Canada. Our aim was to test for enrichment of neurodevelopmental gene sets, implicated by recent exome-sequencing studies of (a) schizophrenia and (b) autism as a means of testing the hypothesis that common pathogenic mechanisms underlie ADHD and these other neurodevelopmental disorders. We also undertook hypothesis-free testing of all biological pathways. We observed significant enrichment of individual genes previously found to harbour schizophrenia de novo non-synonymous single-nucleotide variants (SNVs; P=5.4 × 10(-4)) and targets of the Fragile X mental retardation protein (P=0.0018). No enrichment was observed for activity-regulated cytoskeleton-associated protein (P=0.23) or N-methyl-D-aspartate receptor (P=0.74) post-synaptic signalling gene sets previously implicated in schizophrenia. Enrichment of ADHD CNV hits for genes impacted by autism de novo SNVs (P=0.019 for non-synonymous SNV genes) did not survive Bonferroni correction. Hypothesis-free testing yielded several highly significantly enriched biological pathways, including ion channel pathways. Enrichment findings were robust to multiple testing corrections and to sensitivity analyses that excluded the most significant sample. The findings reveal that CNVs in ADHD converge on biologically meaningful gene clusters, including ones now established as conferring risk of other neurodevelopmental disorders. PMID:26573769

  3. Psychiatric gene discoveries shape evidence on ADHD's biology

    Science.gov (United States)

    Thapar, A; Martin, J; Mick, E; Arias Vásquez, A; Langley, K; Scherer, S W; Schachar, R; Crosbie, J; Williams, N; Franke, B; Elia, J; Glessner, J; Hakonarson, H; Owen, M J; Faraone, S V; O'Donovan, M C; Holmans, P

    2016-01-01

    A strong motivation for undertaking psychiatric gene discovery studies is to provide novel insights into unknown biology. Although attention-deficit hyperactivity disorder (ADHD) is highly heritable, and large, rare copy number variants (CNVs) contribute to risk, little is known about its pathogenesis and it remains commonly misunderstood. We assembled and pooled five ADHD and control CNV data sets from the United Kingdom, Ireland, United States of America, Northern Europe and Canada. Our aim was to test for enrichment of neurodevelopmental gene sets, implicated by recent exome-sequencing studies of (a) schizophrenia and (b) autism as a means of testing the hypothesis that common pathogenic mechanisms underlie ADHD and these other neurodevelopmental disorders. We also undertook hypothesis-free testing of all biological pathways. We observed significant enrichment of individual genes previously found to harbour schizophrenia de novo non-synonymous single-nucleotide variants (SNVs; P=5.4 × 10−4) and targets of the Fragile X mental retardation protein (P=0.0018). No enrichment was observed for activity-regulated cytoskeleton-associated protein (P=0.23) or N-methyl-D-aspartate receptor (P=0.74) post-synaptic signalling gene sets previously implicated in schizophrenia. Enrichment of ADHD CNV hits for genes impacted by autism de novo SNVs (P=0.019 for non-synonymous SNV genes) did not survive Bonferroni correction. Hypothesis-free testing yielded several highly significantly enriched biological pathways, including ion channel pathways. Enrichment findings were robust to multiple testing corrections and to sensitivity analyses that excluded the most significant sample. The findings reveal that CNVs in ADHD converge on biologically meaningful gene clusters, including ones now established as conferring risk of other neurodevelopmental disorders. PMID:26573769

  4. Genome Enabled Discovery of Carbon Sequestration Genes in Poplar

    Energy Technology Data Exchange (ETDEWEB)

    Filichkin, Sergei; Etherington, Elizabeth; Ma, Caiping; Strauss, Steve

    2007-02-22

    The goals of the S.H. Strauss laboratory portion of 'Genome-enabled discovery of carbon sequestration genes in poplar' are (1) to explore the functions of candidate genes using Populus transformation by inserting genes provided by Oakridge National Laboratory (ORNL) and the University of Florida (UF) into poplar; (2) to expand the poplar transformation toolkit by developing transformation methods for important genotypes; and (3) to allow induced expression, and efficient gene suppression, in roots and other tissues. As part of the transformation improvement effort, OSU developed transformation protocols for Populus trichocarpa 'Nisqually-1' clone and an early flowering P. alba clone, 6K10. Complete descriptions of the transformation systems were published (Ma et. al. 2004, Meilan et. al 2004). Twenty-one 'Nisqually-1' and 622 6K10 transgenic plants were generated. To identify root predominant promoters, a set of three promoters were tested for their tissue-specific expression patterns in poplar and in Arabidopsis as a model system. A novel gene, ET304, was identified by analyzing a collection of poplar enhancer trap lines generated at OSU (Filichkin et. al 2006a, 2006b). Other promoters include the pGgMT1 root-predominant promoter from Casuarina glauca and the pAtPIN2 promoter from Arabidopsis root specific PIN2 gene. OSU tested two induction systems, alcohol- and estrogen-inducible, in multiple poplar transgenics. Ethanol proved to be the more efficient when tested in tissue culture and greenhouse conditions. Two estrogen-inducible systems were evaluated in transgenic Populus, neither of which functioned reliably in tissue culture conditions. GATEWAY-compatible plant binary vectors were designed to compare the silencing efficiency of homologous (direct) RNAi vs. heterologous (transitive) RNAi inverted repeats. A set of genes was targeted for post transcriptional silencing in the model Arabidopsis system; these include the floral

  5. Database systems for knowledge-based discovery.

    Science.gov (United States)

    Jagarlapudi, Sarma A R P; Kishan, K V Radha

    2009-01-01

    Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery.

  6. Database systems for knowledge-based discovery.

    Science.gov (United States)

    Jagarlapudi, Sarma A R P; Kishan, K V Radha

    2009-01-01

    Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery. PMID:19727614

  7. The Matchmaker Exchange: a platform for rare disease gene discovery.

    Science.gov (United States)

    Philippakis, Anthony A; Azzariti, Danielle R; Beltran, Sergi; Brookes, Anthony J; Brownstein, Catherine A; Brudno, Michael; Brunner, Han G; Buske, Orion J; Carey, Knox; Doll, Cassie; Dumitriu, Sergiu; Dyke, Stephanie O M; den Dunnen, Johan T; Firth, Helen V; Gibbs, Richard A; Girdea, Marta; Gonzalez, Michael; Haendel, Melissa A; Hamosh, Ada; Holm, Ingrid A; Huang, Lijia; Hurles, Matthew E; Hutton, Ben; Krier, Joel B; Misyura, Andriy; Mungall, Christopher J; Paschall, Justin; Paten, Benedict; Robinson, Peter N; Schiettecatte, François; Sobreira, Nara L; Swaminathan, Ganesh J; Taschner, Peter E; Terry, Sharon F; Washington, Nicole L; Züchner, Stephan; Boycott, Kym M; Rehm, Heidi L

    2015-10-01

    There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow.

  8. Amyotrophic Lateral Sclerosis: An Emerging Era of Collaborative Gene Discovery

    Science.gov (United States)

    Gwinn, Katrina; Corriveau, Roderick A.; Mitsumoto, Hiroshi; Bednarz, Kate; Brown, Robert H.; Cudkowicz, Merit; Gordon, Paul H.; Hardy, John; Kasarskis, Edward J.; Kaufmann, Petra; Miller, Robert; Sorenson, Eric; Tandan, Rup; Traynor, Bryan J.; Nash, Josefina; Sherman, Alex; Mailman, Matthew D.; Ostell, James; Bruijn, Lucie; Cwik, Valerie; Rich, Stephen S.; Singleton, Andrew; Refolo, Larry; Andrews, Jaime; Zhang, Ran; Conwit, Robin; Keller, Margaret A.

    2007-01-01

    Amyotrophic lateral sclerosis (ALS) is the most common form of motor neuron disease (MND). It is currently incurable and treatment is largely limited to supportive care. Family history is associated with an increased risk of ALS, and many Mendelian causes have been discovered. However, most forms of the disease are not obviously familial. Recent advances in human genetics have enabled genome-wide analyses of single nucleotide polymorphisms (SNPs) that make it possible to study complex genetic contributions to human disease. Genome-wide SNP analyses require a large sample size and thus depend upon collaborative efforts to collect and manage the biological samples and corresponding data. Public availability of biological samples (such as DNA), phenotypic and genotypic data further enhances research endeavors. Here we discuss a large collaboration among academic investigators, government, and non-government organizations which has created a public repository of human DNA, immortalized cell lines, and clinical data to further gene discovery in ALS. This resource currently maintains samples and associated phenotypic data from 2332 MND subjects and 4692 controls. This resource should facilitate genetic discoveries which we anticipate will ultimately provide a better understanding of the biological mechanisms of neurodegeneration in ALS. PMID:18060051

  9. The Matchmaker Exchange: a platform for rare disease gene discovery.

    Science.gov (United States)

    Philippakis, Anthony A; Azzariti, Danielle R; Beltran, Sergi; Brookes, Anthony J; Brownstein, Catherine A; Brudno, Michael; Brunner, Han G; Buske, Orion J; Carey, Knox; Doll, Cassie; Dumitriu, Sergiu; Dyke, Stephanie O M; den Dunnen, Johan T; Firth, Helen V; Gibbs, Richard A; Girdea, Marta; Gonzalez, Michael; Haendel, Melissa A; Hamosh, Ada; Holm, Ingrid A; Huang, Lijia; Hurles, Matthew E; Hutton, Ben; Krier, Joel B; Misyura, Andriy; Mungall, Christopher J; Paschall, Justin; Paten, Benedict; Robinson, Peter N; Schiettecatte, François; Sobreira, Nara L; Swaminathan, Ganesh J; Taschner, Peter E; Terry, Sharon F; Washington, Nicole L; Züchner, Stephan; Boycott, Kym M; Rehm, Heidi L

    2015-10-01

    There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow. PMID:26295439

  10. Ontology-based knowledge discovery in pharmacogenomics.

    Science.gov (United States)

    Coulet, Adrien; Smaïl-Tabbone, Malika; Napoli, Amedeo; Devignes, Marie-Dominique

    2011-01-01

    One current challenge in biomedicine is to analyze large amounts of complex biological data for extracting domain knowledge. This work holds on the use of knowledge-based techniques such as knowledge discovery (KD) and knowledge representation (KR) in pharmacogenomics, where knowledge units represent genotype-phenotype relationships in the context of a given treatment. An objective is to design knowledge base (KB, here also mentioned as an ontology) and then to use it in the KD process itself. A method is proposed for dealing with two main tasks: (1) building a KB from heterogeneous data related to genotype, phenotype, and treatment, and (2) applying KD techniques on knowledge assertions for extracting genotype-phenotype relationships. An application was carried out on a clinical trial concerned with the variability of drug response to montelukast treatment. Genotype-genotype and genotype-phenotype associations were retrieved together with new associations, allowing the extension of the initial KB. This experiment shows the potential of KR and KD processes, especially for designing KB, checking KB consistency, and reasoning for problem solving.

  11. Discovery of gene expression-based pharmacodynamic biomarker for a p53 context-specific anti-tumor drug Wee1 inhibitor

    Directory of Open Access Journals (Sweden)

    Mizuarai Shinji

    2009-06-01

    Full Text Available Abstract Background Wee1 is a tyrosine kinase regulating S-G2 cell cycle transition through the inactivating phosphorylation of CDC2. The inhibition of Wee1 kinase by a selective small molecule inhibitor significantly enhances the anti-tumor efficacy of DNA damaging agents, specifically in p53 negative tumors by abrogating S-G2 checkpoints, while normal cells with wild-type p53 are not severely damaged due to the intact function of the G1 checkpoint mediated by p53. Since the measurement of mRNA expression requires a very small amount of biopsy tissue and is highly quantitative, the development of a pharmacodynamic (PD biomarker leveraging mRNA expression is eagerly anticipated in order to estimate target engagement of anti-cancer agents. Results In order to find the Wee1 inhibition signature, mRNA expression profiling was first performed in both p53 positive and negative cancer cell lines treated with gemcitabine and a Wee1 inhibitor, MK-1775. We next carried out mRNA expression profiling of skin samples derived from xenograft models treated with the Wee1 inhibitor to identify a Wee1 inhibitor-regulatory gene set. Then, the genes that were commonly modulated in both cancer cell lines and rat skin samples were extracted as a Wee1 inhibition signature that could potentially be used as a PD biomarker independent of p53 status. The expression of the Wee1 inhibition signature was found to be regulated in a dose-dependent manner by the Wee1 inhibitor, and was significantly correlated with the inhibition level of a direct substrate, phosphorylated-CDC2. Individual genes in this Wee1 inhibition signature are known to regulate S-G2 cell cycle progression or checkpoints, which is consistent with the mode-of-action of the Wee1 inhibitor. Conclusion We report here the identification of an mRNA gene signature that was specifically changed by gemcitabine and Wee1 inhibitor combination treatment by molecular profiling. Given the common regulation of

  12. Graph-Based Methods for Discovery Browsing with Semantic Predications

    OpenAIRE

    Wilkowski, Bartlomiej; Fiszman, Marcelo; Miller, Christopher M.; Hristovski, Dimitar; Arabandi, Sivaram; Rosemblat, Graciela; Rindflesch, Thomas C.

    2011-01-01

    We present an extension to literature-based discovery that goes beyond making discoveries to a principled way of navigating through selected aspects of some biomedical domain. The method is a type of “discovery browsing” that guides the user through the research literature on a specified phenomenon. Poorly understood relationships may be explored through novel points of view, and potentially interesting relationships need not be known ahead of time. In a process of “cooperative reciprocity” t...

  13. Discovery of the faithfulness gene: a model of transmission and transformation of scientific information.

    Science.gov (United States)

    Green, Eva G T; Clémence, Alain

    2008-09-01

    The purpose of this paper is to study the diffusion and transformation of scientific information in everyday discussions. Based on rumour models and social representations theory, the impact of interpersonal communication and pre-existing beliefs on transmission of the content of a scientific discovery was analysed. In three experiments, a communication chain was simulated to investigate how laypeople make sense of a genetic discovery first published in a scientific outlet, then reported in a mainstream newspaper and finally discussed in groups. Study 1 (N=40) demonstrated a transformation of information when the scientific discovery moved along the communication chain. During successive narratives, scientific expert terminology disappeared while scientific information associated with lay terminology persisted. Moreover, the idea of a discovery of a faithfulness gene emerged. Study 2 (N=70) revealed that transmission of the scientific message varied as a function of attitudes towards genetic explanations of behaviour (pro-genetics vs. anti-genetics). Pro-genetics employed more scientific terminology than anti-genetics. Study 3 (N=75) showed that endorsement of genetic explanations was related to descriptive accounts of the scientific information, whereas rejection of genetic explanations was related to evaluative accounts of the information.

  14. The discovery of the microphthalmia locus and its gene, Mitf.

    Science.gov (United States)

    Arnheiter, Heinz

    2010-12-01

    The history of the discovery of the microphthalmia locus and its gene, now called Mitf, is a testament to the triumph of serendipity. Although the first microphthalmia mutation was discovered among the descendants of a mouse that was irradiated for the purpose of mutagenesis, the mutation most likely was not radiation induced but occurred spontaneously in one of the parents of a later breeding. Although Mitf might eventually have been identified by other molecular genetic techniques, it was first cloned from a chance transgene insertion at the microphthalmia locus. And although Mitf was found to encode a member of a well-known transcription factor family, its analysis might still be in its infancy had Mitf not turned out to be of crucial importance for the physiology and pathology of many distinct organs, including eye, ear, immune system, bone, and skin, and in particular for melanoma. In fact, near seven decades of Mitf research have led to many insights about development, function, degeneration, and malignancies of a number of specific cell types, and it is hoped that these insights will one day lead to therapies benefitting those afflicted with diseases originating in these cell types.

  15. Gene expression, single nucleotide variant and fusion transcript discovery in archival material from breast tumors.

    Directory of Open Access Journals (Sweden)

    Nadine Norton

    Full Text Available Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter (226 gene panel and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlations of >0.94 and >0.80 with NanoString and ScriptSeq protocols, respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes and ScriptSeq whole transcriptome protocols respectively, p<2x10(-16. Specifically for lincRNAs, we observed superb Pearson correlation (0.988 between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads. Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transcriptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol

  16. Abiotic Stress Tolerance: From Gene Discovery in Model Organisms to Crop Improvement

    Institute of Scientific and Technical Information of China (English)

    Ray Bressan; Hans Bohnert; Jian-Kang Zhu

    2009-01-01

    Productive and sustainable agriculture necessitates growing plants in sub-optimal environments with less input of precious resources such as fresh water. For a better understanding and rapid improvement of abiotic stress tolerance, it is important to link physiological and biochemical work to molecular studies in genetically tractable model organisms. With the use of several technologies for the discovery of stress tolerance genes and their appropriate alleles,transgenic approaches to improving stress tolerance in crops remarkably parallels breeding principles with a greatly expanded germplasm base and will succeed eventually.

  17. Literature-based knowledge discovery: the state of the art

    CERN Document Server

    Liu, Xiaoyong

    2012-01-01

    Literature-based knowledge discovery method was introduced by Dr. Swanson in 1986. He hypothesized a connection between Raynaud's phenomenon and dietary fish oil, the field of literature-based discovery (LBD) was born from then on. During the subsequent two decades, LBD's research attracts some scientists including information science, computer science, and biomedical science, etc.. It has been a part of knowledge discovery and text mining. This paper summarizes the development of recent years about LBD and presents two parts, methodology research and applied research. Lastly, some problems are pointed as future research directions.

  18. Data mining as a discovery tool for imprinted genes.

    Science.gov (United States)

    Brideau, Chelsea; Soloway, Paul

    2012-01-01

    This chapter serves as an introduction to the collection of genome-wide sequence and epigenomic data, as well as the use of these data in training generalized linear models (glm) to predicted imprinted status. This is meant to be an introduction to the method, so only the most straightforward examples will be covered. For instance, the examples given below refer to 11 classes of genomic regions (the entire gene body, introns, exons, 5' UTR, 3' UTR, and 1, 10, and 100 kb upstream and downstream of each gene). One could also build models based on combinations of these regions. Likewise, models could be built on combinations of epigenetic features, or on combinations of both genomic regions and epigenetic features.This chapter relies heavily on computational methods, including basic programming. However, this chapter is not meant to be an introduction to programming. Throughout the chapter, the reader will be provided with example code in the Perl programming language. PMID:22907493

  19. SPARCoC: a new framework for molecular pattern discovery and cancer gene identification.

    Directory of Open Access Journals (Sweden)

    Shiqian Ma

    Full Text Available It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust, which is based on a novel Common-background and Sparse-foreground Decomposition (CSD model and the Maximum Block Improvement (MBI co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust and nonnegative matrix factorization (NMF. We apply SPARCoC to the study of lung adenocarcinoma (ADCA, an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05. Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification.

  20. Spark, an application based on Serendipitous Knowledge Discovery.

    Science.gov (United States)

    Workman, T Elizabeth; Fiszman, Marcelo; Cairelli, Michael J; Nahl, Diane; Rindflesch, Thomas C

    2016-04-01

    Findings from information-seeking behavior research can inform application development. In this report we provide a system description of Spark, an application based on findings from Serendipitous Knowledge Discovery studies and data structures known as semantic predications. Background information and the previously published IF-SKD model (outlining Serendipitous Knowledge Discovery in online environments) illustrate the potential use of information-seeking behavior in application design. A detailed overview of the Spark system illustrates how methodologies in design and retrieval functionality enable production of semantic predication graphs tailored to evoke Serendipitous Knowledge Discovery in users.

  1. TILLING in forage grasses for gene discovery and breeding improvement.

    Science.gov (United States)

    Manzanares, Chloe; Yates, Steven; Ruckle, Michael; Nay, Michelle; Studer, Bruno

    2016-09-25

    Mutation breeding has a long-standing history and in some major crop species, many of the most important cultivars have their origin in germplasm generated by mutation induction. For almost two decades, methods for TILLING (Targeting Induced Local Lesions IN Genomes) have been established in model plant species such as Arabidopsis (Arabidopsis thaliana L.), enabling the functional analysis of genes. Recent advances in mutation detection by second generation sequencing technology have brought its utility to major crop species. However, it has remained difficult to apply similar approaches in forage and turf grasses, mainly due to their outbreeding nature maintained by an efficient self-incompatibility system. Starting with a description of the extent to which traditional mutagenesis methods have contributed to crop yield increase in the past, this review focuses on technological approaches to implement TILLING-based strategies for the improvement of forage grass breeding through forward and reverse genetics. We present first results from TILLING in allogamous forage grasses for traits such as stress tolerance and evaluate prospects for rapid implementation of beneficial alleles to forage grass breeding. In conclusion, large-scale induced mutation resources, used for forward genetic screens, constitute a valuable tool to increase the genetic diversity for breeding and can be generated with relatively small investments in forage grasses. Furthermore, large libraries of sequenced mutations can be readily established, providing enhanced opportunities to discover mutations in genes controlling traits of agricultural importance and to study gene functions by reverse genetics. PMID:26924175

  2. Resource Discovery in Activity-Based Sensor Networks

    DEFF Research Database (Denmark)

    Bucur, Doina; Bardram, Jakob

    (ABSNs) knowledge about their usage even at the network layer. ABSN redesigns classical network-level service discovery protocols to include and use this logical structuring of the network for a more practically applicable service discovery scheme. Noting that in practical settings activity-based sensor......This paper proposes a service discovery protocol for sensor networks that is specifically tailored for use in humancentered pervasive environments. It uses the high-level concept of computational activities (as logical bundles of data and resources) to give sensors in Activity-Based Sensor Networks...... patches are localized, ABSN designs a completely distributed, hybrid discovery protocol which is proactive in a neighbourhood zone and reactive outside, tailored so that any query among the sensors of one activity is routed through the network with minimum overhead, guided by the bounds of that activity...

  3. Biomarker Discovery by Novel Sensors Based on Nanoproteomics Approaches

    Directory of Open Access Journals (Sweden)

    Manuel Fuentes

    2012-02-01

    Full Text Available During the last years, proteomics has facilitated biomarker discovery by coupling high-throughput techniques with novel nanosensors. In the present review, we focus on the study of label-based and label-free detection systems, as well as nanotechnology approaches, indicating their advantages and applications in biomarker discovery. In addition, several disease biomarkers are shown in order to display the clinical importance of the improvement of sensitivity and selectivity by using nanoproteomics approaches as novel sensors.

  4. Tree-Based Neighbor Discovery in Urban Vehicular Sensor Networks

    OpenAIRE

    Heejun Roh; Wonjun Lee

    2012-01-01

    In urban vehicular sensor networks, vehicles equipped with onboard sensors monitor some area, and the result can be shared to neighbor vehicles to correct their own sensing data. However, due to the frequent change of vehicle topology compared to the wireless sensor network, it is required for a vehicle to discover neighboring vehicles. Therefore, efficient neighbor discovery algorithm should be designed for vehicular sensor networks. In this paper, two efficient tree-based neighbor discovery...

  5. Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids

    Directory of Open Access Journals (Sweden)

    Fu Chih-Hsiung

    2011-07-01

    Full Text Available Abstract Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7. Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies.

  6. Discovery of signature genes in gastric cancer associated with prognosis.

    Science.gov (United States)

    Zhao, X; Cai, H; Wang, X; Ma, L

    2016-01-01

    Gene expression profiles of gastric cancer (GC) were analyzed with bioinformatics tools to identify signature genes associated with prognosis. Four gene expression data sets (accession number: GSE2685, GSE30727, GSE38932 and GSE26253) were downloaded from Gene Expression Omnibus. Differentially expressed genes (DEGs) were screened out using significance analysis of microarrays (SAM) algorithm. P-value 1 were set as the threshold. A co-expression network was constructed for the GC-related genes with package WGCNA of R. Modules were disclosed with WGCNA algorithm. Survival-related signature genes were screened out via COX single-variable regression.A total of 3210 GC-related genes were identified from the 3 data sets. Significantly enriched GO biological process terms included cell death, cell proliferation, apoptosis, response to hormone and phosphorylation. Pathways like viral carcinogenesis, metabolism, EBV viral infection, and PI3K-AKT signaling pathway were significantly over-represented in the DEGs. A gene co-expression network including 2414 genes was constructed, from which 7 modules were revealed. A total of 17 genes were identified as signature genes, such as DAB2, ALDH2, CD58, CITED2, BNIP3L, SLC43A2, FAU and COL5A1.Many signature genes associated with prognosis of GC were identified in present study, some of which have been implicated in the pathogenesis of GC. These findings could not only improve the knowledge about GC, but also provide clues for clinical treatments. PMID:26774142

  7. Discovery of mammalian genes that participate in virus infection

    Directory of Open Access Journals (Sweden)

    Sheng Jinsong

    2004-11-01

    Full Text Available Abstract Background Viruses are obligate intracellular parasites that rely upon the host cell for different steps in their life cycles. The characterization of cellular genes required for virus infection and/or cell killing will be essential for understanding viral life cycles, and may provide cellular targets for new antiviral therapies. Results Candidate genes required for lytic reovirus infection were identified by tagged sequence mutagenesis, a process that permits rapid identification of genes disrupted by gene entrapment. One hundred fifty-one reovirus resistant clones were selected from cell libraries containing 2 × 105 independently disrupted genes, of which 111 contained mutations in previously characterized genes and functionally anonymous transcription units. Collectively, the genes associated with reovirus resistance differed from genes targeted by random gene entrapment in that known mutational hot spots were under represented, and a number of mutations appeared to cluster around specific cellular processes, including: IGF-II expression/signalling, vesicular transport/cytoskeletal trafficking and apoptosis. Notably, several of the genes have been directly implicated in the replication of reovirus and other viruses at different steps in the viral lifecycle. Conclusions Tagged sequence mutagenesis provides a rapid, genome-wide strategy to identify candidate cellular genes required for virus infection. The candidate genes provide a starting point for mechanistic studies of cellular processes that participate in the virus lifecycle and may provide targets for novel anti-viral therapies.

  8. Traditional Chinese Medicine-Based Network Pharmacology Could Lead to New Multicompound Drug Discovery

    Directory of Open Access Journals (Sweden)

    Jian Li

    2012-01-01

    Full Text Available Current strategies for drug discovery have reached a bottleneck where the paradigm is generally “one gene, one drug, one disease.” However, using holistic and systemic views, network pharmacology may be the next paradigm in drug discovery. Based on network pharmacology, a combinational drug with two or more compounds could offer beneficial synergistic effects for complex diseases. Interestingly, traditional chinese medicine (TCM has been practicing holistic views for over 3,000 years, and its distinguished feature is using herbal formulas to treat diseases based on the unique pattern classification. Though TCM herbal formulas are acknowledged as a great source for drug discovery, no drug discovery strategies compatible with the multidimensional complexities of TCM herbal formulas have been developed. In this paper, we highlighted some novel paradigms in TCM-based network pharmacology and new drug discovery. A multiple compound drug can be discovered by merging herbal formula-based pharmacological networks with TCM pattern-based disease molecular networks. Herbal formulas would be a source for multiple compound drug candidates, and the TCM pattern in the disease would be an indication for a new drug.

  9. Computational method for discovery of estrogen responsive genes

    DEFF Research Database (Denmark)

    Tang, Suisheng; Tan, Sin Lam; Ramadoss, Suresh Kumar;

    2004-01-01

    Estrogen has a profound impact on human physiology and affects numerous genes. The classical estrogen reaction is mediated by its receptors (ERs), which bind to the estrogen response elements (EREs) in target gene's promoter region. Due to tedious and expensive experiments, a limited number of...... human genes are functionally well characterized. It is still unclear how many and which human genes respond to estrogen treatment. We propose a simple, economic, yet effective computational method to predict a subclass of estrogen responsive genes. Our method relies on the similarity of ERE frames...... across different promoters in the human genome. Matching ERE frames of a test set of 60 known estrogen responsive genes to the collection of over 18,000 human promoters, we obtained 604 candidate genes. Evaluating our result by comparison with the published microarray data and literature, we found that...

  10. SECURE SERVICE DISCOVERY BASED ON PROBE PACKET MECHANISM FOR MANETS

    Directory of Open Access Journals (Sweden)

    S. Pariselvam

    2015-03-01

    Full Text Available In MANETs, Service discovery process is always considered to be crucial since they do not possess a centralized infrastructure for communication. Moreover, different services available through the network necessitate varying categories. Hence, a need arises for devising a secure probe based service discovery mechanism to reduce the complexity in providing the services to the network users. In this paper, we propose a Secure Service Discovery Based on Probe Packet Mechanism (SSDPPM for identifying the DoS attack in MANETs, which depicts a new approach for estimating the level of trust present in each and every routing path of a mobile ad hoc network by using probe packets. Probing based service discovery mechanisms mainly identifies a mobile node’s genuineness using a test packet called probe that travels the entire network for the sake of computing the degree of trust maintained between the mobile nodes and it’s attributed impact towards the network performance. The performance of SSDPPM is investigated through a wide range of network related parameters like packet delivery, throughput, Control overhead and total overhead using the version ns-2.26 network simulator. This mechanism SSDPPM, improves the performance of the network in an average by 23% and 19% in terms of packet delivery ratio and throughput than the existing service discovery mechanisms available in the literature.

  11. Marinopyrroles: Unique Drug Discoveries Based on Marine Natural Products.

    Science.gov (United States)

    Li, Rongshi

    2016-01-01

    Natural products provide a successful supply of new chemical entities (NCEs) for drug discovery to treat human diseases. Approximately half of the NCEs are based on natural products and their derivatives. Notably, marine natural products, a largely untapped resource, have contributed to drug discovery and development with eight drugs or cosmeceuticals approved by the U.S. Food and Drug Administration and European Medicines Agency, and ten candidates undergoing clinical trials. Collaborative efforts from drug developers, biologists, organic, medicinal, and natural product chemists have elevated drug discoveries to new levels. These efforts are expected to continue to improve the efficiency of natural product-based drugs. Marinopyrroles are examined here as a case study for potential anticancer and antibiotic agents. PMID:26332654

  12. Gene Expression Data Knowledge Discovery using Global and Local Clustering

    OpenAIRE

    H, Swathi.

    2010-01-01

    To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global cl...

  13. Using concepts in literature-based discovery : Simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries

    NARCIS (Netherlands)

    Weeber, M; Klein, H; de Jong-van den Berg, LTW; Vos, R

    2001-01-01

    Literature-based discovery has resulted in new knowledge. In the biomedical context, Don R. Swanson has generated several literature-based hypotheses that have been corroborated experimentally and clinically. In this paper, we propose a two-step model of the discovery process in which hypotheses are

  14. PiggyBac Transposon Mutagenesis: A Tool for Cancer Gene Discovery in Mice

    OpenAIRE

    Rad, Roland; Rad, Lena; Wang, Wei; Cadinanos, Juan; Vassiliou, George; Rice, Stephen; Campos, Lia S.; Yusa, Kosuke; Banerjee, Ruby; Li, Meng Amy; de la Rosa, Jorge; Strong, Alexander; Lu, Dong; Ellis, Peter; Conte, Nathalie

    2010-01-01

    Transposons are mobile DNA segments that can disrupt gene function by inserting in or near genes. Here we show that insertional mutagenesis by the PiggyBac transposon can be used for cancer gene discovery in mice. PiggyBac transposition in genetically engineered transposon/transposase mice induced cancers whose type (hematopoietic versus solid) and latency were dependent on the regulatory elements introduced into transposons. Analysis of 63 hematopoietic tumors revealed the unique qualities o...

  15. Structural choice based on knowledge discovery system

    Institute of Scientific and Technical Information of China (English)

    邢方亮; 王光远

    2002-01-01

    Structural choice is a significant decision having an important influence on structural function, socialeconomics, structural reliability and construction cost. A Case Based Reasoning system with its retrieval partconstructed with a KDD subsystem, is put forward to make a decision for a large scale engineering project. Atypical CBR system consists of four parts: case representation, case retriever, evaluation, and adaptation. Acase library is a set of parameterized excellent and successful structures. For a structural choice, the key pointis that the system must be able to detect the pattern classes hidden in the case library and classify the input pa-rameters into classes properly. That is done by using the KDD Data Mining algorithm based on Self-OrganizingFeature Maps ( SOFM), which makes the whole system more adaptive, self-organizing, self-learning and open.

  16. A comparative review of estimates of the proportion unchanged genes and the false discovery rate

    Directory of Open Access Journals (Sweden)

    Broberg Per

    2005-08-01

    Full Text Available Abstract Background In the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these p-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the p-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available. Results A simulation model based on the distribution of real microarray data plus two real data sets were used to assess the methods. The proposed alternative methods for estimating the proportion unchanged fared very well, and gave evidence of low bias and very low variance. Different methods perform well depending upon whether there are few or many regulated genes. Furthermore, the methods for estimating FDR showed a varying performance, and were sometimes misleading. The new method had a very low error. Conclusion The concept of the q-value or false discovery rate is useful in practical research, despite some theoretical and practical shortcomings. However, it seems possible to challenge the performance of the published methods, and there is likely scope for further developing the estimates of the FDR. The new methods provide the scientist with more options to choose a suitable method for any particular experiment. The article advocates the use of the conjoint information

  17. Phylogeny based discovery of regulatory elements

    Directory of Open Access Journals (Sweden)

    Cohen Barak A

    2006-05-01

    Full Text Available Abstract Background Algorithms that locate evolutionarily conserved sequences have become powerful tools for finding functional DNA elements, including transcription factor binding sites; however, most methods do not take advantage of an explicit model for the constrained evolution of functional DNA sequences. Results We developed a probabilistic framework that combines an HKY85 model, which assigns probabilities to different base substitutions between species, and weight matrix models of transcription factor binding sites, which describe the probabilities of observing particular nucleotides at specific positions in the binding site. The method incorporates the phylogenies of the species under consideration and takes into account the position specific variation of transcription factor binding sites. Using our framework we assessed the suitability of alignments of genomic sequences from commonly used species as substrates for comparative genomic approaches to regulatory motif finding. We then applied this technique to Saccharomyces cerevisiae and related species by examining all possible six base pair DNA sequences (hexamers and identifying sequences that are conserved in a significant number of promoters. By combining similar conserved hexamers we reconstructed known cis-regulatory motifs and made predictions of previously unidentified motifs. We tested one prediction experimentally, finding it to be a regulatory element involved in the transcriptional response to glucose. Conclusion The experimental validation of a regulatory element prediction missed by other large-scale motif finding studies demonstrates that our approach is a useful addition to the current suite of tools for finding regulatory motifs.

  18. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    Energy Technology Data Exchange (ETDEWEB)

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.

  19. Validation of Context Based Service Discovery Protocol for Ubiquitous Applications

    Directory of Open Access Journals (Sweden)

    Anandi Giridharan

    2012-11-01

    Full Text Available Service Discovery Protocol (SDP is important in ubiquitous applications, where a large number of devicesand software components collaborate unobtrusively and provide numerous services without userintervention. Existing service discovery schemes use a service matching process in order to offer services ofinterest to the users. Potentially, the context information of the users and surrounding environment can beused to improve the quality of service matching. We propose a C-IOB (Context- Information, Observationand Belief based service discovery model, which deals with the above challenges by processing the contextinformation and by formulating the beliefs based on the basis of observations. With these formulated beliefsthe required services will be provided to the users. In this work, we present an approach for automatedvalidation of C-IOB based service discovery model in a typical ubiquitous museum environment, where theexternal behavior of the system can be predicted and compared to a model of expected behavior from theoriginal requirements. Formal specification using SDL (Specification and Description Language basedsystem has been used to conduct verification and validation of the system. The purpose of this framework isto provide a formal basis for their performance evaluation and behavioral study of the SDP.

  20. KBERG: KnowledgeBase for Estrogen Responsive Genes

    DEFF Research Database (Denmark)

    Tang, Suisheng; Zhang, Zhuo; Tan, Sin Lam;

    2007-01-01

    Estrogen has a profound impact on human physiology affecting transcription of numerous genes. To decipher functional characteristics of estrogen responsive genes, we developed KnowledgeBase for Estrogen Responsive Genes (KBERG). Genes in KBERG were derived from Estrogen Responsive Gene Database...... is based on ab initio discovery of common cis-elements from the orthologous gene cluster from human, mouse and rat, thus reflecting a degree of promoter sequence preservation during evolution. The identified motifs are linked to transcription factor binding sites based on the TRANSFAC database. In addition...

  1. Gene discovery in the horned beetle Onthophagus taurus

    Directory of Open Access Journals (Sweden)

    Yang Youngik

    2010-12-01

    Full Text Available Abstract Background Horned beetles, in particular in the genus Onthophagus, are important models for studies on sexual selection, biological radiations, the origin of novel traits, developmental plasticity, biocontrol, conservation, and forensic biology. Despite their growing prominence as models for studying both basic and applied questions in biology, little genomic or transcriptomic data are available for this genus. We used massively parallel pyrosequencing (Roche 454-FLX platform to produce a comprehensive EST dataset for the horned beetle Onthophagus taurus. To maximize sequence diversity, we pooled RNA extracted from a normalized library encompassing diverse developmental stages and both sexes. Results We used 454 pyrosequencing to sequence ESTs from all post-embryonic stages of O. taurus. Approximately 1.36 million reads assembled into 50,080 non-redundant sequences encompassing a total of 26.5 Mbp. The non-redundant sequences match over half of the genes in Tribolium castaneum, the most closely related species with a sequenced genome. Analyses of Gene Ontology annotations and biochemical pathways indicate that the O. taurus sequences reflect a wide and representative sampling of biological functions and biochemical processes. An analysis of sequence polymorphisms revealed that SNP frequency was negatively related to overall expression level and the number of tissue types in which a given gene is expressed. The most variable genes were enriched for a limited number of GO annotations whereas the least variable genes were enriched for a wide range of GO terms directly related to fitness. Conclusions This study provides the first large-scale EST database for horned beetles, a much-needed resource for advancing the study of these organisms. Furthermore, we identified instances of gene duplications and alternative splicing, useful for future study of gene regulation, and a large number of SNP markers that could be used in population

  2. Advances in tau-based drug discovery

    Science.gov (United States)

    Noble, Wendy; Pooler, Amy M.; Hanger, Diane P.

    2011-01-01

    Introduction Tauopathies, including Alzheimer’s disease (AD) and some frontotemporal dementias, are neurodegenerative diseases characterised by pathological lesions comprised of tau protein. There is currently a significant and urgent unmet need for disease-modifying therapies for these conditions and recently attention has turned to tau as a potential target for intervention. Areas covered Increasing evidence has highlighted pathways associated with tau-mediated neurodegeneration as important targets for drug development. Here, the authors review recently published papers in this area and summarise the genetic and pharmacological approaches that have shown efficacy in reducing tau-associated neurodegeneration. These include the use of agents to prevent abnormal tau processing and increase tau clearance, therapies targeting the immune system, and the manipulation of tau pre-mRNA to modify tau isoform expression. Expert opinion Several small molecule tau-based treatments are currently being assessed in clinical trials, the outcomes of which are eagerly awaited. Current evidence suggests that therapies targeting tau are likely, at least in part, to form the basis of an effective and safe treatment for Alzheimer’s disease and related neurodegenerative disorders in which tau deposition is evident. PMID:22003359

  3. Gene Expression Data Knowledge Discovery using Global and Local Clustering

    CERN Document Server

    H, Swathi

    2010-01-01

    To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. A validation technique, Figure of Merit is used to determine the quality of clustering results. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. To determine the quality of clustering results, a validation technique, Figure of Merit is used. Appropriate ...

  4. Literature mining for the discovery of hidden connections between drugs, genes and diseases.

    Science.gov (United States)

    Frijters, Raoul; van Vugt, Marianne; Smeets, Ruben; van Schaik, René; de Vlieg, Jacob; Alkema, Wynand

    2010-09-23

    The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs.

  5. Transposons for cancer gene discovery: Sleeping Beauty and beyond

    OpenAIRE

    Collier, Lara S.; Largaespada, David A

    2007-01-01

    The use of Sleeping Beauty transposons as somatic mutagens to discover cancer genes in hematopoietic tumors and sarcomas has been documented. Here, we discuss the future of Sleeping Beauty for cancer genetic studies and the potential use of additional transposable elements for somatic mutagenesis.

  6. Gene Discovery and Functional Analyses in the Model Plant Arabidopsis

    Institute of Scientific and Technical Information of China (English)

    Cai-Ping Feng; John Mundy

    2006-01-01

    The present mini-review describes newer methods and strategies, including transposon and T-DNA insertions,TILLING, Deleteagene, and RNA interference, to functionally analyze genes of interest in the model plant Arabidopsis. The relative advantages and disadvantages of the systems are also discussed.

  7. Gene Discovery and Functional Analyses in the Model Plant Arabidopsis

    DEFF Research Database (Denmark)

    Feng, Cai-ping; Mundy, J.

    2006-01-01

    The present mini-review describes newer methods and strategies, including transposon and T-DNA insertions, TILLING, Deleteagene, and RNA interference, to functionally analyze genes of interest in the model plant Arabidopsis. The relative advantages and disadvantages of the systems are also...

  8. Motif discovery in promoters of genes co-localized and co-expressed during myeloid cells differentiation

    Science.gov (United States)

    Coppe, Alessandro; Ferrari, Francesco; Bisognin, Andrea; Danieli, Gian Antonio; Ferrari, Sergio; Bicciato, Silvio; Bortoluzzi, Stefania

    2009-01-01

    Genes co-expressed may be under similar promoter-based and/or position-based regulation. Although data on expression, position and function of human genes are available, their true integration still represents a challenge for computational biology, hampering the identification of regulatory mechanisms. We carried out an integrative analysis of genomic position, functional annotation and promoters of genes expressed in myeloid cells. Promoter analysis was conducted by a novel multi-step method for discovering putative regulatory elements, i.e. over-represented motifs, in a selected set of promoters, as compared with a background model. The combination of transcriptional, structural and functional data allowed the identification of sets of promoters pertaining to groups of genes co-expressed and co-localized in regions of the human genome. The application of motif discovery to 26 groups of genes co-expressed in myeloid cells differentiation and co-localized in the genome showed that there are more over-represented motifs in promoters of co-expressed and co-localized genes than in promoters of simply co-expressed genes (CEG). Motifs, which are similar to the binding sequences of known transcription factors, non-uniformly distributed along promoter sequences and/or occurring in highly co-expressed subset of genes were identified. Co-expressed and co-localized gene sets were grouped in two co-expressed genomic meta-regions, putatively representing functional domains of a high-level expression regulation. PMID:19059999

  9. Improving functional modules discovery by enriching interaction networks with gene profiles

    KAUST Repository

    Salem, Saeed

    2013-05-01

    Recent advances in proteomic and transcriptomic technologies resulted in the accumulation of vast amount of high-throughput data that span multiple biological processes and characteristics in different organisms. Much of the data come in the form of interaction networks and mRNA expression arrays. An important task in systems biology is functional modules discovery where the goal is to uncover well-connected sub-networks (modules). These discovered modules help to unravel the underlying mechanisms of the observed biological processes. While most of the existing module discovery methods use only the interaction data, in this work we propose, CLARM, which discovers biological modules by incorporating gene profiles data with protein-protein interaction networks. We demonstrate the effectiveness of CLARM on Yeast and Human interaction datasets, and gene expression and molecular function profiles. Experiments on these real datasets show that the CLARM approach is competitive to well established functional module discovery methods.

  10. Profile-based short linear protein motif discovery

    Directory of Open Access Journals (Sweden)

    Haslam Niall J

    2012-05-01

    Full Text Available Abstract Background Short linear protein motifs are attracting increasing attention as functionally independent sites, typically 3–10 amino acids in length that are enriched in disordered regions of proteins. Multiple methods have recently been proposed to discover over-represented motifs within a set of proteins based on simple regular expressions. Here, we extend these approaches to profile-based methods, which provide a richer motif representation. Results The profile motif discovery method MEME performed relatively poorly for motifs in disordered regions of proteins. However, when we applied evolutionary weighting to account for redundancy amongst homologous proteins, and masked out poorly conserved regions of disordered proteins, the performance of MEME is equivalent to that of regular expression methods. However, the two approaches returned different subsets within both a benchmark dataset, and a more realistic discovery dataset. Conclusions Profile-based motif discovery methods complement regular expression based methods. Whilst profile-based methods are computationally more intensive, they are likely to discover motifs currently overlooked by regular expression methods.

  11. Grouped graphical Granger modeling for gene expression regulatory networks discovery

    OpenAIRE

    Lozano, Aurélie C.; Abe, Naoki; Yan LIU; Rosset, Saharon

    2009-01-01

    We consider the problem of discovering gene regulatory networks from time-series microarray data. Recently, graphical Granger modeling has gained considerable attention as a promising direction for addressing this problem. These methods apply graphical modeling methods on time-series data and invoke the notion of ‘Granger causality’ to make assertions on causality through inference on time-lagged effects. Existing algorithms, however, have neglected an important aspect of the problem—the grou...

  12. Mobility Prediction Based Neighborhood Discovery for Mobile Ad Hoc Networks

    OpenAIRE

    Li, Xu; Mitton, Nathalie; Simplot-Ryl, David

    2010-01-01

    Hello protocol is the basic technique for neighborhood discovery in wireless ad hoc networks. It requires nodes to claim their existence/aliveness by periodic `hello' messages. Central to any hello protocol is the determination of `hello' message transmission rate. No fixed optimal rate exists in the presence of node mobility. The rate should in fact adapt to it, high for high mobility and low for low mobility. In this paper, we propose a novel mobility prediction based hello protocol, named ...

  13. Web Service Description and Discovery Based on Semantic Model

    Institute of Scientific and Technical Information of China (English)

    YANG Xuemei; XU Lizhen; DONG Yisheng; WANG Yongli

    2006-01-01

    A novel semantic model of Web service description and discovery was proposed through an extension for profile model of Web ontology language for services (OWL-S) in this paper.Similarity matching of Web services was implemented through computing weighted summation of semantic similarity value based on specific domain ontology and dynamical satisfy extent evaluation for quality of service (QoS).Experiments show that the provided semantic matching model is efficient.

  14. Cross-pollination of research findings, although uncommon, may accelerate discovery of human disease genes

    Directory of Open Access Journals (Sweden)

    Duda Marlena

    2012-11-01

    Full Text Available Abstract Background Technological leaps in genome sequencing have resulted in a surge in discovery of human disease genes. These discoveries have led to increased clarity on the molecular pathology of disease and have also demonstrated considerable overlap in the genetic roots of human diseases. In light of this large genetic overlap, we tested whether cross-disease research approaches lead to faster, more impactful discoveries. Methods We leveraged several gene-disease association databases to calculate a Mutual Citation Score (MCS for 10,853 pairs of genetically related diseases to measure the frequency of cross-citation between research fields. To assess the importance of cooperative research, we computed an Individual Disease Cooperation Score (ICS and the average publication rate for each disease. Results For all disease pairs with one gene in common, we found that the degree of genetic overlap was a poor predictor of cooperation (r2=0.3198 and that the vast majority of disease pairs (89.56% never cited previous discoveries of the same gene in a different disease, irrespective of the level of genetic similarity between the diseases. A fraction (0.25% of the pairs demonstrated cross-citation in greater than 5% of their published genetic discoveries and 0.037% cross-referenced discoveries more than 10% of the time. We found strong positive correlations between ICS and publication rate (r2=0.7931, and an even stronger correlation between the publication rate and the number of cross-referenced diseases (r2=0.8585. These results suggested that cross-disease research may have the potential to yield novel discoveries at a faster pace than singular disease research. Conclusions Our findings suggest that the frequency of cross-disease study is low despite the high level of genetic similarity among many human diseases, and that collaborative methods may accelerate and increase the impact of new genetic discoveries. Until we have a better

  15. Pine Gene Discovery Project - Final Report - 08/31/1997 - 02/28/2001

    Energy Technology Data Exchange (ETDEWEB)

    Whetten, R. W.; Sederoff, R. R.; Kinlaw, C.; Retzel, E.

    2001-04-30

    Integration of pines into the large scope of plant biology research depends on study of pines in parallel with study of annual plants, and on availability of research materials from pine to plant biologists interested in comparing pine with annual plant systems. The objectives of the Pine Gene Discovery Project were to obtain 10,000 partial DNA sequences of genes expressed in loblolly pine, to determine which of those pine genes were similar to known genes from other organisms, and to make the DNA sequences and isolated pine genes available to plant researchers to stimulate integration of pines into the wider scope of plant biology research. Those objectives have been completed, and the results are available to the public. Requests for pine genes have been received from a number of laboratories that would otherwise not have included pine in their research, indicating that progress is being made toward the goal of integrating pine research into the larger molecular biology research community.

  16. Marfan Syndrome and Related Disorders: 25 Years of Gene Discovery.

    Science.gov (United States)

    Verstraeten, Aline; Alaerts, Maaike; Van Laer, Lut; Loeys, Bart

    2016-06-01

    Marfan syndrome (MFS) is a rare, autosomal-dominant, multisystem disorder, presenting with skeletal, ocular, skin, and cardiovascular symptoms. Significant clinical overlap with other systemic connective tissue diseases, including Loeys-Dietz syndrome (LDS), Shprintzen-Goldberg syndrome (SGS), and the MASS phenotype, has been documented. In MFS and LDS, the cardiovascular manifestations account for the major cause of patient morbidity and mortality, rendering them the main target for therapeutic intervention. Over the past decades, gene identification studies confidently linked the aforementioned syndromes, as well as nonsyndromic aneurysmal disease, to genetic defects in proteins related to the transforming growth factor (TGF)-β pathway, greatly expanding our knowledge on the disease mechanisms and providing us with novel therapeutic targets. As a result, the focus of the developing pharmacological treatment strategies is shifting from hemodynamic stress management to TGF-β antagonism. In this review, we discuss the insights that have been gained in the molecular biology of MFS and related disorders over the past 25 years. PMID:26919284

  17. A Wavelet-Based Approach to Pattern Discovery in Melodies

    DEFF Research Database (Denmark)

    Velarde, Gissel; Meredith, David; Weyde, Tillman

    2016-01-01

    We present a computational method for pattern discovery based on the application of the wavelet transform to symbolic representations of melodies or monophonic voices. We model the importance of a discovered pattern in terms of the compression ratio that can be achieved by using it to describe...... transform (CWT) at a single scale using the Haar wavelet. These representations are segmented using various approaches and the segments are then concatenated based on their similarity. The concatenated segments are compared, clustered and ranked. The method was evaluated on two musicological tasks...

  18. Data Mining and Knowledge Discovery via Logic-Based Methods

    CERN Document Server

    Triantaphyllou, Evangelos

    2010-01-01

    There are many approaches to data mining and knowledge discovery (DM&KD), including neural networks, closest neighbor methods, and various statistical methods. This monograph, however, focuses on the development and use of a novel approach, based on mathematical logic, that the author and his research associates have worked on over the last 20 years. The methods presented in the book deal with key DM&KD issues in an intuitive manner and in a natural sequence. Compared to other DM&KD methods, those based on mathematical logic offer a direct and often intuitive approach for extracting easily int

  19. Fragment-based approaches and computer-aided drug discovery.

    Science.gov (United States)

    Rognan, Didier

    2012-01-01

    Fragment-based design has significantly modified drug discovery strategies and paradigms in the last decade. Besides technological advances and novel therapeutic avenues, one of the most significant changes brought by this new discipline has occurred in the minds of drug designers. Fragment-based approaches have markedly impacted rational computer-aided design both in method development and in applications. The present review illustrates the importance of molecular fragments in many aspects of rational ligand design, and discusses how thinking in "fragment space" has boosted computational biology and chemistry. PMID:21710380

  20. ACFIS: a web server for fragment-based drug discovery.

    Science.gov (United States)

    Hao, Ge-Fei; Jiang, Wen; Ye, Yuan-Nong; Wu, Feng-Xu; Zhu, Xiao-Lei; Guo, Feng-Biao; Yang, Guang-Fu

    2016-07-01

    In order to foster innovation and improve the effectiveness of drug discovery, there is a considerable interest in exploring unknown 'chemical space' to identify new bioactive compounds with novel and diverse scaffolds. Hence, fragment-based drug discovery (FBDD) was developed rapidly due to its advanced expansive search for 'chemical space', which can lead to a higher hit rate and ligand efficiency (LE). However, computational screening of fragments is always hampered by the promiscuous binding model. In this study, we developed a new web server Auto Core Fragment in silico Screening (ACFIS). It includes three computational modules, PARA_GEN, CORE_GEN and CAND_GEN. ACFIS can generate core fragment structure from the active molecule using fragment deconstruction analysis and perform in silico screening by growing fragments to the junction of core fragment structure. An integrated energy calculation rapidly identifies which fragments fit the binding site of a protein. We constructed a simple interface to enable users to view top-ranking molecules in 2D and the binding mode in 3D for further experimental exploration. This makes the ACFIS a highly valuable tool for drug discovery. The ACFIS web server is free and open to all users at http://chemyang.ccnu.edu.cn/ccb/server/ACFIS/. PMID:27150808

  1. From mouse to humans: discovery of the CACNG2 pain susceptibility gene.

    Science.gov (United States)

    Nissenbaum, J

    2012-10-01

    Chronic pain is a major healthcare problem affecting the daily lives of millions with enormous financial costs. The notorious variability and lack of efficient pain relief pharmaceuticals provide both genetic and therapeutic challenge. There are several genetic approaches that aim to uncover the molecular nature of pain phenotypes into their genetic components. Gene mapping using model organisms for various pain phenotypes has led to the identification of novel genes affecting susceptibility and response to pain stimuli. Translational studies have succeeded to tie those genes to human pain syndromes, thus suggesting new targets for drug discovery. In this short review, a perspective on pain genetics and the trajectory from pain phenotype to pain gene involving fine-mapping strategies, bioinformatic analysis and microarray profiling alongside human association analysis will be introduced. This integrated approach has led to identification of CACNG2 as a novel neuropathic pain gene affecting pain susceptibility both in mice and humans. It also serves as a prototype for efficient and economic discovery of pain genes. Comparisons to other methods as well as future directions of pain genetics will be discussed as well.

  2. Melody-based knowledge discovery in musical pieces

    Science.gov (United States)

    Rybnik, Mariusz; Jastrzebska, Agnieszka

    2016-06-01

    The paper is focused on automated knowledge discovery in musical pieces, based on transformations of digital musical notation. Usually a single musical piece is analyzed, to discover the structure as well as traits of separate voices. Melody and rhythm is processed with the use of three proposed operators, that serve as meta-data. In this work we focus on melody, so the processed data is labeled using fuzzy labels, created for detecting various voice characteristics. A comparative analysis of two musical pieces may be performed as well, that compares them in terms of various rhythmic or melodic traits (as a whole or with voice separation).

  3. A Metadata based Knowledge Discovery Methodology for Seeding Translational Research.

    Science.gov (United States)

    Kothari, Cartik R; Payne, Philip R O

    2015-01-01

    In this paper, we present a semantic, metadata based knowledge discovery methodology for identifying teams of researchers from diverse backgrounds who can collaborate on interdisciplinary research projects: projects in areas that have been identified as high-impact areas at The Ohio State University. This methodology involves the semantic annotation of keywords and the postulation of semantic metrics to improve the efficiency of the path exploration algorithm as well as to rank the results. Results indicate that our methodology can discover groups of experts from diverse areas who can collaborate on translational research projects.

  4. Tales of one gene discovery of a novel candidate receptor in mammalian taste

    OpenAIRE

    Huang, Angela Lilly

    2007-01-01

    There are five basic taste modalities in mammals: bitter, sweet, sour, salty, and Umami (taste of MSG and L-amino acids). Receptors for bitter, sweet, and Umami were previously discovered. Identities of receptors for salty and sour taste modalities remained elusive. In this dissertation, I will present: 1) development of a novel bioinformatics screen to discover candidate receptors; 2) discovery of a novel gene, PKD2L1, in taste receptor cells; 3) evidence demonstrating PKD2L1-expressing tast...

  5. Syn-lethality: an integrative knowledge base of synthetic lethality towards discovery of selective anticancer therapies.

    Science.gov (United States)

    Li, Xue-juan; Mishra, Shital K; Wu, Min; Zhang, Fan; Zheng, Jie

    2014-01-01

    Synthetic lethality (SL) is a novel strategy for anticancer therapies, whereby mutations of two genes will kill a cell but mutation of a single gene will not. Therefore, a cancer-specific mutation combined with a drug-induced mutation, if they have SL interactions, will selectively kill cancer cells. While numerous SL interactions have been identified in yeast, only a few have been known in human. There is a pressing need to systematically discover and understand SL interactions specific to human cancer. In this paper, we present Syn-Lethality, the first integrative knowledge base of SL that is dedicated to human cancer. It integrates experimentally discovered and verified human SL gene pairs into a network, associated with annotations of gene function, pathway, and molecular mechanisms. It also includes yeast SL genes from high-throughput screenings which are mapped to orthologous human genes. Such an integrative knowledge base, organized as a relational database with user interface for searching and network visualization, will greatly expedite the discovery of novel anticancer drug targets based on synthetic lethality interactions. The database can be downloaded as a stand-alone Java application.

  6. Aptamer-based multiplexed proteomic technology for biomarker discovery.

    Directory of Open Access Journals (Sweden)

    Larry Gold

    Full Text Available BACKGROUND: The interrogation of proteomes ("proteomics" in a highly multiplexed and efficient manner remains a coveted and challenging goal in biology and medicine. METHODOLOGY/PRINCIPAL FINDINGS: We present a new aptamer-based proteomic technology for biomarker discovery capable of simultaneously measuring thousands of proteins from small sample volumes (15 µL of serum or plasma. Our current assay measures 813 proteins with low limits of detection (1 pM median, 7 logs of overall dynamic range (~100 fM-1 µM, and 5% median coefficient of variation. This technology is enabled by a new generation of aptamers that contain chemically modified nucleotides, which greatly expand the physicochemical diversity of the large randomized nucleic acid libraries from which the aptamers are selected. Proteins in complex matrices such as plasma are measured with a process that transforms a signature of protein concentrations into a corresponding signature of DNA aptamer concentrations, which is quantified on a DNA microarray. Our assay takes advantage of the dual nature of aptamers as both folded protein-binding entities with defined shapes and unique nucleotide sequences recognizable by specific hybridization probes. To demonstrate the utility of our proteomics biomarker discovery technology, we applied it to a clinical study of chronic kidney disease (CKD. We identified two well known CKD biomarkers as well as an additional 58 potential CKD biomarkers. These results demonstrate the potential utility of our technology to rapidly discover unique protein signatures characteristic of various disease states. CONCLUSIONS/SIGNIFICANCE: We describe a versatile and powerful tool that allows large-scale comparison of proteome profiles among discrete populations. This unbiased and highly multiplexed search engine will enable the discovery of novel biomarkers in a manner that is unencumbered by our incomplete knowledge of biology, thereby helping to advance the next

  7. Theme discovery from gene lists for identification and viewing of multiple functional groups

    Directory of Open Access Journals (Sweden)

    Wong Garry

    2005-06-01

    Full Text Available Abstract Background High throughput methods of the genome era produce vast amounts of data in the form of gene lists. These lists are large and difficult to interpret without advanced computational or bioinformatic tools. Most existing methods analyse a gene list as a single entity although it is comprised of multiple gene groups associated with separate biological functions. Therefore it is imperative to define and visualize gene groups with unique functionality within gene lists. Results In order to analyse the functional heterogeneity within a gene list, we have developed a method that clusters genes to groups with homogenous functionalities. The method uses Non-negative Matrix Factorization (NMF to create several clustering results with varying numbers of clusters. The obtained clustering results are combined into a simple graphical presentation showing the functional groups over-represented in the analyzed gene list. We demonstrate its performance on two data sets and show results that improve upon existing methods. The comparison also shows that our method creates a more simplified view that aids in discovery of biological themes within the list and discards less informative classes from the results. Conclusion The presented method and associated software are useful for the identification and interpretation of biological functions associated with gene lists and are especially useful for the analysis of large lists.

  8. Knowledge based cluster ensemble for cancer discovery from biomolecular data.

    Science.gov (United States)

    Yu, Zhiwen; Wongb, Hau-San; You, Jane; Yang, Qinmin; Liao, Hongying

    2011-06-01

    The adoption of microarray techniques in biological and medical research provides a new way for cancer diagnosis and treatment. In order to perform successful diagnosis and treatment of cancer, discovering and classifying cancer types correctly is essential. Class discovery is one of the most important tasks in cancer classification using biomolecular data. Most of the existing works adopt single clustering algorithms to perform class discovery from biomolecular data. However, single clustering algorithms have limitations, which include a lack of robustness, stability, and accuracy. In this paper, we propose a new cluster ensemble approach called knowledge based cluster ensemble (KCE) which incorporates the prior knowledge of the data sets into the cluster ensemble framework. Specifically, KCE represents the prior knowledge of a data set in the form of pairwise constraints. Then, the spectral clustering algorithm (SC) is adopted to generate a set of clustering solutions. Next, KCE transforms pairwise constraints into confidence factors for these clustering solutions. After that, a consensus matrix is constructed by considering all the clustering solutions and their corresponding confidence factors. The final clustering result is obtained by partitioning the consensus matrix. Comparison with single clustering algorithms and conventional cluster ensemble approaches, knowledge based cluster ensemble approaches are more robust, stable and accurate. The experiments on cancer data sets show that: 1) KCE works well on these data sets; 2) KCE not only outperforms most of the state-of-the-art single clustering algorithms, but also outperforms most of the state-of-the-art cluster ensemble approaches.

  9. Targeting metalloproteins by fragment-based lead discovery.

    Science.gov (United States)

    Johnson, Sherida; Barile, Elisa; Farina, Biancamaria; Purves, Angela; Wei, Jun; Chen, Li-Hsing; Shiryaev, Sergey; Zhang, Ziming; Rodionova, Irina; Agrawal, Arpita; Cohen, Seth M; Osterman, Andrei; Strongin, Alex; Pellecchia, Maurizio

    2011-08-01

    It has been estimated that nearly one-third of functional proteins contain a metal ion. These constitute a wide variety of possible drug targets including metalloproteinases, dehydrogenases, oxidoreductases, hydrolases, deacetylases, or many others in which the metal ion is either of catalytic or of structural nature. Despite the predominant role of a metal ion in so many classes of drug targets, current high-throughput screening techniques do not usually produce viable hits against these proteins, likely due to the lack of proper metal-binding pharmacophores in the current screening libraries. Herein, we describe a novel fragment-based drug discovery approach using a metal-targeting fragment library that is based on a variety of distinct classes of metal-binding groups designed to reliably anchor the fragments at the target's metal ions. We show that the approach can effectively identify novel, potent and selective agents that can be readily developed into metalloprotein-targeted therapeutics.

  10. Discovery and development of DNA methylation-based biomarkers for lung cancer.

    Science.gov (United States)

    Walter, Kimberly; Holcomb, Thomas; Januario, Tom; Yauch, Robert L; Du, Pan; Bourgon, Richard; Seshagiri, Somasekar; Amler, Lukas C; Hampton, Garret M; S Shames, David

    2014-02-01

    Lung cancer remains the primary cause of cancer-related deaths worldwide. Improved tools for early detection and therapeutic stratification would be expected to increase the survival rate for this disease. Alterations in the molecular pathways that drive lung cancer, which include epigenetic modifications, may provide biomarkers to help address this major unmet clinical need. Epigenetic changes, which are defined as heritable changes in gene expression that do not alter the primary DNA sequence, are one of the hallmarks of cancer, and prevalent in all types of cancer. These modifications represent a rich source of biomarkers that have the potential to be implemented in clinical practice. This perspective describes recent advances in the discovery of epigenetic biomarkers in lung cancer, specifically those that result in the methylation of DNA at CpG sites. We discuss one approach for methylation-based biomarker assay development that describes the discovery at a genome-scale level, which addresses some of the practical considerations for design of assays that can be implemented in the clinic. We emphasize that an integrated technological approach will enable the development of clinically useful DNA methylation-based biomarker assays. While this article focuses on current literature and primary research findings in lung cancer, the principles we describe here apply to the discovery and development of epigenetic biomarkers for other types of cancer.

  11. Parallel Density-Based Clustering for Discovery of Ionospheric Phenomena

    Science.gov (United States)

    Pankratius, V.; Gowanlock, M.; Blair, D. M.

    2015-12-01

    Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).

  12. Systematic discovery of unannotated genes in 11 yeast species using a database of orthologous genomic segments

    LENUS (Irish Health Repository)

    OhEigeartaigh, Sean S

    2011-07-26

    Abstract Background In standard BLAST searches, no information other than the sequences of the query and the database entries is considered. However, in situations where two genes from different species have only borderline similarity in a BLAST search, the discovery that the genes are located within a region of conserved gene order (synteny) can provide additional evidence that they are orthologs. Thus, for interpreting borderline search results, it would be useful to know whether the syntenic context of a database hit is similar to that of the query. This principle has often been used in investigations of particular genes or genomic regions, but to our knowledge it has never been implemented systematically. Results We made use of the synteny information contained in the Yeast Gene Order Browser database for 11 yeast species to carry out a systematic search for protein-coding genes that were overlooked in the original annotations of one or more yeast genomes but which are syntenic with their orthologs. Such genes tend to have been overlooked because they are short, highly divergent, or contain introns. The key features of our software - called SearchDOGS - are that the database entries are classified into sets of genomic segments that are already known to be orthologous, and that very weak BLAST hits are retained for further analysis if their genomic location is similar to that of the query. Using SearchDOGS we identified 595 additional protein-coding genes among the 11 yeast species, including two new genes in Saccharomyces cerevisiae. We found additional genes for the mating pheromone a-factor in six species including Kluyveromyces lactis. Conclusions SearchDOGS has proven highly successful for identifying overlooked genes in the yeast genomes. We anticipate that our approach can be adapted for study of further groups of species, such as bacterial genomes. More generally, the concept of doing sequence similarity searches against databases to which external

  13. Music snippet extraction via melody-based repeated pattern discovery

    Institute of Scientific and Technical Information of China (English)

    XU JiePing; ZHAO Yang; CHEN Zhe; LIU ZiLi

    2009-01-01

    In this paper, we present a complete set of procedures to automatically extract a music snippet, defined as the most representative or the highlighted excerpt of a music clip. We first generate a modified and compact similarity matrix based on selected features and distance metrics, and then several improved techniques for music repeated pattern discovery are utilized because a music snippet is usually a part of the repeated melody, main theme or chorus. During the process, redundant and wrongly detected patterns are discarded, boundaries are corrected using beat information, and final clusters are also further sorted according to the occurrence frequency and energy information. Subsequently, following our methods, we designed a music snippet extraction system which allows users to detect snippets. Experiments performed on the system show the superiority of our proposed approach.

  14. Gene Discovery of Modular Diterpene Metabolism in Nonmodel Systems1[W][OA

    Science.gov (United States)

    Zerbe, Philipp; Hamberger, Björn; Yuen, Macaire M.S.; Chiang, Angela; Sandhu, Harpreet K.; Madilao, Lina L.; Nguyen, Anh; Hamberger, Britta; Bach, Søren Spanner; Bohlmann, Jörg

    2013-01-01

    Plants produce over 10,000 different diterpenes of specialized (secondary) metabolism, and fewer diterpenes of general (primary) metabolism. Specialized diterpenes may have functions in ecological interactions of plants with other organisms and also benefit humanity as pharmaceuticals, fragrances, resins, and other industrial bioproducts. Examples of high-value diterpenes are taxol and forskolin pharmaceuticals or ambroxide fragrances. Yields and purity of diterpenes obtained from natural sources or by chemical synthesis are often insufficient for large-volume or high-end applications. Improvement of agricultural or biotechnological diterpene production requires knowledge of biosynthetic genes and enzymes. However, specialized diterpene pathways are extremely diverse across the plant kingdom, and most specialized diterpenes are taxonomically restricted to a few plant species, genera, or families. Consequently, there is no single reference system to guide gene discovery and rapid annotation of specialized diterpene pathways. Functional diversification of genes and plasticity of enzyme functions of these pathways further complicate correct annotation. To address this challenge, we used a set of 10 different plant species to develop a general strategy for diterpene gene discovery in nonmodel systems. The approach combines metabolite-guided transcriptome resources, custom diterpene synthase (diTPS) and cytochrome P450 reference gene databases, phylogenies, and, as shown for select diTPSs, single and coupled enzyme assays using microbial and plant expression systems. In the 10 species, we identified 46 new diTPS candidates and over 400 putatively terpenoid-related P450s in a resource of nearly 1 million predicted transcripts of diterpene-accumulating tissues. Phylogenetic patterns of lineage-specific blooms of genes guided functional characterization. PMID:23613273

  15. Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems

    Directory of Open Access Journals (Sweden)

    Chung I-Fang

    2008-10-01

    Full Text Available Abstract Background The Signal-to-Noise-Ratio (SNR is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs. These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems. Results and discussion To evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer. For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC, Nearest Mean Classifier (NMC, Support Vector Machine (SVM classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA and one-vs-one (OVO strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the

  16. FORGE Canada Consortium: outcomes of a 2-year national rare-disease gene-discovery project.

    Science.gov (United States)

    Beaulieu, Chandree L; Majewski, Jacek; Schwartzentruber, Jeremy; Samuels, Mark E; Fernandez, Bridget A; Bernier, Francois P; Brudno, Michael; Knoppers, Bartha; Marcadier, Janet; Dyment, David; Adam, Shelin; Bulman, Dennis E; Jones, Steve J M; Avard, Denise; Nguyen, Minh Thu; Rousseau, Francois; Marshall, Christian; Wintle, Richard F; Shen, Yaoqing; Scherer, Stephen W; Friedman, Jan M; Michaud, Jacques L; Boycott, Kym M

    2014-06-01

    Inherited monogenic disease has an enormous impact on the well-being of children and their families. Over half of the children living with one of these conditions are without a molecular diagnosis because of the rarity of the disease, the marked clinical heterogeneity, and the reality that there are thousands of rare diseases for which causative mutations have yet to be identified. It is in this context that in 2010 a Canadian consortium was formed to rapidly identify mutations causing a wide spectrum of pediatric-onset rare diseases by using whole-exome sequencing. The FORGE (Finding of Rare Disease Genes) Canada Consortium brought together clinicians and scientists from 21 genetics centers and three science and technology innovation centers from across Canada. From nation-wide requests for proposals, 264 disorders were selected for study from the 371 submitted; disease-causing variants (including in 67 genes not previously associated with human disease; 41 of these have been genetically or functionally validated, and 26 are currently under study) were identified for 146 disorders over a 2-year period. Here, we present our experience with four strategies employed for gene discovery and discuss FORGE's impact in a number of realms, from clinical diagnostics to the broadening of the phenotypic spectrum of many diseases to the biological insight gained into both disease states and normal human development. Lastly, on the basis of this experience, we discuss the way forward for rare-disease genetic discovery both in Canada and internationally.

  17. MAGIC Database and Interfaces: An Integrated Package for Gene Discovery and Expression

    Directory of Open Access Journals (Sweden)

    Lee H. Pratt

    2006-03-01

    Full Text Available The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs, and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.

  18. Evaluation of gene association methods for coexpression network construction and biological knowledge discovery.

    Directory of Open Access Journals (Sweden)

    Sapna Kumari

    Full Text Available BACKGROUND: Constructing coexpression networks and performing network analysis using large-scale gene expression data sets is an effective way to uncover new biological knowledge; however, the methods used for gene association in constructing these coexpression networks have not been thoroughly evaluated. Since different methods lead to structurally different coexpression networks and provide different information, selecting the optimal gene association method is critical. METHODS AND RESULTS: In this study, we compared eight gene association methods - Spearman rank correlation, Weighted Rank Correlation, Kendall, Hoeffding's D measure, Theil-Sen, Rank Theil-Sen, Distance Covariance, and Pearson - and focused on their true knowledge discovery rates in associating pathway genes and construction coordination networks of regulatory genes. We also examined the behaviors of different methods to microarray data with different properties, and whether the biological processes affect the efficiency of different methods. CONCLUSIONS: We found that the Spearman, Hoeffding and Kendall methods are effective in identifying coexpressed pathway genes, whereas the Theil-sen, Rank Theil-Sen, Spearman, and Weighted Rank methods perform well in identifying coordinated transcription factors that control the same biological processes and traits. Surprisingly, the widely used Pearson method is generally less efficient, and so is the Distance Covariance method that can find gene pairs of multiple relationships. Some analyses we did clearly show Pearson and Distance Covariance methods have distinct behaviors as compared to all other six methods. The efficiencies of different methods vary with the data properties to some degree and are largely contingent upon the biological processes, which necessitates the pre-analysis to identify the best performing method for gene association and coexpression network construction.

  19. An improved procedure for gene selection from microarray experiments using false discovery rate criterion

    Directory of Open Access Journals (Sweden)

    Yang Mark CK

    2006-01-01

    Full Text Available Abstract Background A large number of genes usually show differential expressions in a microarray experiment with two types of tissues, and the p-values of a proper statistical test are often used to quantify the significance of these differences. The genes with small p-values are then picked as the genes responsible for the differences in the tissue RNA expressions. One key question is what should be the threshold to consider the p-values small. There is always a trade off between this threshold and the rate of false claims. Recent statistical literature shows that the false discovery rate (FDR criterion is a powerful and reasonable criterion to pick those genes with differential expression. Moreover, the power of detection can be increased by knowing the number of non-differential expression genes. While this number is unknown in practice, there are methods to estimate it from data. The purpose of this paper is to present a new method of estimating this number and use it for the FDR procedure construction. Results A combination of test functions is used to estimate the number of differentially expressed genes. Simulation study shows that the proposed method has a higher power to detect these genes than other existing methods, while still keeping the FDR under control. The improvement can be substantial if the proportion of true differentially expressed genes is large. This procedure has also been tested with good results using a real dataset. Conclusion For a given expected FDR, the method proposed in this paper has better power to pick genes that show differentiation in their expression than two other well known methods.

  20. Evolutionary signatures amongst disease genes permit novel methods for gene prioritization and construction of informative gene-based networks.

    Directory of Open Access Journals (Sweden)

    Nolan Priedigkeit

    2015-02-01

    Full Text Available Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC, is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC's ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting "disease map" network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung's disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks.

  1. Cancer Biomarker Discovery: Lectin-Based Strategies Targeting Glycoproteins

    Directory of Open Access Journals (Sweden)

    David Clark

    2012-01-01

    Full Text Available Biomarker discovery can identify molecular markers in various cancers that can be used for detection, screening, diagnosis, and monitoring of disease progression. Lectin-affinity is a technique that can be used for the enrichment of glycoproteins from a complex sample, facilitating the discovery of novel cancer biomarkers associated with a disease state.

  2. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species.

    Science.gov (United States)

    de Bruijn, Irene; de Kock, Maarten J D; Yang, Meng; de Waard, Pieter; van Beek, Teris A; Raaijmakers, Jos M

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these predictions, however, are untested and the association between genome sequence and biological function of the predicted metabolite is lacking. Here we report the genome-based identification of previously unknown CLP gene clusters in plant pathogenic Pseudomonas syringae strains B728a and DC3000 and in plant beneficial Pseudomonas fluorescens Pf0-1 and SBW25. For P. fluorescens SBW25, a model strain in studying bacterial evolution and adaptation, the structure of the CLP with a predicted 9-amino acid peptide moiety was confirmed by chemical analyses. Mutagenesis confirmed that the three identified NRPS genes are essential for CLP synthesis in strain SBW25. CLP production was shown to play a key role in motility, biofilm formation and in activity of SBW25 against zoospores of Phytophthora infestans. This is the first time that an antimicrobial metabolite is identified from strain SBW25. The results indicate that genome mining may enable the discovery of unknown gene clusters and traits that are highly relevant in the lifestyle of plant beneficial and plant pathogenic bacteria.

  3. Proxy-Based IPv6 Neighbor Discovery Scheme for Wireless LAN Based Mesh Networks

    Science.gov (United States)

    Lee, Jihoon; Jeon, Seungwoo; Kim, Jaehoon

    Multi-hop Wireless LAN-based mesh network (WMN) provides high capacity and self-configuring capabilities. Due to data forwarding and path selection based on MAC address, WMN requires additional operations to achieve global connectivity using IPv6 address. The neighbor discovery operation over WLAN mesh networks requires repeated all-node broadcasting and this gives rise to a big burden in the entire mesh networks. In this letter, we propose the proxy neighbor discovery scheme for optimized IPv6 communication over WMN to reduce network overhead and communication latency. Using simulation experiments, we show that the control overhead and communication setup latency can be significantly reduced using the proxy-based neighbor discovery mechanism.

  4. How might we increase success in marine-based drug discovery?

    Science.gov (United States)

    Desbois, Andrew P

    2014-09-01

    Drug discovery from marine organisms has been underway for > 60 years and there have been notable successes in discovering, developing and introducing clinical agents derived from marine sources. Such examples include: the analgesic ziconotide and the anti cancer compound trabectedin. However, in light of the pressing need for new drugs, particularly those with anti-infective and anticancer properties, there is strong justification for increased exploration of marine organisms as sources of novel compounds. This article considers approaches that might enhance our chances of delivering new medicines from marine-based drug discovery efforts. Consideration is given to the organisms and habitats deserving of more attention and how we might make best use of these marine genetic resources. In particular, the opportunities offered by synthetic biology are highlighted because these methods allow drug discoverers to explore pathways in 'non-culturable' species and turn on natural product biosynthesis genes that are difficult to activate under laboratory conditions (so-called 'silent' gene clusters). PMID:24909595

  5. PiggyBac transposon mutagenesis: a tool for cancer gene discovery in mice.

    Science.gov (United States)

    Rad, Roland; Rad, Lena; Wang, Wei; Cadinanos, Juan; Vassiliou, George; Rice, Stephen; Campos, Lia S; Yusa, Kosuke; Banerjee, Ruby; Li, Meng Amy; de la Rosa, Jorge; Strong, Alexander; Lu, Dong; Ellis, Peter; Conte, Nathalie; Yang, Fang Tang; Liu, Pentao; Bradley, Allan

    2010-11-19

    Transposons are mobile DNA segments that can disrupt gene function by inserting in or near genes. Here, we show that insertional mutagenesis by the PiggyBac transposon can be used for cancer gene discovery in mice. PiggyBac transposition in genetically engineered transposon-transposase mice induced cancers whose type (hematopoietic versus solid) and latency were dependent on the regulatory elements introduced into transposons. Analysis of 63 hematopoietic tumors revealed that PiggyBac is capable of genome-wide mutagenesis. The PiggyBac screen uncovered many cancer genes not identified in previous retroviral or Sleeping Beauty transposon screens, including Spic, which encodes a PU.1-related transcription factor, and Hdac7, a histone deacetylase gene. PiggyBac and Sleeping Beauty have different integration preferences. To maximize the utility of the tool, we engineered 21 mouse lines to be compatible with both transposon systems in constitutive, tissue- or temporal-specific mutagenesis. Mice with different transposon types, copy numbers, and chromosomal locations support wide applicability. PMID:20947725

  6. Next-generation diagnostics and disease-gene discovery with the Exomiser.

    Science.gov (United States)

    Smedley, Damian; Jacobsen, Julius O B; Jäger, Marten; Köhler, Sebastian; Holtgrewe, Manuel; Schubach, Max; Siragusa, Enrico; Zemojtel, Tomasz; Buske, Orion J; Washington, Nicole L; Bone, William P; Haendel, Melissa A; Robinson, Peter N

    2015-12-01

    Exomiser is an application that prioritizes genes and variants in next-generation sequencing (NGS) projects for novel disease-gene discovery or differential diagnostics of Mendelian disease. Exomiser comprises a suite of algorithms for prioritizing exome sequences using random-walk analysis of protein interaction networks, clinical relevance and cross-species phenotype comparisons, as well as a wide range of other computational filters for variant frequency, predicted pathogenicity and pedigree analysis. In this protocol, we provide a detailed explanation of how to install Exomiser and use it to prioritize exome sequences in a number of scenarios. Exomiser requires ∼3 GB of RAM and roughly 15-90 s of computing time on a standard desktop computer to analyze a variant call format (VCF) file. Exomiser is freely available for academic use from http://www.sanger.ac.uk/science/tools/exomiser.

  7. TargetMine, an integrated data warehouse for candidate gene prioritisation and target discovery.

    Directory of Open Access Journals (Sweden)

    Yi-An Chen

    Full Text Available Prioritising candidate genes for further experimental characterisation is a non-trivial challenge in drug discovery and biomedical research in general. An integrated approach that combines results from multiple data types is best suited for optimal target selection. We developed TargetMine, a data warehouse for efficient target prioritisation. TargetMine utilises the InterMine framework, with new data models such as protein-DNA interactions integrated in a novel way. It enables complicated searches that are difficult to perform with existing tools and it also offers integration of custom annotations and in-house experimental data. We proposed an objective protocol for target prioritisation using TargetMine and set up a benchmarking procedure to evaluate its performance. The results show that the protocol can identify known disease-associated genes with high precision and coverage. A demonstration version of TargetMine is available at http://targetmine.nibio.go.jp/.

  8. A new evaluation methodology for literature-based discovery systems.

    Science.gov (United States)

    Yetisgen-Yildiz, Meliha; Pratt, Wanda

    2009-08-01

    While medical researchers formulate new hypotheses to test, they need to identify connections to their work from other parts of the medical literature. However, the current volume of information has become a great barrier for this task. Recently, many literature-based discovery (LBD) systems have been developed to help researchers identify new knowledge that bridges gaps across distinct sections of the medical literature. Each LBD system uses different methods for mining the connections from text and ranking the identified connections, but none of the currently available LBD evaluation approaches can be used to compare the effectiveness of these methods. In this paper, we present an evaluation methodology for LBD systems that allows comparisons across different systems. We demonstrate the abilities of our evaluation methodology by using it to compare the performance of different correlation-mining and ranking approaches used by existing LBD systems. This evaluation methodology should help other researchers compare approaches, make informed algorithm choices, and ultimately help to improve the performance of LBD systems overall.

  9. Metabolomics-based discovery of diagnostic biomarkers for onchocerciasis.

    Directory of Open Access Journals (Sweden)

    Judith R Denery

    Full Text Available BACKGROUND: Development of robust, sensitive, and reproducible diagnostic tests for understanding the epidemiology of neglected tropical diseases is an integral aspect of the success of worldwide control and elimination programs. In the treatment of onchocerciasis, clinical diagnostics that can function in an elimination scenario are non-existent and desperately needed. Due to its sensitivity and quantitative reproducibility, liquid chromatography-mass spectrometry (LC-MS based metabolomics is a powerful approach to this problem. METHODOLOGY/PRINCIPAL FINDINGS: Analysis of an African sample set comprised of 73 serum and plasma samples revealed a set of 14 biomarkers that showed excellent discrimination between Onchocerca volvulus-positive and negative individuals by multivariate statistical analysis. Application of this biomarker set to an additional sample set from onchocerciasis endemic areas where long-term ivermectin treatment has been successful revealed that the biomarker set may also distinguish individuals with worms of compromised viability from those with active infection. Machine learning extended the utility of the biomarker set from a complex multivariate analysis to a binary format applicable for adaptation to a field-based diagnostic, validating the use of complex data mining tools applied to infectious disease biomarker discovery and diagnostic development. CONCLUSIONS/SIGNIFICANCE: An LC-MS metabolomics-based diagnostic has the potential to monitor the progression of onchocerciasis in both endemic and non-endemic geographic areas, as well as provide an essential tool to multinational programs in the ongoing fight against this neglected tropical disease. Ultimately this technology can be expanded for the diagnosis of other filarial and/or neglected tropical diseases.

  10. GalenOWL: Ontology-based drug recommendations discovery

    Directory of Open Access Journals (Sweden)

    Doulaverakis Charalampos

    2012-12-01

    Full Text Available Abstract Background Identification of drug-drug and drug-diseases interactions can pose a difficult problem to cope with, as the increasingly large number of available drugs coupled with the ongoing research activities in the pharmaceutical domain, make the task of discovering relevant information difficult. Although international standards, such as the ICD-10 classification and the UNII registration, have been developed in order to enable efficient knowledge sharing, medical staff needs to be constantly updated in order to effectively discover drug interactions before prescription. The use of Semantic Web technologies has been proposed in earlier works, in order to tackle this problem. Results This work presents a semantic-enabled online service, named GalenOWL, capable of offering real time drug-drug and drug-diseases interaction discovery. For enabling this kind of service, medical information and terminology had to be translated to ontological terms and be appropriately coupled with medical knowledge of the field. International standards such as the aforementioned ICD-10 and UNII, provide the backbone of the common representation of medical data, while the medical knowledge of drug interactions is represented by a rule base which makes use of the aforementioned standards. Details of the system architecture are presented while also giving an outline of the difficulties that had to be overcome. A comparison of the developed ontology-based system with a similar system developed using a traditional business logic rule engine is performed, giving insights on the advantages and drawbacks of both implementations. Conclusions The use of Semantic Web technologies has been found to be a good match for developing drug recommendation systems. Ontologies can effectively encapsulate medical knowledge and rule-based reasoning can capture and encode the drug interactions knowledge.

  11. INTELLIGENT SEARCH ENGINE-BASED UNIVERSAL DESCRIPTION, DISCOVERY AND INTEGRATION FOR WEB SERVICE DISCOVERY

    Directory of Open Access Journals (Sweden)

    Tamilarasi Karuppiah

    2014-01-01

    Full Text Available Web Services standard has been broadly acknowledged by industries and academic researches along with the progress of web technology and e-business. Increasing number of web applications have been bundled as web services that can be published, positioned and invoked across the web. The importance of the issues regarding their publication and innovation attains a maximum as web services multiply and become more advanced and mutually dependent. With the intension of determining the web services through effiective manner with in the minimum time period in this study proposes an UDDI with intelligent serach engine. In order to publishing and discovering web services initially, the web services are published in the UDDI registry subsequently the published web services are indexed. To improve the efficiency of discovery of web services, the indexed web services are saved as index database. The search query is compared with the index database for discovering of web services and the discovered web services are given to the service customer. The way of accessing the web services is stored in a log file, which is then utilized to provide personalized web services to the user. The finding of web service is enhanced significantly by means of an efficient exploring capability provided by the proposed system and it is accomplished of providing the maximum appropriate web service. Universal Description, Discovery and Integration (UDDI.

  12. Leveraging gene-environment interactions and endotypes for asthma gene discovery.

    Science.gov (United States)

    Bønnelykke, Klaus; Ober, Carole

    2016-03-01

    Asthma is a heterogeneous clinical syndrome that includes subtypes of disease with different underlying causes and disease mechanisms. Asthma is caused by a complex interaction between genes and environmental exposures; early-life exposures in particular play an important role. Asthma is also heritable, and a number of susceptibility variants have been discovered in genome-wide association studies, although the known risk alleles explain only a small proportion of the heritability. In this review, we present evidence supporting the hypothesis that focusing on more specific asthma phenotypes, such as childhood asthma with severe exacerbations, and on relevant exposures that are involved in gene-environment interactions (GEIs), such as rhinovirus infections, will improve detection of asthma genes and our understanding of the underlying mechanisms. We will discuss the challenges of considering GEIs and the advantages of studying responses to asthma-associated exposures in clinical birth cohorts, as well as in cell models of GEIs, to dissect the context-specific nature of genotypic risks, to prioritize variants in genome-wide association studies, and to identify pathways involved in pathogenesis in subgroups of patients. We propose that such approaches, in spite of their many challenges, present great opportunities for better understanding of asthma pathogenesis and heterogeneity and, ultimately, for improving prevention and treatment of disease.

  13. Leveraging gene-environment interactions and endotypes for asthma gene discovery.

    Science.gov (United States)

    Bønnelykke, Klaus; Ober, Carole

    2016-03-01

    Asthma is a heterogeneous clinical syndrome that includes subtypes of disease with different underlying causes and disease mechanisms. Asthma is caused by a complex interaction between genes and environmental exposures; early-life exposures in particular play an important role. Asthma is also heritable, and a number of susceptibility variants have been discovered in genome-wide association studies, although the known risk alleles explain only a small proportion of the heritability. In this review, we present evidence supporting the hypothesis that focusing on more specific asthma phenotypes, such as childhood asthma with severe exacerbations, and on relevant exposures that are involved in gene-environment interactions (GEIs), such as rhinovirus infections, will improve detection of asthma genes and our understanding of the underlying mechanisms. We will discuss the challenges of considering GEIs and the advantages of studying responses to asthma-associated exposures in clinical birth cohorts, as well as in cell models of GEIs, to dissect the context-specific nature of genotypic risks, to prioritize variants in genome-wide association studies, and to identify pathways involved in pathogenesis in subgroups of patients. We propose that such approaches, in spite of their many challenges, present great opportunities for better understanding of asthma pathogenesis and heterogeneity and, ultimately, for improving prevention and treatment of disease. PMID:26947980

  14. Reconstructing Sessions from Data Discovery and Access Logs to Build a Semantic Knowledge Base for Improving Data Discovery

    Directory of Open Access Journals (Sweden)

    Yongyao Jiang

    2016-04-01

    Full Text Available Big geospatial data are archived and made available through online web discovery and access. However, finding the right data for scientific research and application development is still a challenge. This paper aims to improve the data discovery by mining the user knowledge from log files. Specifically, user web session reconstruction is focused upon in this paper as a critical step for extracting usage patterns. However, reconstructing user sessions from raw web logs has always been difficult, as a session identifier tends to be missing in most data portals. To address this problem, we propose two session identification methods, including time-clustering-based and time-referrer-based methods. We also present the workflow of session reconstruction and discuss the approach of selecting appropriate thresholds for relevant steps in the workflow. The proposed session identification methods and workflow are proven to be able to extract data access patterns for further pattern analyses of user behavior and improvement of data discovery for more relevancy data ranking, suggestion, and navigation.

  15. A New Algorithm of Service Discovery Based on DHT for Mobile Application

    Directory of Open Access Journals (Sweden)

    De-gan Zhang

    2011-10-01

    Full Text Available In order to solve how to enhance the discovery efficiency and coverage, based on DHT (Distributed Hash Table and Small World Theory, we put forward a new algorithm of service discovery for mobile application. In traditional DHT discovery algorithm, each node maintains the finger-table that store node information of adjacent node. By using Small-World Theory, we put forward adding a remote node into the finger-table and adding the corresponding remote index. It is different from selecting the remote connection node randomly. We select the remote connection node by calculating local node and it can assure not only the cove range of service discovery but also not increase the length of finger-table, which simplifies the calculation of the finger-table and maintenance work. The simulation proved that the algorithm can reduce the path length of service discovery effectively, improve success rate of service discovery

  16. Discovery of core biotic stress responsive genes in Arabidopsis by weighted gene co-expression network analysis.

    Science.gov (United States)

    Amrine, Katherine C H; Blanco-Ulate, Barbara; Cantu, Dario

    2015-01-01

    Intricate signal networks and transcriptional regulators translate the recognition of pathogens into defense responses. In this study, we carried out a gene co-expression analysis of all currently publicly available microarray data, which were generated in experiments that studied the interaction of the model plant Arabidopsis thaliana with microbial pathogens. This work was conducted to identify (i) modules of functionally related co-expressed genes that are differentially expressed in response to multiple biotic stresses, and (ii) hub genes that may function as core regulators of disease responses. Using Weighted Gene Co-expression Network Analysis (WGCNA) we constructed an undirected network leveraging a rich curated expression dataset comprising 272 microarrays that involved microbial infections of Arabidopsis plants with a wide array of fungal and bacterial pathogens with biotrophic, hemibiotrophic, and necrotrophic lifestyles. WGCNA produced a network with scale-free and small-world properties composed of 205 distinct clusters of co-expressed genes. Modules of functionally related co-expressed genes that are differentially regulated in response to multiple pathogens were identified by integrating differential gene expression testing with functional enrichment analyses of gene ontology terms, known disease associated genes, transcriptional regulators, and cis-regulatory elements. The significance of functional enrichments was validated by comparisons with randomly generated networks. Network topology was then analyzed to identify intra- and inter-modular gene hubs. Based on high connectivity, and centrality in meta-modules that are clearly enriched in defense responses, we propose a list of 66 target genes for reverse genetic experiments to further dissect the Arabidopsis immune system. Our results show that statistical-based data trimming prior to network analysis allows the integration of expression datasets generated by different groups, under different

  17. Discovery of core biotic stress responsive genes in Arabidopsis by weighted gene co-expression network analysis.

    Directory of Open Access Journals (Sweden)

    Katherine C H Amrine

    Full Text Available Intricate signal networks and transcriptional regulators translate the recognition of pathogens into defense responses. In this study, we carried out a gene co-expression analysis of all currently publicly available microarray data, which were generated in experiments that studied the interaction of the model plant Arabidopsis thaliana with microbial pathogens. This work was conducted to identify (i modules of functionally related co-expressed genes that are differentially expressed in response to multiple biotic stresses, and (ii hub genes that may function as core regulators of disease responses. Using Weighted Gene Co-expression Network Analysis (WGCNA we constructed an undirected network leveraging a rich curated expression dataset comprising 272 microarrays that involved microbial infections of Arabidopsis plants with a wide array of fungal and bacterial pathogens with biotrophic, hemibiotrophic, and necrotrophic lifestyles. WGCNA produced a network with scale-free and small-world properties composed of 205 distinct clusters of co-expressed genes. Modules of functionally related co-expressed genes that are differentially regulated in response to multiple pathogens were identified by integrating differential gene expression testing with functional enrichment analyses of gene ontology terms, known disease associated genes, transcriptional regulators, and cis-regulatory elements. The significance of functional enrichments was validated by comparisons with randomly generated networks. Network topology was then analyzed to identify intra- and inter-modular gene hubs. Based on high connectivity, and centrality in meta-modules that are clearly enriched in defense responses, we propose a list of 66 target genes for reverse genetic experiments to further dissect the Arabidopsis immune system. Our results show that statistical-based data trimming prior to network analysis allows the integration of expression datasets generated by different groups

  18. Phylogenomic Analysis of Natural Products Biosynthetic Gene Clusters Allows Discovery of Arseno-Organic Metabolites in Model Streptomycetes.

    Science.gov (United States)

    Cruz-Morales, Pablo; Kopp, Johannes Florian; Martínez-Guerrero, Christian; Yáñez-Guerra, Luis Alfonso; Selem-Mojica, Nelly; Ramos-Aboites, Hilda; Feldmann, Jörg; Barona-Gómez, Francisco

    2016-01-01

    Natural products from microbes have provided humans with beneficial antibiotics for millennia. However, a decline in the pace of antibiotic discovery exerts pressure on human health as antibiotic resistance spreads, a challenge that may better faced by unveiling chemical diversity produced by microbes. Current microbial genome mining approaches have revitalized research into antibiotics, but the empirical nature of these methods limits the chemical space that is explored.Here, we address the problem of finding novel pathways by incorporating evolutionary principles into genome mining. We recapitulated the evolutionary history of twenty-three enzyme families previously uninvestigated in the context of natural product biosynthesis in Actinobacteria, the most proficient producers of natural products. Our genome evolutionary analyses where based on the assumption that expanded-repurposed enzyme families-from central metabolism, occur frequently and thus have the potential to catalyze new conversions in the context of natural products biosynthesis. Our analyses led to the discovery of biosynthetic gene clusters coding for hidden chemical diversity, as validated by comparing our predictions with those from state-of-the-art genome mining tools; as well as experimentally demonstrating the existence of a biosynthetic pathway for arseno-organic metabolites in Streptomyces coelicolor and Streptomyces lividans, Using a gene knockout and metabolite profile combined strategy.As our approach does not rely solely on sequence similarity searches of previously identified biosynthetic enzymes, these results establish the basis for the development of an evolutionary-driven genome mining tool termed EvoMining that complements current platforms. We anticipate that by doing so real 'chemical dark matter' will be unveiled. PMID:27289100

  19. Phylogenomic Analysis of Natural Products Biosynthetic Gene Clusters Allows Discovery of Arseno-Organic Metabolites in Model Streptomycetes

    Science.gov (United States)

    Cruz-Morales, Pablo; Kopp, Johannes Florian; Martínez-Guerrero, Christian; Yáñez-Guerra, Luis Alfonso; Selem-Mojica, Nelly; Ramos-Aboites, Hilda; Feldmann, Jörg; Barona-Gómez, Francisco

    2016-01-01

    Natural products from microbes have provided humans with beneficial antibiotics for millennia. However, a decline in the pace of antibiotic discovery exerts pressure on human health as antibiotic resistance spreads, a challenge that may better faced by unveiling chemical diversity produced by microbes. Current microbial genome mining approaches have revitalized research into antibiotics, but the empirical nature of these methods limits the chemical space that is explored. Here, we address the problem of finding novel pathways by incorporating evolutionary principles into genome mining. We recapitulated the evolutionary history of twenty-three enzyme families previously uninvestigated in the context of natural product biosynthesis in Actinobacteria, the most proficient producers of natural products. Our genome evolutionary analyses where based on the assumption that expanded—repurposed enzyme families—from central metabolism, occur frequently and thus have the potential to catalyze new conversions in the context of natural products biosynthesis. Our analyses led to the discovery of biosynthetic gene clusters coding for hidden chemical diversity, as validated by comparing our predictions with those from state-of-the-art genome mining tools; as well as experimentally demonstrating the existence of a biosynthetic pathway for arseno-organic metabolites in Streptomyces coelicolor and Streptomyces lividans, Using a gene knockout and metabolite profile combined strategy. As our approach does not rely solely on sequence similarity searches of previously identified biosynthetic enzymes, these results establish the basis for the development of an evolutionary-driven genome mining tool termed EvoMining that complements current platforms. We anticipate that by doing so real ‘chemical dark matter’ will be unveiled. PMID:27289100

  20. Phylogenomic Analysis of Natural Products Biosynthetic Gene Clusters Allows Discovery of Arseno-Organic Metabolites in Model Streptomycetes.

    Science.gov (United States)

    Cruz-Morales, Pablo; Kopp, Johannes Florian; Martínez-Guerrero, Christian; Yáñez-Guerra, Luis Alfonso; Selem-Mojica, Nelly; Ramos-Aboites, Hilda; Feldmann, Jörg; Barona-Gómez, Francisco

    2016-01-01

    Natural products from microbes have provided humans with beneficial antibiotics for millennia. However, a decline in the pace of antibiotic discovery exerts pressure on human health as antibiotic resistance spreads, a challenge that may better faced by unveiling chemical diversity produced by microbes. Current microbial genome mining approaches have revitalized research into antibiotics, but the empirical nature of these methods limits the chemical space that is explored.Here, we address the problem of finding novel pathways by incorporating evolutionary principles into genome mining. We recapitulated the evolutionary history of twenty-three enzyme families previously uninvestigated in the context of natural product biosynthesis in Actinobacteria, the most proficient producers of natural products. Our genome evolutionary analyses where based on the assumption that expanded-repurposed enzyme families-from central metabolism, occur frequently and thus have the potential to catalyze new conversions in the context of natural products biosynthesis. Our analyses led to the discovery of biosynthetic gene clusters coding for hidden chemical diversity, as validated by comparing our predictions with those from state-of-the-art genome mining tools; as well as experimentally demonstrating the existence of a biosynthetic pathway for arseno-organic metabolites in Streptomyces coelicolor and Streptomyces lividans, Using a gene knockout and metabolite profile combined strategy.As our approach does not rely solely on sequence similarity searches of previously identified biosynthetic enzymes, these results establish the basis for the development of an evolutionary-driven genome mining tool termed EvoMining that complements current platforms. We anticipate that by doing so real 'chemical dark matter' will be unveiled.

  1. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    Directory of Open Access Journals (Sweden)

    Khan Shafiq A

    2003-06-01

    Full Text Available Abstract Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells.

  2. Wi-Fi Protocol Vulnerability Discovery Based on Fuzzy Testing

    Directory of Open Access Journals (Sweden)

    Kunhua Zhu

    2013-08-01

    Full Text Available To detect the wireless network equipment whether there is protocol vulnerability, using the method of modular design and implementation of a new suitable for Wi-Fi protocol vulnerability discovery fuzzy test framework. It can be independent of its transmission medium, produce deformity packet and implementation of the attack on the target system. The author firstly describes the wireless network protocol vulnerability discovery and fuzzy test in this paper,then focused on the test frame technical scheme, detailed technical realization and so on, and its application are analyzed. In the experimental stage the fuzzy test is applied to a wireless networks gateway, the test results show that the fuzzy test framework can be well applied to the wireless network equipment agreement loophole mining work.  

  3. Mobility Prediction Based Neighborhood Discovery in Mobile Ad Hoc Networks.

    OpenAIRE

    Li, Xu; Mitton, Nathalie; Simplot-Ryl, David

    2011-01-01

    International audience Hello protocol is the basic technique for neighborhood discovery in wireless ad hoc networks. It requires nodes to claim their existence/ aliveness by periodic 'hello' messages. Central to a hello protocol is the determination of 'hello' message transmission rate. No fixed optimal rate exists in the presence of node mobility. The rate should in fact adapt to it, high for high mobility and low for low mobility. In this paper, we propose a novel mobility prediction bas...

  4. Topological and functional discovery in a gene coexpression meta-network of gastric cancer.

    Science.gov (United States)

    Aggarwal, Amit; Guo, Dong Li; Hoshida, Yujin; Yuen, Siu Tsan; Chu, Kent-Man; So, Samuel; Boussioutas, Alex; Chen, Xin; Bowtell, David; Aburatani, Hiroyuki; Leung, Suet Yi; Tan, Patrick

    2006-01-01

    Gastric cancer is a leading cause of global cancer mortality, but comparatively little is known about the cellular pathways regulating different aspects of the gastric cancer phenotype. To achieve a better understanding of gastric cancer at the levels of systems topology, functional modules, and constituent genes, we assembled and systematically analyzed a consensus gene coexpression meta-network of gastric cancer incorporating >300 tissue samples from four independent patient populations (the "gastrome"). We find that the gastrome exhibits a hierarchical scale-free architecture, with an internal structure comprising multiple deeply embedded modules associated with diverse cellular functions. Individual modules display distinct subtopologies, with some (cellular proliferation) being integrated within the primary network, and others (ribosomal biosynthesis) being relatively isolated. One module associated with intestinal differentiation exhibited a remarkably high degree of autonomy, raising the possibility that its specific topological features may contribute towards the frequent occurrence of intestinal metaplasia in gastric cancer. At the single-gene level, we discovered a novel conserved interaction between the PLA2G2A prognostic marker and the EphB2 receptor, and used tissue microarrays to validate the PLA2G2A/EphB2 association. Finally, because EphB2 is a known target of the Wnt signaling pathway, we tested and provide evidence that the Wnt pathway may also similarly regulate PLA2G2A. Many of these findings were not discernible by studying the single patient populations in isolation. Thus, besides enhancing our knowledge of gastric cancer, our results show the broad utility of applying meta-analytic approaches to genome-wide data for the purposes of biological discovery. PMID:16397236

  5. Community structure discovery method based on the Gaussian kernel similarity matrix

    Science.gov (United States)

    Guo, Chonghui; Zhao, Haipeng

    2012-03-01

    Community structure discovery in complex networks is a popular issue, and overlapping community structure discovery in academic research has become one of the hot spots. Based on the Gaussian kernel similarity matrix and spectral bisection, this paper proposes a new community structure discovery method. First, by adjusting the Gaussian kernel parameter to change the scale of similarity, we can find the corresponding non-overlapping community structure when the value of the modularity is the largest relatively. Second, the changes of the Gaussian kernel parameter would lead to the unstable nodes jumping off, so with a slight change in method of non-overlapping community discovery, we can find the overlapping community nodes. Finally, synthetic data, karate club and political books datasets are used to test the proposed method, comparing with some other community discovery methods, to demonstrate the feasibility and effectiveness of this method.

  6. Biochemical genomics for gene discovery in benzylisoquinoline alkaloid biosynthesis in opium poppy and related species.

    Science.gov (United States)

    Dang, Thu Thuy T; Onoyovwi, Akpevwe; Farrow, Scott C; Facchini, Peter J

    2012-01-01

    Benzylisoquinoline alkaloids (BIAs) are a large, diverse group of ∼2500 specialized plant metabolites. Many BIAs display potent pharmacological activities, including the narcotic analgesics codeine and morphine, the vasodilator papaverine, the cough suppressant and potential anticancer drug noscapine, the antimicrobial agents sanguinarine and berberine, and the muscle relaxant (+)-tubocurarine. Opium poppy remains the sole commercial source for codeine, morphine, and a variety of semisynthetic drugs, including oxycodone and buprenorphine, derived primarily from the biosynthetic pathway intermediate thebaine. Recent advances in transcriptomics, proteomics, and metabolomics have created unprecedented opportunities for isolating and characterizing novel BIA biosynthetic genes. Here, we describe the application of next-generation sequencing and cDNA microarrays for selecting gene candidates based on comparative transcriptome analysis. We outline the basic mass spectrometric techniques to perform deep proteome and targeted metabolite analyses on BIA-producing plant tissues and provide methodologies for functionally characterizing biosynthetic gene candidates through in vitro enzyme assays and transient gene silencing in planta. PMID:22999177

  7. Adeno-associated virus at 50: a golden anniversary of discovery, research, and gene therapy success--a personal perspective.

    Science.gov (United States)

    Hastie, Eric; Samulski, R Jude

    2015-05-01

    Fifty years after the discovery of adeno-associated virus (AAV) and more than 30 years after the first gene transfer experiment was conducted, dozens of gene therapy clinical trials are in progress, one vector is approved for use in Europe, and breakthroughs in virus modification and disease modeling are paving the way for a revolution in the treatment of rare diseases, cancer, as well as HIV. This review will provide a historical perspective on the progression of AAV for gene therapy from discovery to the clinic, focusing on contributions from the Samulski lab regarding basic science and cloning of AAV, optimized large-scale production of vectors, preclinical large animal studies and safety data, vector modifications for improved efficacy, and successful clinical applications.

  8. Natural genetic variation in cassava (Manihot esculenta Crantz) landraces as a tool for gene discovery

    International Nuclear Information System (INIS)

    Cassava landraces are the earliest form of the modern cultivars and represents the first step in cassava domestication. Our forward genetic analysis uses this resource to discover spontaneous mutations in the sucrose/starch and carotenoid synthesis/accumulation and to develop both evolutionary and breeding perspective of gene function related to those traits. Biochemical phenotype variants for the synthesis and accumulation of carotenoid, free sugar and starch were identified. Six subtractive cDNA libraries were prepared to construct a high quality (phred > 20) EST database with 1645 entries. Macroarray analysis was performed to identify differentially expressed gene aiming to identify candidate gene related to sugary phenotype. cDNA sequence for gene coding for specific enzymes in the two pathways were obtained. Gene expression analysis for coding specific enzymes was performed by RNA blot and Real Time PCR analysis. Chromoplastassociated proteins of yellow storage root were fractionated and a peptide sequence data base with 906 entries sequences (MASCOT validated) was constructed. For the sucrose/starch metabolism a sugary class of cassava was identified carrying mutation in the BEI and GBSS mutation. For the pigmented cassava a pink color phenotype showed absence of expression of the gene CasLYB while an intense yellow phenotype showed a down regulation of the gene CasHYb. Heat shock proteins were identified as the major proteins associated with chromoplast. Genetic diversity for the GBSS gene in the natural population identified 22 haplotype and a large nucleotide diversity in four subset of population. Single segregating population derived from F2, half sib and S1 population showed segregation for sugary phenotype (93% of the individuals), waxy phenotype (38% of the individuals) and glycogen like starch (2% of the individuals). Here we summarize our current results for the genetic analysis of this variants and recent progress in the direction of mapping of

  9. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Directory of Open Access Journals (Sweden)

    Landfors Mattias

    2010-10-01

    background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data.

  10. Gene invasion in distant eukaryotic lineages: discovery of mutually exclusive genetic elements reveals marine biodiversity.

    Science.gov (United States)

    Monier, Adam; Sudek, Sebastian; Fast, Naomi M; Worden, Alexandra Z

    2013-09-01

    Inteins are rare, translated genetic parasites mainly found in bacteria and archaea, while spliceosomal introns are distinctly eukaryotic features abundant in most nuclear genomes. Using targeted metagenomics, we discovered an intein in an Atlantic population of the photosynthetic eukaryote, Bathycoccus, harbored by the essential spliceosomal protein PRP8 (processing factor 8 protein). Although previously thought exclusive to fungi, we also identified PRP8 inteins in parasitic (Capsaspora) and predatory (Salpingoeca) protists. Most new PRP8 inteins were at novel insertion sites that, surprisingly, were not in the most conserved regions of the gene. Evolutionarily, Dikarya fungal inteins at PRP8 insertion site a appeared more related to the Bathycoccus intein at a unique insertion site, than to other fungal and opisthokont inteins. Strikingly, independent analyses of Pacific and Atlantic samples revealed an intron at the same codon as the Bathycoccus PRP8 intein. The two elements are mutually exclusive and neither was found in cultured Bathycoccus or other picoprasinophyte genomes. Thus, wild Bathycoccus contain one of few non-fungal eukaryotic inteins known and a rare polymorphic intron. Our data indicate at least two Bathycoccus ecotypes exist, associated respectively with oceanic or mesotrophic environments. We hypothesize that intein propagation is facilitated by marine viruses; and, while intron gain is still poorly understood, presence of a spliceosomal intron where a locus lacks an intein raises the possibility of new, intein-primed mechanisms for intron gain. The discovery of nucleus-encoded inteins and associated sequence polymorphisms in uncultivated marine eukaryotes highlights their diversity and reveals potential sexual boundaries between populations indistinguishable by common marker genes. PMID:23635865

  11. Paradigm of tunable clustering using Binarization of Consensus Partition Matrices (Bi-CoPaM for gene discovery.

    Directory of Open Access Journals (Sweden)

    Basel Abu-Jamous

    Full Text Available Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM, which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM. The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.

  12. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function.

    Directory of Open Access Journals (Sweden)

    Jian-Bo Pan

    Full Text Available Pattern genes are a group of genes that have a modularized expression behavior under serial physiological conditions. The identification of pattern genes will provide a path toward a global and dynamic understanding of gene functions and their roles in particular biological processes or events, such as development and pathogenesis. In this study, we present PaGenBase, a novel repository for the collection of tissue- and time-specific pattern genes, including specific genes, selective genes, housekeeping genes and repressed genes. The PaGenBase database is now freely accessible at http://bioinf.xmu.edu.cn/PaGenBase/. In the current version (PaGenBase 1.0, the database contains 906,599 pattern genes derived from the literature or from data mining of more than 1,145,277 gene expression profiles in 1,062 distinct samples collected from 11 model organisms. Four statistical parameters were used to quantitatively evaluate the pattern genes. Moreover, three methods (quick search, advanced search and browse were designed for rapid and customized data retrieval. The potential applications of PaGenBase are also briefly described. In summary, PaGenBase will serve as a resource for the global and dynamic understanding of gene function and will facilitate high-level investigations in a variety of fields, including the study of development, pathogenesis and novel drug discovery.

  13. Location Discovery Based on Fuzzy Geometry in Passive Sensor Networks

    Directory of Open Access Journals (Sweden)

    Rui Wang

    2011-01-01

    Full Text Available Location discovery with uncertainty using passive sensor networks in the nation's power grid is known to be challenging, due to the massive scale and inherent complexity. For bearings-only target localization in passive sensor networks, the approach of fuzzy geometry is introduced to investigate the fuzzy measurability for a moving target in R2 space. The fuzzy analytical bias expressions and the geometrical constraints are derived for bearings-only target localization. The interplay between fuzzy geometry of target localization and the fuzzy estimation bias for the case of fuzzy linear observer trajectory is analyzed in detail in sensor networks, which can realize the 3-dimensional localization including fuzzy estimate position and velocity of the target by measuring the fuzzy azimuth angles at intervals of fixed time. Simulation results show that the resulting estimate position outperforms the traditional least squares approach for localization with uncertainty.

  14. An Affinity Propagation-Based DNA Motif Discovery Algorithm.

    Science.gov (United States)

    Sun, Chunxiao; Huo, Hongwei; Yu, Qiang; Guo, Haitao; Sun, Zhigang

    2015-01-01

    The planted (l, d) motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.

  15. An Affinity Propagation-Based DNA Motif Discovery Algorithm

    Directory of Open Access Journals (Sweden)

    Chunxiao Sun

    2015-01-01

    Full Text Available The planted (l,d motif search (PMS is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.

  16. DNA microarray-based mutation discovery and genotyping.

    Science.gov (United States)

    Gresham, David

    2011-01-01

    DNA microarrays provide an efficient means of identifying single-nucleotide polymorphisms (SNPs) in DNA samples and characterizing their frequencies in individual and mixed samples. We have studied the parameters that determine the sensitivity of DNA probes to SNPs and found that the melting temperature (T (m)) of the probe is the primary determinant of probe sensitivity. An isothermal-melting temperature DNA microarray design, in which the T (m) of all probes is tightly distributed, can be implemented by varying the length of DNA probes within a single DNA microarray. I describe guidelines for designing isothermal-melting temperature DNA microarrays and protocols for labeling and hybridizing DNA samples to DNA microarrays for SNP discovery, genotyping, and quantitative determination of allele frequencies in mixed samples.

  17. An Evaluation of Active Learning Causal Discovery Methods for Reverse-Engineering Local Causal Pathways of Gene Regulation.

    Science.gov (United States)

    Ma, Sisi; Kemmeren, Patrick; Aliferis, Constantin F; Statnikov, Alexander

    2016-01-01

    Reverse-engineering of causal pathways that implicate diseases and vital cellular functions is a fundamental problem in biomedicine. Discovery of the local causal pathway of a target variable (that consists of its direct causes and direct effects) is essential for effective intervention and can facilitate accurate diagnosis and prognosis. Recent research has provided several active learning methods that can leverage passively observed high-throughput data to draft causal pathways and then refine the inferred relations with a limited number of experiments. The current study provides a comprehensive evaluation of the performance of active learning methods for local causal pathway discovery in real biological data. Specifically, 54 active learning methods/variants from 3 families of algorithms were applied for local causal pathways reconstruction of gene regulation for 5 transcription factors in S. cerevisiae. Four aspects of the methods' performance were assessed, including adjacency discovery quality, edge orientation accuracy, complete pathway discovery quality, and experimental cost. The results of this study show that some methods provide significant performance benefits over others and therefore should be routinely used for local causal pathway discovery tasks. This study also demonstrates the feasibility of local causal pathway reconstruction in real biological systems with significant quality and low experimental cost.

  18. A novel approach to the discovery of survival biomarkers in glioblastoma using a joint analysis of DNA methylation and gene expression.

    Science.gov (United States)

    Smith, Ashley A; Huang, Yen-Tsung; Eliot, Melissa; Houseman, E Andres; Marsit, Carmen J; Wiencke, John K; Kelsey, Karl T

    2014-06-01

    Glioblastoma multiforme (GBM) is the most aggressive of all brain tumors, with a median survival of less than 1.5 years. Recently, epigenetic alterations were found to play key roles in both glioma genesis and clinical outcome, demonstrating the need to integrate genetic and epigenetic data in predictive models. To enhance current models through discovery of novel predictive biomarkers, we employed a genome-wide, agnostic strategy to specifically capture both methylation-directed changes in gene expression and alternative associations of DNA methylation with disease survival in glioma. Human GBM-associated DNA methylation, gene expression, IDH1 mutation status, and survival data were obtained from The Cancer Genome Atlas. DNA methylation loci and expression probes were paired by gene, and their subsequent association with survival was determined by applying an accelerated failure time model to previously published alternative and expression-based association equations. Significant associations were seen in 27 unique methylation/expression pairs with expression-based, alternative, and combinatorial associations observed (10, 13, and 4 pairs, respectively). The majority of the predictive DNA methylation loci were located within CpG islands, and all but three of the locus pairs were negatively correlated with survival. This finding suggests that for most loci, methylation/expression pairs are inversely related, consistent with methylation-associated gene regulatory action. Our results indicate that changes in DNA methylation are associated with altered survival outcome through both coordinated changes in gene expression and alternative mechanisms. Furthermore, our approach offers an alternative method of biomarker discovery using a priori gene pairing and precise targeting to identify novel sites for locus-specific therapeutic intervention.

  19. De novo Assembly and Characterization of the Transcriptome of Broomcorn Millet (Panicum miliaceum L.) for Gene Discovery and Marker Development.

    Science.gov (United States)

    Yue, Hong; Wang, Le; Liu, Hui; Yue, Wenjie; Du, Xianghong; Song, Weining; Nie, Xiaojun

    2016-01-01

    Broomcorn millet (Panicum miliaceum L.) is one of the world's oldest cultivated cereals, which is well-adapted to extreme environments such as drought, heat, and salinity with an efficient C4 carbon fixation. Discovery and identification of genes involved in these processes will provide valuable information to improve the crop for meeting the challenge of global climate change. However, the lack of genetic resources and genomic information make gene discovery and molecular mechanism studies very difficult. Here, we sequenced and assembled the transcriptome of broomcorn millet using Illumina sequencing technology. After sequencing, a total of 45,406,730 and 51,160,820 clean paired-end reads were obtained for two genotypes Yumi No. 2 and Yumi No. 3. These reads were mixed and then assembled into 113,643 unigenes, with the length ranging from 351 to 15,691 bp, of which 62,543 contings could be assigned to 315 gene ontology (GO) categories. Cluster of orthologous groups and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses assigned could map 15,514 unigenes into 202 KEGG pathways and 51,020 unigenes to 25 COG categories, respectively. Furthermore, 35,216 simple sequence repeats (SSRs) were identified in 27,055 unigene sequences, of which trinucleotides were the most abundant repeat unit, accounting for 66.72% of SSRs. In addition, 292 differentially expressed genes were identified between the two genotypes, which were significantly enriched in 88 GO terms and 12 KEGG pathways. Finally, the expression patterns of four selected transcripts were validated through quantitative reverse transcription polymerase chain reaction analysis. Our study for the first time sequenced and assembled the transcriptome of broomcorn millet, which not only provided a rich sequence resource for gene discovery and marker development in this important crop, but will also facilitate the further investigation of the molecular mechanism of its favored agronomic traits and beyond. PMID

  20. An Integrated Approach to Gene Discovery and Marker Development in Atlantic Cod (Gadus morhua)

    OpenAIRE

    Bowman, Sharen; Hubert, Sophie; Higgins, Brent; Stone, Cynthia; Kimball, Jennifer; Borza, Tudor; Bussey, Jillian Tarrant; Simpson, Gary; Kozera, Catherine; Curtis, Bruce A.; Hall, Jennifer R.; Hori, Tiago S.; Feng, Charles Y.; Rise, Marlies; Booman, Marije

    2010-01-01

    Atlantic cod is a species that has been overexploited by the capture fishery. Programs to domesticate this species are underway in several countries, including Canada, to provide an alternative route for production. Selective breeding programs have been successfully applied in the domestication of other species, with genomics-based approaches used to augment conventional methods of animal production in recent years. Genomics tools, such as gene sequences and sets of variable markers, also hav...

  1. Applications of Fiberoptics-Based Nanosensors to Drug Discovery

    Science.gov (United States)

    Vo-Dinh, Tuan; Scaffidi, Jonathan; Gregas, Molly; Zhang, Yan; Seewaldt, Victoria

    2013-01-01

    Background Fiber-optic nanosensors are fabricated by heating and pulling optical fibers to yield sub-micron diameter tips, and have been used for in vitro analysis of individual living mammalian cells. Immobilization of bioreceptors (e.g., antibodies, peptides, DNA, etc) selective to target analyte molecules of interest provides molecular specificity. Excitation light can be launched into the fiber, and the resulting evanescent field at the tip of the nanofiber can be used to excite target molecules bound to the bioreceptor molecules. The fluorescence or surface-enhanced Raman scattering produced by the analyte molecules is detected using an ultra-sensitive photodetector. Objective This article provides an overview of the development and application of fiber-optic nanosensors for drug discovery. Conclusions The nanosensors provide minimally invasive tools to probe sub-cellular compartments inside single living cells for health effect studies (e.g., detection of benzopyrene adducts) and medical applications (e.g., monitoring of apoptosis in cells treated with anti-cancer drugs). PMID:23496274

  2. Informatics-Based Discovery of Disease-Associated Immune Profiles

    Science.gov (United States)

    Delmas, Amber; Oikonomopoulos, Angelos; Lacey, Precious N.; Fallahi, Mohammad; Hommes, Daniel W.; Sundrud, Mark S.

    2016-01-01

    Advances in flow and mass cytometry are enabling ultra-high resolution immune profiling in mice and humans on an unprecedented scale. However, the resulting high-content datasets challenge traditional views of cytometry data, which are both limited in scope and biased by pre-existing hypotheses. Computational solutions are now emerging (e.g., Citrus, AutoGate, SPADE) that automate cell gating or enable visualization of relative subset abundance within healthy versus diseased mice or humans. Yet these tools require significant computational fluency and fail to show quantitative relationships between discrete immune phenotypes and continuous disease variables. Here we describe a simple informatics platform that uses hierarchical clustering and nearest neighbor algorithms to associate manually gated immune phenotypes with clinical or pre-clinical disease endpoints of interest in a rapid and unbiased manner. Using this approach, we identify discrete immune profiles that correspond with either weight loss or histologic colitis in a T cell transfer model of inflammatory bowel disease (IBD), and show distinct nodes of immune dysregulation in the IBDs, Crohn’s disease and ulcerative colitis. This streamlined informatics approach for cytometry data analysis leverages publicly available software, can be applied to manually or computationally gated cytometry data, is suitable for any clinical or pre-clinical setting, and embraces ultra-high content flow and mass cytometry as a discovery engine. PMID:27669154

  3. Discovery of tetrahydroisoquinoline-based CXCR4 antagonists.

    Science.gov (United States)

    Truax, Valarie M; Zhao, Huanyu; Katzman, Brooke M; Prosser, Anthony R; Alcaraz, Ana A; Saindane, Manohar T; Howard, Randy B; Culver, Deborah; Arrendale, Richard F; Gruddanti, Prahbakar R; Evers, Taylor J; Natchus, Michael G; Snyder, James P; Liotta, Dennis C; Wilson, Lawrence J

    2013-11-14

    A de novo hit-to-lead effort involving the redesign of benzimidazole-containing antagonists of the CXCR4 receptor resulted in the discovery of a novel series of 1,2,3,4-tetrahydroisoquinoline (TIQ) analogues. In general, this series of compounds show good potencies (3-650 nM) in assays involving CXCR4 function, including both inhibition of attachment of X4 HIV-1IIIB virus in MAGI-CCR5/CXCR4 cells and inhibition of calcium release in Chem-1 cells. Series profiling permitted the identification of TIQ-(R)-stereoisomer 15 as a potent and selective CXCR4 antagonist lead candidate with a promising in vitro profile. The drug-like properties of 15 were determined in ADME in vitro studies, revealing low metabolic liability potential. Further in vivo evaluations included pharmacokinetic experiments in rats and mice, where 15 was shown to have oral bioavailability (F = 63%) and resulted in the mobilization of white blood cells (WBCs) in a dose-dependent manner. PMID:24936240

  4. De novo transcriptomic analysis of peripheral blood lymphocytes from the Chinese goose: gene discovery and immune system pathway description.

    Directory of Open Access Journals (Sweden)

    Mansoor Tariq

    Full Text Available The Chinese goose is one of the most economically important poultry birds and is a natural reservoir for many avian viruses. However, the nature and regulation of the innate and adaptive immune systems of this waterfowl species are not completely understood due to limited information on the goose genome. Recently, transcriptome sequencing technology was applied in the genomic studies focused on novel gene discovery. Thus, this study described the transcriptome of the goose peripheral blood lymphocytes to identify immunity relevant genes.De novo transcriptome assembly of the goose peripheral blood lymphocytes was sequenced by Illumina-Solexa technology. In total, 211,198 unigenes were assembled from the 69.36 million cleaned reads. The average length, N50 size and the maximum length of the assembled unigenes were 687 bp, 1,298 bp and 18,992 bp, respectively. A total of 36,854 unigenes showed similarity by BLAST search against the NCBI non-redundant (Nr protein database. For functional classification, 163,161 unigenes were comprised of three Gene Ontology (Go categories and 67 subcategories. A total of 15,334 unigenes were annotated into 25 eukaryotic orthologous groups (KOGs categories. Kyoto Encyclopedia of Genes and Genomes (KEGG database annotated 39,585 unigenes into six biological functional groups and 308 pathways. Among the 2,757 unigenes that participated in the 15 immune system KEGG pathways, 125 of the most important immune relevant genes were summarized and analyzed by STRING analysis to identify gene interactions and relationships. Moreover, 10 genes were confirmed by PCR and analyzed. Of these 125 unigenes, 109 unigenes, approximately 87%, were not previously identified in the goose.This de novo transcriptome analysis could provide important Chinese goose sequence information and highlights the value of new gene discovery, pathways investigation and immune system gene identification, and comparison with other avian species as useful

  5. SHAPE-BASED TIME SERIES SIMILARITY MEASURE AND PATTERN DISCOVERY ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    Zeng Fanzi; Qiu Zhengding; Li Dongsheng; Yue Jianhai

    2005-01-01

    Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the scale and baseline, value-based methods bring about problem when the objective is to capture the shape. Thus, a similarity measure based on shape, Sh measure, is originally proposed, andthe properties of this similarity and corresponding proofs are given. Then a time series shape pattern discovery algorithm based on Sh measure is put forward. The proposed algorithm is terminated in finite iteration with given computational and storage complexity. Finally the experiments on synthetic datasets and sunspot datasets demonstrate that the time series shape pattern algorithm is valid.

  6. Improving pattern discovery and visualization of SAGE data through poisson-based self-adaptive neural networks.

    Science.gov (United States)

    Zheng, Huiru; Wang, Haiying; Azuaje, Francisco

    2008-07-01

    Serial analysis of gene expression (SAGE) allows a detailed, simultaneous analysis of thousands of genes without the need for prior, complete gene sequence information. However, due to its inherent complexity and the lack of complete structural and function knowledge, mining vast collections of SAGE data to extract useful knowledge poses great challenges to traditional analytical techniques. Moreover, SAGE data are characterized by a specific statistical model that has not been incorporated into traditional data analysis techniques. The analysis of SAGE data requires advanced, intelligent computational techniques, which consider the underlying biology and the statistical nature of SAGE data. By addressing the statistical properties demonstrated by SAGE data, this paper presents a new self-adaptive neural network, Poisson-based growing self-organizing map (PGSOM), which implements novel weight adaptation and neuron growing strategies. An empirical study of key dynamic mechanisms of PGSOM is presented. It was tested on three datasets, including synthetic and experimental SAGE data. The results indicate that, in comparison to traditional techniques, the PGSOM offers significant advantages in the context of pattern discovery and visualization in SAGE data. The pattern discovery and visualization platform discussed in this paper can be applied to other problem domains where the data are better approximated by a Poisson distribution.

  7. Plant gravitropic signal transduction: A network analysis leads to gene discovery

    Science.gov (United States)

    Wyatt, Sarah

    Gravity plays a fundamental role in plant growth and development. Although a significant body of research has helped define the events of gravity perception, the role of the plant growth regulator auxin, and the mechanisms resulting in the gravity response, the events of signal transduction, those that link the biophysical action of perception to a biochemical signal that results in auxin redistribution, those that regulate the gravitropic effects on plant growth, remain, for the most part, a “black box.” Using a cold affect, dubbed the gravity persistent signal (GPS) response, we developed a mutant screen to specifically identify components of the signal transduction pathway. Cloning of the GPS genes have identified new proteins involved in gravitropic signaling. We have further exploited the GPS response using a multi-faceted approach including gene expression microarrays, proteomics analysis, and bioinformatics analysis and continued mutant analysis to identified additional genes, physiological and biochemical processes. Gene expression data provided the foundation of a regulatory network for gravitropic signaling. Based on these gene expression data and related data sets/information from the literature/repositories, we constructed a gravitropic signaling network for Arabidopsis inflorescence stems. To generate the network, both a dynamic Bayesian network approach and a time-lagged correlation coefficient approach were used. The dynamic Bayesian network added existing information of protein-protein interaction while the time-lagged correlation coefficient allowed incorporation of temporal regulation and thus could incorporate the time-course metric from the data set. Thus the methods complemented each other and provided us with a more comprehensive evaluation of connections. Each method generated a list of possible interactions associated with a statistical significance value. The two networks were then overlaid to generate a more rigorous, intersected

  8. Human transporter database: comprehensive knowledge and discovery tools in the human transporter genes.

    Directory of Open Access Journals (Sweden)

    Adam Y Ye

    Full Text Available Transporters are essential in homeostatic exchange of endogenous and exogenous substances at the systematic, organic, cellular, and subcellular levels. Gene mutations of transporters are often related to pharmacogenetics traits. Recent developments in high throughput technologies on genomics, transcriptomics and proteomics allow in depth studies of transporter genes in normal cellular processes and diverse disease conditions. The flood of high throughput data have resulted in urgent need for an updated knowledgebase with curated, organized, and annotated human transporters in an easily accessible way. Using a pipeline with the combination of automated keywords query, sequence similarity search and manual curation on transporters, we collected 1,555 human non-redundant transporter genes to develop the Human Transporter Database (HTD (http://htd.cbi.pku.edu.cn. Based on the extensive annotations, global properties of the transporter genes were illustrated, such as expression patterns and polymorphisms in relationships with their ligands. We noted that the human transporters were enriched in many fundamental biological processes such as oxidative phosphorylation and cardiac muscle contraction, and significantly associated with Mendelian and complex diseases such as epilepsy and sudden infant death syndrome. Overall, HTD provides a well-organized interface to facilitate research communities to search detailed molecular and genetic information of transporters for development of personalized medicine.

  9. Human transporter database: comprehensive knowledge and discovery tools in the human transporter genes.

    Science.gov (United States)

    Ye, Adam Y; Liu, Qing-Rong; Li, Chuan-Yun; Zhao, Min; Qu, Hong

    2014-01-01

    Transporters are essential in homeostatic exchange of endogenous and exogenous substances at the systematic, organic, cellular, and subcellular levels. Gene mutations of transporters are often related to pharmacogenetics traits. Recent developments in high throughput technologies on genomics, transcriptomics and proteomics allow in depth studies of transporter genes in normal cellular processes and diverse disease conditions. The flood of high throughput data have resulted in urgent need for an updated knowledgebase with curated, organized, and annotated human transporters in an easily accessible way. Using a pipeline with the combination of automated keywords query, sequence similarity search and manual curation on transporters, we collected 1,555 human non-redundant transporter genes to develop the Human Transporter Database (HTD) (http://htd.cbi.pku.edu.cn). Based on the extensive annotations, global properties of the transporter genes were illustrated, such as expression patterns and polymorphisms in relationships with their ligands. We noted that the human transporters were enriched in many fundamental biological processes such as oxidative phosphorylation and cardiac muscle contraction, and significantly associated with Mendelian and complex diseases such as epilepsy and sudden infant death syndrome. Overall, HTD provides a well-organized interface to facilitate research communities to search detailed molecular and genetic information of transporters for development of personalized medicine.

  10. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

    Directory of Open Access Journals (Sweden)

    Steinfeld Israel

    2009-02-01

    Full Text Available Abstract Background Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results. Results GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (e.g. by level of expression or of differential expression. GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list. Building on a complete theoretical characterization of the underlying distribution, called mHG, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations. This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds. The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms. Conclusion GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools. GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation. GOrilla is publicly available at: http://cbl-gorilla.cs.technion.ac.il

  11. Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation

    Directory of Open Access Journals (Sweden)

    Robin Stephane

    2009-03-01

    Full Text Available Abstract Background The use of current high-throughput genetic, genomic and post-genomic data leads to the simultaneous evaluation of a large number of statistical hypothesis and, at the same time, to the multiple-testing problem. As an alternative to the too conservative Family-Wise Error-Rate (FWER, the False Discovery Rate (FDR has appeared for the last ten years as more appropriate to handle this problem. However one drawback of FDR is related to a given rejection region for the considered statistics, attributing the same value to those that are close to the boundary and those that are not. As a result, the local FDR has been recently proposed to quantify the specific probability for a given null hypothesis to be true. Results In this context we present a semi-parametric approach based on kernel estimators which is applied to different high-throughput biological data such as patterns in DNA sequences, genes expression and genome-wide association studies. Conclusion The proposed method has the practical advantages, over existing approaches, to consider complex heterogeneities in the alternative hypothesis, to take into account prior information (from an expert judgment or previous studies by allowing a semi-supervised mode, and to deal with truncated distributions such as those obtained in Monte-Carlo simulations. This method has been implemented and is available through the R package kerfdr via the CRAN or at http://stat.genopole.cnrs.fr/software/kerfdr.

  12. Helping Students Understand Gene Regulation with Online Tools: A Review of MEME and Melina II, Motif Discovery Tools for Active Learning in Biology

    Directory of Open Access Journals (Sweden)

    David Treves

    2012-08-01

    Full Text Available Review of: MEME and Melina II, which are two free and easy-to-use online motif discovery tools that can be employed to actively engage students in learning about gene regulatory elements.

  13. Affinity-Based Screening Technology and HCV Drug Discovery

    Institute of Scientific and Technical Information of China (English)

    LI Bin

    2003-01-01

    @@ NS5A is one of the non-structural gene products encoded by Hepatitis C virus (HCV) and related viruses that are essential for viral replication. The amino acid sequence of NS5A is conserved between different HCV genotypes and the primary amino acid sequence of NS5A is unique to HCV and closely related viruses. Importantly, NS5A is unrelated to any human protein. This indicates that drugs designed to block the actions of NS5A could inhibit the replication of HCV without showing toxic side effects in human host cells, thus making NS5A inhibitors ideal anti-viral drugs. However, there are presently no functional assays for this essential viral protein. Therefore, conventional high throughput screening (HTS) approaches can not be used to discover antiviral drugs against NS5A.

  14. Accelerated Discovery in Photocatalysis using a Mechanism-Based Screening Method.

    Science.gov (United States)

    Hopkinson, Matthew N; Gómez-Suárez, Adrián; Teders, Michael; Sahoo, Basudev; Glorius, Frank

    2016-03-18

    Herein, we report a conceptually novel mechanism-based screening approach to accelerate discovery in photocatalysis. In contrast to most screening methods, which consider reactions as discrete entities, this approach instead focuses on a single constituent mechanistic step of a catalytic reaction. Using luminescence spectroscopy to investigate the key quenching step in photocatalytic reactions, an initial screen of 100 compounds led to the discovery of two promising substrate classes. Moreover, a second, more focused screen provided mechanistic insights useful in developing proof-of-concept reactions. Overall, this fast and straightforward approach both facilitated the discovery and aided the development of new light-promoted reactions and suggests that mechanism-based screening strategies could become useful tools in the hunt for new reactivity. PMID:27000485

  15. Accelerated Discovery in Photocatalysis using a Mechanism-Based Screening Method.

    Science.gov (United States)

    Hopkinson, Matthew N; Gómez-Suárez, Adrián; Teders, Michael; Sahoo, Basudev; Glorius, Frank

    2016-03-18

    Herein, we report a conceptually novel mechanism-based screening approach to accelerate discovery in photocatalysis. In contrast to most screening methods, which consider reactions as discrete entities, this approach instead focuses on a single constituent mechanistic step of a catalytic reaction. Using luminescence spectroscopy to investigate the key quenching step in photocatalytic reactions, an initial screen of 100 compounds led to the discovery of two promising substrate classes. Moreover, a second, more focused screen provided mechanistic insights useful in developing proof-of-concept reactions. Overall, this fast and straightforward approach both facilitated the discovery and aided the development of new light-promoted reactions and suggests that mechanism-based screening strategies could become useful tools in the hunt for new reactivity.

  16. Analysis of Gene Expression Profiles in Leaf Tissues of Cultivated Peanuts and Development of EST-SSR Markers and Gene Discovery.

    Science.gov (United States)

    Guo, Baozhu; Chen, Xiaoping; Hong, Yanbin; Liang, Xuanqiang; Dang, Phat; Brenneman, Tim; Holbrook, Corley; Culbreath, Albert

    2009-01-01

    Peanut is vulnerable to a range of foliar diseases such as spotted wilt caused by Tomato spotted wilt virus (TSWV), early (Cercospora arachidicola) and late (Cercosporidium personatum) leaf spots, southern stem rot (Sclerotium rolfsii), and sclerotinia blight (Sclerotinia minor). In this study, we report the generation of 17,376 peanut expressed sequence tags (ESTs) from leaf tissues of a peanut cultivar (Tifrunner, resistant to TSWV and leaf spots) and a breeding line (GT-C20, susceptible to TSWV and leaf spots). After trimming vector and discarding low quality sequences, a total of 14,432 high-quality ESTs were selected for further analysis and deposition to GenBank. Sequence clustering resulted in 6,888 unique ESTs composed of 1,703 tentative consensus (TCs) sequences and 5185 singletons. A large number of ESTs (5717) representing genes of unknown functions were also identified. Among the unique sequences, there were 856 EST-SSRs identified. A total of 290 new EST-based SSR markers were developed and examined for amplification and polymorphism in cultivated peanut and wild species. Resequencing information of selected amplified alleles revealed that allelic diversity could be attributed mainly to differences in repeat type and length in the SSR regions. In addition, a few additional INDEL mutations and substitutions were observed in the regions flanking the microsatellite regions. In addition, some defense-related transcripts were also identified, such as putative oxalate oxidase (EU024476) and NBS-LRR domains. EST data in this study have provided a new source of information for gene discovery and development of SSR markers in cultivated peanut. A total of 16931 ESTs have been deposited to the NCBI GenBank database with accession numbers ES751523 to ES768453. PMID:19584933

  17. New construction for expert system based on innovative knowledge discovery technology

    Institute of Scientific and Technical Information of China (English)

    YANG BingRu; SONG Wei; XU ZhangYan

    2007-01-01

    Knowledge acquisition is the bottleneck of expert system. To solve this problem, KD (D&K), which is a comprehensive knowledge discovery process model cooperating both database and knowledge base, and related technology are proposed. Then based on KD (D&K) and related technology, the new construction of Expert System based on Knowledge Discovery (ESKD) is proposed. As the key knowledge acquisition component of ESKD, KD (D&K) is composed of KDD* and KDK*. KDD*-the new process model based on double bases cooperating mechanism; KDK*- the new process model based on double-basis fusion mechanism are introduced, respectively. The overall framework of ESKD is proposed. Some sub-systems and dynamic knowledge base system are discussed. Finally, the effectiveness and advantages of ESKD are tested in a real-world agriculture database. We hope that ESKD may be useful for the new generation of expert systems.

  18. IMG-ABC: An Atlas of Biosynthetic Gene Clusters to Fuel the Discovery of Novel Secondary Metabolites

    Energy Technology Data Exchange (ETDEWEB)

    Chen, I-Min; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Huang, Jinghua; Reddy, T. B.K.; Cimermancic, Peter; Fischbach, Michael; Ivanova, Natalia; Markowitz, Victor; Kyrpides, Nikos; Pati, Amrita

    2014-10-28

    In the discovery of secondary metabolites (SMs), large-scale analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of relevant computational resources. We present IMG-ABC (https://img.jgi.doe.gov/abc/) -- An Atlas of Biosynthetic gene Clusters within the Integrated Microbial Genomes (IMG) system1. IMG-ABC is a rich repository of both validated and predicted biosynthetic clusters (BCs) in cultured isolates, single-cells and metagenomes linked with the SM chemicals they produce and enhanced with focused analysis tools within IMG. The underlying scalable framework enables traversal of phylogenetic dark matter and chemical structure space -- serving as a doorway to a new era in the discovery of novel molecules.

  19. Discovery of germline-related genes in Cephalochordate amphioxus: A genome wide survey using genome annotation and transcriptome data.

    Science.gov (United States)

    Yue, Jia-Xing; Li, Kun-Lung; Yu, Jr-Kai

    2015-12-01

    The generation of germline cells is a critical process in the reproduction of multicellular organisms. Studies in animal models have identified a common repertoire of genes that play essential roles in primordial germ cell (PGC) formation. However, comparative studies also indicate that the timing and regulation of this core genetic program vary considerably in different animals, raising the intriguing questions regarding the evolution of PGC developmental mechanisms in metazoans. Cephalochordates (commonly called amphioxus or lancelets) represent one of the invertebrate chordate groups and can provide important information about the evolution of developmental mechanisms in the chordate lineage. In this study, we used genome and transcriptome data to identify germline-related genes in two distantly related cephalochordate species, Branchiostoma floridae and Asymmetron lucayanum. Branchiostoma and Asymmetron diverged more than 120 MYA, and the most conspicuous difference between them is their gonadal morphology. We used important germline developmental genes in several model animals to search the amphioxus genome and transcriptome dataset for conserved homologs. We also annotated the assembled transcriptome data using Gene Ontology (GO) terms to facilitate the discovery of putative genes associated with germ cell development and reproductive functions in amphioxus. We further confirmed the expression of 14 genes in developing oocytes or mature eggs using whole mount in situ hybridization, suggesting their potential functions in amphioxus germ cell development. The results of this global survey provide a useful resource for testing potential functions of candidate germline-related genes in cephalochordates and for investigating differences in gonad developmental mechanisms between Branchiostoma and Asymmetron species.

  20. Discovery of technical methanation catalysts based on computational screening

    DEFF Research Database (Denmark)

    Sehested, Jens; Larsen, Kasper Emil; Kustov, Arkadii;

    2007-01-01

    Methanation is a classical reaction in heterogeneous catalysis and significant effort has been put into improving the industrially preferred nickel-based catalysts. Recently, a computational screening study showed that nickel-iron alloys should be more active than the pure nickel catalyst and at ...

  1. A Service Discovery and Automatic Deployment Component-Based Software Infrastructure for Ubiquitous Computing

    OpenAIRE

    FLISSI, A; GRANSART, C; Merle, P.

    2005-01-01

    International audience Software applications running on mobile devices are more and more needed. These applications have strong requirements to address: device heterogeneity, limited resources, networked communications, and security. Moreover it is required to have appropriate application design, discovery, deployment, and execution paradigms. These requirements are similar to those of any ubiquitous computing application. In this paper, we present a component-based software infrastructure...

  2. Infrared and Raman Spectroscopy: A Discovery-Based Activity for the General Chemistry Curriculum

    Science.gov (United States)

    Borgsmiller, Karen L.; O'Connell, Dylan J.; Klauenberg, Kathryn M.; Wilson, Peter M.; Stromberg, Christopher J.

    2012-01-01

    A discovery-based method is described for incorporating the concepts of IR and Raman spectroscopy into the general chemistry curriculum. Students use three sets of springs to model the properties of single, double, and triple covalent bonds. Then, Gaussian 03W molecular modeling software is used to illustrate the relationship between bond…

  3. Microwave-Assisted Esterification: A Discovery-Based Microscale Laboratory Experiment

    Science.gov (United States)

    Reilly, Maureen K.; King, Ryan P.; Wagner, Alexander J.; King, Susan M.

    2014-01-01

    An undergraduate organic chemistry laboratory experiment has been developed that features a discovery-based microscale Fischer esterification utilizing a microwave reactor. Students individually synthesize a unique ester from known sets of alcohols and carboxylic acids. Each student identifies the best reaction conditions given their particular…

  4. Agent-based decision making through intelligent knowledge discovery

    OpenAIRE

    Fernández Caballero, Antonio; Sokolova, Marina

    2008-01-01

    Monitoring of negative effects of urban pollution and real-time decision making allow to clarify consequences upon human health. Large amounts of raw data information describe this situation, and to get knowledge from it, we apply intelligent agents. Further modeling and simulation gives the new knowledge about the tendencies of situation development and about its structure. Agent-based decision support system can help to foresee possible ways of situation development and contribute to effect...

  5. Knowledge discovery based on experiential learning corporate culture management

    Science.gov (United States)

    Tu, Kai-Jan

    2014-10-01

    A good corporate culture based on humanistic theory can make the enterprise's management very effective, all enterprise's members have strong cohesion and centripetal force. With experiential learning model, the enterprise can establish an enthusiastic learning spirit corporate culture, have innovation ability to gain the positive knowledge growth effect, and to meet the fierce global marketing competition. A case study on Trend's corporate culture can offer the proof of industry knowledge growth rate equation as the contribution to experiential learning corporate culture management.

  6. Knowledge Discovery Based on Grid%基于网格的知识发现

    Institute of Scientific and Technical Information of China (English)

    张丽芳

    2009-01-01

    On the basis of introduction of knowledge discovery on the grid, the basic principle and components of knowledge discovery or the grid is proposed and a novel framework of knowledge discovery on the grid is designed. Then the process of centralized data mining and distributed data mining based on the architecture is analyzed. And the future work is proposed at last.%该文在介绍网格知识发现概念的基础上,提出了网格知识发现架构设计的基本原则和组件,设计了一种新型的网格知识发现框架,并在此架构上分析了集中式数据挖掘和分布式数据挖掘的全过程,最后给出了工作展望.

  7. Meiosis-specific gene discovery in plants: RNA-Seq applied to isolated Arabidopsis male meiocytes

    Directory of Open Access Journals (Sweden)

    May Gregory D

    2010-12-01

    Full Text Available Abstract Background Meiosis is a critical process in the reproduction and life cycle of flowering plants in which homologous chromosomes pair, synapse, recombine and segregate. Understanding meiosis will not only advance our knowledge of the mechanisms of genetic recombination, but also has substantial applications in crop improvement. Despite the tremendous progress in the past decade in other model organisms (e.g., Saccharomyces cerevisiae and Drosophila melanogaster, the global identification of meiotic genes in flowering plants has remained a challenge due to the lack of efficient methods to collect pure meiocytes for analyzing the temporal and spatial gene expression patterns during meiosis, and for the sensitive identification and quantitation of novel genes. Results A high-throughput approach to identify meiosis-specific genes by combining isolated meiocytes, RNA-Seq, bioinformatic and statistical analysis pipelines was developed. By analyzing the studied genes that have a meiosis function, a pipeline for identifying meiosis-specific genes has been defined. More than 1,000 genes that are specifically or preferentially expressed in meiocytes have been identified as candidate meiosis-specific genes. A group of 55 genes that have mitochondrial genome origins and a significant number of transposable element (TE genes (1,036 were also found to have up-regulated expression levels in meiocytes. Conclusion These findings advance our understanding of meiotic genes, gene expression and regulation, especially the transcript profiles of MGI genes and TE genes, and provide a framework for functional analysis of genes in meiosis.

  8. Ligand-based receptor tyrosine kinase partial agonists: New paradigm for cancer drug discovery?

    Science.gov (United States)

    Riese, David J.

    2010-01-01

    Introduction Receptor tyrosine kinases (RTKs) are validated targets for oncology drug discovery and several RTK antagonists have been approved for the treatment of human malignancies. Nonetheless, the discovery and development of RTK antagonists has lagged behind the discovery and development of agents that target G-protein coupled receptors. In part, this is because it has been difficult to discover analogs of naturally-occurring RTK agonists that function as antagonists. Areas covered Here we describe ligands of ErbB receptors that function as partial agonists for these receptors, thereby enabling these ligands to antagonize the activity of full agonists for these receptors. We provide insights into the mechanisms by which these ligands function as antagonists. We discuss how information concerning these mechanisms can be translated into screens for novel small molecule- and antibody-based antagonists of ErbB receptors and how such antagonists hold great potential as targeted cancer chemotherapeutics. Expert opinion While there have been a number of important key findings into this field, the identification of the structural basis of ligand functional specificity is still of the greatest importance. While it is true that, with some notable exceptions, peptide hormones and growth factors have not proven to be good platforms for oncology drug discovery; addressing the fundamental issues of antagonistic partial agonists for receptor tyrosine kinases has the potential to steer oncology drug discovery in new directions. Mechanism based approaches are now emerging to enable the discovery of RTK partial agonists that may antagonize both agonist-dependent and –independent RTK signaling and may hold tremendous promise as targeted cancer chemotherapeutics. PMID:21532939

  9. Microfluidic-Based Multi-Organ Platforms for Drug Discovery

    Directory of Open Access Journals (Sweden)

    Ahmad Rezaei Kolahchi

    2016-09-01

    Full Text Available Development of predictive multi-organ models before implementing costly clinical trials is central for screening the toxicity, efficacy, and side effects of new therapeutic agents. Despite significant efforts that have been recently made to develop biomimetic in vitro tissue models, the clinical application of such platforms is still far from reality. Recent advances in physiologically-based pharmacokinetic and pharmacodynamic (PBPK-PD modeling, micro- and nanotechnology, and in silico modeling have enabled single- and multi-organ platforms for investigation of new chemical agents and tissue-tissue interactions. This review provides an overview of the principles of designing microfluidic-based organ-on-chip models for drug testing and highlights current state-of-the-art in developing predictive multi-organ models for studying the cross-talk of interconnected organs. We further discuss the challenges associated with establishing a predictive body-on-chip (BOC model such as the scaling, cell types, the common medium, and principles of the study design for characterizing the interaction of drugs with multiple targets.

  10. Ataxin1L is a regulator of HSC function highlighting the utility of cross-tissue comparisons for gene discovery.

    Directory of Open Access Journals (Sweden)

    Juliette J Kahle

    2013-03-01

    Full Text Available Hematopoietic stem cells (HSCs are rare quiescent cells that continuously replenish the cellular components of the peripheral blood. Observing that the ataxia-associated gene Ataxin-1-like (Atxn1L was highly expressed in HSCs, we examined its role in HSC function through in vitro and in vivo assays. Mice lacking Atxn1L had greater numbers of HSCs that regenerated the blood more quickly than their wild-type counterparts. Molecular analyses indicated Atxn1L null HSCs had gene expression changes that regulate a program consistent with their higher level of proliferation, suggesting that Atxn1L is a novel regulator of HSC quiescence. To determine if additional brain-associated genes were candidates for hematologic regulation, we examined genes encoding proteins from autism- and ataxia-associated protein-protein interaction networks for their representation in hematopoietic cell populations. The interactomes were found to be highly enriched for proteins encoded by genes specifically expressed in HSCs relative to their differentiated progeny. Our data suggest a heretofore unappreciated similarity between regulatory modules in the brain and HSCs, offering a new strategy for novel gene discovery in both systems.

  11. Fragment-Based Discovery of 6-Arylindazole JAK Inhibitors.

    Science.gov (United States)

    Ritzén, Andreas; Sørensen, Morten D; Dack, Kevin N; Greve, Daniel R; Jerre, Anders; Carnerup, Martin A; Rytved, Klaus A; Bagger-Bahnsen, Jesper

    2016-06-01

    Janus kinase (JAK) inhibitors are emerging as novel and efficacious drugs for treating psoriasis and other inflammatory skin disorders, but their full potential is hampered by systemic side effects. To overcome this limitation, we set out to discover soft drug JAK inhibitors for topical use. A fragment screen yielded an indazole hit that was elaborated into a potent JAK inhibitor using structure-based design. Growing the fragment by installing a phenol moiety in the 6-position afforded a greatly improved potency. Fine-tuning the substituents on the phenol and sulfonamide moieties afforded a set of compounds with lead-like properties, but they were found to be phototoxic and unstable in the presence of light. PMID:27326341

  12. SDAA: Towards Service Discovery Anywhere Anytime Mobile Based Application

    Directory of Open Access Journals (Sweden)

    Mehedi Masud

    2016-01-01

    Full Text Available Providing on-demand service based on customers' current location is an urgent need for many societies and individuals. Specially, for woman, elderly people, single mother, sick people, etc. Considering the need of providing localized services, this paper proposes a mobile application framework that allows an individual to receive services from his neighborhood peers anywhere anytime. The application allows an individual to find and select reliable service providers near his location. The application will provide an opportunity to the interested individuals to use their free time for providing services to the community and earn some extra money. This application will benefit many stakeholders like elderly people, women at home, a person while traveling in an unknown place, etc. A prototype application is developed and empirical evaluation is considered to find the qualitative measures of the users' acceptability and satisfaction of the application. It is observed that users' satisfaction is high.

  13. Discovery of possible gene relationships through the application of self-organizing maps to DNA microarray databases.

    Directory of Open Access Journals (Sweden)

    Rocio Chavez-Alvarez

    Full Text Available DNA microarrays and cell cycle synchronization experiments have made possible the study of the mechanisms of cell cycle regulation of Saccharomyces cerevisiae by simultaneously monitoring the expression levels of thousands of genes at specific time points. On the other hand, pattern recognition techniques can contribute to the analysis of such massive measurements, providing a model of gene expression level evolution through the cell cycle process. In this paper, we propose the use of one of such techniques--an unsupervised artificial neural network called a Self-Organizing Map (SOM-which has been successfully applied to processes involving very noisy signals, classifying and organizing them, and assisting in the discovery of behavior patterns without requiring prior knowledge about the process under analysis. As a test bed for the use of SOMs in finding possible relationships among genes and their possible contribution in some biological processes, we selected 282 S. cerevisiae genes that have been shown through biological experiments to have an activity during the cell cycle. The expression level of these genes was analyzed in five of the most cited time series DNA microarray databases used in the study of the cell cycle of this organism. With the use of SOM, it was possible to find clusters of genes with similar behavior in the five databases along two cell cycles. This result suggested that some of these genes might be biologically related or might have a regulatory relationship, as was corroborated by comparing some of the clusters obtained with SOMs against a previously reported regulatory network that was generated using biological knowledge, such as protein-protein interactions, gene expression levels, metabolism dynamics, promoter binding, and modification, regulation and transport of proteins. The methodology described in this paper could be applied to the study of gene relationships of other biological processes in different organisms.

  14. Climate Solutions based on advanced scientific discoveries of Allatra physics

    Science.gov (United States)

    Vershigora, Valery

    2016-05-01

    Global climate change is one of the most important international problems of the 21st century. The overall rapid increase in the dynamics of cataclysms, which have been observed in recent decades, is particularly alarming. Howdo modern scientists predict the occurrence of certain events? In meteorology, unusually powerful cumulonimbus clouds are one of the main conditions for the emergence of a tornado. The former, in their turn, are formed during the invasion of cold air on the overheated land surface. The satellite captures the cloud front, and, based on these pictures, scientists make assumptions about the possibility of occurrence of the respective natural phenomena. In fact, mankind visually observes and draws conclusions about the consequences of the physical phenomena which have already taken place in the invisible world, so the conclusions of scientists are assumptions by their nature, rather than precise knowledge of the causes of theorigin of these phenomena in the physics of microcosm. The latest research in the field of the particle physics and neutrino astrophysics, which was conducted by a working team of scientists of ALLATRA International Public Movement (hereinafter ALLATRA SCIENCE group).

  15. Network-based discovery through mechanistic systems biology. Implications for applications--SMEs and drug discovery: where the action is.

    Science.gov (United States)

    Benson, Neil

    2015-08-01

    Phase II attrition remains the most important challenge for drug discovery. Tackling the problem requires improved understanding of the complexity of disease biology. Systems biology approaches to this problem can, in principle, deliver this. This article reviews the reports of the application of mechanistic systems models to drug discovery questions and discusses the added value. Although we are on the journey to the virtual human, the length, path and rate of learning from this remain an open question. Success will be dependent on the will to invest and make the most of the insight generated along the way. PMID:26464089

  16. Discovery of sequence motifs related to coexpression of genes using evolutionary computation

    OpenAIRE

    Fogel, Gary B.; Weekes, Dana G.; Varga, Gabor; Dow, Ernst R.; Harlow, Harry B.; Onyia, Jude E.; Su, Chen

    2004-01-01

    Transcription factors are key regulatory elements that control gene expression. Recognition of transcription factor binding site (TFBS) motifs in the upstream region of coexpressed genes is therefore critical towards a true understanding of the regulations of gene expression. The task of discovering eukaryotic TFBSs remains a challenging problem. Here, we demonstrate that evolutionary computation can be used to search for TFBSs in upstream regions of genes known to be coexpressed. Evolutionar...

  17. Simulation-based Discovery of Cyclic Peptide Nanotubes

    Science.gov (United States)

    Ruiz Pestana, Luis A.

    Today, there is a growing need for environmentally friendly synthetic membranes with selective transport capabilities to address some of society's most pressing issues, such as carbon dioxide pollution, or access to clean water. While conventional membranes cannot stand up to the challenge, thin nanocomposite membranes, where vertically aligned subnanometer pores (e.g. nanotubes) are embedded in a thin polymeric film, promise to overcome some of the current limitations, namely, achieving a monodisperse distribution of subnanometer size pores, vertical pore alignment across the membrane thickness, and tunability of the pore surface chemistry. Self-assembled cyclic peptide nanotubes (CPNs), are particularly promising as selective nanopores because the pore size can be controlled at the subnanometer level, exhibit high chemical design flexibility, and display remarkable mechanical stability. In addition, when conjugated with polymer chains, the cyclic peptides can co-assemble in block copolymer domains to form nanoporous thin films. CPNs are thus well positioned to tackle persistent challenges in molecular separation applications. However, our poor understanding of the physics underlying their remarkable properties prevents the rational design and implementation of CPNs in technologically relevant membranes. In this dissertation, we use a simulation-based approach, in particular molecular dynamics (MD) simulations, to investigate the critical knowledge gaps hindering the implementation of CPNs. Computational mechanical tests show that, despite the weak nature of the stabilizing hydrogen bonds and the small cross section, CPNs display a Young's modulus of approximately 20 GPa and a maximum strength of around 1 GPa, placing them among the strongest proteinaceous materials known. Simulations of the self-assembly process reveal that CPNs grow by self-similar coarsening, contrary to other low-dimensional peptide systems, such as amyloids, that are believed to grow through

  18. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework.

    Science.gov (United States)

    Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org.

  19. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework.

    Science.gov (United States)

    Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. PMID:26989145

  20. The Discovery of Aurora Kinase Inhibitor by Multi-Docking-Based Virtual Screening

    Directory of Open Access Journals (Sweden)

    Jun-Tae Kim

    2014-11-01

    Full Text Available We report the discovery of aurora kinase inhibitor using the fragment-based virtual screening by multi-docking strategy. Among a number of fragments collected from eMololecules, we found four fragment molecules showing potent activity (>50% at 100 μM against aurora kinase. Based on the explored fragment scaffold, we selected two compounds in our synthesized library and validated the biological activity against Aurora kinase.

  1. The Discovery of Aurora Kinase Inhibitor by Multi-Docking-Based Virtual Screening

    OpenAIRE

    Jun-Tae Kim; Seo Hee Jung; Sun Young Kang; Chung-Kyu Ryu; Nam Sook Kang

    2014-01-01

    We report the discovery of aurora kinase inhibitor using the fragment-based virtual screening by multi-docking strategy. Among a number of fragments collected from eMololecules, we found four fragment molecules showing potent activity (>50% at 100 μM) against aurora kinase. Based on the explored fragment scaffold, we selected two compounds in our synthesized library and validated the biological activity against Aurora kinase.

  2. An Agent-Based Focused Crawling Framework for Topic- and Genre-Related Web Document Discovery

    OpenAIRE

    Pappas, Nikolaos; Katsimpras, Georgios; Stamatatos, Efstathios

    2012-01-01

    The discovery of web documents about certain topics is an important task for web-based applications including web document retrieval, opinion mining and knowledge extraction. In this paper, we propose an agent-based focused crawling framework able to retrieve topic- and genre-related web documents. Starting from a simple topic query, a set of focused crawler agents explore in parallel topic-specific web paths using dynamic seed URLs that belong to certain web genres and are collected from web...

  3. Discovery by the Epistasis Project of an epistatic interaction between the GSTM3 gene and the HHEX/IDE/KIF11 locus in the risk of Alzheimer's disease

    NARCIS (Netherlands)

    J.M. Bullock (James); C. Medway (Christopher); M. Cortina-Borja (Mario); J.C. Turton (James); J.A. Prince (Jonathan); C.A. Ibrahim-Verbaas (Carla); M. Schuur (Maaike); M.M.B. Breteler (Monique); C.M. van Duijn (Cock); P.G. Kehoe (Patrick); R. Barber (Rachel); E. Coto (Eliecer); V. Alvarez (Victoria); P. Deloukas (Panagiotis); N. Hammond (Naomi); O. Combarros (Onofre); I. Mateo (Ignacio); D.R. Warden (Donald); M.G. Lehmann (Michael); O. Belbin (Olivia); K. Brown (Kristelle); G.K. Wilcock (Gordon); R. Heun (Reinhard); H. Kölsch (Heike); A.D. Smith; D.J. Lehmann (Donald); K. Morgan (Kevin)

    2013-01-01

    textabstractDespite recent discoveries in the genetics of sporadic Alzheimer's disease, there remains substantial " hidden heritability." It is thought that some of this missing heritability may be because of gene-gene, i.e., epistatic, interactions. We examined potential epistasis between 110 candi

  4. Motif discovery in promoters of genes co-localized and co-expressed during myeloid cells differentiation

    OpenAIRE

    Coppe, Alessandro; Ferrari, Francesco; Bisognin, Andrea; Danieli, Gian Antonio; Ferrari, Sergio; Bicciato, Silvio; Bortoluzzi, Stefania

    2008-01-01

    Genes co-expressed may be under similar promoter-based and/or position-based regulation. Although data on expression, position and function of human genes are available, their true integration still represents a challenge for computational biology, hampering the identification of regulatory mechanisms. We carried out an integrative analysis of genomic position, functional annotation and promoters of genes expressed in myeloid cells. Promoter analysis was conducted by a novel multi-step method...

  5. Discovery and characterization of the first genuine avian leptin gene in the rock dove (Columba livia).

    Science.gov (United States)

    Friedman-Einat, Miriam; Cogburn, Larry A; Yosefi, Sara; Hen, Gideon; Shinder, Dmitry; Shirak, Andrey; Seroussi, Eyal

    2014-09-01

    Leptin, the key regulator of mammalian energy balance, has been at the center of a great controversy in avian biology for the last 15 years since initial reports of a putative leptin gene (LEP) in chickens. Here, we characterize a novel LEP in rock dove (Columba livia) with low similarity of the predicted protein sequence (30% identity, 47% similarity) to the human ortholog. Searching the Sequence-Read-Archive database revealed leptin transcripts, in the dove's liver, with 2 noncoding exons preceding 2 coding exons. This unusual 4-exon structure was validated by sequencing of a GC-rich product (76% GC, 721 bp) amplified from liver RNA by RT-PCR. Sequence alignment of the dove leptin with orthologous leptins indicated that it consists of a leader peptide (21 amino acids; aa) followed by the mature protein (160 aa), which has a putative structure typical of 4-helical-bundle cytokines except that it is 12 aa longer than human leptin. Extra residues (10 aa) were located within the loop between 2 5'-helices, interrupting the amino acid motif that is conserved in tetrapods and considered essential for activation of leptin receptor (LEPR) but not for receptor binding per se. Quantitative RT-PCR of 11 tissues showed highest (P < .05) expression of LEP in the dove's liver, whereas the dove LEPR peaked (P < .01) in the pituitary. Both genes were prominently expressed in the gonads and at lower levels in tissues involved in mammalian leptin signaling (adipose; hypothalamus). A bioassay based on activation of the chicken LEPR in vitro showed leptin activity in the dove's circulation, suggesting that dove LEP encodes an active protein, despite the interrupted loop motif. Providing tools to study energy-balance control at an evolutionary perspective, our original demonstration of leptin signaling in dove predicts a more ancient role of leptin in growth and reproduction in birds, rather than appetite control.

  6. Generalization-based discovery of spatial association rules with linguistic cloud models

    Institute of Scientific and Technical Information of China (English)

    杨斌; 田永青; 朱仲英

    2004-01-01

    Extraction of interesting and general spatial association rules from large spatial databases is an important task in the development of spatial database systems. In this paper, we investigate the generalization-based knowledge discovery mechanism that integrates attribute-oriented induction on nonspatial data and spatial merging and generalization on spatial data. Furthermore, we present linguistic cloud models for knowledge representation and uncertainty handling to enhance current generalization-based method. With these models, spatial and nonspatial attribute values are well generalized at higher-concept levels, allowing discovery of strong spatial association rules. Combining the cloud model based generalization method with Apriori algorithm for mining association rules from a spatial database shows the benefits in effectiveness and flexibility.

  7. Mass Spectrometry-Based Proteomics in Molecular Diagnostics: Discovery of Cancer Biomarkers Using Tissue Culture

    Directory of Open Access Journals (Sweden)

    Debasish Paul

    2013-01-01

    Full Text Available Accurate diagnosis and proper monitoring of cancer patients remain a key obstacle for successful cancer treatment and prevention. Therein comes the need for biomarker discovery, which is crucial to the current oncological and other clinical practices having the potential to impact the diagnosis and prognosis. In fact, most of the biomarkers have been discovered utilizing the proteomics-based approaches. Although high-throughput mass spectrometry-based proteomic approaches like SILAC, 2D-DIGE, and iTRAQ are filling up the pitfalls of the conventional techniques, still serum proteomics importunately poses hurdle in overcoming a wide range of protein concentrations, and also the availability of patient tissue samples is a limitation for the biomarker discovery. Thus, researchers have looked for alternatives, and profiling of candidate biomarkers through tissue culture of tumor cell lines comes up as a promising option. It is a rich source of tumor cell-derived proteins, thereby, representing a wide array of potential biomarkers. Interestingly, most of the clinical biomarkers in use today (CA 125, CA 15.3, CA 19.9, and PSA were discovered through tissue culture-based system and tissue extracts. This paper tries to emphasize the tissue culture-based discovery of candidate biomarkers through various mass spectrometry-based proteomic approaches.

  8. Discovery of sequence motifs related to coexpression of genes using evolutionary computation

    Science.gov (United States)

    Fogel, Gary B.; Weekes, Dana G.; Varga, Gabor; Dow, Ernst R.; Harlow, Harry B.; Onyia, Jude E.; Su, Chen

    2004-01-01

    Transcription factors are key regulatory elements that control gene expression. Recognition of transcription factor binding site (TFBS) motifs in the upstream region of coexpressed genes is therefore critical towards a true understanding of the regulations of gene expression. The task of discovering eukaryotic TFBSs remains a challenging problem. Here, we demonstrate that evolutionary computation can be used to search for TFBSs in upstream regions of genes known to be coexpressed. Evolutionary computation was used to search for TFBSs of genes regulated by octamer-binding factor and nuclear factor kappa B. The discovered binding sites included experimentally determined known binding motifs as well as lists of putative, previously unknown TFBSs. We believe that this method to search nucleotide sequence information efficiently for similar motifs will be useful for discovering TFBSs that affect gene regulation. PMID:15266008

  9. An integrative approach to species discovery in odonates: from character-based DNA barcoding to ecology.

    Science.gov (United States)

    Damm, Sandra; Schierwater, Bernd; Hadrys, Heike

    2010-09-01

    Modern taxonomy requires an analytical approach incorporating all lines of evidence into decision-making. Such an approach can enhance both species identification and species discovery. The character-based DNA barcode method provides a molecular data set that can be incorporated into classical taxonomic data such that the discovery of new species can be made in an analytical framework that includes multiple sources of data. We here illustrate such a corroborative framework in a dragonfly model system that permits the discovery of two new, but visually cryptic species. In the African dragonfly genus Trithemis three distinct genetic clusters can be detected which could not be identified by using classical taxonomic characters. In order to test the hypothesis of two new species, DNA-barcodes from different sequence markers (ND1 and COI) were combined with morphological, ecological and biogeographic data sets. Phylogenetic analyses and incorporation of all data sets into a scheme called taxonomic circle highly supports the hypothesis of two new species. Our case study suggests an analytical approach to modern taxonomy that integrates data sets from different disciplines, thereby increasing the ease and reliability of both species discovery and species assignment.

  10. Prior knowledge driven Granger causality analysis on gene regulatory network discovery

    OpenAIRE

    Yao, Shun; Yoo, Shinjae; Yu, Dantong

    2015-01-01

    Background Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>>T. Results In this study, we proposed a new method, viz., CGC-2SPR (CGC ...

  11. Discovery of error-tolerant biclusters from noisy gene expression data

    OpenAIRE

    Gupta Rohit; Rao Navneet; Kumar Vipin

    2011-01-01

    Abstract Background An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bi...

  12. Thesaurus-based disambiguation of gene symbols

    Directory of Open Access Journals (Sweden)

    Wain Hester M

    2005-06-01

    Full Text Available Abstract Background Massive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck. Results We developed a simple thesaurus-based disambiguation algorithm that can operate with very little training data. The thesaurus comprises the information from five human genetic databases and MeSH. The extent of the homonym problem for human gene symbols is shown to be substantial (33% of the genes in our combined thesaurus had one or more ambiguous symbols, not only because one symbol can refer to multiple genes, but also because a gene symbol can have many non-gene meanings. A test set of 52,529 Medline abstracts, containing 690 ambiguous human gene symbols taken from OMIM, was automatically generated. Overall accuracy of the disambiguation algorithm was up to 92.7% on the test set. Conclusion The ambiguity of human gene symbols is substantial, not only because one symbol may denote multiple genes but particularly because many symbols have other, non-gene meanings. The proposed disambiguation approach resolves most ambiguities in our test set with high accuracy, including the important gene/not a gene decisions. The algorithm is fast and scalable, enabling gene-symbol disambiguation in massive text mining applications.

  13. A multi-gene transcriptional profiling approach to the discovery of cell signature markers

    OpenAIRE

    Wada, Youichiro; Li, Dan; Merley, Anne; Zukauskas, Andrew; Aird, William C.; Dvorak, Harold F.; Shih, Shou-Ching

    2010-01-01

    A profile of transcript abundances from multiple genes constitutes a molecular signature if the expression pattern is unique to one cell type. Here we measure mRNA copy numbers per cell by normalizing per million copies of 18S rRNA and identify 6 genes (TIE1, KDR, CDH5, TIE2, EFNA1 and MYO5C) out of 79 genes tested as excellent molecular signature markers for endothelial cells (ECs) in vitro. The selected genes are uniformly expressed in ECs of 4 different origins but weakly or not expressed ...

  14. Computational discovery of Epstein-Barr virus targeted human genes and signalling pathways.

    Science.gov (United States)

    Mei, Suyu; Zhang, Kun

    2016-01-01

    Epstein-Barr virus (EBV) plays important roles in the origin and the progression of human carcinomas, e.g. diffuse large B cell tumors, T cell lymphomas, etc. Discovering EBV targeted human genes and signaling pathways is vital to understand EBV tumorigenesis. In this study we propose a noise-tolerant homolog knowledge transfer method to reconstruct functional protein-protein interactions (PPI) networks between Epstein-Barr virus and Homo sapiens. The training set is augmented via homolog instances and the homolog noise is counteracted by support vector machine (SVM). Additionally we propose two methods to define subcellular co-localization (i.e. stringent and relaxed), based on which to further derive physical PPI networks. Computational results show that the proposed method achieves sound performance of cross validation and independent test. In the space of 648,672 EBV-human protein pairs, we obtain 51,485 functional interactions (7.94%), 869 stringent physical PPIs and 46,050 relaxed physical PPIs. Fifty-eight evidences are found from the latest database and recent literature to validate the model. This study reveals that Epstein-Barr virus interferes with normal human cell life, such as cholesterol homeostasis, blood coagulation, EGFR binding, p53 binding, Notch signaling, Hedgehog signaling, etc. The proteome-wide predictions are provided in the supplementary file for further biomedical research. PMID:27470517

  15. Computational discovery of Epstein-Barr virus targeted human genes and signalling pathways

    Science.gov (United States)

    Mei, Suyu; Zhang, Kun

    2016-01-01

    Epstein-Barr virus (EBV) plays important roles in the origin and the progression of human carcinomas, e.g. diffuse large B cell tumors, T cell lymphomas, etc. Discovering EBV targeted human genes and signaling pathways is vital to understand EBV tumorigenesis. In this study we propose a noise-tolerant homolog knowledge transfer method to reconstruct functional protein-protein interactions (PPI) networks between Epstein-Barr virus and Homo sapiens. The training set is augmented via homolog instances and the homolog noise is counteracted by support vector machine (SVM). Additionally we propose two methods to define subcellular co-localization (i.e. stringent and relaxed), based on which to further derive physical PPI networks. Computational results show that the proposed method achieves sound performance of cross validation and independent test. In the space of 648,672 EBV-human protein pairs, we obtain 51,485 functional interactions (7.94%), 869 stringent physical PPIs and 46,050 relaxed physical PPIs. Fifty-eight evidences are found from the latest database and recent literature to validate the model. This study reveals that Epstein-Barr virus interferes with normal human cell life, such as cholesterol homeostasis, blood coagulation, EGFR binding, p53 binding, Notch signaling, Hedgehog signaling, etc. The proteome-wide predictions are provided in the supplementary file for further biomedical research. PMID:27470517

  16. Exploring the Transcriptome Landscape of Pomegranate Fruit Peel for Natural Product Biosynthetic Gene and SSR Marker Discovery

    Institute of Scientific and Technical Information of China (English)

    Nadia Nicole Ono; Monica Therese Britton; Joseph Nathaniel Fass; Charles Meyer Nicolet; Dawei Lin; Li Tian

    2011-01-01

    Pomegranate fruit peel is rich in bioactive plant natural products,such as hydrolyzable tannins and anthocyanins.Despite their documented roles in human nutrition and fruit quality,genes involved in natural product biosynthesis have not been cloned from pomegranate and very little sequence information is available on pomegranate in the public domain.Shotgun transcriptome sequencing of pomegranate fruit peel cDNA was performed using RNA-Seq on the Illumina Genome Analyzer platform.Over 100 million raw sequence reads were obtained and assembled into 9,839 transcriptome assemblies (TAs) (>200 bp).Candidate genes for hydrolyzable tannin,anthocyanin,flavonoid,terpenoid and fatty acid biosynthesis and/or regulation were identified.Three lipid transfer proteins were obtained that may contribute to the previously reported IgE reactivity of pomegranate fruit extracts.In addition,115 SSR markers were identified from the pomegranate fruit peel transcriptome and primers were designed for 77 SSR markers.The pomegranate fruit peel transcriptome set provides a valuable platform for natural product biosynthetic gene and SSR marker discovery in pomegranate.This work also demonstrates that next-generation transcriptome sequencing is an economical and effective approach for investigating natural product biosynthesis,identifying genes controlling important agronomic traits,and discovering molecular markers in non-model specialty crop species.

  17. ETS gene fusions in prostate cancer: from discovery to daily clinical practice.

    NARCIS (Netherlands)

    Tomlins, S.A.; Bjartell, A.; Chinnaiyan, A.M.; Jenster, G.; Nam, R.K.; Rubin, M.A.; Schalken, J.A.

    2009-01-01

    CONTEXT: In 2005, fusions between the androgen-regulated transmembrane protease serine 2 gene, TMPRSS2, and E twenty-six (ETS) transcription factors were discovered in prostate cancer. OBJECTIVE: To review advances in our understanding of ETS gene fusions, focusing on challenges affecting translatio

  18. Discovery of putative capsaicin biosynthetic genes by RNA-Seq and digital gene expression analysis of pepper

    Science.gov (United States)

    Zhang, Zi-Xin; Zhao, Shu-Niu; Liu, Gao-Feng; Huang, Zu-Mei; Cao, Zhen-Mu; Cheng, Shan-Han; Lin, Shi-Sen

    2016-01-01

    The Indian pepper ‘Guijiangwang’ (Capsicum frutescens L.), one of the world’s hottest chili peppers, is rich in capsaicinoids. The accumulation of the alkaloid capsaicin and its analogs in the epidermal cells of the placenta contribute to the pungency of Capsicum fruits. To identify putative genes involved in capsaicin biosynthesis, RNA-Seq was used to analyze the pepper’s expression profiles over five developmental stages. Five cDNA libraries were constructed from the total RNA of placental tissue and sequenced using an Illumina HiSeq 2000. More than 19 million clean reads were obtained from each library, and greater than 50% of the reads were assignable to reference genes. Digital gene expression (DGE) profile analysis using Solexa sequencing was performed at five fruit developmental stages and resulted in the identification of 135 genes of known function; their expression patterns were compared to the capsaicin accumulation pattern. Ten genes of known function were identified as most likely to be involved in regulating capsaicin synthesis. Additionally, 20 new candidate genes were identified related to capsaicin synthesis. We use a combination of RNA-Seq and DGE analyses to contribute to the understanding of the biosynthetic regulatory mechanism(s) of secondary metabolites in a nonmodel plant and to identify candidate enzyme-encoding genes. PMID:27756914

  19. Discovery of mitochondrial chimeric-gene associated with cytoplasmic male sterility of HL-rice

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The mitochondrial genome libraries of HL-type sterile line(A) and maintainer line(B) have been constructed.Mitochondrial gene, atp6, was used to screen libraries, due to the different Southern and Northern blot results between sterile and maintainer line. Sequencing analysis of positive clones proved that there were two copies of atp6 gene in sterile line and only one in maintainer line. One copy of atpt6 in sterile line was same to that in maintainer line; the other showed different flanking sequence from the 49th nucleotide downstream of the termination codon of atp6 gene. A new chimeric gene, orfH79, was found in the region. OrfH79 had homology to mitochondrial gene coxⅡ and orfl07, and was special to HL-sterile cytoplasm.``

  20. Correlating overrepresented upstream motifs to gene expression a computational approach to regulatory element discovery in eukaryotes

    CERN Document Server

    Caselle, M; Provero, P

    2002-01-01

    Gene regulation in eukaryotes is mainly effected through transcription factors binding to rather short recognition motifs generally located upstream of the coding region. We present a novel computational method to identify regulatory elements in the upstream region of eukaryotic genes. The genes are grouped in sets sharing an overrepresented short motif in their upstream sequence. For each set, the average expression level from a microarray experiment is determined: If this level is significantly higher or lower than the average taken over the whole genome, then the overerpresented motif shared by the genes in the set is likely to play a role in their regulation. The method was tested by applying it to the genome of Saccharomyces cerevisiae, using the publicly available results of a DNA microarray experiment, in which expression levels for virtually all the genes were measured during the diauxic shift from fermentation to respiration. Several known motifs were correctly identified, and a new candidate regulat...

  1. Gun Possession among American Youth: A Discovery-Based Approach to Understand Gun Violence

    OpenAIRE

    Kelly V Ruggles; Sonali Rajan

    2014-01-01

    OBJECTIVE: To apply discovery-based computational methods to nationally representative data from the Centers for Disease Control and Preventions' Youth Risk Behavior Surveillance System to better understand and visualize the behavioral factors associated with gun possession among adolescent youth. RESULTS: Our study uncovered the multidimensional nature of gun possession across nearly five million unique data points over a ten year period (2001-2011). Specifically, we automated odds ratio cal...

  2. Research on Hotspot Discovery in Internet Public Opinions Based on Improved K-Means

    OpenAIRE

    Gensheng Wang

    2013-01-01

    How to discover hotspot in the Internet public opinions effectively is a hot research field for the researchers related which plays a key role for governments and corporations to find useful information from mass data in the Internet. An improved K-means algorithm for hotspot discovery in internet public opinions is presented based on the analysis of existing defects and calculation principle of original K-means algorithm. First, some new methods are designed to preprocess website texts, sele...

  3. Natural Genetic Variation in Cassava (Manihot esculenta Crantz) Landraces: A Tool for Gene Discovery

    International Nuclear Information System (INIS)

    Cassava landraces are the earliest form of the modern cultivars and represent the first step in cassava domestication. Our forward genetic analysis uses this resource to discover spontaneous mutations in the sucrose/ starch and carotenoid synthesis/accumulation and to develop both an evolutionary and breeding perspective of gene function related to those traits. Biochemical phenotype variants for the synthesis and accumulation of carotenoid, free sugar and starch were identified. Six subtractive cDNA libraries were prepared to construct a high quality (phred > 20) EST database with 1,645 entries. Macroarray and micro-array analysis was performed to identify differentially expressed genes aiming to identify candidate genes related to sugary phenotype and carotenoid diversity. cDNA sequence for gene coding for specific enzymes in the two pathways was obtained. Gene expression analysis for coding specific enzymes was performed by RNA blot and Real Time PCR analysis. Chromoplast-associated proteins of yellow storage root were fractionated and a peptide sequence database with 906 entries sequences (MASCOT validated) was constructed. For the sucrose/starch, metabolism a sugary class of cassava was identified, carrying a mutation in the BEI and GBSS genes. For the pigmented cassava, a pink color phenotype showed absence of expression of the gene CasLYB, while an intense yellow phenotype showed a down regulation of the gene CasHYb. Heat shock proteins were identified as the major proteins associated with carotenoid. Genetic diversity for the GBSS gene in the natural population identified 22 haplotypes and a large nucleotide diversity in four subsets of population. Single segregating population derived from F2, half-sibling and S1 population showed segregation for sugary phenotype (93% of individuals), waxy phenotype (38% of individuals) and glycogen like starch (2% of individuals). Here we summarize our current results for the genetic analysis of these variants and recent

  4. Structure-Based Drug Discovery for Prion Disease Using a Novel Binding Simulation

    Directory of Open Access Journals (Sweden)

    Daisuke Ishibashi

    2016-07-01

    Full Text Available The accumulation of abnormal prion protein (PrPSc converted from the normal cellular isoform of PrP (PrPC is assumed to induce pathogenesis in prion diseases. Therefore, drug discovery studies for these diseases have focused on the protein conversion process. We used a structure-based drug discovery algorithm (termed Nagasaki University Docking Engine: NUDE that ran on an intensive supercomputer with a graphic-processing unit to identify several compounds with anti-prion effects. Among the candidates showing a high-binding score, the compounds exhibited direct interaction with recombinant PrP in vitro, and drastically reduced PrPSc and protein-aggresomes in the prion-infected cells. The fragment molecular orbital calculation showed that the van der Waals interaction played a key role in PrPC binding as the intermolecular interaction mode. Furthermore, PrPSc accumulation and microgliosis were significantly reduced in the brains of treated mice, suggesting that the drug candidates provided protection from prion disease, although further in vivo tests are needed to confirm these findings. This NUDE-based structure-based drug discovery for normal protein structures is likely useful for the development of drugs to treat other conformational disorders, such as Alzheimer's disease.

  5. A multi-gene transcriptional profiling approach to the discovery of cell signature markers.

    Science.gov (United States)

    Wada, Youichiro; Li, Dan; Merley, Anne; Zukauskas, Andrew; Aird, William C; Dvorak, Harold F; Shih, Shou-Ching

    2011-01-01

    A profile of transcript abundances from multiple genes constitutes a molecular signature if the expression pattern is unique to one cell type. Here we measure mRNA copy numbers per cell by normalizing per million copies of 18S rRNA and identify 6 genes (TIE1, KDR, CDH5, TIE2, EFNA1 and MYO5C) out of 79 genes tested as excellent molecular signature markers for endothelial cells (ECs) in vitro. The selected genes are uniformly expressed in ECs of 4 different origins but weakly or not expressed in 4 non-EC cell lines. A multi-gene transcriptional profile of these 6 genes clearly distinguishes ECs from non-ECs in vitro. We conclude that (i) a profile of mRNA copy numbers per cell from a well-chosen multi-gene panel can act as a sensitive and accurate cell type signature marker, and (ii) the method described here can be applied to in vivo cell fingerprinting and molecular diagnosis. PMID:20972619

  6. Discovery of clubroot-resistant genes in Brassica napus by transcriptome sequencing.

    Science.gov (United States)

    Chen, S W; Liu, T; Gao, Y; Zhang, C; Peng, S D; Bai, M B; Li, S J; Xu, L; Zhou, X Y; Lin, L B

    2016-01-01

    Clubroot significantly affects plants of the Brassicaceae family and is one of the main diseases causing serious losses in B. napus yield. Few studies have investigated the clubroot-resistance mechanism in B. napus. Identification of clubroot-resistant genes may be used in clubroot-resistant breeding, as well as to elucidate the molecular mechanism behind B. napus clubroot-resistance. We used three B. napus transcriptome samples to construct a transcriptome sequencing library by using Illumina HiSeq™ 2000 sequencing and bioinformatic analysis. In total, 171 million high-quality reads were obtained, containing 96,149 unigenes of N50-value. We aligned the obtained unigenes with the Nr, Swiss-Prot, clusters of orthologous groups, and gene ontology databases and annotated their functions. In the Kyoto encyclopedia of genes and genomes database, 25,033 unigenes (26.04%) were assigned to 124 pathways. Many genes, including broad-spectrum disease-resistance genes, specific clubroot-resistant genes, and genes related to indole-3-acetic acid (IAA) signal transduction, cytokinin synthesis, and myrosinase synthesis in the Huashuang 3 variety of B. napus were found to be related to clubroot-resistance. The effective clubroot-resistance observed in this variety may be due to the induced increased expression of these disease-resistant genes and strong inhibition of the IAA signal transduction, cytokinin synthesis, and myrosinase synthesis. The homology observed between unigenes 0048482, 0061770 and the Crr1 gene shared 94% nucleotide similarity. Furthermore, unigene 0061770 could have originated from an inversion of the Crr1 5'-end sequence. PMID:27525940

  7. Gene discovery for the bark beetle-vectored fungal tree pathogen Grosmannia clavigera

    Directory of Open Access Journals (Sweden)

    Robertson Gordon

    2010-10-01

    Full Text Available Abstract Background Grosmannia clavigera is a bark beetle-vectored fungal pathogen of pines that causes wood discoloration and may kill trees by disrupting nutrient and water transport. Trees respond to attacks from beetles and associated fungi by releasing terpenoid and phenolic defense compounds. It is unclear which genes are important for G. clavigera's ability to overcome antifungal pine terpenoids and phenolics. Results We constructed seven cDNA libraries from eight G. clavigera isolates grown under various culture conditions, and Sanger sequenced the 5' and 3' ends of 25,000 cDNA clones, resulting in 44,288 high quality ESTs. The assembled dataset of unique transcripts (unigenes consists of 6,265 contigs and 2,459 singletons that mapped to 6,467 locations on the G. clavigera reference genome, representing ~70% of the predicted G. clavigera genes. Although only 54% of the unigenes matched characterized proteins at the NCBI database, this dataset extensively covers major metabolic pathways, cellular processes, and genes necessary for response to environmental stimuli and genetic information processing. Furthermore, we identified genes expressed in spores prior to germination, and genes involved in response to treatment with lodgepole pine phloem extract (LPPE. Conclusions We provide a comprehensively annotated EST dataset for G. clavigera that represents a rich resource for gene characterization in this and other ophiostomatoid fungi. Genes expressed in response to LPPE treatment are indicative of fungal oxidative stress response. We identified two clusters of potentially functionally related genes responsive to LPPE treatment. Furthermore, we report a simple method for identifying contig misassemblies in de novo assembled EST collections caused by gene overlap on the genome.

  8. Discovery of clubroot-resistant genes in Brassica napus by transcriptome sequencing.

    Science.gov (United States)

    Chen, S W; Liu, T; Gao, Y; Zhang, C; Peng, S D; Bai, M B; Li, S J; Xu, L; Zhou, X Y; Lin, L B

    2016-01-01

    Clubroot significantly affects plants of the Brassicaceae family and is one of the main diseases causing serious losses in B. napus yield. Few studies have investigated the clubroot-resistance mechanism in B. napus. Identification of clubroot-resistant genes may be used in clubroot-resistant breeding, as well as to elucidate the molecular mechanism behind B. napus clubroot-resistance. We used three B. napus transcriptome samples to construct a transcriptome sequencing library by using Illumina HiSeq™ 2000 sequencing and bioinformatic analysis. In total, 171 million high-quality reads were obtained, containing 96,149 unigenes of N50-value. We aligned the obtained unigenes with the Nr, Swiss-Prot, clusters of orthologous groups, and gene ontology databases and annotated their functions. In the Kyoto encyclopedia of genes and genomes database, 25,033 unigenes (26.04%) were assigned to 124 pathways. Many genes, including broad-spectrum disease-resistance genes, specific clubroot-resistant genes, and genes related to indole-3-acetic acid (IAA) signal transduction, cytokinin synthesis, and myrosinase synthesis in the Huashuang 3 variety of B. napus were found to be related to clubroot-resistance. The effective clubroot-resistance observed in this variety may be due to the induced increased expression of these disease-resistant genes and strong inhibition of the IAA signal transduction, cytokinin synthesis, and myrosinase synthesis. The homology observed between unigenes 0048482, 0061770 and the Crr1 gene shared 94% nucleotide similarity. Furthermore, unigene 0061770 could have originated from an inversion of the Crr1 5'-end sequence.

  9. Discovery of diversity in xylan biosynthetic genes by transcriptional profiling of a heteroxylan containing mucilaginous tissue

    Directory of Open Access Journals (Sweden)

    Jacob Kruger Jensen

    2013-06-01

    Full Text Available The exact biochemical steps of xylan backbone synthesis remain elusive. In Arabidopsis, three non-redundant genes from two glycosyltransferase (GT families, IRX9 and IRX14 from GT43 and IRX10 from GT47, are candidates for forming the xylan backbone. In other plants, evidence exists that different tissues express these three genes at widely different levels, which suggests that diversity in the makeup of the xylan synthase complex exists. Recently we have profiled the transcripts present in the developing mucilaginous tissue of psyllium (Plantago ovata Forsk. This tissue was found to have high expression levels of an IRX10 homolog, but very low levels of the two GT43 family members. This contrasts with recent wheat endosperm tissue profiling that found a relatively high abundance of the GT43 family members. We have performed an in-depth analysis of all GTs genes expressed in four developmental stages of the psyllium mucilagenous layer and in a single stage of the psyllium stem using RNA-Seq. This analysis revealed several IRX10 homologs, an expansion in GT61 (homologs of At3g18170/At3g18180, and several GTs from other GT families that are highly abundant and specifically expressed in the mucilaginous tissue. Our current hypothesis is that the four IRX10 genes present in the mucilagenous tissues have evolved to function without the GT43 genes. These four genes represent some of the most divergent IRX10 genes identified to date. Conversely, those present in the psyllium stem are very similar to those in other eudicots. This suggests these genes are under selective pressure, likely due to the synthesis of the various xylan structures present in mucilage that has a different biochemical role than that present in secondary walls. The numerous GT61 family members also show a wide sequence diversity and may be responsible for the larger number of side chain structures present in the psyllium mucilage.

  10. Analysis of cassava (Manihot esculenta) ESTs: A tool for the discovery of genes

    International Nuclear Information System (INIS)

    Cassava (Manihot esculenta) is the main source of calories for more than 1,000 millions of people around the world and has been consolidated as the fourth most important crop after rice, corn and wheat. Cassava is considered tolerant to abiotic and biotic stress conditions; nevertheless these characteristics are mainly present in non-commercial varieties. Genetic breeding strategies represent an alternative to introduce the desirable characteristics into commercial varieties. A fundamental step for accelerating the genetic breeding process in cassava requires the identification of genes associated to these characteristics. One rapid strategy for the identification of genes is the possibility to have a large collection of ESTs (expressed sequence tag). In this study, a complete analysis of cassava ESTs was done. The cassava ESTs represent 80,459 sequences which were assembled in a set of 29,231 unique genes (unigen), comprising 10,945 contigs and 18,286 singletones. These 29,231 unique genes represent about 80% of the genes of the cassava's genome. Between 5% and 10% of the unigenes of cassava not show similarity to any sequences present in the NCBI database and could be consider as cassava specific genes. a functional category was assigned to a group of sequences of the unigen set (29%) following the Gene Ontology Vocabulary. the molecular function component was the best represented with 43% of the sequences, followed by the biological process component (38%) and finally the cellular component with 19%. in the cassava ESTs collection, 3,709 microsatellites were identified and they could be used as molecular markers. this study represents an important contribution to the knowledge of the functional genomic structure of cassava and constitutes an important tool for the identification of genes associated to agricultural characteristics of interest that could be employed in cassava breeding programs.

  11. Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data

    Directory of Open Access Journals (Sweden)

    Gruber Stephen B

    2005-02-01

    Full Text Available Abstract Background A critical step in processing oligonucleotide microarray data is combining the information in multiple probes to produce a single number that best captures the expression level of a RNA transcript. Several systematic studies comparing multiple methods for array processing have used tightly controlled calibration data sets as the basis for comparison. Here we compare performances for seven processing methods using two data sets originally collected for disease profiling studies. An emphasis is placed on understanding sensitivity for detecting differentially expressed genes in terms of two key statistical determinants: test statistic variability for non-differentially expressed genes, and test statistic size for truly differentially expressed genes. Results In the two data sets considered here, up to seven-fold variation across the processing methods was found in the number of genes detected at a given false discovery rate (FDR. The best performing methods called up to 90% of the same genes differentially expressed, had less variable test statistics under randomization, and had a greater number of large test statistics in the experimental data. Poor performance of one method was directly tied to a tendency to produce highly variable test statistic values under randomization. Based on an overall measure of performance, two of the seven methods (Dchip and a trimmed mean approach are superior in the two data sets considered here. Two other methods (MAS5 and GCRMA-EB are inferior, while results for the other three methods are mixed. Conclusions Choice of processing method has a major impact on differential expression analysis of microarray data. Previously reported performance analyses using tightly controlled calibration data sets are not highly consistent with results reported here using data from human tissue samples. Performance of array processing methods in disease profiling and other realistic biological studies should be

  12. Discovery and Characterization of Two Novel Salt-Tolerance Genes in Puccinellia tenuiflora

    Directory of Open Access Journals (Sweden)

    Ying Li

    2014-09-01

    Full Text Available Puccinellia tenuiflora is a monocotyledonous halophyte that is able to survive in extreme saline soil environments at an alkaline pH range of 9–10. In this study, we transformed full-length cDNAs of P. tenuiflora into Saccharomyces cerevisiae by using the full-length cDNA over-expressing gene-hunting system to identify novel salt-tolerance genes. In all, 32 yeast clones overexpressing P. tenuiflora cDNA were obtained by screening under NaCl stress conditions; of these, 31 clones showed stronger tolerance to NaCl and were amplified using polymerase chain reaction (PCR and sequenced. Four novel genes encoding proteins with unknown function were identified; these genes had no homology with genes from higher plants. Of the four isolated genes, two that encoded proteins with two transmembrane domains showed the strongest resistance to 1.3 M NaCl. RT-PCR and northern blot analysis of P. tenuiflora cultured cells confirmed the endogenous NaCl-induced expression of the two proteins. Both of the proteins conferred better tolerance in yeasts to high salt, alkaline and osmotic conditions, some heavy metals and H2O2 stress. Thus, we inferred that the two novel proteins might alleviate oxidative and other stresses in P. tenuiflora.

  13. Discovery and characterization of novel vascular and hematopoietic genes downstream of etsrp in zebrafish.

    Directory of Open Access Journals (Sweden)

    Gustavo A Gomez

    Full Text Available The transcription factor Etsrp is required for vasculogenesis and primitive myelopoiesis in zebrafish. When ectopically expressed, etsrp is sufficient to induce the expression of many vascular and myeloid genes in zebrafish. The mammalian homolog of etsrp, ER71/Etv2, is also essential for vascular and hematopoietic development. To identify genes downstream of etsrp, gain-of-function experiments were performed for etsrp in zebrafish embryos followed by transcription profile analysis by microarray. Subsequent in vivo expression studies resulted in the identification of fourteen genes with blood and/or vascular expression, six of these being completely novel. Regulation of these genes by etsrp was confirmed by ectopic induction in etsrp overexpressing embryos and decreased expression in etsrp deficient embryos. Additional functional analysis of two newly discovered genes, hapln1b and sh3gl3, demonstrates their importance in embryonic vascular development. The results described here identify a group of genes downstream of etsrp likely to be critical for vascular and/or myeloid development.

  14. Contributions of computational chemistry and biophysical techniques to fragment-based drug discovery.

    Science.gov (United States)

    Gozalbes, Rafael; Carbajo, Rodrigo J; Pineda-Lucena, Antonio

    2010-01-01

    In the last decade, fragment-based drug discovery (FBDD) has evolved from a novel approach in the search of new hits to a valuable alternative to the high-throughput screening (HTS) campaigns of many pharmaceutical companies. The increasing relevance of FBDD in the drug discovery universe has been concomitant with an implementation of the biophysical techniques used for the detection of weak inhibitors, e.g. NMR, X-ray crystallography or surface plasmon resonance (SPR). At the same time, computational approaches have also been progressively incorporated into the FBDD process and nowadays several computational tools are available. These stretch from the filtering of huge chemical databases in order to build fragment-focused libraries comprising compounds with adequate physicochemical properties, to more evolved models based on different in silico methods such as docking, pharmacophore modelling, QSAR and virtual screening. In this paper we will review the parallel evolution and complementarities of biophysical techniques and computational methods, providing some representative examples of drug discovery success stories by using FBDD.

  15. Enhancing service discovery using cat swarm optimisation based web service clustering

    Directory of Open Access Journals (Sweden)

    Sunaina Kotekar

    2016-09-01

    Full Text Available Web service discovery is a critical task in service oriented application development. Due to extensive proliferation in the number of available services, it is challenging to obtain all the relevant services available for a given task. For the retrieval of most relevant Web services, a user would have to use those service-specific terms that best describe and match the natural language documentation contained within a service description. This process can be time intensive, due to functional diversity of available services in a repository. Domain specific clustering of Web Services based on the similarities of their functionalities would greatly boost the ability of a Web service search engine to retrieve the most relevant service. In this paper, we propose a novel technique to cluster service documents into functionally similar service groups using the Cat Swarm Optimisation Algorithm. We present experimental results that show that the proposed technique was effective and enhanced the process of service discovery.

  16. A wavelet-based approach to the discovery of themes and sections in monophonic melodies

    DEFF Research Database (Denmark)

    Velarde, Gissel; Meredith, David

    We present the computational method submitted to the MIREX 2014 Discovery of Repeated Themes & Sections task, and the results on the monophonic version of the JKU Patterns Development Database. In the context of pattern discovery in monophonic music, the idea behind our method is that, with a good...... melodic structure in terms of segments, it should be possible to gather similar segments into clusters and rank their salience within the piece. We present an approach to this problem and how we address it. In general terms, we represent melodies either as raw 1D pitch signals or as these signals filtered...... with the continuous wavelet transform (CWT) using the Haar wavelet. We then segment the signal either into constant duration segments or at the resulting coefficients’ modulus local maxima. Segments are concatenated based on their contiguous city-block distance. The concatenated segments are compared using city...

  17. Research on Hotspot Discovery in Internet Public Opinions Based on Improved -Means

    Directory of Open Access Journals (Sweden)

    Gensheng Wang

    2013-01-01

    Full Text Available How to discover hotspot in the Internet public opinions effectively is a hot research field for the researchers related which plays a key role for governments and corporations to find useful information from mass data in the Internet. An improved -means algorithm for hotspot discovery in internet public opinions is presented based on the analysis of existing defects and calculation principle of original -means algorithm. First, some new methods are designed to preprocess website texts, select and express the characteristics of website texts, and define the similarity between two website texts, respectively. Second, clustering principle and the method of initial classification centers selection are analyzed and improved in order to overcome the limitations of original -means algorithm. Finally, the experimental results verify that the improved algorithm can improve the clustering stability and classification accuracy of hotspot discovery in internet public opinions when used in practice.

  18. Research on hotspot discovery in internet public opinions based on improved K-means.

    Science.gov (United States)

    Wang, Gensheng

    2013-01-01

    How to discover hotspot in the Internet public opinions effectively is a hot research field for the researchers related which plays a key role for governments and corporations to find useful information from mass data in the Internet. An improved K-means algorithm for hotspot discovery in internet public opinions is presented based on the analysis of existing defects and calculation principle of original K-means algorithm. First, some new methods are designed to preprocess website texts, select and express the characteristics of website texts, and define the similarity between two website texts, respectively. Second, clustering principle and the method of initial classification centers selection are analyzed and improved in order to overcome the limitations of original K-means algorithm. Finally, the experimental results verify that the improved algorithm can improve the clustering stability and classification accuracy of hotspot discovery in internet public opinions when used in practice.

  19. Use of model organism and disease databases to support matchmaking for human disease gene discovery.

    Science.gov (United States)

    Mungall, Christopher J; Washington, Nicole L; Nguyen-Xuan, Jeremy; Condit, Christopher; Smedley, Damian; Köhler, Sebastian; Groza, Tudor; Shefchek, Kent; Hochheiser, Harry; Robinson, Peter N; Lewis, Suzanna E; Haendel, Melissa A

    2015-10-01

    The Matchmaker Exchange application programming interface (API) allows searching a patient's genotypic or phenotypic profiles across clinical sites, for the purposes of cohort discovery and variant disease causal validation. This API can be used not only to search for matching patients, but also to match against public disease and model organism data. This public disease data enable matching known diseases and variant-phenotype associations using phenotype semantic similarity algorithms developed by the Monarch Initiative. The model data can provide additional evidence to aid diagnosis, suggest relevant models for disease mechanism and treatment exploration, and identify collaborators across the translational divide. The Monarch Initiative provides an implementation of this API for searching multiple integrated sources of data that contextualize the knowledge about any given patient or patient family into the greater biomedical knowledge landscape. While this corpus of data can aid diagnosis, it is also the beginning of research to improve understanding of rare human diseases. PMID:26269093

  20. Transcriptome analysis and discovery of genes involved in immune pathways from hepatopancreas of microbial challenged mitten crab Eriocheir sinensis.

    Directory of Open Access Journals (Sweden)

    Xihong Li

    Full Text Available BACKGROUND: The Chinese mitten crab Eriocheir sinensis is an important economic crustacean and has been seriously attacked by various diseases, which requires more and more information for immune relevant genes on genome background. Recently, high-throughput RNA sequencing (RNA-seq technology provides a powerful and efficient method for transcript analysis and immune gene discovery. METHODS/PRINCIPAL FINDINGS: A cDNA library from hepatopancreas of E. sinensis challenged by a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 10(8 cfu·mL(-1 was constructed and randomly sequenced using Illumina technique. Totally 39.76 million clean reads were assembled to 70,300 unigenes. After ruling out short-length and low-quality sequences, 52,074 non-redundant unigenes were compared to public databases for homology searching and 17,617 of them showed high similarity to sequences in NCBI non-redundant protein (Nr database. For function classification and pathway assignment, 18,734 (36.00% unigenes were categorized to three Gene Ontology (GO categories, 12,243 (23.51% were classified to 25 Clusters of Orthologous Groups (COG, and 8,983 (17.25% were assigned to six Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. Potentially, 24, 14, 47 and 132 unigenes were characterized to be involved in Toll, IMD, JAK-STAT and MAPK pathways, respectively. CONCLUSIONS/SIGNIFICANCE: This is the first systematical transcriptome analysis of components relating to innate immune pathways in E. sinensis. Functional genes and putative pathways identified here will contribute to better understand immune system and prevent various diseases in crab.

  1. Discovery of Phytophthora infestans genes expressed in planta through mining of cDNA libraries.

    Directory of Open Access Journals (Sweden)

    Roberto Sierra

    Full Text Available BACKGROUND: Phytophthora infestans (Mont. de Bary causes late blight of potato and tomato, and has a broad host range within the Solanaceae family. Most studies of the Phytophthora--Solanum pathosystem have focused on gene expression in the host and have not analyzed pathogen gene expression in planta. METHODOLOGY/PRINCIPAL FINDINGS: We describe in detail an in silico approach to mine ESTs from inoculated host plants deposited in a database in order to identify particular pathogen sequences associated with disease. We identified candidate effector genes through mining of 22,795 ESTs corresponding to P. infestans cDNA libraries in compatible and incompatible interactions with hosts from the Solanaceae family. CONCLUSIONS/SIGNIFICANCE: We annotated genes of P. infestans expressed in planta associated with late blight using different approaches and assigned putative functions to 373 out of the 501 sequences found in the P. infestans genome draft, including putative secreted proteins, domains associated with pathogenicity and poorly characterized proteins ideal for further experimental studies. Our study provides a methodology for analyzing cDNA libraries and provides an understanding of the plant--oomycete pathosystems that is independent of the host, condition, or type of sample by identifying genes of the pathogen expressed in planta.

  2. Semantic MEDLINE for discovery browsing: using semantic predications and the literature-based discovery paradigm to elucidate a mechanism for the obesity paradox.

    Science.gov (United States)

    Cairelli, Michael J; Miller, Christopher M; Fiszman, Marcelo; Workman, T Elizabeth; Rindflesch, Thomas C

    2013-01-01

    Applying the principles of literature-based discovery (LBD), we elucidate the paradox that obesity is beneficial in critical care despite contributing to disease generally. Our approach enhances a previous extension to LBD, called "discovery browsing," and is implemented using Semantic MEDLINE, which summarizes the results of a PubMed search into an interactive graph of semantic predications. The methodology allows a user to construct argumentation underpinning an answer to a biomedical question by engaging the user in an iterative process between system output and user knowledge. Components of the Semantic MEDLINE output graph identified as "interesting" by the user both contribute to subsequent searches and are constructed into a logical chain of relationships constituting an explanatory network in answer to the initial question. Based on this methodology we suggest that phthalates leached from plastic in critical care interventions activate PPAR gamma, which is anti-inflammatory and abundant in obese patients.

  3. Physics-based gene identification: proof of concept for Plasmodium falciparum.

    Science.gov (United States)

    Yeramian, Edouard; Bonnefoy, Serge; Langsley, Gordon

    2002-01-01

    The ab initio prediction of new genes in eukaryotic genomes represents a difficult task, notably for the identification of complex split genes. A Physics-Based Gene Identification (PBGI) method was formulated recently (Yeramian, Gene, 255, 139-150, 151-168, 2000a,b) to address this problem, taking as a model the Plasmodium falciparum genome. Here, the predictive power of this method is put under experimental test for this genome. The presented results demonstrate the usefulness of the PBGI as a gene-identification tool for P. falciparum, notably for the discovery of new genes with no homology to known genes. Perspectives opened by this new method for other eukaryotic genomes are also mentioned.

  4. Generation of ESTs in Vitis vinifera wine grape (Cabernet Sauvignon) and table grape (Muscat Hamburg) and discovery of new candidate genes with potential roles in berry development.

    Science.gov (United States)

    Peng, Fred Y; Reid, Karen E; Liao, Nancy; Schlosser, James; Lijavetzky, Diego; Holt, Robert; Martínez Zapater, José M; Jones, Steven; Marra, Marco; Bohlmann, Jörg; Lund, Steven T

    2007-11-01

    We report the generation and analysis of a total of 77,583 expressed sequence tags (ESTs) from two grapevine (Vitis vinifera L.) cultivars, Cabernet Sauvignon (wine grape) and Muscat Hamburg (table grape) with a focus on EST sequence quality and assembly optimization. The majority of the ESTs were derived from normalized cDNA libraries representing berry pericarp and seed developmental series, pooled non-berry tissues including root, flower, and leaf in Cabernet Sauvignon, and pooled tissues of berry, seed, and flower in Muscat Hamburg. EST and unigene sequence quality were determined by computational filtering coupled with small-scale contig reassembly, manual review, and BLAST analyses. EST assembly was optimized to better discriminate among closely related paralogs using two independent grape sequence sets, a previously published set of Vitis spp. gene families and our EST dataset derived from pooled leaf, flower, and root tissues of Cabernet Sauvignon. Sequence assembly within individual libraries indicated that those prepared from pooled tissues contributed the most to gene discovery. Annotations based upon searches against multiple databases including tomato and strawberry sequences helped to identify putative functions of ESTs and unigenes, particularly with respect to fleshy fruit development. Sequence comparison among the three wine grape libraries identified a number of genes preferentially expressed in the pericarp tissue, including transcription factors, receptor-like protein kinases, and hexose transporters. Gene ontology (GO) classification in the biological process aspect showed that GO categories corresponding to 'transport' and 'cell organization and biogenesis', which are associated with metabolite movement and cell wall structural changes during berry ripening, were higher in pericarp than in other tissues in the wine grape studied. The sequence data were used to characterize potential roles of new genes in berry development and composition. PMID

  5. Discovery of molecular mechanisms of traditional Chinese medicinal formula Si-Wu-Tang using gene expression microarray and connectivity map.

    Directory of Open Access Journals (Sweden)

    Zhining Wen

    Full Text Available To pursue a systematic approach to discovery of mechanisms of action of traditional Chinese medicine (TCM, we used microarrays, bioinformatics and the "Connectivity Map" (CMAP to examine TCM-induced changes in gene expression. We demonstrated that this approach can be used to elucidate new molecular targets using a model TCM herbal formula Si-Wu-Tang (SWT which is widely used for women's health. The human breast cancer MCF-7 cells treated with 0.1 µM estradiol or 2.56 mg/ml of SWT showed dramatic gene expression changes, while no significant change was detected for ferulic acid, a known bioactive compound of SWT. Pathway analysis using differentially expressed genes related to the treatment effect identified that expression of genes in the nuclear factor erythroid 2-related factor 2 (Nrf2 cytoprotective pathway was most significantly affected by SWT, but not by estradiol or ferulic acid. The Nrf2-regulated genes HMOX1, GCLC, GCLM, SLC7A11 and NQO1 were upregulated by SWT in a dose-dependent manner, which was validated by real-time RT-PCR. Consistently, treatment with SWT and its four herbal ingredients resulted in an increased antioxidant response element (ARE-luciferase reporter activity in MCF-7 and HEK293 cells. Furthermore, the gene expression profile of differentially expressed genes related to SWT treatment was used to compare with those of 1,309 compounds in the CMAP database. The CMAP profiles of estradiol-treated MCF-7 cells showed an excellent match with SWT treatment, consistent with SWT's widely claimed use for women's diseases and indicating a phytoestrogenic effect. The CMAP profiles of chemopreventive agents withaferin A and resveratrol also showed high similarity to the profiles of SWT. This study identified SWT as an Nrf2 activator and phytoestrogen, suggesting its use as a nontoxic chemopreventive agent, and demonstrated the feasibility of combining microarray gene expression profiling with CMAP mining to discover mechanisms

  6. Discovery and identification of candidate genes from the chitinase gene family for Verticillium dahliae resistance in cotton

    Science.gov (United States)

    Xu, Jun; Xu, Xiaoyang; Tian, Liangliang; Wang, Guilin; Zhang, Xueying; Wang, Xinyu; Guo, Wangzhen

    2016-01-01

    Verticillium dahliae, a destructive and soil-borne fungal pathogen, causes massive losses in cotton yields. However, the resistance mechanism to V. dahilae in cotton is still poorly understood. Accumulating evidence indicates that chitinases are crucial hydrolytic enzymes, which attack fungal pathogens by catalyzing the fungal cell wall degradation. As a large gene family, to date, the chitinase genes (Chis) have not been systematically analyzed and effectively utilized in cotton. Here, we identified 47, 49, 92, and 116 Chis from four sequenced cotton species, diploid Gossypium raimondii (D5), G. arboreum (A2), tetraploid G. hirsutum acc. TM-1 (AD1), and G. barbadense acc. 3–79 (AD2), respectively. The orthologous genes were not one-to-one correspondence in the diploid and tetraploid cotton species, implying changes in the number of Chis in different cotton species during the evolution of Gossypium. Phylogenetic classification indicated that these Chis could be classified into six groups, with distinguishable structural characteristics. The expression patterns of Chis indicated their various expressions in different organs and tissues, and in the V. dahliae response. Silencing of Chi23, Chi32, or Chi47 in cotton significantly impaired the resistance to V. dahliae, suggesting these genes might act as positive regulators in disease resistance to V. dahliae. PMID:27354165

  7. Discovery and identification of candidate genes from the chitinase gene family for Verticillium dahliae resistance in cotton.

    Science.gov (United States)

    Xu, Jun; Xu, Xiaoyang; Tian, Liangliang; Wang, Guilin; Zhang, Xueying; Wang, Xinyu; Guo, Wangzhen

    2016-06-29

    Verticillium dahliae, a destructive and soil-borne fungal pathogen, causes massive losses in cotton yields. However, the resistance mechanism to V. dahilae in cotton is still poorly understood. Accumulating evidence indicates that chitinases are crucial hydrolytic enzymes, which attack fungal pathogens by catalyzing the fungal cell wall degradation. As a large gene family, to date, the chitinase genes (Chis) have not been systematically analyzed and effectively utilized in cotton. Here, we identified 47, 49, 92, and 116 Chis from four sequenced cotton species, diploid Gossypium raimondii (D5), G. arboreum (A2), tetraploid G. hirsutum acc. TM-1 (AD1), and G. barbadense acc. 3-79 (AD2), respectively. The orthologous genes were not one-to-one correspondence in the diploid and tetraploid cotton species, implying changes in the number of Chis in different cotton species during the evolution of Gossypium. Phylogenetic classification indicated that these Chis could be classified into six groups, with distinguishable structural characteristics. The expression patterns of Chis indicated their various expressions in different organs and tissues, and in the V. dahliae response. Silencing of Chi23, Chi32, or Chi47 in cotton significantly impaired the resistance to V. dahliae, suggesting these genes might act as positive regulators in disease resistance to V. dahliae.

  8. Focusing on shared subpockets - new developments in fragment based drug discovery

    Science.gov (United States)

    Abdelraheem, Eman M. M.; Camacho, Carlos; Dömling, Alexander

    2016-01-01

    Introduction Protein–protein interactions (PPIs) are important targets for understanding fundamental biology and for the development of therapeutic agents. Based on different physicochemical properties, numerous pieces of software (e.g PocketQuery, Anchor and FTMap) have been reported to find pockets on protein surfaces and have applications in facilitating the design and discovery of small molecular weight compounds which bind to these pockets. Areas covered The authors discuss a pocket-centric method of analyzing protein-protein interaction interfaces, which prioritize their pockets for small molecule drug discovery and the importance of multicomponent reaction (MCR) chemistry as starting points for undruggable targets. The authors also provide their perspectives on the field Expert opinion Only the tight interplay of efficient computational methods capable of screening a large chemical space and fast synthetic chemistry will lead to progress in the rational design of PPI antagonists in the future. Early drug discovery platforms will also benefit from efficient rapid feedback loops from early clinical research back to molecular design and the medicinal chemistry bench. PMID:26296101

  9. Context-aware computing-based reducing cost of service method in resource discovery and interaction

    Institute of Scientific and Technical Information of China (English)

    TANG Shan-cheng; HOU Yi-bin

    2004-01-01

    Reducing cost of service is an important goal for resource discovery and interaction technologies. The shortcomings of transhipment-method and hibernation-method are to increase holistic cost of service and to slower resource discovery respectively. To overcome these shortcomings, a context-aware computing-based method is developed. This method, firstly,analyzes the courses of devices using resource discovery and interaction technologies to identify some types of context related to reducing cost of service, then, chooses effective methods such as stopping broadcast and hibernation to reduce cost of service according to information supplied by the context but not the transhipment-method's simple hibernations. The results of experiments indicate that under the worst condition this method overcomes the shortcomings of transhipment-method, makes the "poor" devices hibernate longer than hibernation-method to reduce cost of service more effectively, and discovers resources faster than hibernation-method; under the best condition it is far better than hibernation-method in all aspects.

  10. Fuzzy-Based Knowledge Discovery from Heterogeneous Data in Planting Systems for Elderly LOHAS

    Institute of Scientific and Technical Information of China (English)

    Hung-Chih Hsueh; Jung-Yi Jiang; Jen-Sheng Tsai; Wen-Hao Tsai; Kuan-Rong Lee; Yau-Hwang Kuo

    2015-01-01

    Abstract⎯In this paper, we propose a knowledge discovery method based on the fuzzy set theory to help elders with plant cultivation. Initially, the fuzzy sets are constructed by using the feature selection and statistical interval estimation. The min-max inference and the center of gravity defuzzification method are then used to output a candidate pattern set. Finally, a pattern discovery is adopted to obtain the patterns from the candidate set for the cultivation suggestions by considering the frequency weight and user’s experience. In order to demonstrate the performance of our method in planting systems, we conduct a clicks-and-mortar cultivation platform, namely Eden Garden, for the elderly lifestyles of health and sustainability (LOHAS). The experimental results show that the accuracy rate of our knowledge discovery method can reach up to 85%. Moreover, the results of the LOHAS index scale table present that the happiness of the elders is increasing while the elders are using our proposed method.

  11. Symbolic representation based on trend features for knowledge discovery in long time series

    Institute of Scientific and Technical Information of China (English)

    Hong YIN; Shu-qiang YANG; Xiao-qian ZHU; Shao-dong MA; Lu-min ZHANG

    2015-01-01

    The symbolic representation of time series has attracted much research interest recently. The high dimensionality typical of the data is challenging, especially as the time series becomes longer. The wide distribution of sensors collecting more and more data exacerbates the problem. Representing a time series effectively is an essential task for decision-making activities such as classification, prediction, and knowledge discovery. In this paper, we propose a new symbolic representation method for long time series based on trend features, called trend feature symbolic approximation (TFSA). The method uses a two-step mechanism to segment long time series rapidly. Unlike some previous symbolic methods, it focuses on retaining most of the trend features and patterns of the original series. A time series is represented by trend symbols, which are also suitable for use in knowledge discovery, such as association rules mining. TFSA provides the lower bounding guarantee. Experimental results show that, compared with some previous methods, it not only has better segmentation efficiency and classification accuracy, but also is applicable for use in knowledge discovery from time series.

  12. Integration of lyoplate based flow cytometry and computational analysis for standardized immunological biomarker discovery.

    Directory of Open Access Journals (Sweden)

    Federica Villanova

    Full Text Available Discovery of novel immune biomarkers for monitoring of disease prognosis and response to therapy in immune-mediated inflammatory diseases is an important unmet clinical need. Here, we establish a novel framework for immunological biomarker discovery, comparing a conventional (liquid flow cytometry platform (CFP and a unique lyoplate-based flow cytometry platform (LFP in combination with advanced computational data analysis. We demonstrate that LFP had higher sensitivity compared to CFP, with increased detection of cytokines (IFN-γ and IL-10 and activation markers (Foxp3 and CD25. Fluorescent intensity of cells stained with lyophilized antibodies was increased compared to cells stained with liquid antibodies. LFP, using a plate loader, allowed medium-throughput processing of samples with comparable intra- and inter-assay variability between platforms. Automated computational analysis identified novel immunophenotypes that were not detected with manual analysis. Our results establish a new flow cytometry platform for standardized and rapid immunological biomarker discovery with wide application to immune-mediated diseases.

  13. A REGISTRY BASED DISCOVERY MECHANISM FOR E-LEARNING WEB SERVICES

    Directory of Open Access Journals (Sweden)

    Demian Antony D’Mello

    2012-10-01

    Full Text Available E-learning is currently taking the shape of a Web Service in various applications i.e. learners can search for suitable content, book it, pay for it and consume it. This paper shows how the search aspects for e-learning content can technically be combined with the recent standardization efforts that aim at content exchangeability and efficient reuse. A repository for learning object publication and search is proposed that essentially adapts the UDDI framework used in commercial Web Services to the e-learning context. To adopt Web Services technology towards the reusability and aggregation of e-learning services, the conceptual Web Services architecture and its building blocks need to be augmented. The objective of this research is to design broker based registry architecture for e- Web services which facilitates effective elearning content/service discovery for the consumption or composition. The implementation followed by experimentation showed that, the proposed e-learning discovery architecture facilitates effective discovery with moderate performance in terms of overall response.

  14. Transcriptome analysis of Catharanthus roseus for gene discovery and expression profiling.

    Science.gov (United States)

    Verma, Mohit; Ghangal, Rajesh; Sharma, Raghvendra; Sinha, Alok K; Jain, Mukesh

    2014-01-01

    The medicinal plant, Catharanthus roseus, accumulates wide range of terpenoid indole alkaloids, which are well documented therapeutic agents. In this study, deep transcriptome sequencing of C. roseus was carried out to identify the pathways and enzymes (genes) involved in biosynthesis of these compounds. About 343 million reads were generated from different tissues (leaf, flower and root) of C. roseus using Illumina platform. Optimization of de novo assembly involving a two-step process resulted in a total of 59,220 unique transcripts with an average length of 1284 bp. Comprehensive functional annotation and gene ontology (GO) analysis revealed the representation of many genes involved in different biological processes and molecular functions. In total, 65% of C. roseus transcripts showed homology with sequences available in various public repositories, while remaining 35% unigenes may be considered as C. roseus specific. In silico analysis revealed presence of 11,620 genic simple sequence repeats (excluding mono-nucleotide repeats) and 1820 transcription factor encoding genes in C. roseus transcriptome. Expression analysis showed roots and leaves to be actively participating in bisindole alkaloid production with clear indication that enzymes involved in pathway of vindoline and vinblastine biosynthesis are restricted to aerial tissues. Such large-scale transcriptome study provides a rich source for understanding plant-specialized metabolism, and is expected to promote research towards production of plant-derived pharmaceuticals. PMID:25072156

  15. Transcriptome analysis of Catharanthus roseus for gene discovery and expression profiling.

    Directory of Open Access Journals (Sweden)

    Mohit Verma

    Full Text Available The medicinal plant, Catharanthus roseus, accumulates wide range of terpenoid indole alkaloids, which are well documented therapeutic agents. In this study, deep transcriptome sequencing of C. roseus was carried out to identify the pathways and enzymes (genes involved in biosynthesis of these compounds. About 343 million reads were generated from different tissues (leaf, flower and root of C. roseus using Illumina platform. Optimization of de novo assembly involving a two-step process resulted in a total of 59,220 unique transcripts with an average length of 1284 bp. Comprehensive functional annotation and gene ontology (GO analysis revealed the representation of many genes involved in different biological processes and molecular functions. In total, 65% of C. roseus transcripts showed homology with sequences available in various public repositories, while remaining 35% unigenes may be considered as C. roseus specific. In silico analysis revealed presence of 11,620 genic simple sequence repeats (excluding mono-nucleotide repeats and 1820 transcription factor encoding genes in C. roseus transcriptome. Expression analysis showed roots and leaves to be actively participating in bisindole alkaloid production with clear indication that enzymes involved in pathway of vindoline and vinblastine biosynthesis are restricted to aerial tissues. Such large-scale transcriptome study provides a rich source for understanding plant-specialized metabolism, and is expected to promote research towards production of plant-derived pharmaceuticals.

  16. A Sorghum Mutant Resource as an Efficient Platform for Gene Discovery in Grasses.

    Science.gov (United States)

    Jiao, Yinping; Burke, John; Chopra, Ratan; Burow, Gloria; Chen, Junping; Wang, Bo; Hayes, Chad; Emendack, Yves; Ware, Doreen; Xin, Zhanguo

    2016-07-01

    Sorghum (Sorghum bicolor) is a versatile C4 crop and a model for research in family Poaceae. High-quality genome sequence is available for the elite inbred line BTx623, but functional validation of genes remains challenging due to the limited genomic and germplasm resources available for comprehensive analysis of induced mutations. In this study, we generated 6400 pedigreed M4 mutant pools from EMS-mutagenized BTx623 seeds through single-seed descent. Whole-genome sequencing of 256 phenotyped mutant lines revealed >1.8 million canonical EMS-induced mutations, affecting >95% of genes in the sorghum genome. The vast majority (97.5%) of the induced mutations were distinct from natural variations. To demonstrate the utility of the sequenced sorghum mutant resource, we performed reverse genetics to identify eight genes potentially affecting drought tolerance, three of which had allelic mutations and two of which exhibited exact cosegregation with the phenotype of interest. Our results establish that a large-scale resource of sequenced pedigreed mutants provides an efficient platform for functional validation of genes in sorghum, thereby accelerating sorghum breeding. Moreover, findings made in sorghum could be readily translated to other members of the Poaceae via integrated genomics approaches. PMID:27354556

  17. Large-Scale Discovery of Disease-Disease and Disease-Gene Associations.

    Science.gov (United States)

    Gligorijevic, Djordje; Stojanovic, Jelena; Djuric, Nemanja; Radosavljevic, Vladan; Grbovic, Mihajlo; Kulathinal, Rob J; Obradovic, Zoran

    2016-01-01

    Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies. PMID:27578529

  18. Using Osteoclast Differentiation as a Model for Gene Discovery in an Undergraduate Cell Biology Laboratory

    Science.gov (United States)

    Birnbaum, Mark J.; Picco, Jenna; Clements, Meghan; Witwicka, Hanna; Yang, Meiheng; Hoey, Margaret T.; Odgren, Paul R.

    2010-01-01

    A key goal of molecular/cell biology/biotechnology is to identify essential genes in virtually every physiological process to uncover basic mechanisms of cell function and to establish potential targets of drug therapy combating human disease. This article describes a semester-long, project-oriented molecular/cellular/biotechnology laboratory…

  19. Network-based drug discovery by integrating systems biology and computational technologies.

    Science.gov (United States)

    Leung, Elaine L; Cao, Zhi-Wei; Jiang, Zhi-Hong; Zhou, Hua; Liu, Liang

    2013-07-01

    Network-based intervention has been a trend of curing systemic diseases, but it relies on regimen optimization and valid multi-target actions of the drugs. The complex multi-component nature of medicinal herbs may serve as valuable resources for network-based multi-target drug discovery due to its potential treatment effects by synergy. Recently, robustness of multiple systems biology platforms shows powerful to uncover molecular mechanisms and connections between the drugs and their targeting dynamic network. However, optimization methods of drug combination are insufficient, owning to lacking of tighter integration across multiple '-omics' databases. The newly developed algorithm- or network-based computational models can tightly integrate '-omics' databases and optimize combinational regimens of drug development, which encourage using medicinal herbs to develop into new wave of network-based multi-target drugs. However, challenges on further integration across the databases of medicinal herbs with multiple system biology platforms for multi-target drug optimization remain to the uncertain reliability of individual data sets, width and depth and degree of standardization of herbal medicine. Standardization of the methodology and terminology of multiple system biology and herbal database would facilitate the integration. Enhance public accessible databases and the number of research using system biology platform on herbal medicine would be helpful. Further integration across various '-omics' platforms and computational tools would accelerate development of network-based drug discovery and network medicine.

  20. Gene discovery in the threatened elkhorn coral: 454 sequencing of the Acropora palmata transcriptome.

    Directory of Open Access Journals (Sweden)

    Nicholas R Polato

    Full Text Available BACKGROUND: Cnidarians, including corals and anemones, offer unique insights into metazoan evolution because they harbor genetic similarities with vertebrates beyond that found in model invertebrates and retain genes known only from non-metazoans. Cataloging genes expressed in Acropora palmata, a foundation-species of reefs in the Caribbean and western Atlantic, will advance our understanding of the genetic basis of ecologically important traits in corals and comes at a time when sequencing efforts in other cnidarians allow for multi-species comparisons. RESULTS: A cDNA library from a sample enriched for symbiont free larval tissue was sequenced on the 454 GS-FLX platform. Over 960,000 reads were obtained and assembled into 42,630 contigs. Annotation data was acquired for 57% of the assembled sequences. Analysis of the assembled sequences indicated that 83-100% of all A. palmata transcripts were tagged, and provided a rough estimate of the total number genes expressed in our samples (~18,000-20,000. The coral annotation data contained many of the same molecular components as in the Bilateria, particularly in pathways associated with oxidative stress and DNA damage repair, and provided evidence that homologs of p53, a key player in DNA repair pathways, has experienced selection along the branch separating Cnidaria and Bilateria. Transcriptome wide screens of paralog groups and transition/transversion ratios highlighted genes including: green fluorescent proteins, carbonic anhydrase, and oxidative stress proteins; and functional groups involved in protein and nucleic acid metabolism, and the formation of structural molecules. These results provide a starting point for study of adaptive evolution in corals. CONCLUSIONS: Currently available transcriptome data now make comparative studies of the mechanisms underlying coral's evolutionary success possible. Here we identified candidate genes that enable corals to maintain genomic integrity despite

  1. Gene expression and epigenetic discovery screen reveal methylation of SFRP2 in prostate cancer.

    LENUS (Irish Health Repository)

    Perry, Antoinette S

    2013-04-15

    Aberrant activation of Wnts is common in human cancers, including prostate. Hypermethylation associated transcriptional silencing of Wnt antagonist genes SFRPs (Secreted Frizzled-Related Proteins) is a frequent oncogenic event. The significance of this is not known in prostate cancer. The objectives of our study were to (i) profile Wnt signaling related gene expression and (ii) investigate methylation of Wnt antagonist genes in prostate cancer. Using TaqMan Low Density Arrays, we identified 15 Wnt signaling related genes with significantly altered expression in prostate cancer; the majority of which were upregulated in tumors. Notably, histologically benign tissue from men with prostate cancer appeared more similar to tumor (r = 0.76) than to benign prostatic hyperplasia (BPH; r = 0.57, p < 0.001). Overall, the expression profile was highly similar between tumors of high (≥ 7) and low (≤ 6) Gleason scores. Pharmacological demethylation of PC-3 cells with 5-Aza-CdR reactivated 39 genes (≥ 2-fold); 40% of which inhibit Wnt signaling. Methylation frequencies in prostate cancer were 10% (2\\/20) (SFRP1), 64.86% (48\\/74) (SFRP2), 0% (0\\/20) (SFRP4) and 60% (12\\/20) (SFRP5). SFRP2 methylation was detected at significantly lower frequencies in high-grade prostatic intraepithelial neoplasia (HGPIN; 30%, (6\\/20), p = 0.0096), tumor adjacent benign areas (8.82%, (7\\/69), p < 0.0001) and BPH (11.43% (4\\/35), p < 0.0001). The quantitative level of SFRP2 methylation (normalized index of methylation) was also significantly higher in tumors (116) than in the other samples (HGPIN = 7.45, HB = 0.47, and BPH = 0.12). We show that SFRP2 hypermethylation is a common event in prostate cancer. SFRP2 methylation in combination with other epigenetic markers may be a useful biomarker of prostate cancer.

  2. rVISTA for Comparative Sequence-Based Discovery of Functional Transcription Factor Binding Sites

    Energy Technology Data Exchange (ETDEWEB)

    Loots, Gabriela G.; Ovcharenko, Ivan; Pachter, Lior; Dubchak, Inna; Rubin, Edward M.

    2002-03-08

    Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVISTA, for high-throughput discovery of cis-regulatory elements that combines transcription factor binding site prediction and the analysis of inter-species sequence conservation. Here, we illustrate the ability of rVISTA to identify true transcription factor binding sites through the analysis of AP-1 and NFAT binding sites in the 1 Mb well-annotated cytokine gene cluster1 (Hs5q31; Mm11). The exploitation of orthologous human-mouse data set resulted in the elimination of 95 percent of the 38,000 binding sites predicted upon analysis of the human sequence alone, while it identified 87 percent of the experimentally verified binding sites in this region.

  3. Leveraging a Sturge-Weber Gene Discovery: An Agenda for Future Research.

    Science.gov (United States)

    Comi, Anne M; Sahin, Mustafa; Hammill, Adrienne; Kaplan, Emma H; Juhász, Csaba; North, Paula; Ball, Karen L; Levin, Alex V; Cohen, Bernard; Morris, Jill; Lo, Warren; Roach, E Steve

    2016-05-01

    Sturge-Weber syndrome (SWS) is a vascular neurocutaneous disorder that results from a somatic mosaic mutation in GNAQ, which is also responsible for isolated port-wine birthmarks. Infants with SWS are born with a cutaneous capillary malformation (port-wine birthmark) of the forehead or upper eyelid which can signal an increased risk of brain and/or eye involvement prior to the onset of specific symptoms. This symptom-free interval represents a time when a targeted intervention could help to minimize the neurological and ophthalmologic manifestations of the disorder. This paper summarizes a 2015 SWS workshop in Bethesda, Maryland that was sponsored by the National Institutes of Health. Meeting attendees included a diverse group of clinical and translational researchers with a goal of establishing research priorities for the next few years. The initial portion of the meeting included a thorough review of the recent genetic discovery and what is known of the pathogenesis of SWS. Breakout sessions related to neurology, dermatology, and ophthalmology aimed to establish SWS research priorities in each field. Key priorities for future development include the need for clinical consensus guidelines, further work to develop a clinical trial network, improvement of tissue banking for research purposes, and the need for multiple animal and cell culture models of SWS.

  4. Strategies for enhancing the effectiveness of metagenomic-based enzyme discovery in lignocellulytic microbial communities

    Energy Technology Data Exchange (ETDEWEB)

    DeAngelis, K.M.; Gladden, J.G.; Allgaier, M.; D' haeseleer, P.; Fortney, J.L.; Reddy, A.; Hugenholtz, P.; Singer, S.W.; Vander Gheynst, J.; Silver, W.L.; Simmons, B.; Hazen, T.C.

    2010-03-01

    Producing cellulosic biofuels from plant material has recently emerged as a key U.S. Department of Energy goal. For this technology to be commercially viable on a large scale, it is critical to make production cost efficient by streamlining both the deconstruction of lignocellulosic biomass and fuel production. Many natural ecosystems efficiently degrade lignocellulosic biomass and harbor enzymes that, when identified, could be used to increase the efficiency of commercial biomass deconstruction. However, ecosystems most likely to yield relevant enzymes, such as tropical rain forest soil in Puerto Rico, are often too complex for enzyme discovery using current metagenomic sequencing technologies. One potential strategy to overcome this problem is to selectively cultivate the microbial communities from these complex ecosystems on biomass under defined conditions, generating less complex biomass-degrading microbial populations. To test this premise, we cultivated microbes from Puerto Rican soil or green waste compost under precisely defined conditions in the presence dried ground switchgrass (Panicum virgatum L.) or lignin, respectively, as the sole carbon source. Phylogenetic profiling of the two feedstock-adapted communities using SSU rRNA gene amplicon pyrosequencing or phylogenetic microarray analysis revealed that the adapted communities were significantly simplified compared to the natural communities from which they were derived. Several members of the lignin-adapted and switchgrass-adapted consortia are related to organisms previously characterized as biomass degraders, while others were from less well-characterized phyla. The decrease in complexity of these communities make them good candidates for metagenomic sequencing and will likely enable the reconstruction of a greater number of full length genes, leading to the discovery of novel lignocellulose-degrading enzymes adapted to feedstocks and conditions of interest.

  5. miR2Gene: pattern discovery of single gene, multiple genes, and pathways by enrichment analysis of their microRNA regulators

    Directory of Open Access Journals (Sweden)

    Qiu Chengxiang

    2011-12-01

    Full Text Available Abstract Background In recent years, a number of tools have been developed to explore microRNAs (miRNAs by analyzing their target genes. However, a reverse problem, that is, inferring patterns of protein-coding genes through their miRNA regulators, has not been explored. As various miRNA annotation data become available, exploring gene patterns by analyzing the prior knowledge of their miRNA regulators is becoming more feasible. Results In this study, we developed a tool, miR2Gene, for this purpose. Various sets of miRNAs, according to prior rules such as function, associated disease, tissue specificity, family, and cluster, were integrated with miR2Gene. For given genes, miR2Gene evaluates the enrichment of the predicted miRNAs that regulate them in each miRNA set. This tool can be used for single genes, multiple genes, and KEGG pathways. For the KEGG pathway, genes with enriched miRNA sets are highlighted according to various rules. We confirmed the usefulness of miR2Gene through case studies. Conclusions miR2Gene represents a novel and useful tool that integrates miRNA knowledge for protein-coding gene analysis. miR2Gene is freely available at http://cmbi.hsc.pku.edu.cn/mir2gene.

  6. Discovery of inhibitors of aberrant gene transcription from Libraries of DNA binding molecules: inhibition of LEF-1-mediated gene transcription and oncogenic transformation.

    Science.gov (United States)

    Stover, James S; Shi, Jin; Jin, Wei; Vogt, Peter K; Boger, Dale L

    2009-03-11

    The screening of a >9000 compound library of synthetic DNA binding molecules for selective binding to the consensus sequence of the transcription factor LEF-1 followed by assessment of the candidate compounds in a series of assays that characterized functional activity (disruption of DNA-LEF-1 binding) at the intended target and site (inhibition of intracellular LEF-1-mediated gene transcription) resulting in a desired phenotypic cellular change (inhibit LEF-1-driven cell transformation) provided two lead compounds: lefmycin-1 and lefmycin-2. The sequence of screens defining the approach assures that activity in the final functional assay may be directly related to the inhibition of gene transcription and DNA binding properties of the identified molecules. Central to the implementation of this generalized approach to the discovery of DNA binding small molecule inhibitors of gene transcription was (1) the use of a technically nondemanding fluorescent intercalator displacement (FID) assay for initial assessment of the DNA binding affinity and selectivity of a library of compounds for any sequence of interest, and (2) the technology used to prepare a sufficiently large library of DNA binding compounds.

  7. Yeast homologous recombination-based promoter engineering for the activation of silent natural product biosynthetic gene clusters.

    Science.gov (United States)

    Montiel, Daniel; Kang, Hahk-Soo; Chang, Fang-Yuan; Charlop-Powers, Zachary; Brady, Sean F

    2015-07-21

    Large-scale sequencing of prokaryotic (meta)genomic DNA suggests that most bacterial natural product gene clusters are not expressed under common laboratory culture conditions. Silent gene clusters represent a promising resource for natural product discovery and the development of a new generation of therapeutics. Unfortunately, the characterization of molecules encoded by these clusters is hampered owing to our inability to express these gene clusters in the laboratory. To address this bottleneck, we have developed a promoter-engineering platform to transcriptionally activate silent gene clusters in a model heterologous host. Our approach uses yeast homologous recombination, an auxotrophy complementation-based yeast selection system and sequence orthogonal promoter cassettes to exchange all native promoters in silent gene clusters with constitutively active promoters. As part of this platform, we constructed and validated a set of bidirectional promoter cassettes consisting of orthogonal promoter sequences, Streptomyces ribosome binding sites, and yeast selectable marker genes. Using these tools we demonstrate the ability to simultaneously insert multiple promoter cassettes into a gene cluster, thereby expediting the reengineering process. We apply this method to model active and silent gene clusters (rebeccamycin and tetarimycin) and to the silent, cryptic pseudogene-containing, environmental DNA-derived Lzr gene cluster. Complete promoter refactoring and targeted gene exchange in this "dead" cluster led to the discovery of potent indolotryptoline antiproliferative agents, lazarimides A and B. This potentially scalable and cost-effective promoter reengineering platform should streamline the discovery of natural products from silent natural product biosynthetic gene clusters.

  8. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

    OpenAIRE

    Steinfeld Israel; Navon Roy; Eden Eran; Lipson Doron; Yakhini Zohar

    2009-01-01

    Abstract Background Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on ...

  9. An Isogenic Human ESC Platform for Functional Evaluation of Genome-wide-Association-Study-Identified Diabetes Genes and Drug Discovery.

    Science.gov (United States)

    Zeng, Hui; Guo, Min; Zhou, Ting; Tan, Lei; Chong, Chi Nok; Zhang, Tuo; Dong, Xue; Xiang, Jenny Zhaoying; Yu, Albert S; Yue, Lixia; Qi, Qibin; Evans, Todd; Graumann, Johannes; Chen, Shuibing

    2016-09-01

    Genome-wide association studies (GWASs) have increased our knowledge of loci associated with a range of human diseases. However, applying such findings to elucidate pathophysiology and promote drug discovery remains challenging. Here, we created isogenic human ESCs (hESCs) with mutations in GWAS-identified susceptibility genes for type 2 diabetes. In pancreatic beta-like cells differentiated from these lines, we found that mutations in CDKAL1, KCNQ1, and KCNJ11 led to impaired glucose secretion in vitro and in vivo, coinciding with defective glucose homeostasis. CDKAL1 mutant insulin+ cells were also hypersensitive to glucolipotoxicity. A high-content chemical screen identified a candidate drug that rescued CDKAL1-specific defects in vitro and in vivo by inhibiting the FOS/JUN pathway. Our approach of a proof-of-principle platform, which uses isogenic hESCs for functional evaluation of GWAS-identified loci and identification of a drug candidate that rescues gene-specific defects, paves the way for precision therapy of metabolic diseases. PMID:27524441

  10. An Isogenic Human ESC Platform for Functional Evaluation of Genome-wide-Association-Study-Identified Diabetes Genes and Drug Discovery.

    Science.gov (United States)

    Zeng, Hui; Guo, Min; Zhou, Ting; Tan, Lei; Chong, Chi Nok; Zhang, Tuo; Dong, Xue; Xiang, Jenny Zhaoying; Yu, Albert S; Yue, Lixia; Qi, Qibin; Evans, Todd; Graumann, Johannes; Chen, Shuibing

    2016-09-01

    Genome-wide association studies (GWASs) have increased our knowledge of loci associated with a range of human diseases. However, applying such findings to elucidate pathophysiology and promote drug discovery remains challenging. Here, we created isogenic human ESCs (hESCs) with mutations in GWAS-identified susceptibility genes for type 2 diabetes. In pancreatic beta-like cells differentiated from these lines, we found that mutations in CDKAL1, KCNQ1, and KCNJ11 led to impaired glucose secretion in vitro and in vivo, coinciding with defective glucose homeostasis. CDKAL1 mutant insulin+ cells were also hypersensitive to glucolipotoxicity. A high-content chemical screen identified a candidate drug that rescued CDKAL1-specific defects in vitro and in vivo by inhibiting the FOS/JUN pathway. Our approach of a proof-of-principle platform, which uses isogenic hESCs for functional evaluation of GWAS-identified loci and identification of a drug candidate that rescues gene-specific defects, paves the way for precision therapy of metabolic diseases.

  11. Fragment-based discovery of hepatitis C virus NS5b RNA polymerase inhibitors

    Energy Technology Data Exchange (ETDEWEB)

    Antonysamy, Stephen S.; Aubol, Brandon; Blaney, Jeff; Browner, Michelle F.; Giannetti, Anthony M.; Harris, Seth F.; Hébert, Normand; Hendle, Jörg; Hopkins, Stephanie; Jefferson, Elizabeth; Kissinger, Charles; Leveque, Vincent; Marciano, David; McGee, Ethel; Nájera, Isabel; Nolan, Brian; Tomimoto, Masaki; Torres, Eduardo; Wright, Tobi (SGX); (Roche)

    2009-07-22

    Non-nucleoside inhibitors of HCV NS5b RNA polymerase were discovered by a fragment-based lead discovery approach, beginning with crystallographic fragment screening. The NS5b binding affinity and biochemical activity of fragment hits and inhibitors was determined by surface plasmon resonance (Biacore) and an enzyme inhibition assay, respectively. Crystallographic fragment screening hits with {approx}1-10 mM binding affinity (K{sub D}) were iteratively optimized to give leads with {approx}200 nM biochemical activity and low {micro}M cellular activity in a Replicon assay.

  12. Novel Technology for Protein-Protein Interaction-based Targeted Drug Discovery

    Directory of Open Access Journals (Sweden)

    Jung Me Hwang

    2011-12-01

    Full Text Available We have developed a simple but highly efficient in-cell protein-protein interaction (PPI discovery system based on the translocation properties of protein kinase C- and its C1a domain in live cells. This system allows the visual detection of trimeric and dimeric protein interactions including cytosolic, nuclear, and/or membrane proteins with their cognate ligands. In addition, this system can be used to identify pharmacological small compounds that inhibit specific PPIs. These properties make this PPI system an attractive tool for screening drug candidates and mapping the protein interactome.

  13. Structure Based Discovery of Small Molecules to Regulate the Activity of Human Insulin Degrading Enzyme

    OpenAIRE

    Bilal Çakir; Onur Dağliyan; Ezgi Dağyildiz; İbrahim Bariş; Ibrahim Halil Kavakli; Seda Kizilel; Metin Türkay

    2012-01-01

    Structure Based Discovery of Small Molecules to Regulate the Activity of Human Insulin Degrading Enzyme Bilal C¸ akir1, Onur Dag˘ liyan1, Ezgi Dag˘ yildiz1, I˙brahim Baris¸1, Ibrahim Halil Kavakli1,2*, Seda Kizilel1*, Metin Tu¨ rkay3* 1 Department of Chemical and Biological Engineering, Koc¸ University, Sariyer, Istanbul, Turkey, 2 Department of Molecular Biology and Genetics, Koc¸ University, Sariyer, Istanbul, Turkey, 3 Department of Industrial Engineering, Koc¸ University...

  14. Mechanism of FCA-based Folksonomy Knowledge Discovery%基于FCA的folksonomy知识发现机理研究

    Institute of Scientific and Technical Information of China (English)

    张云中

    2012-01-01

    folksonomy知识发现已成为解决folksonomy自身语义问题和用户问题的有效途径。针对基于FCA的folk—sonomy知识发现的机理,首先探寻当前基于FCA的folksonomy知识发现研究的不足,进而在剖析基于FCA的folksonomy知识发现要素和要素关系的基础上,用螺旋研究模型揭示基于FCA的folksonomy知识发现的客观规律,最终探寻出基于FCA的folksonomy知识发现的三个核心内容:folksonomy用户行为、folksonomy用户偏好和folksonomy语义关系。%Folksonomy knowledge discovery has become an effective way to solve the semantic problems and user problems in folkson- omy. In allusion to the mechanism of FCA-based folksonomy knowledge discovery, firstly, the lack of current research on FCA-based folksonomy knowledge discovery is summarized. Then on the basis of analyzing the elements of FCA-based folksonomy knowledge discovery and the relationship among them, the spiral evolution model is built to reveal the objective laws of FCA-based folksonomy knowledge discovery. Eventually three core contents of FCA-based folksonomy knowledge discovery are determined, which are folksonomy user behavior, folksonomy user preferences and folksonomy semantic relations.

  15. A Performance/Cost Evaluation for a GPU-Based Drug Discovery Application on Volunteer Computing

    Directory of Open Access Journals (Sweden)

    Ginés D. Guerrero

    2014-01-01

    Full Text Available Bioinformatics is an interdisciplinary research field that develops tools for the analysis of large biological databases, and, thus, the use of high performance computing (HPC platforms is mandatory for the generation of useful biological knowledge. The latest generation of graphics processing units (GPUs has democratized the use of HPC as they push desktop computers to cluster-level performance. Many applications within this field have been developed to leverage these powerful and low-cost architectures. However, these applications still need to scale to larger GPU-based systems to enable remarkable advances in the fields of healthcare, drug discovery, genome research, etc. The inclusion of GPUs in HPC systems exacerbates power and temperature issues, increasing the total cost of ownership (TCO. This paper explores the benefits of volunteer computing to scale bioinformatics applications as an alternative to own large GPU-based local infrastructures. We use as a benchmark a GPU-based drug discovery application called BINDSURF that their computational requirements go beyond a single desktop machine. Volunteer computing is presented as a cheap and valid HPC system for those bioinformatics applications that need to process huge amounts of data and where the response time is not a critical factor.

  16. Functional Analysis and Discovery of Microbial Genes Transforming Metallic and Organic Pollutants: Database and Experimental Tools

    Energy Technology Data Exchange (ETDEWEB)

    Lawrence P. Wackett; Lynda B.M. Ellis

    2004-12-09

    Microbial functional genomics is faced with a burgeoning list of genes which are denoted as unknown or hypothetical for lack of any knowledge about their function. The majority of microbial genes encode enzymes. Enzymes are the catalysts of metabolism; catabolism, anabolism, stress responses, and many other cell functions. A major problem facing microbial functional genomics is proposed here to derive from the breadth of microbial metabolism, much of which remains undiscovered. The breadth of microbial metabolism has been surveyed by the PIs and represented according to reaction types on the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD): http://umbbd.ahc.umn.edu/search/FuncGrps.html The database depicts metabolism of 49 chemical functional groups, representing most of current knowledge. Twice that number of chemical groups are proposed here to be metabolized by microbes. Thus, at least 50% of the unique biochemical reactions catalyzed by microbes remain undiscovered. This further suggests that many unknown and hypothetical genes encode functions yet undiscovered. This gap will be partly filled by the current proposal. The UM-BBD will be greatly expanded as a resource for microbial functional genomics. Computational methods will be developed to predict microbial metabolism which is not yet discovered. Moreover, a concentrated effort to discover new microbial metabolism will be conducted. The research will focus on metabolism of direct interest to DOE, dealing with the transformation of metals, metalloids, organometallics and toxic organics. This is precisely the type of metabolism which has been characterized most poorly to date. Moreover, these studies will directly impact functional genomic analysis of DOE-relevant genomes.

  17. AMHC: Adaptive Multi-Hop Clustering based Resource Discovery Architecture for Large Scale MANETs

    Directory of Open Access Journals (Sweden)

    Saad Al-Ahmadi

    2014-05-01

    Full Text Available In this study we propose an efficient clustering protocol called AMHC used for resource discovery in large scale Mobile Ad hoc Networks (MANETs. AMHC is an Adaptive Multi-Hop Clustering generating several non-overlapping network localities (clusters with explicit elected cluster-heads. Every cluster member is on average d hops away from its cluster-head, where d is an integer parameter for the protocol. The generated set of clusters are highly stable and has low restructuring frequency that takes into consideration the dynamic network topology due to nodes mobility and depleted energy. The head election process is a distributed process based on a node’s weight formula calculated by every node independently. The node’s weight involves the current energy level, the current neighborhood degree and distance (in number of hops between the nominated head and the voting node. The cluster-head is responsible of coordinating intra-cluster and inter-cluster resource discovery activities. Inter-cluster communication is handled through gateway nodes which hear from more than one cluster and able to connect clusters with each other. The aim of AMHC is to identify all the possible gateways for creating highly fault-tolerant architecture. AMHC is an asynchronous, scalable and robust architecture capable of handling large amount of resource queries with high degree of power and communication efficiency. We conducted a comparative study using simulation to demonstrate AMHC’s efficiency and superiority against other recently proposed clustering algorithms in the literature. The comparison is based on: number of generated clusters, average cluster size, cluster stability and nodes re-affiliation. These results show a lot of promise for AMHC as efficient, energy-aware, load-balance and fault tolerant resource discovery architecture for large-scale MANETs.

  18. Novel Gene Discovery of Crops in China: Status, Challenging, and Perspective%中国作物新基因发掘:现状、挑战与展望

    Institute of Scientific and Technical Information of China (English)

    邱丽娟; 王建康; 万建民; 郭勇; 黎裕; 王晓波; 周国安; 刘章雄; 周时荣; 李新海; 马有志

    2011-01-01

    a gene level and hence for molecular breeding.This paper reviewed progress of novel gene discovery studies in major crops, such as rice, wheat, maize, soybean, cotton, and oilseed rape in China.In last decade, Chinese scientists have achieved a number of breakthroughs on novel gene identification in crops, including: (1) Various distinctive materials for gene discovery were created, such as core collections of germplasms based on crop genetic diversity, establishment of genetic populations based on genetic resources with favorite traits, assessment of mutants derived from mutagenesis, and so on; (2) Technology and methods of gene discovery were further developed, especially the gene-based integration of various discovery technologies with combination of biometric algorithm improvement of gene/QTLs, and therefore the efficiency of gene discovery was increased; (3) Mapping genes/QTLs related to important agronomic traits of crops has become a common method for genetic studies.A number of genes/QTLs associated with disease and insect resistance, stress tolerance, good quality, nutrient use efficiency and high yield have been mapped, of which more than 500 genes have been positioned on chromosomes precisely by fine mapping; (4) Great progress in cloning and functional analysis of crop genes in China, particularly in rice, has drawn world-wide attention.More than 300 genes have been cloned in the main crops, among which more than 70 genes have been functionally validated in crops.While gene discovery in crops becomes more and more efficient, large-scale and towards utilization in the world, Chinese scientists are also making new findings in this field.However, the quality and quantity of crop gene discovery in China is still far from satisfying the needs for molecular breeding and the overall level of novel gene discovery is still behind top labs/institutions in the world.Gene discovery in different crops has developed unevenly, the number of genes discovered is not

  19. Automated conserved noncoding sequence (CNS discovery reveals differences in gene content and promoter evolution among grasses

    Directory of Open Access Journals (Sweden)

    Gina eTurco

    2013-07-01

    Full Text Available Conserved noncoding sequences (CNS are islands of noncoding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several of CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searchers for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 KB of noncoding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium and maize.

  20. The Analysis of Image Segmentation Hierarchies with a Graph-based Knowledge Discovery System

    Science.gov (United States)

    Tilton, James C.; Cooke, diane J.; Ketkar, Nikhil; Aksoy, Selim

    2008-01-01

    Currently available pixel-based analysis techniques do not effectively extract the information content from the increasingly available high spatial resolution remotely sensed imagery data. A general consensus is that object-based image analysis (OBIA) is required to effectively analyze this type of data. OBIA is usually a two-stage process; image segmentation followed by an analysis of the segmented objects. We are exploring an approach to OBIA in which hierarchical image segmentations provided by the Recursive Hierarchical Segmentation (RHSEG) software developed at NASA GSFC are analyzed by the Subdue graph-based knowledge discovery system developed by a team at Washington State University. In this paper we discuss out initial approach to representing the RHSEG-produced hierarchical image segmentations in a graphical form understandable by Subdue, and provide results on real and simulated data. We also discuss planned improvements designed to more effectively and completely convey the hierarchical segmentation information to Subdue and to improve processing efficiency.

  1. Gene discovery and transcript analyses in the corn smut pathogen Ustilago maydis: expressed sequence tag and genome sequence comparison

    Directory of Open Access Journals (Sweden)

    Saville Barry J

    2007-09-01

    Full Text Available Abstract Background Ustilago maydis is the basidiomycete fungus responsible for common smut of corn and is a model organism for the study of fungal phytopathogenesis. To aid in the annotation of the genome sequence of this organism, several expressed sequence tag (EST libraries were generated from a variety of U. maydis cell types. In addition to utility in the context of gene identification and structure annotation, the ESTs were analyzed to identify differentially abundant transcripts and to detect evidence of alternative splicing and anti-sense transcription. Results Four cDNA libraries were constructed using RNA isolated from U. maydis diploid teliospores (U. maydis strains 518 × 521 and haploid cells of strain 521 grown under nutrient rich, carbon starved, and nitrogen starved conditions. Using the genome sequence as a scaffold, the 15,901 ESTs were assembled into 6,101 contiguous expressed sequences (contigs; among these, 5,482 corresponded to predicted genes in the MUMDB (MIPS Ustilago maydis database, while 619 aligned to regions of the genome not yet designated as genes in MUMDB. A comparison of EST abundance identified numerous genes that may be regulated in a cell type or starvation-specific manner. The transcriptional response to nitrogen starvation was assessed using RT-qPCR. The results of this suggest that there may be cross-talk between the nitrogen and carbon signalling pathways in U. maydis. Bioinformatic analysis identified numerous examples of alternative splicing and anti-sense transcription. While intron retention was the predominant form of alternative splicing in U. maydis, other varieties were also evident (e.g. exon skipping. Selected instances of both alternative splicing and anti-sense transcription were independently confirmed using RT-PCR. Conclusion Through this work: 1 substantial sequence information has been provided for U. maydis genome annotation; 2 new genes were identified through the discovery of 619

  2. Developing a distributed HTML5-based search engine for geospatial resource discovery

    Science.gov (United States)

    ZHOU, N.; XIA, J.; Nebert, D.; Yang, C.; Gui, Z.; Liu, K.

    2013-12-01

    With explosive growth of data, Geospatial Cyberinfrastructure(GCI) components are developed to manage geospatial resources, such as data discovery and data publishing. However, the efficiency of geospatial resources discovery is still challenging in that: (1) existing GCIs are usually developed for users of specific domains. Users may have to visit a number of GCIs to find appropriate resources; (2) The complexity of decentralized network environment usually results in slow response and pool user experience; (3) Users who use different browsers and devices may have very different user experiences because of the diversity of front-end platforms (e.g. Silverlight, Flash or HTML). To address these issues, we developed a distributed and HTML5-based search engine. Specifically, (1)the search engine adopts a brokering approach to retrieve geospatial metadata from various and distributed GCIs; (2) the asynchronous record retrieval mode enhances the search performance and user interactivity; (3) the search engine based on HTML5 is able to provide unified access capabilities for users with different devices (e.g. tablet and smartphone).

  3. AGENTS AND OWL-S BASED SEMANTIC WEB SERVICE DISCOVERY WITH USER PREFERENCE SUPPORT

    Directory of Open Access Journals (Sweden)

    Rohallah Benaboud

    2013-05-01

    Full Text Available Service-oriented computing (SOC is an interdisciplinary paradigm that revolutionizes the very fabric ofdistributed software development applications that adopt service-oriented architectures (SOA can evolveduring their lifespan and adapt to changing or unpredictable environments more easily. SOA is builtaround the concept of Web Services. Although the Web services constitute a revolution in Word Wide Web,they are always regarded as non-autonomous entities and can be exploited only after their discovery. Withthe help of software agents, Web services are becoming more efficient and more dynamic.The topic of this paper is the development of an agent based approach for Web services discovery andselection in witch, OWL-S is used to describe Web services, QoS and service customer request. We developan efficient semantic service matching which takes into account concepts properties to match concepts inWeb service and service customer request descriptions. Our approach is based on an architecturecomposed of four layers: Web service and Request description layer, Functional match layer, QoScomputing layer and Reputation computing layer.

  4. Discovery of Gene Sources for Economic Traits in Hanwoo by Whole-genome Resequencing

    Science.gov (United States)

    Shin, Younhee; Jung, Ho-jin; Jung, Myunghee; Yoo, Seungil; Subramaniyam, Sathiyamoorthy; Markkandan, Kesavan; Kang, Jun-Mo; Rai, Rajani; Park, Junhyung; Kim, Jong-Joo

    2016-01-01

    Hanwoo, a Korean native cattle (Bos taurus coreana), has great economic value due to high meat quality. Also, the breed has genetic variations that are associated with production traits such as health, disease resistance, reproduction, growth as well as carcass quality. In this study, next generation sequencing technologies and the availability of an appropriate reference genome were applied to discover a large amount of single nucleotide polymorphisms (SNPs) in ten Hanwoo bulls. Analysis of whole-genome resequencing generated a total of 26.5 Gb data, of which 594,716,859 and 592,990,750 reads covered 98.73% and 93.79% of the bovine reference genomes of UMD 3.1 and Btau 4.6.1, respectively. In total, 2,473,884 and 2,402,997 putative SNPs were discovered, of which 1,095,922 (44.3%) and 982,674 (40.9%) novel SNPs were discovered against UMD3.1 and Btau 4.6.1, respectively. Among the SNPs, the 46,301 (UMD 3.1) and 28,613 SNPs (Btau 4.6.1) that were identified as Hanwoo-specific SNPs were included in the functional genes that may be involved in the mechanisms of milk production, tenderness, juiciness, marbling of Hanwoo beef and yellow hair. Most of the Hanwoo-specific SNPs were identified in the promoter region, suggesting that the SNPs influence differential expression of the regulated genes relative to the relevant traits. In particular, the non-synonymous (ns) SNPs found in CORIN, which is a negative regulator of Agouti, might be a causal variant to determine yellow hair of Hanwoo. Our results will provide abundant genetic sources of variation to characterize Hanwoo genetics and for subsequent breeding. PMID:26954201

  5. Pigmentation in sand pear (Pyrus pyrifolia) fruit: biochemical characterization, gene discovery and expression analysis with exocarp pigmentation mutant.

    Science.gov (United States)

    Wang, Yue-zhi; Zhang, Shujun; Dai, Mei-song; Shi, Ze-bin

    2014-05-01

    -membrane transport of lignin, cutin, and suberin precursors suggests that the transport process could also affect the composition of exocarp and take a role in the regulation of exocarp pigmentation. Results from this study provide a base for the analysis of the molecular mechanism underlying sand pear russet/green exocarp mutation, and presents a comprehensive list of candidate genes that could be used to further investigate the trait mutation at the molecular level. PMID:24445590

  6. Identification and Validation of HCC-specific Gene Transcriptional Signature for Tumor Antigen Discovery.

    Science.gov (United States)

    Petrizzo, Annacarmen; Caruso, Francesca Pia; Tagliamonte, Maria; Tornesello, Maria Lina; Ceccarelli, Michele; Costa, Valerio; Aprile, Marianna; Esposito, Roberta; Ciliberto, Gennaro; Buonaguro, Franco M; Buonaguro, Luigi

    2016-07-08

    A novel two-step bioinformatics strategy was applied for identification of signatures with therapeutic implications in hepatitis-associated HCC. Transcriptional profiles from HBV- and HCV-associated HCC samples were compared with non-tumor liver controls. Resulting HCC modulated genes were subsequently compared with different non-tumor tissue samples. Two related signatures were identified, namely "HCC-associated" and "HCC-specific". Expression data were validated by RNA-Seq analysis carried out on unrelated HCC samples and protein expression was confirmed according to The Human Protein Atlas" (http://proteinatlas.org/), a public repository of immunohistochemistry data. Among all, aldo-keto reductase family 1 member B10, and IGF2 mRNA-binding protein 3 were found strictly HCC-specific with no expression in 18/20 normal tissues. Target peptides for vaccine design were predicted for both proteins associated with the most prevalent HLA-class I and II alleles. The described novel strategy showed to be feasible for identification of HCC-specific proteins as highly potential target for HCC immunotherapy.

  7. Developing computer-based training programs for basic mammalian histology: Didactic versus discovery-based design

    Science.gov (United States)

    Fabian, Henry Joel

    Educators have long tried to understand what stimulates students to learn. The Swiss psychologist and zoologist, Jean Claude Piaget, suggested that students are stimulated to learn when they attempt to resolve confusion. He reasoned that students try to explain the world with the knowledge they have acquired in life. When they find their own explanations to be inadequate to explain phenomena, students find themselves in a temporary state of confusion. This prompts students to seek more plausible explanations. At this point, students are primed for learning (Piaget 1964). The Piagetian approach described above is called learning by discovery. To promote discovery learning, a teacher must first allow the student to recognize his misconception and then provide a plausible explanation to replace that misconception (Chinn and Brewer 1993). One application of this method is found in the various learning cycles, which have been demonstrated to be effective means for teaching science (Renner and Lawson 1973, Lawson 1986, Marek and Methven 1991, and Glasson & Lalik 1993). In contrast to the learning cycle, tutorial computer programs are generally not designed to correct student misconceptions, but rather follow a passive, didactic method of teaching. In the didactic or expositional method, the student is told about a phenomenon, but is neither encouraged to explore it, nor explain it in his own terms (Schneider and Renner 1980).

  8. Key Object Discovery and Tracking Based on Context-Aware Saliency

    Directory of Open Access Journals (Sweden)

    Geng Zhang

    2013-01-01

    Full Text Available In this paper, we propose an online key object discovery and tracking system based on visual saliency. We formulate the problem as a temporally consistent binary labelling task on a conditional random field and solve it by using a particle filter. We also propose a context‐aware saliency measurement, which can be used to improve the accuracy of any static or dynamic saliency maps. Our refined saliency maps provide clearer indications as to where the key object lies. Based on good saliency cues, we can further segment the key object inside the resulting bounding box, considering the spatial and temporal context. We tested our system extensively on different video clips. The results show that our method has significantly improved the saliency maps and tracks the key object accurately.

  9. A Chord-based resource scheduling approach in drug discovery grid

    Institute of Scientific and Technical Information of China (English)

    Chen Shudong; Zhang Wenju; Zhang Jun; Ma Fanyuan; Shen Jianhua

    2007-01-01

    This paper presents a resource scheduling approach in grid computing environment. Using P2P technology, this novel approach call schedule dynamic grid computing resources efficiently. Grid computing resources in different domains are organized into a structured P2P overlay network. Available resource information is published in type of grid services. Task requests for computational resources are also presented aS grid services. Problem of resources scheduling is translated into services discovery. Different from central scheduling approaches that collect available resources information, this Chord-based approach forwards task requests in the overlay network and discovers satisfied resources for these tasks. Using this approach, the computational resources of a grid system can be scheduled dynamically according to the real-time workload on each peer. Furthermore, the application of this approach is introduced into DDG, a grid system for drug discovery and design, to evaluate the performance. Experimental results show that computational resources of a grid system can be managed efficiently, and the system can hold a perfect load balance state and robustness.

  10. De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes.

    Science.gov (United States)

    Zolotarov, Yevgen; Strömvik, Martina

    2015-01-01

    Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved.

  11. The fragile x mental retardation syndrome 20 years after the FMR1 gene discovery: an expanding universe of knowledge.

    Science.gov (United States)

    Rousseau, François; Labelle, Yves; Bussières, Johanne; Lindsay, Carmen

    2011-08-01

    The fragile X mental retardation (FXMR) syndrome is one of the most frequent causes of mental retardation. Affected individuals display a wide range of additional characteristic features including behavioural and physical phenotypes, and the extent to which individuals are affected is highly variable. For these reasons, elucidation of the pathophysiology of this disease has been an important challenge to the scientific community. 1991 marks the year of the discovery of both the FMR1 gene mutations involved in this disease, and of their dynamic nature. Although a mouse model for the disease has been available for 16 years and extensive research has been performed on the FMR1 protein (FMRP), we still understand little about how the disease develops, and no treatment has yet been shown to be effective. In this review, we summarise current knowledge on FXMR with an emphasis on the technical challenges of molecular diagnostics, on its prevalence and dynamics among populations, and on the potential of screening for FMR1 mutations.

  12. A joint modeling approach for uncovering associations between gene expression, bioactivity and chemical structure in early drug discovery to guide lead selection and genomic biomarker development.

    Science.gov (United States)

    Perualila-Tan, Nolen; Kasim, Adetayo; Talloen, Willem; Verbist, Bie; Göhlmann, Hinrich W H; Shkedy, Ziv

    2016-08-01

    The modern drug discovery process involves multiple sources of high-dimensional data. This imposes the challenge of data integration. A typical example is the integration of chemical structure (fingerprint features), phenotypic bioactivity (bioassay read-outs) data for targets of interest, and transcriptomic (gene expression) data in early drug discovery to better understand the chemical and biological mechanisms of candidate drugs, and to facilitate early detection of safety issues prior to later and expensive phases of drug development cycles. In this paper, we discuss a joint model for the transcriptomic and the phenotypic variables conditioned on the chemical structure. This modeling approach can be used to uncover, for a given set of compounds, the association between gene expression and biological activity taking into account the influence of the chemical structure of the compound on both variables. The model allows to detect genes that are associated with the bioactivity data facilitating the identification of potential genomic biomarkers for compounds efficacy. In addition, the effect of every structural feature on both genes and pIC50 and their associations can be simultaneously investigated. Two oncology projects are used to illustrate the applicability and usefulness of the joint model to integrate multi-source high-dimensional information to aid drug discovery. PMID:27269248

  13. Gene expression module-based chemical function similarity search

    OpenAIRE

    Li, Yun; Hao, Pei; Zheng, Siyuan; Tu, Kang; Fan, Haiwei; Zhu, Ruixin; Ding, Guohui; Dong, Changzheng; Wang, Chuan; Li, Xuan; Thiesen, H.-J.; Chen, Y. Eugene; Jiang, HuaLiang; Liu, Lei; Li, Yixue

    2008-01-01

    Investigation of biological processes using selective chemical interventions is generally applied in biomedical research and drug discovery. Many studies of this kind make use of gene expression experiments to explore cellular responses to chemical interventions. Recently, some research groups constructed libraries of chemical related expression profiles, and introduced similarity comparison into chemical induced transcriptome analysis. Resembling sequence similarity alignment, expression pat...

  14. 1-Mb resolution array-based comparative genomic hybridization using a BAC clone set optimized for cancer gene analysis

    NARCIS (Netherlands)

    Greshock, J; Naylor, TL; Margolin, A; Diskin, S; Cleaver, SH; Futreal, PA; deJong, PJ; Zhao, SY; Liebman, M; Weber, BL

    2004-01-01

    Array-based comparative genomic hybridization (aCGH) is a recently developed tool for genome-wide determination of DNA copy number alterations. This technology has tremendous potential for disease-gene discovery in cancer and developmental disorders as well as numerous other applications. However, w

  15. Fragment-based hit discovery and structure-based optimization of aminotriazoloquinazolines as novel Hsp90 inhibitors.

    Science.gov (United States)

    Casale, Elena; Amboldi, Nadia; Brasca, Maria Gabriella; Caronni, Dannica; Colombo, Nicoletta; Dalvit, Claudio; Felder, Eduard R; Fogliatto, Gianpaolo; Galvani, Arturo; Isacchi, Antonella; Polucci, Paolo; Riceputi, Laura; Sola, Francesco; Visco, Carlo; Zuccotto, Fabio; Casuscelli, Francesco

    2014-08-01

    In the last decade the heat shock protein 90 (Hsp90) has emerged as a major therapeutic target and many efforts have been dedicated to the discovery of Hsp90 inhibitors as new potent anticancer agents. Here we report the identification of a novel class of Hsp90 inhibitors by means of a biophysical FAXS-NMR based screening of a library of fragments. The use of X-ray structure information combined with modeling studies enabled the fragment evolution of the initial triazoloquinazoline hit to a class of compounds with nanomolar potency and drug-like properties suited for further lead optimization.

  16. A review of Fuzzy Based QoS Web Service Discovery

    Directory of Open Access Journals (Sweden)

    R.Buvanesvari

    2013-03-01

    Full Text Available Recently, web service has become an important issue for developers. Selecting a specific service is a crucial task. Some approaches develop extensive description and publication mechanisms while others use syntactic, semantic, and structural reviews of Web service specifications. It is very crucial for finding the most suitable web service from a large collection of web services for successful execution of applications. In many cases, the value of a QoS property may not be precisely defined. Recently, fuzzy is considered as the dominant approaches in Web services which can deal with fuzzy constraints have been proposed. Therefore fuzzy logic can be applied to support for representing such imprecise QoS constraints. In this paper, we will present an overview which focus on developing fuzzy-based approach for Web service discovery. This paper also describes the web service challenges on fuzzy mechanism that summarized and analyzed in order to assess their benefits and limitations.

  17. FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

    Directory of Open Access Journals (Sweden)

    Breno C. Costa

    2013-11-01

    Full Text Available Nowadays the electric utilities have to handle problems with the non-technical losses caused by frauds and thefts committed by some of their consumers. In order to minimize this, some methodologies have been created to perform the detection of consumers that might be fraudsters. In this context, the use of classification techniques can improve the hit rate of the fraud detection and increase the financial income. This paper proposes the use of the knowledge-discovery in databases process based on artificial neural networks applied to the classifying process of consumers to be inspected. An experiment performed in a Brazilian electric power distribution company indicated an improvement of over 50% of the proposed approach if compared to the previous methods used by that company.

  18. Predicting high-throughput screening results with scalable literature-based discovery methods.

    Science.gov (United States)

    Cohen, T; Widdows, D; Stephan, C; Zinner, R; Kim, J; Rindflesch, T; Davies, P

    2014-10-08

    The identification of new therapeutic uses for existing agents has been proposed as a means to mitigate the escalating cost of drug development. A common approach to such repurposing involves screening libraries of agents for activities against cell lines. In silico methods using knowledge from the biomedical literature have been proposed to constrain the costs of screening by identifying agents that are likely to be effective a priori. However, results obtained with these methods are seldom evaluated empirically. Conversely, screening experiments have been criticized for their inability to reveal the biological basis of their results. In this paper, we evaluate the ability of a scalable literature-based approach, discovery-by-analogy, to identify a small number of active agents within a large library screened for activity against prostate cancer cells. The methods used permit retrieval of the knowledge used to infer their predictions, providing a plausible biological basis for predicted activity.

  19. REALIZING THE NEED FOR SIMILARITY BASED REASONING OF CLOUD SERVICE DISCOVERY

    Directory of Open Access Journals (Sweden)

    S. BHAMA

    2011-12-01

    Full Text Available With the growing abundance of information on the web, it becomes the need of the hour to enrich data with semantics that can be understood and processed by machines. Currently, much of the effort in the area of semantics is focused on the representation of semantic data and its reasoning, which is the processing of semantic information associated with that data. This paper aims at realizing the need for similarity based reasoning of cloud service discovery. It forms a basic requirement of a cloud client to discover the most appropriate cloud service from the list of available services published by service providers. Cloud ontology provides a set of concepts, individuals and relationships among them. The similarity among cloud services can be determined from the semantic similarity of concepts and hence the relevant service can be retrieved.

  20. Anti-cancer Parasporin Toxins are Associated with Different Environments: Discovery of Two Novel Parasporin 5-like Genes.

    Science.gov (United States)

    Ammons, David R; Short, John D; Bailey, Jeffery; Hinojosa, Gabriela; Tavarez, Lourdes; Salazar, Martha; Rampersad, Joanne N

    2016-02-01

    Cry toxins are primarily a family of insecticidal toxins produced by the bacterium Bacillus thuringiensis (Bt). However, some Cry toxins, called parasporins (PSs), are non-insecticidal and have been shown to differentially kill human cancer cells. Based on amino acid homology, there are currently six different classes of parasporins (PS1-6). It is not known what role parasporins play in nature, nor if certain PSs are associated with Bt found in particular environments. Herein, we present ten parasporin-containing isolates of Bt from the Caribbean island of Trinidad. Genes coding for PS1 and PS6 were found in isolates associated mainly with artificial aquatic environments (e.g., barrels with rain water), while Bt possessing two novel PS5-like genes (ps5-1 and ps5-2), were isolated from manure collected directly from the rectum of cattle. The amino acid sequences inferred from the two PS5-like genes were 51 % homologous to each other, while being only 41 or 45 % similar to PS5Aa1/Cry64Aa, the only reported member of the parasporin five class. The low level of amino acid homology between the two PS5-like genes and PS5Aa1 indicate that the two PS5-like genes may represent a new class of parasporins, or greatly expand the level of diversity within the current parasporin 5 class. PMID:26563301

  1. Knowledge-based discovery for designing CRISPR-CAS systems against invading mobilomes in thermophiles.

    Science.gov (United States)

    Chellapandi, P; Ranjani, J

    2015-09-01

    Clustered regularly interspaced short palindromic repeats (CRISPRs) are direct features of the prokaryotic genomes involved in resistance to their bacterial viruses and phages. Herein, we have identified CRISPR loci together with CRISPR-associated sequences (CAS) genes to reveal their immunity against genome invaders in the thermophilic archaea and bacteria. Genomic survey of this study implied that genomic distribution of CRISPR-CAS systems was varied from strain to strain, which was determined by the degree of invading mobiloms. Direct repeats found to be equal in some extent in many thermopiles, but their spacers were differed in each strain. Phylogenetic analyses of CAS superfamily revealed that genes cmr, csh, csx11, HD domain, devR were belonged to the subtypes of cas gene family. The members in cas gene family of thermophiles were functionally diverged within closely related genomes and may contribute to develop several defense strategies. Nevertheless, genome dynamics, geological variation and host defense mechanism were contributed to share their molecular functions across the thermophiles. A thermophilic archaean, Thermococcus gammotolerans and thermophilic bacteria, Petrotoga mobilis and Thermotoga lettingae have shown superoperons-like appearance to cluster cas genes, which were typically evolved for their defense pathways. A cmr operon was identified with a specific promoter in a thermophilic archaean, Caldivirga maquilingensis. Overall, we concluded that knowledge-based genomic survey and phylogeny-based functional assignment have suggested for designing a reliable genetic regulatory circuit naturally from CRISPR-CAS systems, acquired defense pathways, to thermophiles in future synthetic biology.

  2. Discovery of Potent Myeloid Cell Leukemia 1 (Mcl-1) Inhibitors Using Fragment-Based Methods and Structure-Based Design

    Energy Technology Data Exchange (ETDEWEB)

    Friberg, Anders [Vanderbilt Univ. School of Medicine, Nashville, TN (United States); Vigil, Dominico [Vanderbilt Univ. School of Medicine, Nashville, TN (United States); Zhao, Bin [Vanderbilt Univ. School of Medicine, Nashville, TN (United States); Daniels, R. Nathan [Vanderbilt Univ. School of Medicine, Nashville, TN (United States); Burke, Jason P. [Vanderbilt Univ. School of Medicine, Nashville, TN (United States); Garcia-Barrantes, Pedro M. [Vanderbilt Univ. School of Medicine, Nashville, TN (United States); Camper, DeMarco [Vanderbilt Univ. School of Medicine, Nashville, TN (United States); Chauder, Brian A. [Vanderbilt Univ. School of Medicine, Nashville, TN (United States); Lee, Taekyu [Vanderbilt Univ. School of Medicine, Nashville, TN (United States); Olejniczak, Edward T. [Vanderbilt Univ. School of Medicine, Nashville, TN (United States); Fesik, Stephen W. [Vanderbilt Univ. School of Medicine, Nashville, TN (United States)

    2012-12-17

    Myeloid cell leukemia 1 (Mcl-1), a member of the Bcl-2 family of proteins, is overexpressed and amplified in various cancers and promotes the aberrant survival of tumor cells that otherwise would undergo apoptosis. Here we describe the discovery of potent and selective Mcl-1 inhibitors using fragment-based methods and structure-based design. NMR-based screening of a large fragment library identified two chemically distinct hit series that bind to different sites on Mcl-1. Members of the two fragment classes were merged together to produce lead compounds that bind to Mcl-1 with a dissociation constant of <100 nM with selectivity for Mcl-1 over Bcl-xL and Bcl-2. Structures of merged compounds when complexed to Mcl-1 were obtained by X-ray crystallography and provide detailed information about the molecular recognition of small-molecule ligands binding Mcl-1. The compounds represent starting points for the discovery of clinically useful Mcl-1 inhibitors for the treatment of a wide variety of cancers.

  3. Dipeptidyl peptidase 9 substrates and their discovery: current progress and the application of mass spectrometry-based approaches.

    Science.gov (United States)

    Wilson, Claire H; Zhang, Hui Emma; Gorrell, Mark D; Abbott, Catherine A

    2016-09-01

    The enzyme members of the dipeptidyl peptidase 4 (DPP4) gene family have the very unusual capacity to cleave the post-proline bond to release dipeptides from the N-terminus of peptide/protein substrates. DPP4 and related enzymes are current and potential therapeutic targets in the treatment of type II diabetes, inflammatory conditions and cancer. Despite this, the precise biological function of individual dipeptidyl peptidases (DPPs), other than DPP4, and knowledge of their in vivo substrates remains largely unknown. For many years, identification of physiological DPP substrates has been difficult due to limitations in the available tools. Now, with advances in mass spectrometry based approaches, we can discover DPP substrates on a system wide-scale. Application of these approaches has helped reveal some of the in vivo natural substrates of DPP8 and DPP9 and their unique biological roles. In this review, we provide a general overview of some tools and approaches available for protease substrate discovery and their applicability to the DPPs with a specific focus on DPP9 substrates. This review provides comment upon potential approaches for future substrate elucidation. PMID:27410463

  4. Dynamic Structure-Based Pharmacophore Model Development: A New and Effective Addition in the Histone Deacetylase 8 (HDAC8 Inhibitor Discovery

    Directory of Open Access Journals (Sweden)

    Keun Woo Lee

    2011-12-01

    Full Text Available Histone deacetylase 8 (HDAC8 is an enzyme involved in deacetylating the amino groups of terminal lysine residues, thereby repressing the transcription of various genes including tumor suppressor gene. The over expression of HDAC8 was observed in many cancers and thus inhibition of this enzyme has emerged as an efficient cancer therapeutic strategy. In an effort to facilitate the future discovery of HDAC8 inhibitors, we developed two pharmacophore models containing six and five pharmacophoric features, respectively, using the representative structures from two molecular dynamic (MD simulations performed in Gromacs 4.0.5 package. Various analyses of trajectories obtained from MD simulations have displayed the changes upon inhibitor binding. Thus utilization of the dynamically-responded protein structures in pharmacophore development has the added advantage of considering the conformational flexibility of protein. The MD trajectories were clustered based on single-linkage method and representative structures were taken to be used in the pharmacophore model development. Active site complimenting structure-based pharmacophore models were developed using Discovery Studio 2.5 program and validated using a dataset of known HDAC8 inhibitors. Virtual screening of chemical database coupled with drug-like filter has identified drug-like hit compounds that match the pharmacophore models. Molecular docking of these hits reduced the false positives and identified two potential compounds to be used in future HDAC8 inhibitor design.

  5. Dynamic structure-based pharmacophore model development: a new and effective addition in the histone deacetylase 8 (HDAC8) inhibitor discovery.

    Science.gov (United States)

    Thangapandian, Sundarapandian; John, Shalini; Lee, Yuno; Kim, Songmi; Lee, Keun Woo

    2011-01-01

    Histone deacetylase 8 (HDAC8) is an enzyme involved in deacetylating the amino groups of terminal lysine residues, thereby repressing the transcription of various genes including tumor suppressor gene. The over expression of HDAC8 was observed in many cancers and thus inhibition of this enzyme has emerged as an efficient cancer therapeutic strategy. In an effort to facilitate the future discovery of HDAC8 inhibitors, we developed two pharmacophore models containing six and five pharmacophoric features, respectively, using the representative structures from two molecular dynamic (MD) simulations performed in Gromacs 4.0.5 package. Various analyses of trajectories obtained from MD simulations have displayed the changes upon inhibitor binding. Thus utilization of the dynamically-responded protein structures in pharmacophore development has the added advantage of considering the conformational flexibility of protein. The MD trajectories were clustered based on single-linkage method and representative structures were taken to be used in the pharmacophore model development. Active site complimenting structure-based pharmacophore models were developed using Discovery Studio 2.5 program and validated using a dataset of known HDAC8 inhibitors. Virtual screening of chemical database coupled with drug-like filter has identified drug-like hit compounds that match the pharmacophore models. Molecular docking of these hits reduced the false positives and identified two potential compounds to be used in future HDAC8 inhibitor design. PMID:22272142

  6. Computational Materials Science and Chemistry: Accelerating Discovery and Innovation through Simulation-Based Engineering and Science

    Energy Technology Data Exchange (ETDEWEB)

    Crabtree, George [Argonne National Lab. (ANL), Argonne, IL (United States); Glotzer, Sharon [University of Michigan; McCurdy, Bill [University of California Davis; Roberto, Jim [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2010-07-26

    This report is based on a SC Workshop on Computational Materials Science and Chemistry for Innovation on July 26-27, 2010, to assess the potential of state-of-the-art computer simulations to accelerate understanding and discovery in materials science and chemistry, with a focus on potential impacts in energy technologies and innovation. The urgent demand for new energy technologies has greatly exceeded the capabilities of today's materials and chemical processes. To convert sunlight to fuel, efficiently store energy, or enable a new generation of energy production and utilization technologies requires the development of new materials and processes of unprecedented functionality and performance. New materials and processes are critical pacing elements for progress in advanced energy systems and virtually all industrial technologies. Over the past two decades, the United States has developed and deployed the world's most powerful collection of tools for the synthesis, processing, characterization, and simulation and modeling of materials and chemical systems at the nanoscale, dimensions of a few atoms to a few hundred atoms across. These tools, which include world-leading x-ray and neutron sources, nanoscale science facilities, and high-performance computers, provide an unprecedented view of the atomic-scale structure and dynamics of materials and the molecular-scale basis of chemical processes. For the first time in history, we are able to synthesize, characterize, and model materials and chemical behavior at the length scale where this behavior is controlled. This ability is transformational for the discovery process and, as a result, confers a significant competitive advantage. Perhaps the most spectacular increase in capability has been demonstrated in high performance computing. Over the past decade, computational power has increased by a factor of a million due to advances in hardware and software. This rate of improvement, which shows no sign of

  7. A gene-based information gain method for detecting gene-gene interactions in case-control studies.

    Science.gov (United States)

    Li, Jin; Huang, Dongli; Guo, Maozu; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Zhang, Ruijie; Jiang, Yongshuai; Lv, Hongchao; Wang, Limei

    2015-11-01

    Currently, most methods for detecting gene-gene interactions (GGIs) in genome-wide association studies are divided into SNP-based methods and gene-based methods. Generally, the gene-based methods can be more powerful than SNP-based methods. Some gene-based entropy methods can only capture the linear relationship between genes. We therefore proposed a nonparametric gene-based information gain method (GBIGM) that can capture both linear relationship and nonlinear correlation between genes. Through simulation with different odds ratio, sample size and prevalence rate, GBIGM was shown to be valid and more powerful than classic KCCU method and SNP-based entropy method. In the analysis of data from 17 genes on rheumatoid arthritis, GBIGM was more effective than the other two methods as it obtains fewer significant results, which was important for biological verification. Therefore, GBIGM is a suitable and powerful tool for detecting GGIs in case-control studies.

  8. Gene cloning based on long oligonucleotide probes

    International Nuclear Information System (INIS)

    The most commonly used technique for gene cloning has been to utilize oligonucleotide probe based on protein sequence data. Of course this approach requires characterized and purified protein so that at least a portion of amino acid sequence can be determined and used to infer the corresponding DNA sequence. Based on the amino acid sequence information, either short or long oligonucleotide probes can be synthesized chemically. Long probes are typically 30-100 nucleotides long and are a single sequence based on a best guess for each codon. The long probe approach was first used to screen for three different genes: bovine trypsin inhibitor, human insulin-like growth factor I, and human factor IX. There are three advantages of long probes. (1) Any stretch of amino acid sequence 10 or longer can be used. (2) The amino acid sequence need not be absolutely correct. (3) These probes can be used to screen high-complexity libraries with fewer false positives. In spite of the uncertainties over codon selection, the long probe approach is currently the method of choice in screening for genes based on protein sequence data

  9. Discovery of a Series of Acridinones as Mechanism-Based Tubulin Assembly Inhibitors with Anticancer Activity.

    Science.gov (United States)

    Magalhaes, Luma G; Marques, Fernando B; da Fonseca, Marina B; Rogério, Kamilla R; Graebin, Cedric S; Andricopulo, Adriano D

    2016-01-01

    Microtubules play critical roles in vital cell processes, including cell growth, division, and migration. Microtubule-targeting small molecules are chemotherapeutic agents that are widely used in the treatment of cancer. Many of these compounds are structurally complex natural products (e.g., paclitaxel, vinblastine, and vincristine) with multiple stereogenic centers. Because of the scarcity of their natural sources and the difficulty of their partial or total synthesis, as well as problems related to their bioavailability, toxicity, and resistance, there is an urgent need for novel microtubule binding agents that are effective for treating cancer but do not have these disadvantages. In the present work, our lead discovery effort toward less structurally complex synthetic compounds led to the discovery of a series of acridinones inspired by the structure of podophyllotoxin, a natural product with important microtubule assembly inhibitory activity, as novel mechanism-based tubulin assembly inhibitors with potent anticancer properties and low toxicity. The compounds were evaluated in vitro by wound healing assays employing the metastatic and triple negative breast cancer cell line MDA-MB-231. Four compounds with IC50 values between 0.294 and 1.7 μM were identified. These compounds showed selective cytotoxicity against MDA-MB-231 and DU-145 cancer cell lines and promoted cell cycle arrest in G2/M phase and apoptosis. Consistent with molecular modeling results, the acridinones inhibited tubulin assembly in in vitro polymerization assays with IC50 values between 0.9 and 13 μM. Their binding to the colchicine-binding site of tubulin was confirmed through competitive assays. PMID:27508497

  10. Discovery of a Series of Acridinones as Mechanism-Based Tubulin Assembly Inhibitors with Anticancer Activity

    Science.gov (United States)

    Magalhaes, Luma G.; Marques, Fernando B.; da Fonseca, Marina B.; Rogério, Kamilla R.; Graebin, Cedric S.; Andricopulo, Adriano D.

    2016-01-01

    Microtubules play critical roles in vital cell processes, including cell growth, division, and migration. Microtubule-targeting small molecules are chemotherapeutic agents that are widely used in the treatment of cancer. Many of these compounds are structurally complex natural products (e.g., paclitaxel, vinblastine, and vincristine) with multiple stereogenic centers. Because of the scarcity of their natural sources and the difficulty of their partial or total synthesis, as well as problems related to their bioavailability, toxicity, and resistance, there is an urgent need for novel microtubule binding agents that are effective for treating cancer but do not have these disadvantages. In the present work, our lead discovery effort toward less structurally complex synthetic compounds led to the discovery of a series of acridinones inspired by the structure of podophyllotoxin, a natural product with important microtubule assembly inhibitory activity, as novel mechanism-based tubulin assembly inhibitors with potent anticancer properties and low toxicity. The compounds were evaluated in vitro by wound healing assays employing the metastatic and triple negative breast cancer cell line MDA-MB-231. Four compounds with IC50 values between 0.294 and 1.7 μM were identified. These compounds showed selective cytotoxicity against MDA-MB-231 and DU-145 cancer cell lines and promoted cell cycle arrest in G2/M phase and apoptosis. Consistent with molecular modeling results, the acridinones inhibited tubulin assembly in in vitro polymerization assays with IC50 values between 0.9 and 13 μM. Their binding to the colchicine-binding site of tubulin was confirmed through competitive assays. PMID:27508497

  11. Discovery of Subtype Selective Janus Kinase (JAK) Inhibitors by Structure-Based Virtual Screening.

    Science.gov (United States)

    Bajusz, Dávid; Ferenczy, György G; Keserű, György M

    2016-01-25

    Janus kinase inhibitors represent a promising opportunity for the pharmaceutical intervention of various inflammatory and oncological indications. Subtype selective inhibition of these enzymes, however, is still a very challenging goal. In this study, a novel, customized virtual screening protocol was developed with the intention of providing an efficient tool for the discovery of subtype selective JAK2 inhibitors. The screening protocol involves protein ensemble-based docking calculations combined with an Interaction Fingerprint (IFP) based scoring scheme for estimating ligand affinities and selectivities, respectively. The methodology was validated in retrospective studies and was applied prospectively to screen a large database of commercially available compounds. Six compounds were identified and confirmed in vitro, with an indazole-based hit exhibiting promising selectivity for JAK2 vs JAK1. Having demonstrated that the described methodology is capable of identifying subtype selective chemical starting points with a favorable hit rate (11%), we believe that the presented screening concept can be useful for other kinase targets with challenging selectivity profiles. PMID:26682735

  12. Sensor Network-Based and User-Friendly User Location Discovery for Future Smart Homes.

    Science.gov (United States)

    Ahvar, Ehsan; Lee, Gyu Myoung; Han, Son N; Crespi, Noel; Khan, Imran

    2016-01-01

    User location is crucial context information for future smart homes where many location based services will be proposed. This location necessarily means that User Location Discovery (ULD) will play an important role in future smart homes. Concerns about privacy and the need to carry a mobile or a tag device within a smart home currently make conventional ULD systems uncomfortable for users. Future smart homes will need a ULD system to consider these challenges. This paper addresses the design of such a ULD system for context-aware services in future smart homes stressing the following challenges: (i) users' privacy; (ii) device-/tag-free; and (iii) fault tolerance and accuracy. On the other hand, emerging new technologies, such as the Internet of Things, embedded systems, intelligent devices and machine-to-machine communication, are penetrating into our daily life with more and more sensors available for use in our homes. Considering this opportunity, we propose a ULD system that is capitalizing on the prevalence of sensors for the home while satisfying the aforementioned challenges. The proposed sensor network-based and user-friendly ULD system relies on different types of inexpensive sensors, as well as a context broker with a fuzzy-based decision-maker. The context broker receives context information from different types of sensors and evaluates that data using the fuzzy set theory. We demonstrate the performance of the proposed system by illustrating a use case, utilizing both an analytical model and simulation.

  13. Assessment of cardiovascular risk based on a data-driven knowledge discovery approach.

    Science.gov (United States)

    Mendes, D; Paredes, S; Rocha, T; Carvalho, P; Henriques, J; Cabiddu, R; Morais, J

    2015-01-01

    The cardioRisk project addresses the development of personalized risk assessment tools for patients who have been admitted to the hospital with acute myocardial infarction. Although there are models available that assess the short-term risk of death/new events for such patients, these models were established in circumstances that do not take into account the present clinical interventions and, in some cases, the risk factors used by such models are not easily available in clinical practice. The integration of the existing risk tools (applied in the clinician's daily practice) with data-driven knowledge discovery mechanisms based on data routinely collected during hospitalizations, will be a breakthrough in overcoming some of these difficulties. In this context, the development of simple and interpretable models (based on recent datasets), unquestionably will facilitate and will introduce confidence in this integration process. In this work, a simple and interpretable model based on a real dataset is proposed. It consists of a decision tree model structure that uses a reduced set of six binary risk factors. The validation is performed using a recent dataset provided by the Portuguese Society of Cardiology (11113 patients), which originally comprised 77 risk factors. A sensitivity, specificity and accuracy of, respectively, 80.42%, 77.25% and 78.80% were achieved showing the effectiveness of the approach.

  14. Sensor Network-Based and User-Friendly User Location Discovery for Future Smart Homes.

    Science.gov (United States)

    Ahvar, Ehsan; Lee, Gyu Myoung; Han, Son N; Crespi, Noel; Khan, Imran

    2016-01-01

    User location is crucial context information for future smart homes where many location based services will be proposed. This location necessarily means that User Location Discovery (ULD) will play an important role in future smart homes. Concerns about privacy and the need to carry a mobile or a tag device within a smart home currently make conventional ULD systems uncomfortable for users. Future smart homes will need a ULD system to consider these challenges. This paper addresses the design of such a ULD system for context-aware services in future smart homes stressing the following challenges: (i) users' privacy; (ii) device-/tag-free; and (iii) fault tolerance and accuracy. On the other hand, emerging new technologies, such as the Internet of Things, embedded systems, intelligent devices and machine-to-machine communication, are penetrating into our daily life with more and more sensors available for use in our homes. Considering this opportunity, we propose a ULD system that is capitalizing on the prevalence of sensors for the home while satisfying the aforementioned challenges. The proposed sensor network-based and user-friendly ULD system relies on different types of inexpensive sensors, as well as a context broker with a fuzzy-based decision-maker. The context broker receives context information from different types of sensors and evaluates that data using the fuzzy set theory. We demonstrate the performance of the proposed system by illustrating a use case, utilizing both an analytical model and simulation. PMID:27355951

  15. Sensor Network-Based and User-Friendly User Location Discovery for Future Smart Homes

    Science.gov (United States)

    Ahvar, Ehsan; Lee, Gyu Myoung; Han, Son N.; Crespi, Noel; Khan, Imran

    2016-01-01

    User location is crucial context information for future smart homes where many location based services will be proposed. This location necessarily means that User Location Discovery (ULD) will play an important role in future smart homes. Concerns about privacy and the need to carry a mobile or a tag device within a smart home currently make conventional ULD systems uncomfortable for users. Future smart homes will need a ULD system to consider these challenges. This paper addresses the design of such a ULD system for context-aware services in future smart homes stressing the following challenges: (i) users’ privacy; (ii) device-/tag-free; and (iii) fault tolerance and accuracy. On the other hand, emerging new technologies, such as the Internet of Things, embedded systems, intelligent devices and machine-to-machine communication, are penetrating into our daily life with more and more sensors available for use in our homes. Considering this opportunity, we propose a ULD system that is capitalizing on the prevalence of sensors for the home while satisfying the aforementioned challenges. The proposed sensor network-based and user-friendly ULD system relies on different types of inexpensive sensors, as well as a context broker with a fuzzy-based decision-maker. The context broker receives context information from different types of sensors and evaluates that data using the fuzzy set theory. We demonstrate the performance of the proposed system by illustrating a use case, utilizing both an analytical model and simulation. PMID:27355951

  16. Sensor Network-Based and User-Friendly User Location Discovery for Future Smart Homes

    Directory of Open Access Journals (Sweden)

    Ehsan Ahvar

    2016-06-01

    Full Text Available User location is crucial context information for future smart homes where many location based services will be proposed. This location necessarily means that User Location Discovery (ULD will play an important role in future smart homes. Concerns about privacy and the need to carry a mobile or a tag device within a smart home currently make conventional ULD systems uncomfortable for users. Future smart homes will need a ULD system to consider these challenges. This paper addresses the design of such a ULD system for context-aware services in future smart homes stressing the following challenges: (i users’ privacy; (ii device-/tag-free; and (iii fault tolerance and accuracy. On the other hand, emerging new technologies, such as the Internet of Things, embedded systems, intelligent devices and machine-to-machine communication, are penetrating into our daily life with more and more sensors available for use in our homes. Considering this opportunity, we propose a ULD system that is capitalizing on the prevalence of sensors for the home while satisfying the aforementioned challenges. The proposed sensor network-based and user-friendly ULD system relies on different types of inexpensive sensors, as well as a context broker with a fuzzy-based decision-maker. The context broker receives context information from different types of sensors and evaluates that data using the fuzzy set theory. We demonstrate the performance of the proposed system by illustrating a use case, utilizing both an analytical model and simulation.

  17. A systems-genetics approach and data mining tool to assist in the discovery of genes underlying complex traits in Oryza sativa.

    Directory of Open Access Journals (Sweden)

    Stephen P Ficklin

    Full Text Available Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice, thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs. Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001 from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and

  18. Evidence-based gene predictions in plant genomes

    Science.gov (United States)

    Automated evidence-based gene building is a rapid and cost-effective way to provide reliable gene annotations on newly sequenced genomes. One of the limitations of evidence-based gene builders, however, is their requirement for gene expression evidence—known proteins, full-length cDNAs, or expressed...

  19. Computational Materials Science and Chemistry: Accelerating Discovery and Innovation through Simulation-Based Engineering and Science

    Energy Technology Data Exchange (ETDEWEB)

    Crabtree, George [Argonne National Lab. (ANL), Argonne, IL (United States); Glotzer, Sharon [University of Michigan; McCurdy, Bill [University of California Davis; Roberto, Jim [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2010-07-26

    This report is based on a SC Workshop on Computational Materials Science and Chemistry for Innovation on July 26-27, 2010, to assess the potential of state-of-the-art computer simulations to accelerate understanding and discovery in materials science and chemistry, with a focus on potential impacts in energy technologies and innovation. The urgent demand for new energy technologies has greatly exceeded the capabilities of today's materials and chemical processes. To convert sunlight to fuel, efficiently store energy, or enable a new generation of energy production and utilization technologies requires the development of new materials and processes of unprecedented functionality and performance. New materials and processes are critical pacing elements for progress in advanced energy systems and virtually all industrial technologies. Over the past two decades, the United States has developed and deployed the world's most powerful collection of tools for the synthesis, processing, characterization, and simulation and modeling of materials and chemical systems at the nanoscale, dimensions of a few atoms to a few hundred atoms across. These tools, which include world-leading x-ray and neutron sources, nanoscale science facilities, and high-performance computers, provide an unprecedented view of the atomic-scale structure and dynamics of materials and the molecular-scale basis of chemical processes. For the first time in history, we are able to synthesize, characterize, and model materials and chemical behavior at the length scale where this behavior is controlled. This ability is transformational for the discovery process and, as a result, confers a significant competitive advantage. Perhaps the most spectacular increase in capability has been demonstrated in high performance computing. Over the past decade, computational power has increased by a factor of a million due to advances in hardware and software. This rate of improvement, which shows no sign of

  20. Genomic sequence-based discovery of novel angucyclinone antibiotics from marine Streptomyces sp. W007.

    Science.gov (United States)

    Zhang, Hongyu; Wang, Hongbo; Wang, Yipeng; Cui, Hongli; Xie, Zeping; Pu, Yang; Pei, Shiqian; Li, Fuchao; Qin, Song

    2012-07-01

    A large number of novel bioactive compounds were discovered from microbial secondary metabolites based on the traditional bioactivity screenings. Recent fermentation studies indicated that the crude extract of marine Streptomyces sp. W007 possessed great potential in agricultural fungal disease control against Phomopsis asparagi, Polystigma deformans, Cladosporium cucumerinum, Monilinia fructicola, and Colletotrichum lagenarium. To further evaluate the biosynthetic potential of secondary metabolites, we sequenced the genome of Streptomyces sp. W007 and analyzed the identifiable secondary metabolite gene clusters. Moreover, one gene cluster with type II PKS implied the possibility of Streptomyces sp. W007 to produce aromatic polyketide of angucyclinone antibiotics. Therefore, two novel compounds, 3-hydroxy-1-keto-3-methyl-8-methoxy-1,2,3,4-tetrahydro-benz[α]anthracene and kiamycin with potent cytotoxicities against human cancer cell lines, were isolated from the culture broth of Streptomyces sp. W007. In addition, other four known angucyclinone antibiotics were obtained. The gene cluster for these angucyclinone antibiotics could be assigned to 20 genes. This work provides powerful evidence for the interplay between genomic analysis and traditional natural product isolation research. PMID:22536997

  1. Evidence Based Selection of Housekeeping Genes

    OpenAIRE

    de Jonge, Hendrik J.M.; Fehrmann, Rudolf S. N.; Eveline S. J. M. de Bont; Hofstra, Robert M. W.; Gerbens, Frans; Kamps, Willem A.; Vries, Elisabeth G. E.; van der Zee, Ate G.J.; te Meerman, Gerard J.; ter Elst, Arja

    2007-01-01

    For accurate and reliable gene expression analysis, normalization of gene expression data against housekeeping genes (reference or internal control genes) is required. It is known that commonly used housekeeping genes (e.g. ACTB, GAPDH, HPRT1, and B2M) vary considerably under different experimental conditions and therefore their use for normalization is limited. We performed a meta-analysis of 13,629 human gene array samples in order to identify the most stable expressed genes. Here we show n...

  2. Accelerating Gene Discovery by Phenotyping Whole-Genome Sequenced Multi-mutation Strains and Using the Sequence Kernel Association Test (SKAT).

    Science.gov (United States)

    Timbers, Tiffany A; Garland, Stephanie J; Mohan, Swetha; Flibotte, Stephane; Edgley, Mark; Muncaster, Quintin; Au, Vinci; Li-Leger, Erica; Rosell, Federico I; Cai, Jerry; Rademakers, Suzanne; Jansen, Gert; Moerman, Donald G; Leroux, Michel R

    2016-08-01

    Forward genetic screens represent powerful, unbiased approaches to uncover novel components in any biological process. Such screens suffer from a major bottleneck, however, namely the cloning of corresponding genes causing the phenotypic variation. Reverse genetic screens have been employed as a way to circumvent this issue, but can often be limited in scope. Here we demonstrate an innovative approach to gene discovery. Using C. elegans as a model system, we used a whole-genome sequenced multi-mutation library, from the Million Mutation Project, together with the Sequence Kernel Association Test (SKAT), to rapidly screen for and identify genes associated with a phenotype of interest, namely defects in dye-filling of ciliated sensory neurons. Such anomalies in dye-filling are often associated with the disruption of cilia, organelles which in humans are implicated in sensory physiology (including vision, smell and hearing), development and disease. Beyond identifying several well characterised dye-filling genes, our approach uncovered three genes not previously linked to ciliated sensory neuron development or function. From these putative novel dye-filling genes, we confirmed the involvement of BGNT-1.1 in ciliated sensory neuron function and morphogenesis. BGNT-1.1 functions at the trans-Golgi network of sheath cells (glia) to influence dye-filling and cilium length, in a cell non-autonomous manner. Notably, BGNT-1.1 is the orthologue of human B3GNT1/B4GAT1, a glycosyltransferase associated with Walker-Warburg syndrome (WWS). WWS is a multigenic disorder characterised by muscular dystrophy as well as brain and eye anomalies. Together, our work unveils an effective and innovative approach to gene discovery, and provides the first evidence that B3GNT1-associated Walker-Warburg syndrome may be considered a ciliopathy. PMID:27508411

  3. Accelerating Gene Discovery by Phenotyping Whole-Genome Sequenced Multi-mutation Strains and Using the Sequence Kernel Association Test (SKAT)

    Science.gov (United States)

    Garland, Stephanie J.; Mohan, Swetha; Flibotte, Stephane; Muncaster, Quintin; Cai, Jerry; Rademakers, Suzanne; Moerman, Donald G.; Leroux, Michel R.

    2016-01-01

    Forward genetic screens represent powerful, unbiased approaches to uncover novel components in any biological process. Such screens suffer from a major bottleneck, however, namely the cloning of corresponding genes causing the phenotypic variation. Reverse genetic screens have been employed as a way to circumvent this issue, but can often be limited in scope. Here we demonstrate an innovative approach to gene discovery. Using C. elegans as a model system, we used a whole-genome sequenced multi-mutation library, from the Million Mutation Project, together with the Sequence Kernel Association Test (SKAT), to rapidly screen for and identify genes associated with a phenotype of interest, namely defects in dye-filling of ciliated sensory neurons. Such anomalies in dye-filling are often associated with the disruption of cilia, organelles which in humans are implicated in sensory physiology (including vision, smell and hearing), development and disease. Beyond identifying several well characterised dye-filling genes, our approach uncovered three genes not previously linked to ciliated sensory neuron development or function. From these putative novel dye-filling genes, we confirmed the involvement of BGNT-1.1 in ciliated sensory neuron function and morphogenesis. BGNT-1.1 functions at the trans-Golgi network of sheath cells (glia) to influence dye-filling and cilium length, in a cell non-autonomous manner. Notably, BGNT-1.1 is the orthologue of human B3GNT1/B4GAT1, a glycosyltransferase associated with Walker-Warburg syndrome (WWS). WWS is a multigenic disorder characterised by muscular dystrophy as well as brain and eye anomalies. Together, our work unveils an effective and innovative approach to gene discovery, and provides the first evidence that B3GNT1-associated Walker-Warburg syndrome may be considered a ciliopathy. PMID:27508411

  4. Enabling Metabolomics Based Biomarker Discovery Studies Using Molecular Phenotyping of Exosome-Like Vesicles.

    Directory of Open Access Journals (Sweden)

    Tatiana Altadill

    Full Text Available Identification of sensitive and specific biomarkers with clinical and translational utility will require smart experimental strategies that would augment expanding the breadth and depth of molecular measurements within the constraints of currently available technologies. Exosomes represent an information rich matrix to discern novel disease mechanisms that are thought to contribute to pathologies such as dementia and cancer. Although proteomics and transcriptomic studies have been reported using Exosomes-Like Vesicles (ELVs from different sources, exosomal metabolome characterization and its modulation in health and disease remains to be elucidated. Here we describe methodologies for UPLC-ESI-MS based small molecule profiling of ELVs from human plasma and cell culture media. In this study, we present evidence that indeed ELVs carry a rich metabolome that could not only augment the discovery of low abundance biomarkers but may also help explain the molecular basis of disease progression. This approach could be easily translated to other studies seeking to develop predictive biomarkers that can subsequently be used with simplified targeted approaches.

  5. Discovery of an Oxybenzylglycine Based Peroxisome Proliferator Activated Receptor Alpha Selective

    Energy Technology Data Exchange (ETDEWEB)

    Li, J.; Kennedy, L; Shi, Y; Tao, S; Ye, X; Chen, S; Wang, Y; Hernandez, A; Wang, W; et al.

    2010-01-01

    An 1,3-oxybenzylglycine based compound 2 (BMS-687453) was discovered to be a potent and selective peroxisome proliferator activated receptor (PPAR) {alpha} agonist, with an EC{sub 50} of 10 nM for human PPAR{alpha} and {approx}410-fold selectivity vs human PPAR{gamma} in PPAR-GAL4 transactivation assays. Similar potencies and selectivity were also observed in the full length receptor co-transfection assays. Compound 2 has negligible cross-reactivity against a panel of human nuclear hormone receptors including PPAR{delta}. Compound 2 demonstrated an excellent pharmacological and safety profile in preclinical studies and thus was chosen as a development candidate for the treatment of atherosclerosis and dyslipidemia. The X-ray cocrystal structures of the early lead compound 12 and compound 2 in complex with PPAR{alpha} ligand binding domain (LBD) were determined. The role of the crystal structure of compound 12 with PPAR{alpha} in the development of the SAR that ultimately resulted in the discovery of compound 2 is discussed.

  6. Structure based discovery of small molecules to regulate the activity of human insulin degrading enzyme.

    Directory of Open Access Journals (Sweden)

    Bilal Çakir

    Full Text Available BACKGROUND: Insulin-degrading enzyme (IDE is an allosteric Zn(+2 metalloprotease involved in the degradation of many peptides including amyloid-β, and insulin that play key roles in Alzheimer's disease (AD and type 2 diabetes mellitus (T2DM, respectively. Therefore, the use of therapeutic agents that regulate the activity of IDE would be a viable approach towards generating pharmaceutical treatments for these diseases. Crystal structure of IDE revealed that N-terminal has an exosite which is ∼30 Å away from the catalytic region and serves as a regulation site by orientation of the substrates of IDE to the catalytic site. It is possible to find small molecules that bind to the exosite of IDE and enhance its proteolytic activity towards different substrates. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we applied structure based drug design method combined with experimental methods to discover four novel molecules that enhance the activity of human IDE. The novel compounds, designated as D3, D4, D6, and D10 enhanced IDE mediated proteolysis of substrate V, insulin and amyloid-β, while enhanced degradation profiles were obtained towards substrate V and insulin in the presence of D10 only. CONCLUSION/SIGNIFICANCE: This paper describes the first examples of a computer-aided discovery of IDE regulators, showing that in vitro and in vivo activation of this important enzyme with small molecules is possible.

  7. Structure Based Discovery of Small Molecules to Regulate the Activity of Human Insulin Degrading Enzyme

    Science.gov (United States)

    Çakir, Bilal; Dağliyan, Onur; Dağyildiz, Ezgi; Bariş, İbrahim; Kavakli, Ibrahim Halil; Kizilel, Seda; Türkay, Metin

    2012-01-01

    Background Insulin-degrading enzyme (IDE) is an allosteric Zn+2 metalloprotease involved in the degradation of many peptides including amyloid-β, and insulin that play key roles in Alzheimer's disease (AD) and type 2 diabetes mellitus (T2DM), respectively. Therefore, the use of therapeutic agents that regulate the activity of IDE would be a viable approach towards generating pharmaceutical treatments for these diseases. Crystal structure of IDE revealed that N-terminal has an exosite which is ∼30 Å away from the catalytic region and serves as a regulation site by orientation of the substrates of IDE to the catalytic site. It is possible to find small molecules that bind to the exosite of IDE and enhance its proteolytic activity towards different substrates. Methodology/Principal Findings In this study, we applied structure based drug design method combined with experimental methods to discover four novel molecules that enhance the activity of human IDE. The novel compounds, designated as D3, D4, D6, and D10 enhanced IDE mediated proteolysis of substrate V, insulin and amyloid-β, while enhanced degradation profiles were obtained towards substrate V and insulin in the presence of D10 only. Conclusion/Significance This paper describes the first examples of a computer-aided discovery of IDE regulators, showing that in vitro and in vivo activation of this important enzyme with small molecules is possible. PMID:22355395

  8. Gun possession among American youth: a discovery-based approach to understand gun violence.

    Directory of Open Access Journals (Sweden)

    Kelly V Ruggles

    Full Text Available OBJECTIVE: To apply discovery-based computational methods to nationally representative data from the Centers for Disease Control and Preventions' Youth Risk Behavior Surveillance System to better understand and visualize the behavioral factors associated with gun possession among adolescent youth. RESULTS: Our study uncovered the multidimensional nature of gun possession across nearly five million unique data points over a ten year period (2001-2011. Specifically, we automated odds ratio calculations for 55 risk behaviors to assemble a comprehensive table of associations for every behavior combination. Downstream analyses included the hierarchical clustering of risk behaviors based on their association "fingerprint" to 1 visualize and assess which behaviors frequently co-occur and 2 evaluate which risk behaviors are consistently found to be associated with gun possession. From these analyses, we identified more than 40 behavioral factors, including heroin use, using snuff on school property, having been injured in a fight, and having been a victim of sexual violence, that have and continue to be strongly associated with gun possession. Additionally, we identified six behavioral clusters based on association similarities: 1 physical activity and nutrition; 2 disordered eating, suicide and sexual violence; 3 weapon carrying and physical safety; 4 alcohol, marijuana and cigarette use; 5 drug use on school property and 6 overall drug use. CONCLUSIONS: Use of computational methodologies identified multiple risk behaviors, beyond more commonly discussed indicators of poor mental health, that are associated with gun possession among youth. Implications for prevention efforts and future interdisciplinary work applying computational methods to behavioral science data are described.

  9. Immunophenotype Discovery, Hierarchical Organization, and Template-Based Classification of Flow Cytometry Samples

    Science.gov (United States)

    Azad, Ariful; Rajwa, Bartek; Pothen, Alex

    2016-01-01

    We describe algorithms for discovering immunophenotypes from large collections of flow cytometry samples and using them to organize the samples into a hierarchy based on phenotypic similarity. The hierarchical organization is helpful for effective and robust cytometry data mining, including the creation of collections of cell populations’ characteristic of different classes of samples, robust classification, and anomaly detection. We summarize a set of samples belonging to a biological class or category with a statistically derived template for the class. Whereas individual samples are represented in terms of their cell populations (clusters), a template consists of generic meta-populations (a group of homogeneous cell populations obtained from the samples in a class) that describe key phenotypes shared among all those samples. We organize an FC data collection in a hierarchical data structure that supports the identification of immunophenotypes relevant to clinical diagnosis. A robust template-based classification scheme is also developed, but our primary focus is in the discovery of phenotypic signatures and inter-sample relationships in an FC data collection. This collective analysis approach is more efficient and robust since templates describe phenotypic signatures common to cell populations in several samples while ignoring noise and small sample-specific variations. We have applied the template-based scheme to analyze several datasets, including one representing a healthy immune system and one of acute myeloid leukemia (AML) samples. The last task is challenging due to the phenotypic heterogeneity of the several subtypes of AML. However, we identified thirteen immunophenotypes corresponding to subtypes of AML and were able to distinguish acute promyelocytic leukemia (APL) samples with the markers provided. Clinically, this is helpful since APL has a different treatment regimen from other subtypes of AML. Core algorithms used in our data analysis are

  10. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    Directory of Open Access Journals (Sweden)

    Luo Ming-Cheng

    2011-01-01

    Full Text Available Abstract Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA

  11. Xenogenomics: Genomic Bioprospecting in Indigenous and Exotic Plants Through EST Discovery, cDNA Microarray-Based Expression Profiling and Functional Genomics

    Directory of Open Access Journals (Sweden)

    German C. Spangenberg

    2006-04-01

    Full Text Available To date, the overwhelming majority of genomics programs in plants have been directed at model or crop plant species, meaning that very little of the naturally occurring sequence diversity found in plants is available for characterization and exploitation. In contrast, ‘xenogenomics’ refers to the discovery and functional analysis of novel genes and alleles from indigenous and exotic species, permitting bioprospecting of biodiversity using high-throughput genomics experimental approaches. Such a program has been initiated to bioprospect for genetic determinants of abiotic stress tolerance in indigenous Australian flora and native Antarctic plants. Uniquely adapted Poaceae and Fabaceae species with enhanced tolerance to salt, drought, elevated soil aluminium concentration, and freezing stress have been identified, based primarily on their eco-physiology, and have been subjected to structural and functional genomics analyses. For each species, EST collections have been derived from plants subjected to appropriate abiotic stresses. Transcript profiling with spotted unigene cDNA micro-arrays has been used to identify genes that are transcriptionally modulated in response to abiotic stress. Candidate genes identified on the basis of sequence annotation or transcript profiling have been assayed in planta and other in vivo systems for their capacity to confer novel phenotypes. Comparative genomics analysis of novel genes and alleles identified in the xenogenomics target plant species has subsequently been undertaken with reference to key model and crop plants.

  12. Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi

    Directory of Open Access Journals (Sweden)

    Wynhoven Brian

    2011-03-01

    Full Text Available Abstract Background Rust fungi are biotrophic basidiomycete plant pathogens that cause major diseases on plants and trees world-wide, affecting agriculture and forestry. Their biotrophic nature precludes many established molecular genetic manipulations and lines of research. The generation of genomic resources for these microbes is leading to novel insights into biology such as interactions with the hosts and guiding directions for breakthrough research in plant pathology. Results To support gene discovery and gene model verification in the genome of the wheat leaf rust fungus, Puccinia triticina (Pt, we have generated Expressed Sequence Tags (ESTs by sampling several life cycle stages. We focused on several spore stages and isolated haustorial structures from infected wheat, generating 17,684 ESTs. We produced sequences from both the sexual (pycniospores, aeciospores and teliospores and asexual (germinated urediniospores stages of the life cycle. From pycniospores and aeciospores, produced by infecting the alternate host, meadow rue (Thalictrum speciosissimum, 4,869 and 1,292 reads were generated, respectively. We generated 3,703 ESTs from teliospores produced on the senescent primary wheat host. Finally, we generated 6,817 reads from haustoria isolated from infected wheat as well as 1,003 sequences from germinated urediniospores. Along with 25,558 previously generated ESTs, we compiled a database of 13,328 non-redundant sequences (4,506 singlets and 8,822 contigs. Fungal genes were predicted using the EST version of the self-training GeneMarkS algorithm. To refine the EST database, we compared EST sequences by BLASTN to a set of 454 pyrosequencing-generated contigs and Sanger BAC-end sequences derived both from the Pt genome, and to ESTs and genome reads from wheat. A collection of 6,308 fungal genes was identified and compared to sequences of the cereal rusts, Puccinia graminis f. sp. tritici (Pgt and stripe rust, P. striiformis f. sp

  13. Design Process Optimization Based on Design Process Gene Mapping

    Institute of Scientific and Technical Information of China (English)

    LI Bo; TONG Shu-rong

    2011-01-01

    The idea of genetic engineering is introduced into the area of product design to improve the design efficiency. A method towards design process optimization based on the design process gene is proposed through analyzing the correlation between the design process gene and characteristics of the design process. The concept of the design process gene is analyzed and categorized into five categories that are the task specification gene, the concept design gene, the overall design gene, the detailed design gene and the processing design gene in the light of five design phases. The elements and their interactions involved in each kind of design process gene signprocess gene mapping is drawn with its structure disclosed based on its function that process gene.

  14. Random SNPs Discovery from Genome and Target Gene of Turbot (Scophthalmus maximus) by Using Magnetic Beads%磁珠富集法随机筛选大菱鲆基因组SNP标记

    Institute of Scientific and Technical Information of China (English)

    董晓丽; 徐建勇; 陈松林

    2013-01-01

    利用磁珠富集法随机筛选了大菱鲆(Scophthalmus maximus)基因组7820bp的序列,获得了35个SNP标记,SNP标记的含量约为0.448%.超过68%的SNP标记由碱基转换造成,不足29%的SNP标记由碱基的颠换造成.使用磁珠富集法对目的基因KC70的检测发现,725bp的片段上发现7个SNP标记.此结果证实,该方法不仅能够随机筛选基因组SNP标记,还能筛选目的基因的SNP标记.%The Magnetic Beads was used for randomly discovering SNPs from turbot (Scophthalmus muximus) genome DNA. 35 SNPs were detected from 7820 bp DNA sequences, about 0.448%. More than 68 percents of the SNPs were caused by base transition and less than 29 percents resulted from transversion. Seven SNPs were found from the 725 bp fragment of KC70 gene. The findings suggest that the new method be not only useful for random SNPs discovery but also for the SNPs discovery of target genes.

  15. Automated discovery of tissue-targeting enhancers and transcription factors from binding motif and gene function data.

    Directory of Open Access Journals (Sweden)

    Geetu Tuteja

    2014-01-01

    Full Text Available Identifying enhancers regulating gene expression remains an important and challenging task. While recent sequencing-based methods provide epigenomic characteristics that correlate well with enhancer activity, it remains onerous to comprehensively identify all enhancers across development. Here we introduce a computational framework to identify tissue-specific enhancers evolving under purifying selection. First, we incorporate high-confidence binding site predictions with target gene functional enrichment analysis to identify transcription factors (TFs likely functioning in a particular context. We then search the genome for clusters of binding sites for these TFs, overcoming previous constraints associated with biased manual curation of TFs or enhancers. Applying our method to the placenta, we find 33 known and implicate 17 novel TFs in placental function, and discover 2,216 putative placenta enhancers. Using luciferase reporter assays, 31/36 (86% tested candidates drive activity in placental cells. Our predictions agree well with recent epigenomic data in human and mouse, yet over half our loci, including 7/8 (87% tested regions, are novel. Finally, we establish that our method is generalizable by applying it to 5 additional tissues: heart, pancreas, blood vessel, bone marrow, and liver.

  16. Progress in Chimeric Vector and Chimeric Gene Based Cardiovascular Gene Therapy

    Institute of Scientific and Technical Information of China (English)

    HU Chun-Song; YOON Young-sup; ISNER Jeffrey M.; LOSORDO Douglas W.

    2003-01-01

    Gene therapy for cardiovascular diseases has developed from preliminary animal experiments to clinical trials. However, vectors and target genes used currently in gene therapy are mainly focused on viral, nonviral vector and single target gene or monogene. Each vector system has a series of advantages and limitations. Chimeric vectors which combine the advantages of viral and nonviral vector,chimeric target genes which combine two or more target genes and novel gene delivery modes are being developed. In this article, we summarized the progress in chimeric vectors and chimeric genes based cardiovascular gene therapy, which including proliferative or occlusive vascular diseases such as atheroslerosis and restenosis, hypertonic vascular disease such as hypertension and cardiac diseases such as myocardium ischemia, dilated cardiomyopathy and heart failure, even heart transplantation. The development of chimeric vector, chimeric gene and their cardiovascular gene therapy is promising.

  17. Practice-Based Knowledge Discovery for Comparative Effectiveness Research: An Organizing Framework.

    Science.gov (United States)

    Lucero, Robert J; Bakken, Suzanne

    2013-03-01

    Electronic health information systems can increase the ability of health-care organizations to investigate the effects of clinical interventions. The authors present an organizing framework that integrates outcomes and informatics research paradigms to guide knowledge discovery in electronic clinical databases. They illustrate its application using the example of hospital acquired pressure ulcers (HAPU). The Knowledge Discovery through Informatics for Comparative Effectiveness Research (KDI-CER) framework was conceived as a heuristic to conceptualize study designs and address potential methodological limitations imposed by using a single research perspective. Advances in informatics research can play a complementary role in advancing the field of outcomes research including CER. The KDI-CER framework can be used to facilitate knowledge discovery from routinely collected electronic clinical data.

  18. Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Models for Mycobacterium tuberculosis Drug Discovery

    Science.gov (United States)

    Sarker, Malabika; Li, Shao-Gang; Mittal, Nisha; Kumar, Pradeep; Wang, Xin; Stratton, Thomas P.; Zimmerman, Matthew; Talcott, Carolyn; Bourbon, Pauline; Travers, Mike; Yadav, Maneesh

    2015-01-01

    Integrated computational approaches for Mycobacterium tuberculosis (Mtb) are useful to identify new molecules that could lead to future tuberculosis (TB) drugs. Our approach uses information derived from the TBCyc pathway and genome database, the Collaborative Drug Discovery TB database combined with 3D pharmacophores and dual event Bayesian models of whole-cell activity and lack of cytotoxicity. We have prioritized a large number of molecules that may act as mimics of substrates and metabolites in the TB metabolome. We computationally searched over 200,000 commercial molecules using 66 pharmacophores based on substrates and metabolites from Mtb and further filtering with Bayesian models. We ultimately tested 110 compounds in vitro that resulted in two compounds of interest, BAS 04912643 and BAS 00623753 (MIC of 2.5 and 5 μg/mL, respectively). These molecules were used as a starting point for hit-to-lead optimization. The most promising class proved to be the quinoxaline di-N-oxides, evidenced by transcriptional profiling to induce mRNA level perturbations most closely resembling known protonophores. One of these, SRI58 exhibited an MIC = 1.25 μg/mL versus Mtb and a CC50 in Vero cells of >40 μg/mL, while featuring fair Caco-2 A-B permeability (2.3 x 10−6 cm/s), kinetic solubility (125 μM at pH 7.4 in PBS) and mouse metabolic stability (63.6% remaining after 1 h incubation with mouse liver microsomes). Despite demonstration of how a combined bioinformatics/cheminformatics approach afforded a small molecule with promising in vitro profiles, we found that SRI58 did not exhibit quantifiable blood levels in mice. PMID:26517557

  19. Antibody-Array-Based Proteomic Screening of Serum Markers in Systemic Lupus Erythematosus: A Discovery Study.

    Science.gov (United States)

    Wu, Tianfu; Ding, Huihua; Han, Jie; Arriens, Cristina; Wei, Chungwen; Han, Weilu; Pedroza, Claudia; Jiang, Shan; Anolik, Jennifer; Petri, Michelle; Sanz, Ignacio; Saxena, Ramesh; Mohan, Chandra

    2016-07-01

    A discovery study was carried out where serum samples from 22 systemic lupus erythematosus (SLE) patients and matched healthy controls were hybridized to antibody-coated glass slide arrays that interrogated the level of 274 human proteins. On the basis of these screens, 48 proteins were selected for ELISA-based validation in an independent cohort of 28 SLE patients. Whereas AXL, ferritin, and sTNFRII were significantly elevated in patients with active lupus nephritis (LN) relative to SLE patients who were quiescent, other molecules such as OPN, sTNFRI, sTNFRII, IGFBP2, SIGLEC5, FAS, and MMP10 exhibited the capacity to distinguish SLE from healthy controls with ROC AUC exceeding 90%, all with p serum markers were next tested in a cohort of 45 LN patients, where serum was obtained at the time of renal biopsy. In these patients, sTNFRII exhibited the strongest correlation with eGFR (r = -0.50, p = 0.0014) and serum creatinine (r = 0.57, p = 0.0001), although AXL, FAS, and IGFBP2 also correlated with these clinical measures of renal function. When concurrent renal biopsies from these patients were examined, serum FAS, IGFBP2, and TNFRII showed significant positive correlations with renal pathology activity index, while sTNFRII displayed the highest correlation with concurrently scored renal pathology chronicity index (r = 0.57, p = 0.001). Finally, in a longitudinal cohort of seven SLE patients examined at ∼3 month intervals, AXL, ICAM-1, IGFBP2, SIGLEC5, sTNFRII, and VCAM-1 demonstrated the ability to track with concurrent disease flare, with significant subject to subject variation. In summary, serum proteins have the capacity to identify patients with active nephritis, flares, and renal pathology activity or chronicity changes, although larger longitudinal cohort studies are warranted. PMID:27211902

  20. Common minor histocompatibility antigen discovery based upon patient clinical outcomes and genomic data.

    Directory of Open Access Journals (Sweden)

    Paul M Armistead

    Full Text Available BACKGROUND: Minor histocompatibility antigens (mHA mediate much of the graft vs. leukemia (GvL effect and graft vs. host disease (GvHD in patients who undergo allogeneic stem cell transplantation (SCT. Therapeutic decision making and treatments based upon mHAs will require the evaluation of multiple candidate mHAs and the selection of those with the potential to have the greatest impact on clinical outcomes. We hypothesized that common, immunodominant mHAs, which are presented by HLA-A, B, and C molecules, can mediate clinically significant GvL and/or GvHD, and that these mHAs can be identified through association of genomic data with clinical outcomes. METHODOLOGY/PRINCIPAL FINDINGS: Because most mHAs result from donor/recipient cSNP disparities, we genotyped 57 myeloid leukemia patients and their donors at 13,917 cSNPs. We correlated the frequency of genetically predicted mHA disparities with clinical evidence of an immune response and then computationally screened all peptides mapping to the highly associated cSNPs for their ability to bind to HLA molecules. As proof-of-concept, we analyzed one predicted antigen, T4A, whose mHA mismatch trended towards improved overall and disease free survival in our cohort. T4A mHA mismatches occurred at the maximum theoretical frequency for any given SCT. T4A-specific CD8+ T lymphocytes (CTLs were detected in 3 of 4 evaluable post-transplant patients predicted to have a T4A mismatch. CONCLUSIONS/SIGNIFICANCE: Our method is the first to combine clinical outcomes data with genomics and bioinformatics methods to predict and confirm a mHA. Refinement of this method should enable the discovery of clinically relevant mHAs in the majority of transplant patients and possibly lead to novel immunotherapeutics.

  1. Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Models for Mycobacterium tuberculosis Drug Discovery.

    Directory of Open Access Journals (Sweden)

    Sean Ekins

    Full Text Available Integrated computational approaches for Mycobacterium tuberculosis (Mtb are useful to identify new molecules that could lead to future tuberculosis (TB drugs. Our approach uses information derived from the TBCyc pathway and genome database, the Collaborative Drug Discovery TB database combined with 3D pharmacophores and dual event Bayesian models of whole-cell activity and lack of cytotoxicity. We have prioritized a large number of molecules that may act as mimics of substrates and metabolites in the TB metabolome. We computationally searched over 200,000 commercial molecules using 66 pharmacophores based on substrates and metabolites from Mtb and further filtering with Bayesian models. We ultimately tested 110 compounds in vitro that resulted in two compounds of interest, BAS 04912643 and BAS 00623753 (MIC of 2.5 and 5 μg/mL, respectively. These molecules were used as a starting point for hit-to-lead optimization. The most promising class proved to be the quinoxaline di-N-oxides, evidenced by transcriptional profiling to induce mRNA level perturbations most closely resembling known protonophores. One of these, SRI58 exhibited an MIC = 1.25 μg/mL versus Mtb and a CC50 in Vero cells of >40 μg/mL, while featuring fair Caco-2 A-B permeability (2.3 x 10-6 cm/s, kinetic solubility (125 μM at pH 7.4 in PBS and mouse metabolic stability (63.6% remaining after 1 h incubation with mouse liver microsomes. Despite demonstration of how a combined bioinformatics/cheminformatics approach afforded a small molecule with promising in vitro profiles, we found that SRI58 did not exhibit quantifiable blood levels in mice.

  2. Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach

    Science.gov (United States)

    Song, Min

    2016-01-01

    In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications. PMID:27195695

  3. Target-based vs. phenotypic screenings in Leishmania drug discovery: A marriage of convenience or a dialogue of the deaf?

    Science.gov (United States)

    Reguera, Rosa M.; Calvo-Álvarez, Estefanía; Álvarez-Velilla, Raquel; Balaña-Fouce, Rafael

    2014-01-01

    Drug discovery programs sponsored by public or private initiatives pursue the same ambitious goal: a crushing defeat of major Neglected Tropical Diseases (NTDs) during this decade. Both target-based and target-free screenings have pros and cons when it comes to finding potential small-molecule leads among chemical libraries consisting of myriads of compounds. Within the target-based strategy, crystals of pathogen recombinant-proteins are being used to obtain three-dimensional (3D) structures in silico for the discovery of structure-based inhibitors. On the other hand, genetically modified parasites expressing easily detectable reporters are in the pipeline of target-free (phenotypic) screenings. Furthermore, lead compounds can be scaled up to in vivo preclinical trials using rodent models of infection monitoring parasite loads by means of cutting-edge bioimaging devices. As such, those preferred are fluorescent and bioluminescent readouts due to their reproducibility and rapidity, which reduces the number of animals used in the trials and allows for an earlier stage detection of the infective process as compared with classical methods. In this review, we focus on the current differences between target-based and phenotypic screenings in Leishmania, as an approach that leads to the discovery of new potential drugs against leishmaniasis. PMID:25516847

  4. A two-genome microarray for the rice pathogens Xanthomonas oryzae pv. oryzae and X. oryzae pv. oryzicola and its use in the discovery of a difference in their regulation of hrp genes

    Directory of Open Access Journals (Sweden)

    Lin Ye

    2008-06-01

    Full Text Available Abstract Background Xanthomonas oryzae pv. oryzae (Xoo and X. oryzae pv. oryzicola (Xoc are bacterial pathogens of the worldwide staple and grass model, rice. Xoo and Xoc are closely related but Xoo invades rice vascular tissue to cause bacterial leaf blight, a serious disease of rice in many parts of the world, and Xoc colonizes the mesophyll parenchyma to cause bacterial leaf streak, a disease of emerging importance. Both pathogens depend on hrp genes for type III secretion to infect their host. We constructed a 50–70 mer oligonucleotide microarray based on available genome data for Xoo and Xoc and compared gene expression in Xoo strains PXO99A and Xoc strain BLS256 grown in the rich medium PSB vs. XOM2, a minimal medium previously reported to induce hrp genes in Xoo strain T7174. Results Three biological replicates of the microarray experiment to compare global gene expression in representative strains of Xoo and Xoc grown in PSB vs. XOM2 were carried out. The non-specific error rate and the correlation coefficients across biological replicates and among duplicate spots revealed that the microarray data were robust. 247 genes of Xoo and 39 genes of Xoc were differentially expressed in the two media with a false discovery rate of 5% and with a minimum fold-change of 1.75. Semi-quantitative-RT-PCR assays confirmed differential expression of each of 16 genes each for Xoo and Xoc selected for validation. The differentially expressed genes represent 17 functional categories. Conclusion We describe here the construction and validation of a two-genome microarray for the two pathovars of X. oryzae. Microarray analysis revealed that using representative strains, a greater number of Xoo genes than Xoc genes are differentially expressed in XOM2 relative to PSB, and that these include hrp genes and other genes important in interactions with rice. An exception was the rax genes, which are required for production of the host resistance elicitor AvrXa21

  5. Genome-Based Studies of Marine Microorganisms to Maximize the Diversity of Natural Products Discovery for Medical Treatments

    Directory of Open Access Journals (Sweden)

    Xin-Qing Zhao

    2011-01-01

    Full Text Available Marine microorganisms are rich source for natural products which play important roles in pharmaceutical industry. Over the past decade, genome-based studies of marine microorganisms have unveiled the tremendous diversity of the producers of natural products and also contributed to the efficiency of harness the strain diversity and chemical diversity, as well as the genetic diversity of marine microorganisms for the rapid discovery and generation of new natural products. In the meantime, genomic information retrieved from marine symbiotic microorganisms can also be employed for the discovery of new medical molecules from yet-unculturable microorganisms. In this paper, the recent progress in the genomic research of marine microorganisms is reviewed; new tools of genome mining as well as the advance in the activation of orphan pathways and metagenomic studies are summarized. Genome-based research of marine microorganisms will maximize the biodiscovery process and solve the problems of supply and sustainability of drug molecules for medical treatments.

  6. Evidence based selection of housekeeping genes.

    Directory of Open Access Journals (Sweden)

    Hendrik J M de Jonge

    Full Text Available For accurate and reliable gene expression analysis, normalization of gene expression data against housekeeping genes (reference or internal control genes is required. It is known that commonly used housekeeping genes (e.g. ACTB, GAPDH, HPRT1, and B2M vary considerably under different experimental conditions and therefore their use for normalization is limited. We performed a meta-analysis of 13,629 human gene array samples in order to identify the most stable expressed genes. Here we show novel candidate housekeeping genes (e.g. RPS13, RPL27, RPS20 and OAZ1 with enhanced stability among a multitude of different cell types and varying experimental conditions. None of the commonly used housekeeping genes were present in the top 50 of the most stable expressed genes. In addition, using 2,543 diverse mouse gene array samples we were able to confirm the enhanced stability of the candidate novel housekeeping genes in another mammalian species. Therefore, the identified novel candidate housekeeping genes seem to be the most appropriate choice for normalizing gene expression data.

  7. Recent progress in polymer-based gene delivery vectors

    Institute of Scientific and Technical Information of China (English)

    HUANG Shiwen; ZHUO Renxi

    2003-01-01

    The gene delivery system is one of the three components of a gene medicine, which is the bottle neck of current gene therapy. Nonviral vectors offer advantages over the viral system of safety, ease of manufacturing, etc. As important nonviral vectors, polymer gene delivery systems have gained increasing attention and have begun to show increasing promising. In this review, the fundamental and recent progress of polymer-based gene delivery vectors is reviewed.

  8. Computational drug discovery

    Institute of Scientific and Technical Information of China (English)

    Si-sheng OU-YANG; Jun-yan LU; Xiang-qian KONG; Zhong-jie LIANG; Cheng LUO; Hualiang JIANG

    2012-01-01

    Computational drug discovery is an effective strategy for accelerating and economizing drug discovery and development process.Because of the dramatic increase in the availability of biological macromolecule and small molecule information,the applicability of computational drug discovery has been extended and broadly applied to nearly every stage in the drug discovery and development workflow,including target identification and validation,lead discovery and optimization and preclinical tests.Over the past decades,computational drug discovery methods such as molecular docking,pharmacophore modeling and mapping,de novo design,molecular similarity calculation and sequence-based virtual screening have been greatly improved.In this review,we present an overview of these important computational methods,platforms and successful applications in this field.

  9. Milp-hyperbox classification for structure-based drug design in the discovery of small molecule inhibitors of SIRTUIN6

    OpenAIRE

    Tardu, Mehmet; Rahim, Fatih; Kavaklı, İbrahim Halil; Türkay, Metin

    2016-01-01

    Virtual screening of chemical libraries following experimental assays of drug candidates is a common procedure in structure-based drug discovery. However, virtual screening of chemical libraries with millions of compounds requires a lot of time for computing and data analysis. A priori classification of compounds in the libraries as low-and high-binding free energy sets decreases the number of compounds for virtual screening experiments. This classification also reduces the required computati...

  10. The Analysis of Multiple Genome Comparisons in Genus Escherichia and Its Application to the Discovery of Uncharacterised Metabolic Genes in Uropathogenic Escherichia coli CFT073

    Directory of Open Access Journals (Sweden)

    William A. Bryant

    2009-01-01

    Full Text Available A survey of a complete gene synteny comparison has been carried out between twenty fully sequenced strains from the genus Escherichia with the aim of finding yet uncharacterised genes implicated in the metabolism of uropathogenic strains of E. coli (UPEC. Several sets of adjacent colinear genes have been identified which are present in all four UPEC included in this study (CFT073, F11, UTI89, and 536, annotated with putative metabolic functions, but are not found in any other strains considered. An operon closely homologous to that encoding the L-sorbose degradation pathway in Klebsiella pneumoniae has been identified in E. coli CFT073; this operon is present in all of the UPEC considered, but only in 7 of the other 16 strains. The operon's function has been confirmed by cloning the genes into E. coli DH5α and testing for growth on L-sorbose. The functional genomic approach combining in silico and in vitro work presented here can be used as a basis for the discovery of other uncharacterised genes contributing to bacterial survival in specific environments.

  11. Ontology-Based Context-Aware Service Discovery for Pervasive Environments

    NARCIS (Netherlands)

    Pawar, P.; Tokmakoff, A.

    2006-01-01

    Existing service discovery protocols use a service matching process in order to offer services of interest to the clients. Potentially, the context information of the services and client can be used to improve the quality of service matching. To make use of context information in service matching, s

  12. Towards a goal-based service framework for dynamic service discovery and composition

    NARCIS (Netherlands)

    Bonino da Silva Santos, Luiz Olavo; Silva, Eduardo Goncalves; Ferreira Pires, Luis; Sinderen, van Marten

    2009-01-01

    Service-Oriented Computing allows new applications to be developed by using and/or combining services offered by different providers. Service discovery and composition are performed aiming to comply with the client’s request in terms of functionality and expected outcome. In this paper we present a

  13. Drugs, structures, fragments : substructure-based approaches to GPCR drug discovery and design

    NARCIS (Netherlands)

    Horst, Eelke van der

    2012-01-01

    This thesis is all about cheminformatics, and its impact on drug discovery. A number of strategies are discussed that apply computational methods for the analysis and design of G protein-coupled receptor (GPCR) ligands. Frequent substructure mining is applied to find the common structural motifs tha

  14. Endophytes : exploiting biodiversity for the improvement of natural product-based drug discovery

    NARCIS (Netherlands)

    Staniek, Agata; Woerdenbag, Herman J.; Kayser, Oliver

    2008-01-01

    Endophytes, microorganisms that colonize internal tissues of all plant species, create a huge biodiversity with yet unknown novel natural products, presumed to push forward the frontiers of drug discovery. Next to the clinically acknowledged antineoplastic agent, paclitaxel, endophyte research has y

  15. Optimizing Neighbor Discovery for Ad hoc Networks based on the Bluetooth PAN Profile

    DEFF Research Database (Denmark)

    Kuijpers, Gerben; Nielsen, Thomas Toftegaard; Prasad, Ramjee

    2002-01-01

    . This paper introduces a neighbor discovery mechanism that utilizes the resources in the Bluetooth PAN profile more efficient. The performance of the new mechanism is investigated using a IPv6 network simulator and compared with emulated broadcasting. It is shown that the signaling overhead can...

  16. New insight into genes in association with asthma: literature-based mining and network centrality analysis

    Institute of Scientific and Technical Information of China (English)

    LIANG Rui; WANG Lei; WANG Gang

    2013-01-01

    Background Asthma is a heterogeneous disease for which a strong genetic basis has been firmly established.Until now no studies have been undertaken to systemically explore the network of asthma-related genes using an internally developed literature-based discovery approach.This study was to explore asthma-related genes by using literaturebased mining and network centrality analysis.Methods Literature involving asthma-related genes were searched in PubMed from 2001 to 2011.Integration of natural language processing with network centrality analysis was used to identify asthma susceptibility genes and their interaction network.Asthma susceptibility genes were classified into three functional groups by gene ontology (GO) analysis and the key genes were confirmed by establishing asthma-related networks and pathways.Results Three hundred and twenty-six genes related with asthma such as IGHE (IgE),interleukin (IL)-4,5,6,10,13,17A,and tumor necrosis factor (TNF)-alpha were identified.GO analysis indicated some biological processes (developmental processes,signal transduction,death,etc.),cellular components (non-structural extracellular,plasma membrane and extracellular matrix),and molecular functions (signal transduction activity) that were involved in asthma.Furthermore,22 asthma-related pathways such as the Toll-like receptor signaling pathway,hematopoietic cell lineage,JAK-STAT signaling pathway,chemokine signaling pathway,and cytokine-cytokine receptor interaction,and 17 hub genes,such as JAK3,CCR1-3,CCR5-7,CCR8,were found.Conclusions Our study provides a remarkably detailed and comprehensive picture of asthma susceptibility genes and their interacting network.Further identification of these genes and molecular pathways may play a prominent role in establishing rational therapeutic approaches for asthma.

  17. Analysis of tumor suppressor genes based on gene ontology and the KEGG pathway.

    Directory of Open Access Journals (Sweden)

    Jing Yang

    Full Text Available Cancer is a serious disease that causes many deaths every year. We urgently need to design effective treatments to cure this disease. Tumor suppressor genes (TSGs are a type of gene that can protect cells from becoming cancerous. In view of this, correct identification of TSGs is an alternative method for identifying effective cancer therapies. In this study, we performed gene ontology (GO and pathway enrichment analysis of the TSGs and non-TSGs. Some popular feature selection methods, including minimum redundancy maximum relevance (mRMR and incremental feature selection (IFS, were employed to analyze the enrichment features. Accordingly, some GO terms and KEGG pathways, such as biological adhesion, cell cycle control, genomic stability maintenance and cell death regulation, were extracted, which are important factors for identifying TSGs. We hope these findings can help in building effective prediction methods for identifying TSGs and thereby, promoting the discovery of effective cancer treatments.

  18. Recent developments in StemBase: a tool to study gene expression in human and murine stem cells

    OpenAIRE

    Krzyzanowski Paul M; Porter Christopher J; Huska Matthew R; Palidwor Gareth A; Sandie Reatha; Muro Enrique M; Perez-Iratxeta Carolina; Andrade-Navarro Miguel A

    2009-01-01

    Abstract Background Currently one of the largest online repositories for human and mouse stem cell gene expression data, StemBase was first designed as a simple web-interface to DNA microarray data generated by the Canadian Stem Cell Network to facilitate the discovery of gene functions relevant to stem cell control and differentiation. Findings Since its creation, StemBase has grown in both size and scope into a system with analysis tools that examine either the whole database at once, or sl...

  19. Use of arbitrary DNA primers, polyacrylamide gel electrophoresis and silver staining for identity testing, gene discovery and analysis of gene expression

    International Nuclear Information System (INIS)

    To understand chemically-induced genomic differences in soybean mutants differing in their ability to enter the nitrogen-fixing symbiosis involving Bradyrhizobium japonicum, molecular techniques were developed to aid the map-based, or positional, cloning. DNA marker technology involving single arbitrary primers was used to enrich regional RFLP linkage data. Molecular techniques, including two-dimensional pulse field gel electrophoresis, were developed to ascertain the first physical mapping in soybean, leading to the conclusion that in the region of marker pA-36 on linkage group H, 1 cM equals about 500 cM. High molecular weight DNA was isolated and cloned into yeast or bacterial artificial chromosomes (YACs/ BACs). YACs were used to analyze soybean genome structure, revealing that over half of the genome contains repetitive DNA. Genetic and molecular tools are now available to facilitate the isolation of plant genes directly involved in symbiosis. The further characterization of these genes, along with the determination of the mechanisms that lead to the mutation, will be of value to other plants and induced mutation research. (author)

  20. Rapid countermeasure discovery against Francisella tularensis based on a metabolic network reconstruction.

    Directory of Open Access Journals (Sweden)

    Sidhartha Chaudhury

    Full Text Available In the future, we may be faced with the need to provide treatment for an emergent biological threat against which existing vaccines and drugs have limited efficacy or availability. To prepare for this eventuality, our objective was to use a metabolic network-based approach to rapidly identify potential drug targets and prospectively screen and validate novel small-molecule antimicrobials. Our target organism was the fully virulent Francisella tularensis subspecies tularensis Schu S4 strain, a highly infectious intracellular pathogen that is the causative agent of tularemia and is classified as a category A biological agent by the Centers for Disease Control and Prevention. We proceeded with a staggered computational and experimental workflow that used a strain-specific metabolic network model, homology modeling and X-ray crystallography of protein targets, and ligand- and structure-based drug design. Selected compounds were subsequently filtered based on physiological-based pharmacokinetic modeling, and we selected a final set of 40 compounds for experimental validation of antimicrobial activity. We began screening these compounds in whole bacterial cell-based assays in biosafety level 3 facilities in the 20th week of the study and completed the screens within 12 weeks. Six compounds showed significant growth inhibition of F. tularensis, and we determined their respective minimum inhibitory concentrations and mammalian cell cytotoxicities. The most promising compound had a low molecular weight, was non-toxic, and abolished bacterial growth at 13 µM, with putative activity against pantetheine-phosphate adenylyltransferase, an enzyme involved in the biosynthesis of coenzyme A, encoded by gene coaD. The novel antimicrobial compounds identified in this study serve as starting points for lead optimization, animal testing, and drug development against tularemia. Our integrated in silico/in vitro approach had an overall 15% success rate in terms of

  1. In-depth cDNA Library Sequencing Provides Quantitative Gene Expression Profiling in Cancer Biomarker Discovery

    Institute of Scientific and Technical Information of China (English)

    Wanling Yang; Dingge Ying; Yu-Lung Lau

    2009-01-01

    procedures may allow detection of many expres-sion features for less abundant gene variants. With the reduction of sequencing cost and the emerging of new generation sequencing technology, in-depth sequencing of cDNA pools or libraries may represent a better and powerful tool in gene expression profiling and cancer biomarker detection. We also propose using sequence-specific subtraction to remove hundreds of the most abundant housekeeping genes to in-crease sequencing depth without affecting relative expression ratio of other genes, as transcripts from as few as 300 most abundantly expressed genes constitute about 20% of the total transcriptome. In-depth sequencing also represents a unique ad-vantage of detecting unknown forms of transcripts, such as alternative splicing variants, fusion genes, and regulatory RNAs, as well as detecting mutations and polymorphisms that may play important roles in disease pathogenesis.

  2. Discovery and evaluation of candidate sex-determining genes and xenobiotics in the gonads of lake sturgeon (Acipenser fulvescens).

    Science.gov (United States)

    Hale, Matthew C; Jackson, James R; Dewoody, J Andrew

    2010-07-01

    Modern pyrosequencing has the potential to uncover many interesting aspects of genome evolution, even in lineages where genomic resources are scarce. In particular, 454 pyrosequencing of nonmodel species has been used to characterize expressed sequence tags, xenobiotics, gene ontologies, and relative levels of gene expression. Herein, we use pyrosequencing to study the evolution of genes expressed in the gonads of a polyploid fish, the lake sturgeon (Acipenser fulvescens). Using 454 pyrosequencing of transcribed genes, we produced more than 125 MB of sequence data from 473,577 high-quality sequencing reads. Sequences that passed stringent quality control thresholds were assembled into 12,791 male contigs and 32,629 female contigs. Average depth of coverage was 4.2 x for the male assembly and 5.5x for the female assembly. Analytical rarefaction indicates that our assemblies include most of the genes expressed in lake sturgeon gonads. Over 86,700 sequencing reads were assigned gene ontologies, many to general housekeeping genes like protein, RNA, and ion binding genes. We searched specifically for sex determining genes and documented significant sex differences in the expression of two genes involved in animal sex determination, DMRT1 and TRA-1. DMRT1 is the master sex determining gene in birds and in medaka (Oryzias latipes) whereas TRA-1 helps direct sexual differentiation in nematodes. We also searched the lake sturgeon assembly for evidence of xenobiotic organisms that may exist as endosymbionts. Our results suggest that exogenous parasites (trematodes) and pathogens (protozoans) apparently have infected lake sturgeon gonads, and the trematodes have horizontally transferred some genes to the lake sturgeon genome.

  3. Cyclodextrin-based gene delivery systems

    OpenAIRE

    Ortiz-Mellet, Carmen; García Fernández, José M.; Benito, Juan M.

    2011-01-01

    Cyclodextrin (CD) history has been largely dominated by their unique ability to form inclusion complexes with guests fitting in their hydrophobic cavity. Chemical funcionalization was soon recognized as a powerful mean for improving CD applications in a wide range of fields, including drug delivery, sensing or enzyme mimicking. However, 100 years after their discovery, CDs are still perceived as novel nanoobjects of undeveloped potential. This critical review provides an overview of different...

  4. Gene Sequence Based Clustering Assists in Dereplication of Pseudoalteromonas luteoviolacea Strains with Identical Inhibitory Activity and Antibiotic Production

    DEFF Research Database (Denmark)

    Vynne, Nikolaj Grønnegaard; Månsson, Maria; Gram, Lone

    2012-01-01

    Some microbial species are chemically homogenous, and the same secondary metabolites are found in all strains. In contrast, we previously found that five strains of P. luteoviolacea were closely related by 16S rRNA gene sequence but produced two different antibiotic profiles. The purpose...... antibacterial profiles based on inhibition assays against Vibrio anguillarum and Staphylococcus aureus. To determine whether chemotype and inhibition profile are reflected by phylogenetic clustering we sequenced 16S rRNA, gyrB and recA genes. Clustering based on 16S rRNA gene sequences alone showed little...... correlation to chemotypes and inhibition profiles, while clustering based on concatenated 16S rRNA, gyrB, and recA gene sequences resulted in three clusters, two of which uniformly consisted of strains of identical chemotype and inhibition profile. A major time sink in natural products discovery is the effort...

  5. A unified view of Automata-based algorithms for Frequent Episode Discovery

    CERN Document Server

    Achar, Avinash; Sastry, P S

    2010-01-01

    Frequent Episode Discovery framework is a popular framework in Temporal Data Mining with many applications. Over the years many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper we present a unified view of all such frequency counting algorithms. We present a generic algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies and we present quantitative relationships among different frequencies. Our unified view also helps in obtaining correctness proofs for various algorithms as we show here. We also point out how this unified view helps us to consider generalization of the algorithm so that they can discover episodes with general partial orders.

  6. Structuring osteosarcoma knowledge: an osteosarcoma-gene association database based on literature mining and manual annotation.

    Science.gov (United States)

    Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard

    2014-01-01

    Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific

  7. Implementation of a deidentified federated data network for population-based cohort discovery

    OpenAIRE

    Anderson, Nicholas; Abend, Aaron; Mandel, Aaron; Geraghty, Estella; Gabriel, Davera; Wynden, Rob; Kamerick, Michael; Anderson, Kent; Rainwater, Julie; Tarczy-Hornoch, Peter

    2011-01-01

    Objective The Cross-Institutional Clinical Translational Research project explored a federated query tool and looked at how this tool can facilitate clinical trial cohort discovery by managing access to aggregate patient data located within unaffiliated academic medical centers. Methods The project adapted software from the Informatics for Integrating Biology and the Bedside (i2b2) program to connect three Clinical Translational Research Award sites: University of Washington, Seattle, Univers...

  8. NMR in drug discovery. From screening to structure-based design of antitumoral agents

    OpenAIRE

    Rodríguez Mías, Ricard Aleix

    2006-01-01

    [eng] Nuclear Magnetic Resonance has experienced an increasing interest in the drug discovery field that has led to its wide use on nearly every stage of drug development. For this reason, during the present thesis we propose to use some of the tools offered by NMR to target various systems related with cancer. Initially we intended to get acquainted with the NMR most outstanding methodologies for the detection and characterization of binding events; and for this goal various proteins involve...

  9. Discovery of Novel Human Epidermal Growth Factor Receptor-2 Inhibitors by Structure-based Virtual Screening

    OpenAIRE

    Shi, Zheng; Yu, Tian; Sun, Rong; Wang, Shan; Chen, Xiao-Qian; Cheng, Li-jia; Liu, Rong

    2016-01-01

    Background: Human epidermal growth factor receptor-2 (HER2) is a trans-membrane receptor like protein, and aberrant signaling of HER2 is implicated in many human cancers, such as ovarian cancer, gastric cancer, and prostate cancer, most notably breast cancer. Moreover, it has been in the spotlight in the recent years as a promising new target for therapy of breast cancer. Objective: Since virtual screening has become an integral part of the drug discovery process, it is of great significant t...

  10. Drugs, structures, fragments: substructure-based approaches to GPCR drug discovery and design

    OpenAIRE

    Horst, Eelke van der

    2012-01-01

    This thesis is all about cheminformatics, and its impact on drug discovery. A number of strategies are discussed that apply computational methods for the analysis and design of G protein-coupled receptor (GPCR) ligands. Frequent substructure mining is applied to find the common structural motifs that are discriminative for predefined classes of GPCR ligands. In addtion, this approach is extended to cluster GPCRs to suggest a new classification for this receptor superfamily. Furthermore, subst...

  11. Statistical design for biospecimen cohort size in proteomics-based biomarker discovery and verification studies.

    Science.gov (United States)

    Skates, Steven J; Gillette, Michael A; LaBaer, Joshua; Carr, Steven A; Anderson, Leigh; Liebler, Daniel C; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R; Rodriguez, Henry; Boja, Emily S

    2013-12-01

    Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor, and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung, and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC) with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step toward building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research.

  12. Coupled transcriptome and proteome analysis of human lymphotropic tumor viruses: insights on the detection and discovery of viral genes

    Directory of Open Access Journals (Sweden)

    Dresang Lindsay R

    2011-12-01

    Full Text Available Abstract Background Kaposi's sarcoma-associated herpesvirus (KSHV and Epstein-Barr virus (EBV are related human tumor viruses that cause primary effusion lymphomas (PEL and Burkitt's lymphomas (BL, respectively. Viral genes expressed in naturally-infected cancer cells contribute to disease pathogenesis; knowing which viral genes are expressed is critical in understanding how these viruses cause cancer. To evaluate the expression of viral genes, we used high-resolution separation and mass spectrometry coupled with custom tiling arrays to align the viral proteomes and transcriptomes of three PEL and two BL cell lines under latent and lytic culture conditions. Results The majority of viral genes were efficiently detected at the transcript and/or protein level on manipulating the viral life cycle. Overall the correlation of expressed viral proteins and transcripts was highly complementary in both validating and providing orthogonal data with latent/lytic viral gene expression. Our approach also identified novel viral genes in both KSHV and EBV, and extends viral genome annotation. Several previously uncharacterized genes were validated at both transcript and protein levels. Conclusions This systems biology approach coupling proteome and transcriptome measurements provides a comprehensive view of viral gene expression that could not have been attained using each methodology independently. Detection of viral proteins in combination with viral transcripts is a potentially powerful method for establishing virus-disease relationships.

  13. Cultivation of hard-to-culture subsurface mercury-resistant bacteria and discovery of new merA gene sequences

    DEFF Research Database (Denmark)

    Rasmussen, L D; Zawadsky, C; Binnerup, S J;

    2008-01-01

    was increased up to 2,800 times and numbers of mCFU were similar to the total number of mercury-resistant bacteria in the soils. Denaturing gradient gel electrophoresis analysis of DNA extracted from membranes suggested stimulation of growth of hard-to-culture bacteria during the preincubation. A total of 25...... sequencing of merA of selected isolates led to the discovery of new merA sequences. With phylum-specific merA primers, PCR products were obtained for Alpha- and Betaproteobacteria and Actinobacteria but not for Bacteroidetes and Firmicutes. The similarity to known sequences ranged between 89 and 95%. One...

  14. Combining SNP discovery from next-generation sequencing data with bulked segregant analysis (BSA to fine-map genes in polyploid wheat

    Directory of Open Access Journals (Sweden)

    Trick Martin

    2012-01-01

    Full Text Available Abstract Background Next generation sequencing (NGS technologies are providing new ways to accelerate fine-mapping and gene isolation in many species. To date, the majority of these efforts have focused on diploid organisms with readily available whole genome sequence information. In this study, as a proof of concept, we tested the use of NGS for SNP discovery in tetraploid wheat lines differing for the previously cloned grain protein content (GPC gene GPC-B1. Bulked segregant analysis (BSA was used to define a subset of putative SNPs within the candidate gene region, which were then used to fine-map GPC-B1. Results We used Illumina paired end technology to sequence mRNA (RNAseq from near isogenic lines differing across a ~30-cM interval including the GPC-B1 locus. After discriminating for SNPs between the two homoeologous wheat genomes and additional quality filtering, we identified inter-varietal SNPs in wheat unigenes between the parental lines. The relative frequency of these SNPs was examined by RNAseq in two bulked samples made up of homozygous recombinant lines differing for their GPC phenotype. SNPs that were enriched at least 3-fold in the corresponding pool (6.5% of all SNPs were further evaluated. Marker assays were designed for a subset of the enriched SNPs and mapped using DNA from individuals of each bulk. Thirty nine new SNP markers, corresponding to 67% of the validated SNPs, mapped across a 12.2-cM interval including GPC-B1. This translated to 1 SNP marker per 0.31 cM defining the GPC-B1 gene to within 13-18 genes in syntenic cereal genomes and to a 0.4 cM interval in wheat. Conclusions This study exemplifies the use of RNAseq for SNP discovery in polyploid species and supports the use of BSA as an effective way to target SNPs to specific genetic intervals to fine-map genes in unsequenced genomes.

  15. HANDS: a tool for genome-wide discovery of subgenome-specific base-identity in polyploids.

    KAUST Repository

    Mithani, Aziz

    2013-09-24

    The analysis of polyploid genomes is problematic because homeologous subgenome sequences are closely related. This relatedness makes it difficult to assign individual sequences to the specific subgenome from which they are derived, and hinders the development of polyploid whole genome assemblies.We here present a next-generation sequencing (NGS)-based approach for assignment of subgenome-specific base-identity at sites containing homeolog-specific polymorphisms (HSPs): \\'HSP base Assignment using NGS data through Diploid Similarity\\' (HANDS). We show that HANDS correctly predicts subgenome-specific base-identity at >90% of assayed HSPs in the hexaploid bread wheat (Triticum aestivum) transcriptome, thus providing a substantial increase in accuracy versus previous methods for homeolog-specific base assignment.We conclude that HANDS enables rapid and accurate genome-wide discovery of homeolog-specific base-identity, a capability having multiple applications in polyploid genomics.

  16. MICROBLOG-BASED THEME DISCOVERY%基于微博的主题社区发现

    Institute of Scientific and Technical Information of China (English)

    何翔; 顾春华; 丁军

    2013-01-01

    为了满足微博营销寻找投放目标的需求,提出结合面向内容及连接关系分析的微博主题社区发现方法.创造性地加入了领袖发现、文本分类以及最大流社区发现的链接分析技术,同时采用多种剪枝策略,设计出一个高效准确的微博主题爬虫.实验经过真实数据的采集,并且从不同的维度对结果数据进行了实验分析.%In order to meet the demand of microblog marketing in hunting the delivery target,we propose a discovery method of microblogging theme community which combines the content-oriented and linking relationship-based analytical methods.In the paper,the link analysis technologies of authority discovery,text classification and max-flow community discovery are creatively added,multiple pruning strategies are employed simultaneously as well,we design a quite effective and precise microblogging theme crawler.Our experiments are passed with the collection of real data,and the result data are made experimental analysis from different dimensions.

  17. Inhibition of Shikimate Kinase and Type II Dehydroquinase for Antibiotic Discovery: Structure-Based Design and Simulation Studies.

    Science.gov (United States)

    Gonzalez-Bello, Concepcion

    2016-01-01

    The loss of effectiveness of current antibiotics caused by the development of drug resistance has become a severe threat to public health. Current widely used antibiotics are surprisingly targeted at a few bacterial functions - cell wall, DNA, RNA, and protein biosynthesis - and resistance to them is widespread and well identified. There is therefore great interest in the discovery of novel drugs and therapies to tackle antimicrobial resistance, in particular drugs that target other essential processes for bacterial survival. In the past few years a great deal of effort has been focused on the discovery of new inhibitors of the enzymes involved in the biosynthesis of aromatic amino acids, also known as the shikimic acid pathway, in which chorismic acid is synthesized. The latter compound is the synthetic precursor of L-Phe, L-Tyr, L-Phe, and other important aromatic metabolites. These enzymes are recognized as attractive targets for the development of new antibacterial agents because they are essential in important pathogenic bacteria, such as Mycobacterium tuberculosis and Helicobacter pylori, but do not have any counterpart in human cells. This review is focused on two key enzymes of this pathway, shikimate kinase and type II dehydroquinase. An overview of the use of structure-based design and computational studies for the discovery of selective inhibitors of these enzymes will be provided. A detailed view of the structural changes caused by these inhibitors in the catalytic arrangement of these enzymes, which are responsible for the inhibition of their activity, is described. PMID:26303426

  18. Novel definition files for human GeneChips based on GeneAnnot

    Directory of Open Access Journals (Sweden)

    Ferrari Sergio

    2007-11-01

    Full Text Available Abstract Background Improvements in genome sequence annotation revealed discrepancies in the original probeset/gene assignment in Affymetrix microarray and the existence of differences between annotations and effective alignments of probes and transcription products. In the current generation of Affymetrix human GeneChips, most probesets include probes matching transcripts from more than one gene and probes which do not match any transcribed sequence. Results We developed a novel set of custom Chip Definition Files (CDF and the corresponding Bioconductor libraries for Affymetrix human GeneChips, based on the information contained in the GeneAnnot database. GeneAnnot-based CDFs are composed of unique custom-probesets, including only probes matching a single gene. Conclusion GeneAnnot-based custom CDFs solve the problem of a reliable reconstruction of expression levels and eliminate the existence of more than one probeset per gene, which often leads to discordant expression signals for the same transcript when gene differential expression is the focus of the analysis. GeneAnnot CDFs are freely distributed and fully compliant with Affymetrix standards and all available software for gene expression analysis. The CDF libraries are available from http://www.xlab.unimo.it/GA_CDF, along with supplementary information (CDF libraries, installation guidelines and R code, CDF statistics, and analysis results.

  19. Discovery of genes related to witches broom disease in Paulownia tomentosa × Paulownia fortunei by a De Novo assembled transcriptome.

    Directory of Open Access Journals (Sweden)

    Rongning Liu

    Full Text Available In spite of its economic importance, very little molecular genetics and genomic research has been targeted at the family Paulownia spp. The little genetic information on this plant is a big obstacle to studying the mechanisms of its ability to resist Paulownia Witches' Broom (PaWB disease. Analysis of the Paulownia transcriptome and its expression profile data are essential to extending the genetic resources on this species, thus will greatly improves our studies on Paulownia. In the current study, we performed the de novo assembly of a transcriptome on P. tomentosa × P. fortunei using the short-read sequencing technology (Illumina. 203,664 unigenes with a mean length of 1,328 bp was obtained. Of these unigenes, 32,976 (30% of all unigenes containing complete structures were chosen. Eukaryotic clusters of orthologous groups, gene orthology, and the Kyoto Encyclopedia of Genes and Genomes annotations were performed of these unigenes. Genes related to PaWB disease resistance were analyzed in detail. To our knowledge, this is the first study to elucidate the genetic makeup of Paulownia. This transcriptome provides a quick way to understanding Paulownia, increases the number of gene sequences available for further functional genomics studies and provides clues to the identification of potential PaWB disease resistance genes. This study has provided a comprehensive insight into gene expression profiles at different states, which facilitates the study of each gene's roles in the developmental process and in PaWB disease resistance.

  20. Discovery of genes related to witches broom disease in Paulownia tomentosa × Paulownia fortunei by a De Novo assembled transcriptome.

    Science.gov (United States)

    Liu, Rongning; Dong, Yanpeng; Fan, Guoqiang; Zhao, Zhenli; Deng, Minjie; Cao, Xibing; Niu, Suyan

    2013-01-01

    In spite of its economic importance, very little molecular genetics and genomic research has been targeted at the family Paulownia spp. The little genetic information on this plant is a big obstacle to studying the mechanisms of its ability to resist Paulownia Witches' Broom (PaWB) disease. Analysis of the Paulownia transcriptome and its expression profile data are essential to extending the genetic resources on this species, thus will greatly improves our studies on Paulownia. In the current study, we performed the de novo assembly of a transcriptome on P. tomentosa × P. fortunei using the short-read sequencing technology (Illumina). 203,664 unigenes with a mean length of 1,328 bp was obtained. Of these unigenes, 32,976 (30% of all unigenes) containing complete structures were chosen. Eukaryotic clusters of orthologous groups, gene orthology, and the Kyoto Encyclopedia of Genes and Genomes annotations were performed of these unigenes. Genes related to PaWB disease resistance were analyzed in detail. To our knowledge, this is the first study to elucidate the genetic makeup of Paulownia. This transcriptome provides a quick way to understanding Paulownia, increases the number of gene sequences available for further functional genomics studies and provides clues to the identification of potential PaWB disease resistance genes. This study has provided a comprehensive insight into gene expression profiles at different states, which facilitates the study of each gene's roles in the developmental process and in PaWB disease resistance. PMID:24278262

  1. HMM-Based Gene Annotation Methods

    Energy Technology Data Exchange (ETDEWEB)

    Haussler, David; Hughey, Richard; Karplus, Keven

    1999-09-20

    Development of new statistical methods and computational tools to identify genes in human genomic DNA, and to provide clues to their functions by identifying features such as transcription factor binding sites, tissue, specific expression and splicing patterns, and remove homologies at the protein level with genes of known function.

  2. Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome.

    Directory of Open Access Journals (Sweden)

    Ju-Chun Hsu

    Full Text Available Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS. The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs. A total of 29,067 isotigs have putative homologues in the non-redundant (nr protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also

  3. De novo assembly, gene annotation, and marker discovery in stored-product pest Liposcelis entomophila (Enderlein using transcriptome sequences.

    Directory of Open Access Journals (Sweden)

    Dan-Dan Wei

    Full Text Available BACKGROUND: As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. METHODOLOGY/PRINCIPAL FINDINGS: We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61% unigenes were matched to known proteins in the NCBI non-redundant (Nr protein database. These unigenes were further functionally annotated with gene ontology (GO, cluster of orthologous groups of proteins (COG, and Kyoto Encyclopedia of Genes and Genomes (KEGG databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST genes, 19 putative carboxyl/cholinesterase (CCE genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. CONCLUSIONS/SIGNIFICANCE: We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying

  4. Sleeping Beauty Transposon Mutagenesis as a Tool for Gene Discovery in the NOD Mouse Model of Type 1 Diabetes.

    Science.gov (United States)

    Elso, Colleen M; Chu, Edward P F; Alsayb, May A; Mackin, Leanne; Ivory, Sean T; Ashton, Michelle P; Bröer, Stefan; Silveira, Pablo A; Brodnicki, Thomas C

    2015-12-01

    A number of different strategies have been used to identify genes for which genetic variation contributes to type 1 diabetes (T1D) pathogenesis. Genetic studies in humans have identified >40 loci that affect the risk for developing T1D, but the underlying causative alleles are often difficult to pinpoint or have subtle biological effects. A complementary strategy to identifying "natural" alleles in the human population is to engineer "artificial" alleles within inbred mouse strains and determine their effect on T1D incidence. We describe the use of the Sleeping Beauty (SB) transposon mutagenesis system in the nonobese diabetic (NOD) mouse strain, which harbors a genetic background predisposed to developing T1D. Mutagenesis in this system is random, but a green fluorescent protein (GFP)-polyA gene trap within the SB transposon enables early detection of mice harboring transposon-disrupted genes. The SB transposon also acts as a molecular tag to, without additional breeding, efficiently identify mutated genes and prioritize mutant mice for further characterization. We show here that the SB transposon is functional in NOD mice and can produce a null allele in a novel candidate gene that increases diabetes incidence. We propose that SB transposon mutagenesis could be used as a complementary strategy to traditional methods to help identify genes that, when disrupted, affect T1D pathogenesis. PMID:26438296

  5. Accelerating Novel Candidate Gene Discovery in Neurogenetic Disorders via Whole-Exome Sequencing of Prescreened Multiplex Consanguineous Families

    Directory of Open Access Journals (Sweden)

    Anas M. Alazami

    2015-01-01

    Full Text Available Our knowledge of disease genes in neurological disorders is incomplete. With the aim of closing this gap, we performed whole-exome sequencing on 143 multiplex consanguineous families in whom known disease genes had been excluded by autozygosity mapping and candidate gene analysis. This prescreening step led to the identification of 69 recessive genes not previously associated with disease, of which 33 are here described (SPDL1, TUBA3E, INO80, NID1, TSEN15, DMBX1, CLHC1, C12orf4, WDR93, ST7, MATN4, SEC24D, PCDHB4, PTPN23, TAF6, TBCK, FAM177A1, KIAA1109, MTSS1L, XIRP1, KCTD3, CHAF1B, ARV1, ISCA2, PTRH2, GEMIN4, MYOCD, PDPR, DPH1, NUP107, TMEM92, EPB41L4A, and FAM120AOS. We also encountered instances in which the phenotype departed significantly from the established clinical presentation of a known disease gene. Overall, a likely causal mutation was identified in >73% of our cases. This study contributes to the global effort toward a full compendium of disease genes affecting brain function.

  6. Accelerating novel candidate gene discovery in neurogenetic disorders via whole-exome sequencing of prescreened multiplex consanguineous families.

    Science.gov (United States)

    Alazami, Anas M; Patel, Nisha; Shamseldin, Hanan E; Anazi, Shamsa; Al-Dosari, Mohammed S; Alzahrani, Fatema; Hijazi, Hadia; Alshammari, Muneera; Aldahmesh, Mohammed A; Salih, Mustafa A; Faqeih, Eissa; Alhashem, Amal; Bashiri, Fahad A; Al-Owain, Mohammed; Kentab, Amal Y; Sogaty, Sameera; Al Tala, Saeed; Temsah, Mohamad-Hani; Tulbah, Maha; Aljelaify, Rasha F; Alshahwan, Saad A; Seidahmed, Mohammed Zain; Alhadid, Adnan A; Aldhalaan, Hesham; AlQallaf, Fatema; Kurdi, Wesam; Alfadhel, Majid; Babay, Zainab; Alsogheer, Mohammad; Kaya, Namik; Al-Hassnan, Zuhair N; Abdel-Salam, Ghada M H; Al-Sannaa, Nouriya; Al Mutairi, Fuad; El Khashab, Heba Y; Bohlega, Saeed; Jia, Xiaofei; Nguyen, Henry C; Hammami, Rakad; Adly, Nouran; Mohamed, Jawahir Y; Abdulwahab, Firdous; Ibrahim, Niema; Naim, Ewa A; Al-Younes, Banan; Meyer, Brian F; Hashem, Mais; Shaheen, Ranad; Xiong, Yong; Abouelhoda, Mohamed; Aldeeri, Abdulrahman A; Monies, Dorota M; Alkuraya, Fowzan S

    2015-01-13

    Our knowledge of disease genes in neurological disorders is incomplete. With the aim of closing this gap, we performed whole-exome sequencing on 143 multiplex consanguineous families in whom known disease genes had been excluded by autozygosity mapping and candidate gene analysis. This prescreening step led to the identification of 69 recessive genes not previously associated with disease, of which 33 are here described (SPDL1, TUBA3E, INO80, NID1, TSEN15, DMBX1, CLHC1, C12orf4, WDR93, ST7, MATN4, SEC24D, PCDHB4, PTPN23, TAF6, TBCK, FAM177A1, KIAA1109, MTSS1L, XIRP1, KCTD3, CHAF1B, ARV1, ISCA2, PTRH2, GEMIN4, MYOCD, PDPR, DPH1, NUP107, TMEM92, EPB41L4A, and FAM120AOS). We also encountered instances in which the phenotype departed significantly from the established clinical presentation of a known disease gene. Overall, a likely causal mutation was identified in >73% of our cases. This study contributes to the global effort toward a full compendium of disease genes affecting brain function.

  7. A roadmap for natural product discovery based on large-scale genomics and metabolomics

    Science.gov (United States)

    Actinobacteria encode a wealth of natural product biosynthetic gene clusters, whose systematic study is complicated by numerous repetitive motifs. By combining several metrics we developed a method for global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic ca...

  8. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species

    NARCIS (Netherlands)

    Bruijn, de I.; Kock, de M.J.D.; Meng, Y.; Waard, de P.; Beek, van T.A.; Raaijmakers, J.M.

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these pre

  9. Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds

    Directory of Open Access Journals (Sweden)

    Sugantham Priyanka Annabel

    2010-10-01

    Full Text Available Abstract Background Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs. Results A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 106 clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. Conclusions The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding

  10. Combining knowledge discovery from databases (KDD) and case-based reasoning (CBR) to support diagnosis of medical images

    Science.gov (United States)

    Stranieri, Andrew; Yearwood, John; Pham, Binh

    1999-07-01

    The development of data warehouses for the storage and analysis of very large corpora of medical image data represents a significant trend in health care and research. Amongst other benefits, the trend toward warehousing enables the use of techniques for automatically discovering knowledge from large and distributed databases. In this paper, we present an application design for knowledge discovery from databases (KDD) techniques that enhance the performance of the problem solving strategy known as case- based reasoning (CBR) for the diagnosis of radiological images. The problem of diagnosing the abnormality of the cervical spine is used to illustrate the method. The design of a case-based medical image diagnostic support system has three essential characteristics. The first is a case representation that comprises textual descriptions of the image, visual features that are known to be useful for indexing images, and additional visual features to be discovered by data mining many existing images. The second characteristic of the approach presented here involves the development of a case base that comprises an optimal number and distribution of cases. The third characteristic involves the automatic discovery, using KDD techniques, of adaptation knowledge to enhance the performance of the case based reasoner. Together, the three characteristics of our approach can overcome real time efficiency obstacles that otherwise mitigate against the use of CBR to the domain of medical image analysis.

  11. The Discovery of Quinoxaline-Based Metathesis Catalysts from Synthesis of Grazoprevir (MK-5172).

    Science.gov (United States)

    Williams, Michael J; Kong, Jongrock; Chung, Cheol K; Brunskill, Andrew; Campeau, Louis-Charles; McLaughlin, Mark

    2016-05-01

    Olefin metathesis (OM) is a reliable and practical synthetic methodology for challenging carbon-carbon bond formations. While existing catalysts can effect many of these transformations, the synthesis and development of new catalysts is essential to increase the application breadth of OM and to achieve improved catalyst activity. The unexpected initial discovery of a novel olefin metathesis catalyst derived from synthetic efforts toward the HCV therapeutic agent grazoprevir (MK-5172) is described. This initial finding has evolved into a class of tunable, shelf-stable ruthenium OM catalysts that are easily prepared and exhibit unique catalytic activity. PMID:27123552

  12. The Discovery of Quinoxaline-Based Metathesis Catalysts from Synthesis of Grazoprevir (MK-5172).

    Science.gov (United States)

    Williams, Michael J; Kong, Jongrock; Chung, Cheol K; Brunskill, Andrew; Campeau, Louis-Charles; McLaughlin, Mark

    2016-05-01

    Olefin metathesis (OM) is a reliable and practical synthetic methodology for challenging carbon-carbon bond formations. While existing catalysts can effect many of these transformations, the synthesis and development of new catalysts is essential to increase the application breadth of OM and to achieve improved catalyst activity. The unexpected initial discovery of a novel olefin metathesis catalyst derived from synthetic efforts toward the HCV therapeutic agent grazoprevir (MK-5172) is described. This initial finding has evolved into a class of tunable, shelf-stable ruthenium OM catalysts that are easily prepared and exhibit unique catalytic activity.

  13. Identifying disease feature genes based on cellular localized gene functional modules and regulation networks

    Institute of Scientific and Technical Information of China (English)

    ZHANG Min; ZHU Jing; GUO Zheng; LI Xia; YANG Da; WANG Lei; RAO Shaoqi

    2006-01-01

    Identifying disease-relevant genes and functional modules, based on gene expression profiles and gene functional knowledge, is of high importance for studying disease mechanisms and subtyping disease phenotypes. Using gene categories of biological process and cellular component in Gene Ontology, we propose an approach to selecting functional modules enriched with differentially expressed genes, and identifying the feature functional modules of high disease discriminating abilities. Using the differentially expressed genes in each feature module as the feature genes, we reveal the relevance of the modules to the studied diseases. Using three datasets for prostate cancer, gastric cancer, and leukemia, we have demonstrated that the proposed modular approach is of high power in identifying functionally integrated feature gene subsets that are highly relevant to the disease mechanisms. Our analysis has also shown that the critical disease-relevant genes might be better recognized from the gene regulation network, which is constructed using the characterized functional modules, giving important clues to the concerted mechanisms of the modules responding to complex disease states. In addition, the proposed approach to selecting the disease-relevant genes by jointly considering the gene functional knowledge suggests a new way for precisely classifying disease samples with clear biological interpretations, which is critical for the clinical diagnosis and the elucidation of the pathogenic basis of complex diseases.

  14. High-throughput sequence analysis of turbot (Scophthalmus maximus transcriptome using 454-pyrosequencing for the discovery of antiviral immune genes.

    Directory of Open Access Journals (Sweden)

    Patricia Pereiro

    Full Text Available BACKGROUND: Turbot (Scophthalmus maximus L. is an important aquacultural resource both in Europe and Asia. However, there is little information on gene sequences available in public databases. Currently, one of the main problems affecting the culture of this flatfish is mortality due to several pathogens, especially viral diseases which are not treatable. In order to identify new genes involved in immune defense, we conducted 454-pyrosequencing of the turbot transcriptome after different immune stimulations. METHODOLOGY/PRINCIPAL FINDINGS: Turbot were injected with viral stimuli to increase the expression level of immune-related genes. High-throughput deep sequencing using 454-pyrosequencing technology yielded 915,256 high-quality reads. These sequences were assembled into 55,404 contigs that were subjected to annotation steps. Intriguingly, 55.16% of the deduced protein was not significantly similar to any sequences in the databases used for the annotation and only 0.85% of the BLASTx top-hits matched S. maximus protein sequences. This relatively low level of annotation is possibly due to the limited information for this specie and other flatfish in the database. These results suggest the identification of a large number of new genes in turbot and in fish in general. A more detailed analysis showed the presence of putative members of several innate and specific immune pathways. CONCLUSIONS/SIGNIFICANCE: To our knowledge, this study is the first transcriptome analysis using 454-pyrosequencing for turbot. Previously, there were only 12,471 EST and less of 1,500 nucleotide sequences for S. maximus in NCBI database. Our results provide a rich source of data (55,404 contigs and 181,845 singletons for discovering and identifying new genes, which will serve as a basis for microarray construction, gene expression characterization and for identification of genetic markers to be used in several applications. Immune stimulation in turbot was very

  15. In silico network topology-based prediction of gene essentiality

    CERN Document Server

    da Silva, Joao Paulo Muller; Mombach, Jose Carlos Merino; Vieira, Renata; da Silva, Jose Guliherme Camargo; Lemke, Ney; Sinigaglia, Marialva

    2007-01-01

    The identification of genes essential for survival is important for the understanding of the minimal requirements for cellular life and for drug design. As experimental studies with the purpose of building a catalog of essential genes for a given organism are time-consuming and laborious, a computational approach which could predict gene essentiality with high accuracy would be of great value. We present here a novel computational approach, called NTPGE (Network Topology-based Prediction of Gene Essentiality), that relies on network topology features of a gene to estimate its essentiality. The first step of NTPGE is to construct the integrated molecular network for a given organism comprising protein physical, metabolic and transcriptional regulation interactions. The second step consists in training a decision tree-based machine learning algorithm on known essential and non-essential genes of the organism of interest, considering as learning attributes the network topology information for each of these genes...

  16. Perfused drop microfluidic device for brain slice culture-based drug discovery.

    Science.gov (United States)

    Liu, Jing; Pan, Liping; Cheng, Xuanhong; Berdichevsky, Yevgeny

    2016-06-01

    Living slices of brain tissue are widely used to model brain processes in vitro. In addition to basic neurophysiology studies, brain slices are also extensively used for pharmacology, toxicology, and drug discovery research. In these experiments, high parallelism and throughput are critical. Capability to conduct long-term electrical recording experiments may also be necessary to address disease processes that require protein synthesis and neural circuit rewiring. We developed a novel perfused drop microfluidic device for use with long term cultures of brain slices (organotypic cultures). Slices of hippocampus were placed into wells cut in polydimethylsiloxane (PDMS) film. Fluid level in the wells was hydrostatically controlled such that a drop was formed around each slice. The drops were continuously perfused with culture medium through microchannels. We found that viable organotypic hippocampal slice cultures could be maintained for at least 9 days in vitro. PDMS microfluidic network could be readily integrated with substrate-printed microelectrodes for parallel electrical recordings of multiple perfused organotypic cultures on a single MEA chip. We expect that this highly scalable perfused drop microfluidic device will facilitate high-throughput drug discovery and toxicology. PMID:27194028

  17. A DHT-Based Discovery Service for the Internet of Things

    Directory of Open Access Journals (Sweden)

    Federica Paganelli

    2012-01-01

    Full Text Available Current trends towards the Future Internet are envisaging the conception of novel services endowed with context-aware and autonomic capabilities to improve end users’ quality of life. The Internet of Things paradigm is expected to contribute towards this ambitious vision by proposing models and mechanisms enabling the creation of networks of “smart things” on a large scale. It is widely recognized that efficient mechanisms for discovering available resources and capabilities are required to realize such vision. The contribution of this work consists in a novel discovery service for the Internet of Things. The proposed solution adopts a peer-to-peer approach for guaranteeing scalability, robustness, and easy maintenance of the overall system. While most existing peer-to-peer discovery services proposed for the IoT support solely exact match queries on a single attribute (i.e., the object identifier, our solution can handle multiattribute and range queries. We defined a layered approach by distinguishing three main aspects: multiattribute indexing, range query support, peer-to-peer routing. We chose to adopt an over-DHT indexing scheme to guarantee ease of design and implementation principles. We report on the implementation of a Proof of Concept in a dangerous goods monitoring scenario, and, finally, we discuss test results for structural properties and query performance evaluation.

  18. Prediction of eukaryotic gene structures based on multilevel optimization

    Institute of Scientific and Technical Information of China (English)

    ZHOU Yanhong; YANG Lei; WANG Hui; LU Feng; WAN Honghui

    2004-01-01

    Computational gene structure prediction, which is valuable for finding new genes and understanding the composition of genomes, plays a very important role in various kinds of genome projects. For eukaryotic gene structures, however, the prediction accuracy of existing methods is still limited. This paper presents a method of predicting eukaryotic gene structures based on multilevel optimization. The complicated problem of predicting gene structure in eukaryotic DNA sequence containing multiple genes can be decomposed into a series of sub-problems at several levels with decreasing complexity, including the gene level (single-exon gene, multi-exon gene), the element level (exon, intron, etc.), and the feature level (functional site signals, codon usage preference, etc.). On the basis of this decomposition, a multilevel model for the prediction of complex gene structures is created by a multilevel optimization process, in which the models dealing with sub-problems at low complexity level are first optimized respectively, and then optimally combined together to form models for those sub-problems at higher complexity level. Based on the multilevel model, a dynamic programming algorithm is designed to search for optimal gene structures from DNA sequences, and a new program GeneKey (1.0) for the prediction of eukaryotic gene structures is developed. Testing results with widely used datasets demonstrate that the prediction accuracies of GeneKey (1.0) at the nucleotide level, exon level and gene level are all higher than that of the well known program GENSCAN. A web server of GeneKey(1.0) is available at http://infosci.hust.edu.cn

  19. Gene Network Biological Validity Based on Gene-Gene Interaction Relevance

    OpenAIRE

    Francisco Gómez-Vela; Norberto Díaz-Díaz

    2014-01-01

    In recent years, gene networks have become one of the most useful tools for modeling biological processes. Many inference gene network algorithms have been developed as techniques for extracting knowledge from gene expression data. Ensuring the reliability of the inferred gene relationships is a crucial task in any study in order to prove that the algorithms used are precise. Usually, this validation process can be carried out using prior biological knowledge. The metabolic pathways stored in...

  20. DNA-energetics-based analyses suggest additional genes in prokaryotes

    Indian Academy of Sciences (India)

    Garima Khandelwal; Jalaj Gupta; B Jayaram

    2012-07-01

    We present here a novel methodology for predicting new genes in prokaryotic genomes on the basis of inherent energetics of DNA. Regions of higher thermodynamic stability were identified, which were filtered based on already known annotations to yield a set of potentially new genes. These were then processed for their compatibility with the stereo-chemical properties of proteins and tripeptide frequencies of proteins in Swissprot data, which results in a reliable set of new genes in a genome. Quite surprisingly, the methodology identifies new genes even in well-annotated genomes. Also, the methodology can handle genomes of any GC-content, size and number of annotated genes.

  1. Gene-based and semantic structure of the Gene Ontology as a complex network

    Science.gov (United States)

    Coronnello, Claudia; Tumminello, Michele; Miccichè, Salvatore

    2016-09-01

    The last decade has seen the advent and consolidation of ontology based tools for the identification and biological interpretation of classes of genes, such as the Gene Ontology. The Gene Ontology (GO) is constantly evolving over time. The information accumulated time-by-time and included in the GO is encoded in the definition of terms and in the setting up of semantic relations amongst terms. Here we investigate the Gene Ontology from a complex network perspective. We consider the semantic network of terms naturally associated with the semantic relationships provided by the Gene Ontology consortium. Moreover, the GO is a natural example of bipartite network of terms and genes. Here we are interested in studying the properties of the projected network of terms, i.e. a gene-based weighted network of GO terms, in which a link between any two terms is set if at least one gene is annotated in both terms. One aim of the present paper is to compare the structural properties of the semantic and the gene-based network. The relative importance of terms is very similar in the two networks, but the community structure changes. We show that in some cases GO terms that appear to be distinct from a semantic point of view are instead connected, and appear in the same community when considering their gene content. The identification of such gene-based communities of terms might therefore be the basis of a simple protocol aiming at improving the semantic structure of GO. Information about terms that share large gene content might also be important from a biomedical point of view, as it might reveal how genes over-expressed in a certain term also affect other biological processes, molecular functions and cellular components not directly linked according to GO semantics.

  2. Discovery of gene-gene interactions across multiple independent data sets of late onset Alzheimer disease from the Alzheimer Disease Genetics Consortium.

    Science.gov (United States)

    Hohman, Timothy J; Bush, William S; Jiang, Lan; Brown-Gentry, Kristin D; Torstenson, Eric S; Dudek, Scott M; Mukherjee, Shubhabrata; Naj, Adam; Kunkle, Brian W; Ritchie, Marylyn D; Martin, Eden R; Schellenberg, Gerard D; Mayeux, Richard; Farrer, Lindsay A; Pericak-Vance, Margaret A; Haines, Jonathan L; Thornton-Wells, Tricia A

    2016-02-01

    Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis. PMID:26827652

  3. A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs

    Directory of Open Access Journals (Sweden)

    Seitzer Phillip

    2012-11-01

    Full Text Available Abstract Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF binding site motif

  4. Knowledge-Based, Central Nervous System (CNS) Lead Selection and Lead Optimization for CNS Drug Discovery.

    Science.gov (United States)

    Ghose, Arup K; Herbertz, Torsten; Hudkins, Robert L; Dorsey, Bruce D; Mallamo, John P

    2012-01-18

    The central nervous system (CNS) is the major area that is affected by aging. Alzheimer's disease (AD), Parkinson's disease (PD), brain cancer, and stroke are the CNS diseases that will cost trillions of dollars for their treatment. Achievement of appropriate blood-brain barrier (BBB) penetration is often considered a significant hurdle in the CNS drug discovery process. On the other hand, BBB penetration may be a liability for many of the non-CNS drug targets, and a clear understanding of the physicochemical and structural differences between CNS and non-CNS drugs may assist both research areas. Because of the numerous and challenging issues in CNS drug discovery and the low success rates, pharmaceutical companies are beginning to deprioritize their drug discovery efforts in the CNS arena. Prompted by these challenges and to aid in the design of high-quality, efficacious CNS compounds, we analyzed the physicochemical property and the chemical structural profiles of 317 CNS and 626 non-CNS oral drugs. The conclusions derived provide an ideal property profile for lead selection and the property modification strategy during the lead optimization process. A list of substructural units that may be useful for CNS drug design was also provided here. A classification tree was also developed to differentiate between CNS drugs and non-CNS oral drugs. The combined analysis provided the following guidelines for designing high-quality CNS drugs: (i) topological molecular polar surface area of <76 Å(2) (25-60 Å(2)), (ii) at least one (one or two, including one aliphatic amine) nitrogen, (iii) fewer than seven (two to four) linear chains outside of rings, (iv) fewer than three (zero or one) polar hydrogen atoms, (v) volume of 740-970 Å(3), (vi) solvent accessible surface area of 460-580 Å(2), and (vii) positive QikProp parameter CNS. The ranges within parentheses may be used during lead optimization. One violation to this proposed profile may be acceptable. The

  5. Seed-based biclustering of gene expression data.

    Directory of Open Access Journals (Sweden)

    Jiyuan An

    Full Text Available BACKGROUND: Accumulated biological research outcomes show that biological functions do not depend on individual genes, but on complex gene networks. Microarray data are widely used to cluster genes according to their expression levels across experimental conditions. However, functionally related genes generally do not show coherent expression across all conditions since any given cellular process is active only under a subset of conditions. Biclustering finds gene clusters that have similar expression levels across a subset of conditions. This paper proposes a seed-based algorithm that identifies coherent genes in an exhaustive, but efficient manner. METHODS: In order to find the biclusters in a gene expression dataset, we exhaustively select combinations of genes and conditions as seeds to create candidate bicluster tables. The tables have two columns (a a gene set, and (b the conditions on which the gene set have dissimilar expression levels to the seed. First, the genes with less than the maximum number of dissimilar conditions are identified and a table of these genes is created. Second, the rows that have the same dissimilar conditions are grouped together. Third, the table is sorted in ascending order based on the number of dissimilar conditions. Finally, beginning with the first row of the table, a test is run repeatedly to determine whether the cardinality of the gene set in the row is greater than the minimum threshold number of genes in a bicluster. If so, a bicluster is outputted and the corresponding row is removed from the table. Repeating this process, all biclusters in the table are systematically identified until the table becomes empty. CONCLUSIONS: This paper presents a novel biclustering algorithm for the identification of additive biclusters. Since it involves exhaustively testing combinations of genes and conditions, the additive biclusters can be found more readily.

  6. Transcriptome analysis of the white body of the squid Euprymna tasmanica with emphasis on immune and hematopoietic gene discovery.

    Directory of Open Access Journals (Sweden)

    Karla A Salazar

    Full Text Available In the mutualistic relationship between the squid Euprymna tasmanica and the bioluminescent bacterium Vibrio fischeri, several host factors, including immune-related proteins, are known to interact and respond specifically and exclusively to the presence of the symbiont. In squid and octopus, the white body is considered to be an immune organ mainly due to the fact that blood cells, or hemocytes, are known to be present in high numbers and in different developmental stages. Hence, the white body has been described as the site of hematopoiesis in cephalopods. However, to our knowledge, there are no studies showing any molecular evidence of such functions. In this study, we performed a transcriptomic analysis of white body tissue of the Southern dumpling squid, E. tasmanica. Our primary goal was to gain insights into the functions of this tissue and to test for the presence of gene transcripts associated with hematopoietic and immune processes. Several hematopoiesis genes including CPSF1, GATA 2, TFIID, and FGFR2 were found to be expressed in the white body. In addition, transcripts associated with immune-related signal transduction pathways, such as the toll-like receptor/NF-κβ, and MAPK pathways were also found, as well as other immune genes previously identified in E. tasmanica's sister species, E. scolopes. This study is the first to analyze an immune organ within cephalopods, and to provide gene expression data supporting the white body as a hematopoietic tissue.

  7. Gene expression profiling of coelomic cells and discovery of immune-related genes in the earthworm, Eisenia andrei, using expressed sequence tags.

    Science.gov (United States)

    Tak, Eun Sik; Cho, Sung-Jin; Park, Soon Cheol

    2015-01-01

    The coelomic cells of the earthworm consist of leukocytes, chlorogocytes, and coelomocytes, which play an important role in innate immunity reactions. To gain insight into the expression profiles of coelomic cells of the earthworm, Eisenia andrei, we analyzed 1151 expressed sequence tags (ESTs) derived from the cDNA library of the coelomic cells. Among the 1151 ESTs analyzed, 493 ESTs (42.8%) showed a significant similarity to known genes and represented 164 unique genes, of which 93 ESTs were singletons and 71 ESTs manifested as two or more ESTs. From the 164 unique genes sequenced, we found 24 immune-related and cell defense genes. Furthermore, real-time PCR analysis showed that levels of lysenin-related proteins mRNA in coelomic cells of E. andrei were upregulated after the injection of Bacillus subtilis bacteria. This EST data-set would provide a valuable resource for future researches of earthworm immune system. PMID:25496401

  8. Gene-Set Local Hierarchical Clustering (GSLHC--A Gene Set-Based Approach for Characterizing Bioactive Compounds in Terms of Biological Functional Groups.

    Directory of Open Access Journals (Sweden)

    Feng-Hsiang Chung

    Full Text Available Gene-set-based analysis (GSA, which uses the relative importance of functional gene-sets, or molecular signatures, as units for analysis of genome-wide gene expression data, has exhibited major advantages with respect to greater accuracy, robustness, and biological relevance, over individual gene analysis (IGA, which uses log-ratios of individual genes for analysis. Yet IGA remains the dominant mode of analysis of gene expression data. The Connectivity Map (CMap, an extensive database on genomic profiles of effects of drugs and small molecules and widely used for studies related to repurposed drug discovery, has been mostly employed in IGA mode. Here, we constructed a GSA-based version of CMap, Gene-Set Connectivity Map (GSCMap, in which all the genomic profiles in CMap are converted, using gene-sets from the Molecular Signatures Database, to functional profiles. We showed that GSCMap essentially eliminated cell-type dependence, a weakness of CMap in IGA mode, and yielded significantly better performance on sample clustering and drug-target association. As a first application of GSCMap we constructed the platform Gene-Set Local Hierarchical Clustering (GSLHC for discovering insights on coordinated actions of biological functions and facilitating classification of heterogeneous subtypes on drug-driven responses. GSLHC was shown to tightly clustered drugs of known similar properties. We used GSLHC to identify the therapeutic properties and putative targets of 18 compounds of previously unknown characteristics listed in CMap, eight of which suggest anti-cancer activities. The GSLHC website http://cloudr.ncu.edu.tw/gslhc/ contains 1,857 local hierarchical clusters accessible by querying 555 of the 1,309 drugs and small molecules listed in CMap. We expect GSCMap and GSLHC to be widely useful in providing new insights in the biological effect of bioactive compounds, in drug repurposing, and in function-based classification of complex diseases.

  9. Gene-Set Local Hierarchical Clustering (GSLHC)--A Gene Set-Based Approach for Characterizing Bioactive Compounds in Terms of Biological Functional Groups.

    Science.gov (United States)

    Chung, Feng-Hsiang; Jin, Zhen-Hua; Hsu, Tzu-Ting; Hsu, Chueh-Lin; Liu, Hsueh-Chuan; Lee, Hoong-Chien

    2015-01-01

    Gene-set-based analysis (GSA), which uses the relative importance of functional gene-sets, or molecular signatures, as units for analysis of genome-wide gene expression data, has exhibited major advantages with respect to greater accuracy, robustness, and biological relevance, over individual gene analysis (IGA), which uses log-ratios of individual genes for analysis. Yet IGA remains the dominant mode of analysis of gene expression data. The Connectivity Map (CMap), an extensive database on genomic profiles of effects of drugs and small molecules and widely used for studies related to repurposed drug discovery, has been mostly employed in IGA mode. Here, we constructed a GSA-based version of CMap, Gene-Set Connectivity Map (GSCMap), in which all the genomic profiles in CMap are converted, using gene-sets from the Molecular Signatures Database, to functional profiles. We showed that GSCMap essentially eliminated cell-type dependence, a weakness of CMap in IGA mode, and yielded significantly better performance on sample clustering and drug-target association. As a first application of GSCMap we constructed the platform Gene-Set Local Hierarchical Clustering (GSLHC) for discovering insights on coordinated actions of biological functions and facilitating classification of heterogeneous subtypes on drug-driven responses. GSLHC was shown to tightly clustered drugs of known similar properties. We used GSLHC to identify the therapeutic properties and putative targets of 18 compounds of previously unknown characteristics listed in CMap, eight of which suggest anti-cancer activities. The GSLHC website http://cloudr.ncu.edu.tw/gslhc/ contains 1,857 local hierarchical clusters accessible by querying 555 of the 1,309 drugs and small molecules listed in CMap. We expect GSCMap and GSLHC to be widely useful in providing new insights in the biological effect of bioactive compounds, in drug repurposing, and in function-based classification of complex diseases. PMID:26473729

  10. Comprehensive Phenotyping in Multiple Sclerosis: Discovery Based Proteomics and the Current Understanding of Putative Biomarkers

    Directory of Open Access Journals (Sweden)

    Kevin C. O’Connor

    2006-01-01

    Full Text Available Currently, there is no single test for multiple sclerosis (MS. Diagnosis is confirmed through clinical evaluation, abnormalities revealed by magnetic resonance imaging (MRI, and analysis of cerebrospinal fluid (CSF chemistry. The early and accurate diagnosis of the disease, monitoring of progression, and gauging of therapeutic intervention are important but elusive elements of patient care. Moreover, a deeper understanding of the disease pathology is needed, including discovery of accurate biomarkers for MS. Herein we review putative biomarkers of MS relating to neurodegeneration and contributions to neuropathology, with particular focus on autoimmunity. In addition, novel assessments of biomarkers not driven by hypotheses are discussed, featuring our application of advanced proteomics and metabolomics for comprehensive phenotyping of CSF and blood. This strategy allows comparison of component expression levels in CSF and serum between MS and control groups. Examination of these preliminary data suggests that several CSF proteins in MS are differentially expressed, and thus, represent putative biomarkers deserving of further evaluation.

  11. A Fluorescence Displacement Assay for Antidepressant Drug Discovery Based on Ligand-Conjugated Quantum Dots

    Energy Technology Data Exchange (ETDEWEB)

    Chang, Jerry [Vanderbilt University; Tomlinson, Ian [Oak Ridge National Laboratory (ORNL); Warnement, Michael [Vanderbilt University; Iwamoto, Hideki [Vanderbilt University

    2011-01-01

    The serotonin (5-hydroxytryptamine, 5-HT) transporter (SERT) protein plays a central role in terminating 5-HT neurotransmission and is the most important therapeutic target for the treatment of major depression and anxiety disorders. We report an innovative, versatile, and target-selective quantum dot (QD) labeling approach for SERT in single Xenopus oocytes that can be adopted as a drug-screening platform. Our labeling approach employs a custom-made, QD-tagged indoleamine derivative ligand, IDT318, that is structurally similar to 5-HT and accesses the primary binding site with enhanced human SERT selectivity. Incubating QD-labeled oocytes with paroxetine (Paxil), a high-affinity SERT-specific inhibitor, showed a concentration- and time-dependent decrease in QD fluorescence, demonstrating the utility of our approach for the identification of SERT modulators. Furthermore, with the development of ligands aimed at other pharmacologically relevant targets, our approach may potentially form the basis for a multitarget drug discovery platform.

  12. The Complete Genome Sequence of Plodia Interpunctella Granulovirus: Evidence for Horizontal Gene Transfer and Discovery of an Unusual Inhibitor-of-Apoptosis Gene.

    Science.gov (United States)

    Harrison, Robert L; Rowley, Daniel L; Funk, C Joel

    2016-01-01

    The Indianmeal moth, Plodia interpunctella (Lepidoptera: Pyralidae), is a common pest of stored goods with a worldwide distribution. The complete genome sequence for a larval pathogen of this moth, the baculovirus Plodia interpunctella granulovirus (PiGV), was determined by next-generation sequencing. The PiGV genome was found to be 112, 536 bp in length with a 44.2% G+C nucleotide distribution. A total of 123 open reading frames (ORFs) and seven homologous regions (hrs) were identified and annotated. Phylogenetic inference using concatenated alignments of 36 baculovirus core genes placed PiGV in the "b" clade of viruses from genus Betabaculovirus with a branch length suggesting that PiGV represents a distinct betabaculovirus species. In addition to the baculovirus core genes and orthologues of other genes found in other betabaculovirus genomes, the PiGV genome sequence contained orthologues of the bidensovirus NS3 gene, as well as ORFs that occur in alphabaculoviruses but not betabaculoviruses. While PiGV contained an orthologue of inhibitor of apoptosis-5 (iap-5), an orthologue of inhibitor of apoptosis-3 (iap-3) was not present. Instead, the PiGV sequence contained an ORF (PiGV ORF81) encoding an IAP homologue with sequence similarity to insect cellular IAPs, but not to viral IAPs. Phylogenetic analysis of baculovirus and insect IAP amino acid sequences suggested that the baculovirus IAP-3 genes and the PiGV ORF81 IAP homologue represent different lineages arising from more than one acquisition event. The presence of genes from other sources in the PiGV genome highlights the extent to which baculovirus gene content is shaped by horizontal gene transfer.

  13. Volatility Discovery

    DEFF Research Database (Denmark)

    Dias, Gustavo Fruet; Scherrer, Cristina; Papailias, Fotis

    There is a large literature that investigates how homogenous securities traded on different markets incorporate new information (price discovery analysis). We extend this concept to the stochastic volatility process and investigate how markets contribute to the efficient stochastic volatility whi...

  14. Connectivity Map-based discovery of parbendazole reveals targetable human osteogenic pathway.

    Science.gov (United States)

    Brum, Andrea M; van de Peppel, Jeroen; van der Leije, Cindy S; Schreuders-Koedam, Marijke; Eijken, Marco; van der Eerden, Bram C J; van Leeuwen, Johannes P T M

    2015-10-13

    Osteoporosis is a common skeletal disorder characterized by low bone mass leading to increased bone fragility and fracture susceptibility. In this study, we have identified pathways that stimulate differentiation of bone forming osteoblasts from human mesenchymal stromal cells (hMSCs). Gene expression profiling was performed in hMSCs differentiated toward osteoblasts (at 6 h). Significantly regulated genes were analyzed in silico, and the Connectivity Map (CMap) was used to identify candidate bone stimulatory compounds. The signature of parbendazole matches the expression changes observed for osteogenic hMSCs. Parbendazole stimulates osteoblast differentiation as indicated by increased alkaline phosphatase activity, mineralization, and up-regulation of bone marker genes (alkaline phosphatase/ALPL, osteopontin/SPP1, and bone sialoprotein II/IBSP) in a subset of the hMSC population resistant to the apoptotic effects of parbendazole. These osteogenic effects are independent of glucocorticoids because parbendazole does not up-regulate glucocorticoid receptor (GR) target genes and is not inhibited by the GR antagonist mifepristone. Parbendazole causes profound cytoskeletal changes including degradation of microtubules and increased focal adhesions. Stabilization of microtubules by pretreatment with Taxol inhibits osteoblast differentiation. Parbendazole up-regulates bone morphogenetic protein 2 (BMP-2) gene expression and activity. Cotreatment with the BMP-2 antagonist DMH1 limits, but does not block, parbendazole-induced mineralization. Using the CMap we have identified a previously unidentified lineage-specific, bone anabolic compound, parbendazole, which induces osteogenic differentiation through a combination of cytoskeletal changes and increased BMP-2 activity. PMID:26420877

  15. Discovery of new risk loci for IgA nephropathy implicates genes involved in immunity against intestinal pathogens.

    Science.gov (United States)

    Kiryluk, Krzysztof; Li, Yifu; Scolari, Francesco; Sanna-Cherchi, Simone; Choi, Murim; Verbitsky, Miguel; Fasel, David; Lata, Sneh; Prakash, Sindhuri; Shapiro, Samantha; Fischman, Clara; Snyder, Holly J; Appel, Gerald; Izzi, Claudia; Viola, Battista Fabio; Dallera, Nadia; Del Vecchio, Lucia; Barlassina, Cristina; Salvi, Erika; Bertinetto, Francesca Eleonora; Amoroso, Antonio; Savoldi, Silvana; Rocchietti, Marcella; Amore, Alessandro; Peruzzi, Licia; Coppo, Rosanna; Salvadori, Maurizio; Ravani, Pietro; Magistroni, Riccardo; Ghiggeri, Gian Marco; Caridi, Gianluca; Bodria, Monica; Lugani, Francesca; Allegri, Landino; Delsante, Marco; Maiorana, Mariarosa; Magnano, Andrea; Frasca, Giovanni; Boer, Emanuela; Boscutti, Giuliano; Ponticelli, Claudio; Mignani, Renzo; Marcantoni, Carmelita; Di Landro, Domenico; Santoro, Domenico; Pani, Antonello; Polci, Rosaria; Feriozzi, Sandro; Chicca, Silvana; Galliani, Marco; Gigante, Maddalena; Gesualdo, Loreto; Zamboli, Pasquale; Battaglia, Giovanni Giorgio; Garozzo, Maurizio; Maixnerová, Dita; Tesar, Vladimir; Eitner, Frank; Rauen, Thomas; Floege, Jürgen; Kovacs, Tibor; Nagy, Judit; Mucha, Krzysztof; Pączek, Leszek; Zaniew, Marcin; Mizerska-Wasiak, Małgorzata; Roszkowska-Blaim, Maria; Pawlaczyk, Krzysztof; Gale, Daniel; Barratt, Jonathan; Thibaudin, Lise; Berthoux, Francois; Canaud, Guillaume; Boland, Anne; Metzger, Marie; Panzer, Ulf; Suzuki, Hitoshi; Goto, Shin; Narita, Ichiei; Caliskan, Yasar; Xie, Jingyuan; Hou, Ping; Chen, Nan; Zhang, Hong; Wyatt, Robert J; Novak, Jan; Julian, Bruce A; Feehally, John; Stengel, Benedicte; Cusi, Daniele; Lifton, Richard P; Gharavi, Ali G

    2014-11-01

    We performed a genome-wide association study (GWAS) of IgA nephropathy (IgAN), the most common form of glomerulonephritis, with discovery and follow-up in 20,612 individuals of European and East Asian ancestry. We identified six new genome-wide significant associations, four in ITGAM-ITGAX, VAV3 and CARD9 and two new independent signals at HLA-DQB1 and DEFA. We replicated the nine previously reported signals, including known SNPs in the HLA-DQB1 and DEFA loci. The cumulative burden of risk alleles is strongly associated with age at disease onset. Most loci are either directly associated with risk of inflammatory bowel disease (IBD) or maintenance of the intestinal epithelial barrier and response to mucosal pathogens. The geospatial distribution of risk alleles is highly suggestive of multi-locus adaptation, and genetic risk correlates strongly with variation in local pathogens, particularly helminth diversity, suggesting a possible role for host-intestinal pathogen interactions in shaping the genetic landscape of IgAN. PMID:25305756

  16. Toxins and drug discovery.

    Science.gov (United States)

    Harvey, Alan L

    2014-12-15

    Components from venoms have stimulated many drug discovery projects, with some notable successes. These are briefly reviewed, from captopril to ziconotide. However, there have been many more disappointments on the road from toxin discovery to approval of a new medicine. Drug discovery and development is an inherently risky business, and the main causes of failure during development programmes are outlined in order to highlight steps that might be taken to increase the chances of success with toxin-based drug discovery. These include having a clear focus on unmet therapeutic needs, concentrating on targets that are well-validated in terms of their relevance to the disease in question, making use of phenotypic screening rather than molecular-based assays, and working with development partners with the resources required for the long and expensive development process. PMID:25448391

  17. The discovery of archaea origin phosphomannomutase in algae based on the algal transcriptome

    Institute of Scientific and Technical Information of China (English)

    FENG Yanjing; CHI Shan; LIU Cui; CHEN Shengping; YU Jun; WANG Xumin; LIU Tao

    2014-01-01

    Phosphomannomutase (PMM;EC 5.4.2.8) is an enzyme that catalyzes the interconversion reaction between mannose-6-phosphate and mannose-1-phosphate. However, its systematic molecular and functional in-vestigations in algae have not hitherto been reported. In this work, with the accomplishment of the 1 000 Plant Project (OneKP) in which more than 218 species of Chromista, including 19 marine phaeophytes, 22 marine rhodophytes, 171 chlorophytes, 5 cryptophytes, 4 haptophytes, and 5 glaucophytes were sequenced, we used a gene analysis method to analyze the PMM gene sequences in algae and confirm the existence of the PMM gene in the transcriptomic sequencing data of Rhodophyta and Ochrophyta. Our results showed that only one type of PMM with four conserved motifs exists in Chromista which is similar to human PMM. Moreover, the phylogenetic tree revealed that algae PMM possibly originated from archaea.

  18. AAV-Based Targeting Gene Therapy

    Directory of Open Access Journals (Sweden)

    Wenfang Shi

    2008-01-01

    Full Text Available Since the first parvovirus serotype AAV2 was isolated from human and used as a vector for gene therapy application, there have been significant progresses in AAV vector development. AAV vectors have been extensively investigated in gene therapy for a broad application. AAV vectors have been considered as the first choice of vector due to efficient infectivity, stable expression and non-pathogenicity. However, the untoward events in AAV mediated in vivo gene therapy studies proposed the new challenges for their further applications. Deep understanding of the viral life cycle, viral structure and replication, infection mechanism and efficiency of AAV DNA integration, in terms of contributing viral, host-cell factors and circumstances would promote to evaluate the advantages and disadvantages and provide more insightful information for the possible clinical applications. In this review, main effort will be focused on the recent progresses in gene delivery to the target cells via receptor-ligand interaction and DNA specific integration regulation. Furthermore AAV receptor and virus particle intracellular trafficking are also discussed.

  19. A general co-expression network-based approach to gene expression analysis: comparison and applications

    Directory of Open Access Journals (Sweden)

    Zhang Weixiong

    2010-02-01

    Full Text Available Abstract Background Co-expression network-based approaches have become popular in analyzing microarray data, such as for detecting functional gene modules. However, co-expression networks are often constructed by ad hoc methods, and network-based analyses have not been shown to outperform the conventional cluster analyses, partially due to the lack of an unbiased evaluation metric. Results Here, we develop a general co-expression network-based approach for analyzing both genes and samples in microarray data. Our approach consists of a simple but robust rank-based network construction method, a parameter-free module discovery algorithm and a novel reference network-based metric for module evaluation. We report some interesting topological properties of rank-based co-expression networks that are very different from that of value-based networks in the literature. Using a large set of synthetic and real microarray data, we demonstrate the superior performance of our approach over several popular existing algorithms. Applications of our approach to yeast, Arabidopsis and human cancer microarray data reveal many interesting modules, including a fatal subtype of lymphoma and a gene module regulating yeast telomere integrity, which were missed by the existing methods. Conclusions We demonstrated that our novel approach is very effective in discovering the modular structures in microarray data, both for genes and for samples. As the method is essentially parameter-free, it may be applied to large data sets where the number of clusters is difficult to estimate. The method is also very general and can be applied to other types of data. A MATLAB implementation of our algorithm can be downloaded from http://cs.utsa.edu/~jruan/Software.html.

  20. Increased complexity of gene structure and base composition in vertebrates

    Institute of Scientific and Technical Information of China (English)

    Ying Wu; Huizhong Yuan; Shengjun Tan; Jian-Qun Chen; Dacheng Tian; Haiwang Yang

    2011-01-01

    How the structure and base composition of genes changed with the evolution of vertebrates remains a puzzling question. Here we analyzed 895 orthologous protein-coding genes in six multicellular animals: human, chicken, zebrafish, sea squirt, fruit fly, and worm. Our analyses reveal that many gene regions, particularly intron and 3' UTR, gradually expanded throughout the evolution of vertebrates from their invertebrate ancestors, and that the number of exons per gene increased. Studies based on all protein-coding genes in each genome provide consistent results.We also find that GC-content increased in many gene regions (especially 5' UTR) in the evolution of endotherms, except in coding-exons.Analysis of individual genomes shows that 3′ UTR demonstrated stronger length and CC-content correlation with intron than 5' UTR, and gene with large intron in all six species demonstrated relatively similar GC-content. Our data indicates a great increase in complexity in vertebrate genes and we propose that the requirement for morphological and functional changes is probably the driving force behind the evolution of structure and base composition complexity in multicellular animal genes.

  1. Core Collection Based Backcrossing: An Efficient Approach for Breeding,Germplasm Enhacement and Gene Discovery

    Institute of Scientific and Technical Information of China (English)

    J.Z. Jia; R.H. Zhou; X.Y. Zhang; L. Zhang; Y.L. Li; J. Wang; X.Z. Liu; L.F. Gao; S.B. Liu

    2007-01-01

    @@ Plant germplasm underpins much of crop development. Millions of germplasm accessions have been collected and conserved ex situ, and the major challenge is now how to exploit and utilize this abundant resource.

  2. Discovery of New Anti-Schistosomal Hits by Integration of QSAR-Based Virtual Screening and High Content Screening.

    Science.gov (United States)

    Neves, Bruno J; Dantas, Rafael F; Senger, Mario R; Melo-Filho, Cleber C; Valente, Walter C G; de Almeida, Ana C M; Rezende-Neto, João M; Lima, Elid F C; Paveley, Ross; Furnham, Nicholas; Muratov, Eugene; Kamentsky, Lee; Carpenter, Anne E; Braga, Rodolpho C; Silva-Junior, Floriano P; Andrade, Carolina Horta

    2016-08-11

    Schistosomiasis is a debilitating neglected tropical disease, caused by flatworms of Schistosoma genus. The treatment relies on a single drug, praziquantel (PZQ), making the discovery of new compounds extremely urgent. In this work, we integrated QSAR-based virtual screening (VS) of Schistosoma mansoni thioredoxin glutathione reductase (SmTGR) inhibitors and high content screening (HCS) aiming to discover new antischistosomal agents. Initially, binary QSAR models for inhibition of SmTGR were developed and validated using the Organization for Economic Co-operation and Development (OECD) guidance. Using these models, we prioritized 29 compounds for further testing in two HCS platforms based on image analysis of assay plates. Among them, 2-[2-(3-methyl-4-nitro-5-isoxazolyl)vinyl]pyridine and 2-(benzylsulfonyl)-1,3-benzothiazole, two compounds representing new chemical scaffolds have activity against schistosomula and adult worms at low micromolar concentrations and therefore represent promising antischistosomal hits for further hit-to-lead optimization. PMID:27396732

  3. FluKB: A Knowledge-Based System for Influenza Vaccine Target Discovery and Analysis of the Immunological Properties of Influenza Viruses

    DEFF Research Database (Denmark)

    Simon, Christian; Kudahl, Ulrich Johan; Sun, Jing;

    2015-01-01

    FluKB is a knowledge-based system focusing on data and analytical tools for influenza vaccine discovery. The main goal of FluKB is to provide access to curated influenza sequence and epitope data and enhance the analysis of influenza sequence diversity and the analysis of targets of immune...... framework allowing the implementation of analytical workflows and includes standard search tools, such as keyword search and sequence similarity queries, as well as advanced tools for the analysis of sequence variability.The advanced analytical tools for vaccine discovery include visual mapping of T- and B......-cell vaccine targets and assessment of neutralizing antibody coverage. FluKB supports the discovery of vaccine targets and the analysis of viral diversity and its implications for vaccine discovery as well as potential T-cell breadth and antibody cross neutralization involving multiple strains. Flu...

  4. Development of urinary pseudotargeted LC-MS-based metabolomics method and its application in hepatocellular carcinoma biomarker discovery.

    Science.gov (United States)

    Shao, Yaping; Zhu, Bin; Zheng, Ruiyin; Zhao, Xinjie; Yin, Peiyuan; Lu, Xin; Jiao, Binghua; Xu, Guowang; Yao, Zhenzhen

    2015-02-01

    Hepatocellular carcinoma (HCC) is one of the pestilent malignancies leading to cancer-related death. Discovering effective biomarkers for HCC diagnosis is an urgent demand. To identify potential metabolite biomarkers, we developed a urinary pseudotargeted method based on liquid chromatography-hybrid triple quadrupole linear ion trap mass spectrometry (LC-QTRAP MS). Compared with nontargeted method, the pseudotargeted method can achieve better data quality, which benefits differential metabolites discovery. The established method was applied to cirrhosis (CIR) and HCC investigation. It was found that urinary nucleosides, bile acids, citric acid, and several amino acids were significantly changed in liver disease groups compared with the controls, featuring the dysregulation of purine metabolism, energy metabolism, and amino metabolism in liver diseases. Furthermore, some metabolites such as cyclic adenosine monophosphate, glutamine, and short- and medium-chain acylcarnitines were the differential metabolites of HCC and CIR. On the basis of binary logistic regression, butyrylcarnitine (carnitine C4:0) and hydantoin-5-propionic acid were defined as combinational markers to distinguish HCC from CIR. The area under curve was 0.786 and 0.773 for discovery stage and validation stage samples, respectively. These data show that the established pseudotargeted method is a complementary one of targeted and nontargeted methods for metabolomics study.

  5. Insights into shell deposition in the Antarctic bivalve Laternula elliptica: gene discovery in the mantle transcriptome using 454 pyrosequencing

    Directory of Open Access Journals (Sweden)

    Power Deborah M

    2010-06-01

    Full Text Available Abstract Background The Antarctic clam, Laternula elliptica, is an infaunal stenothermal bivalve mollusc with a circumpolar distribution. It plays a significant role in bentho-pelagic coupling and hence has been proposed as a sentinel species for climate change monitoring. Previous studies have shown that this mollusc displays a high level of plasticity with regard to shell deposition and damage repair against a background of genetic homogeneity. The Southern Ocean has amongst the lowest present-day CaCO3 saturation rate of any ocean region, and is predicted to be among the first to become undersaturated under current ocean acidification scenarios. Hence, this species presents as an ideal candidate for studies into the processes of calcium regulation and shell deposition in our changing ocean environments. Results 454 sequencing of L. elliptica mantle tissue generated 18,290 contigs with an average size of 535 bp (ranging between 142 bp-5.591 kb. BLAST sequence similarity searching assigned putative function to 17% of the data set, with a significant proportion of these transcripts being involved in binding and potentially of a secretory nature, as defined by GO molecular function and biological process classifications. These results indicated that the mantle is a transcriptionally active tissue which is actively proliferating. All transcripts were screened against an in-house database of genes shown to be involved in extracellular matrix formation and calcium homeostasis in metazoans. Putative identifications were made for a number of classical shell deposition genes, such as tyrosinase, carbonic anhydrase and metalloprotease 1, along with novel members of the family 2 G-Protein Coupled Receptors (GPCRs. A membrane transport protein (SEC61 was also characterised and this demonstrated the utility of the clam sequence data as a resource for examining cold adapted amino acid substitutions. The sequence data contained 46,235 microsatellites and 13

  6. Automated Sample Preparation Platform for Mass Spectrometry-Based Plasma Proteomics and Biomarker Discovery

    Directory of Open Access Journals (Sweden)

    Vilém Guryča

    2014-03-01

    Full Text Available The identification of novel biomarkers from human plasma remains a critical need in order to develop and monitor drug therapies for nearly all disease areas. The discovery of novel plasma biomarkers is, however, significantly hampered by the complexity and dynamic range of proteins within plasma, as well as the inherent variability in composition from patient to patient. In addition, it is widely accepted that most soluble plasma biomarkers for diseases such as cancer will be represented by tissue leakage products, circulating in plasma at low levels. It is therefore necessary to find approaches with the prerequisite level of sensitivity in such a complex biological matrix. Strategies for fractionating the plasma proteome have been suggested, but improvements in sensitivity are often negated by the resultant process variability. Here we describe an approach using multidimensional chromatography and on-line protein derivatization, which allows for higher sensitivity, whilst minimizing the process variability. In order to evaluate this automated process fully, we demonstrate three levels of processing and compare sensitivity, throughput and reproducibility. We demonstrate that high sensitivity analysis of the human plasma proteome is possible down to the low ng/mL or even high pg/mL level with a high degree of technical reproducibility.

  7. Potential of Glutamate-Based Drug Discovery for Next Generation Antidepressants

    Directory of Open Access Journals (Sweden)

    Shigeyuki Chaki

    2015-09-01

    Full Text Available Recently, ketamine has been demonstrated to exert rapid-acting antidepressant effects in patients with depression, including those with treatment-resistant depression, and this discovery has been regarded as the most significant advance in drug development for the treatment of depression in over 50 years. To overcome unwanted side effects of ketamine, numerous approaches targeting glutamatergic systems have been vigorously investigated. For example, among agents targeting the NMDA receptor, the efficacies of selective GluN2B receptor antagonists and a low-trapping antagonist, as well as glycine site modulators such as GLYX-13 and sarcosine have been demonstrated clinically. Moreover, agents acting on metabotropic glutamate receptors, such as mGlu2/3 and mGlu5 receptors, have been proposed as useful approaches to mimicking the antidepressant effects of ketamine. Neural and synaptic mechanisms mediated through the antidepressant effects of ketamine have been being delineated, most of which indicate that ketamine improves abnormalities in synaptic transmission and connectivity observed in depressive states via the AMPA receptor and brain-derived neurotrophic factor-dependent mechanisms. Interestingly, some of the above agents may share some neural and synaptic mechanisms with ketamine. These studies should provide important insights for the development of superior pharmacotherapies for depression with more potent and faster onsets of actions.

  8. An Innovative Cell Microincubator for Drug Discovery Based on 3D Silicon Structures

    Directory of Open Access Journals (Sweden)

    Francesca Aredia

    2016-01-01

    Full Text Available We recently employed three-dimensional (3D silicon microstructures (SMSs consisting in arrays of 3 μm-thick silicon walls separated by 50 μm-deep, 5 μm-wide gaps, as microincubators for monitoring the biomechanical properties of tumor cells. They were here applied to investigate the in vitro behavior of HT1080 human fibrosarcoma cells driven to apoptosis by the chemotherapeutic drug Bleomycin. Our results, obtained by fluorescence microscopy, demonstrated that HT1080 cells exhibited a great ability to colonize the narrow gaps. Remarkably, HT1080 cells grown on 3D-SMS, when treated with the DNA damaging agent Bleomycin under conditions leading to apoptosis, tended to shrink, reducing their volume and mimicking the normal behavior of apoptotic cells, and were prone to leave the gaps. Finally, we performed label-free detection of cells adherent to the vertical silicon wall, inside the gap of 3D-SMS, by exploiting optical low coherence reflectometry using infrared, low power radiation. This kind of approach may become a new tool for increasing automation in the drug discovery area. Our results open new perspectives in view of future applications of the 3D-SMS as the core element of a lab-on-a-chip suitable for screening the effect of new molecules potentially able to kill tumor cells.

  9. A sequence-based approach to identify reference genes for gene expression analysis

    Directory of Open Access Journals (Sweden)

    Chari Raj

    2010-08-01

    Full Text Available Abstract Background An important consideration when analyzing both microarray and quantitative PCR expression data is the selection of appropriate genes as endogenous controls or reference genes. This step is especially critical when identifying genes differentially expressed between datasets. Moreover, reference genes suitable in one context (e.g. lung cancer may not be suitable in another (e.g. breast cancer. Currently, the main approach to identify reference genes involves the mining of expression microarray data for highly expressed and relatively constant transcripts across a sample set. A caveat here is the requirement for transcript normalization prior to analysis, and measurements obtained are relative, not absolute. Alternatively, as sequencing-based technologies provide digital quantitative output, absolute quantification ensues, and reference gene identification becomes more accurate. Methods Serial analysis of gene expression (SAGE profiles of non-malignant and malignant lung samples were compared using a permutation test to identify the most stably expressed genes across all samples. Subsequently, the specificity of the reference genes was evaluated across multiple tissue types, their constancy of expression was assessed using quantitative RT-PCR (qPCR, and their impact on differential expression analysis of microarray data was evaluated. Results We show that (i conventional references genes such as ACTB and GAPDH are highly variable between cancerous and non-cancerous samples, (ii reference genes identified for lung cancer do not perform well for other cancer types (breast and brain, (iii reference genes identified through SAGE show low variability using qPCR in a different cohort of samples, and (iv normalization of a lung cancer gene expression microarray dataset with or without our reference genes, yields different results for differential gene expression and subsequent analyses. Specifically, key established pathways in lung

  10. Prediction of Tumor Outcome Based on Gene Expression Data

    Institute of Scientific and Technical Information of China (English)

    Liu Juan; Hitoshi Iba

    2004-01-01

    Gene expression microarray data can be used to classify tumor types. We proposed a new procedure to classify human tumor samples based on microarray gene expressions by using a hybrid supervised learning method called MOEA+WV (Multi-Objective Evolutionary Algorithm+Weighted Voting). MOEA is used to search for a relatively few subsets of informative genes from the high-dimensional gene space, and WV is used as a classification tool. This new method has been applied to predicate the subtypes of lymphoma and outcomes of medulloblastoma. The results are relatively accurate and meaningful compared to those from other methods.

  11. Low-coverage, whole-genome sequencing of Artocarpus camansi (Moraceae) for phylogenetic marker development and gene discovery1

    Science.gov (United States)

    Gardner, Elliot M.; Johnson, Matthew G.; Ragone, Diane; Wickett, Norman J.; Zerega, Nyree J. C.

    2016-01-01

    Premise of the study: We used moderately low-coverage (17×) whole-genome sequencing of Artocarpus camansi (Moraceae) to develop genomic resources for Artocarpus and Moraceae. Methods and Results: A de novo assembly of Illumina short reads (251,378,536 pairs, 2 × 100 bp) accounted for 93% of the predicted genome size. Predicted coding regions were used in a three-way orthology search with published genomes of Morus notabilis and Cannabis sativa. Phylogenetic markers for Moraceae were developed from 333 inferred single-copy exons. Ninety-eight putative MADS-box genes were identified. Analysis of all predicted coding regions resulted in preliminary annotation of 49,089 genes. An analysis of synonymous substitutions for pairs of orthologs (Ks analysis) in M. notabilis and A. camansi strongly suggested a lineage-specific whole-genome duplication in Artocarpus. Conclusions: This study substantially increases the genomic resources available for Artocarpus and Moraceae and demonstrates the value of low-coverage de novo assemblies for nonmodel organisms with moderately large genomes. PMID:27437173

  12. Low-coverage, whole-genome sequencing of Artocarpus camansi (Moraceae) for phylogenetic marker development and gene discovery1

    Science.gov (United States)

    Gardner, Elliot M.; Johnson, Matthew G.; Ragone, Diane; Wickett, Norman J.; Zerega, Nyree J. C.

    2016-01-01

    Premise of the study: We used moderately low-coverage (17×) whole-genome sequencing of Artocarpus camansi (Moraceae) to develop genomic resources for Artocarpus and Moraceae. Methods and Results: A de novo assembly of Illumina short reads (251,378,536 pairs, 2 × 100 bp) accounted for 93% of the predicted genome size. Predicted coding regions were used in a three-way orthology search with published genomes of Morus notabilis and Cannabis sativa. Phylogenetic markers for Moraceae were developed from 333 inferred single-copy exons. Ninety-eight putative MADS-box genes were identified. Analysis of all predicted coding regions resulted in preliminary annotation of 49,089 genes. An analysis of synonymous substitutions for pairs of orthologs (Ks analysis) in M. notabilis and A. camansi strongly suggested a lineage-specific whole-genome duplication in Artocarpus. Conclusions: This study substantially increases the genomic resources available for Artocarpus and Moraceae and demonstrates the value of low-coverage de novo assemblies for nonmodel organisms with moderately large genomes.

  13. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics.

    Science.gov (United States)

    Lamparter, David; Marbach, Daniel; Rueedi, Rico; Kutalik, Zoltán; Bergmann, Sven

    2016-01-01

    Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries.

  14. A new gene ontology-based measure for the functional similarity of gene products

    Institute of Scientific and Technical Information of China (English)

    QI Guo-long; QIAN Shi-yu; FANG Ji-qian

    2013-01-01

    Background Although biomedical ontologies have standardized the representation of gene products across species and databases,a method for determining the functional similarities of gene products has not yet been developed.Methods We proposed a new semantic similarity measure based on Gene Ontology that considers the semantic influences from all of the ancestor terms in a graph.Our measure was compared with Resnik's measure in two applications,which were based on the association of the measure used with the gene co-expression and the proteinprotein interactions.Results The results showed a considerable association between the semantic similarity and the expression correlation and between the semantic similarity and the protein-protein interactions,and our measure performed the best overall.Conclusion These results revealed the potential value of our newly proposed semantic similarity measure in studying the functional relevance of gene products.

  15. Genome-scale identification of cell-wall related genes in Arabidopsis based on co-expression network analysis

    Directory of Open Access Journals (Sweden)

    Wang Shan

    2012-08-01

    Full Text Available Abstract Background Identification of the novel genes relevant to plant cell-wall (PCW synthesis represents a highly important and challenging problem. Although substantial efforts have been invested into studying this problem, the vast majority of the PCW related genes remain unknown. Results Here we present a computational study focused on identification of the novel PCW genes in Arabidopsis based on the co-expression analyses of transcriptomic data collected under 351 conditions, using a bi-clustering technique. Our analysis identified 217 highly co-expressed gene clusters (modules under some experimental conditions, each containing at least one gene annotated as PCW related according to the Purdue Cell Wall Gene Families database. These co-expression modules cover 349 known/annotated PCW genes and 2,438 new candidates. For each candidate gene, we annotated the specific PCW synthesis stages in which it is involved and predicted the detailed function. In addition, for the co-expressed genes in each module, we predicted and analyzed their cis regulatory motifs in the promoters using our motif discovery pipeline, providing strong evidence that the genes in each co-expression module are transcriptionally co-regulated. From the all co-expression modules, we infer that 108 modules are related to four major PCW synthesis components, using three complementary methods. Conclusions We believe our approach and data presented here will be useful for further identification and characterization of PCW genes. All the predicted PCW genes, co-expression modules, motifs and their annotations are available at a web-based database: http://csbl.bmb.uga.edu/publications/materials/shanwang/CWRPdb/index.html.

  16. 基于知识发现的范例推理系统%Case-Based Reasoning System Based on Knowledge Discovery

    Institute of Scientific and Technical Information of China (English)

    倪志伟; 蔡庆生

    2003-01-01

    Nowadays the research and exploitation of the case-based system are getting more and more attention.Case-Based Reasoning (CBR) is a strategy for solving the object cases based on the source cases that are prompted bythe object ones. CBR is not only a psychological theory for human knowledge, but will be a new cornerstone of theintelligent computer system technology. The case-based system is adopted in more and more application fields in orderto obtain better results, especially in the fields with ill-defined and no expert knowledge. But there is a lot of knowl-edge required in CBR, and we are also faced with the same knowledge acquisition bottleneck as in the expert systems.Data Mining (DM) and Knowledge Discovery in Database (KDD) are just the most useful means to solve this kind ofproblem in order to make the knowledge acquisition more automated . In this paper, we discuss the data mining tech-nology in CBR, especially we raise knowledge discovery in case base (KDC) and discuss this concept in detail. Final-ly, the structure of CBR based on DM is put forward.

  17. First discovery of two polyketide synthase genes for mitorubrinic acid and mitorubrinol yellow pigment biosynthesis and implications in virulence of Penicillium marneffei.

    Directory of Open Access Journals (Sweden)

    Patrick C Y Woo

    Full Text Available BACKGROUND: The genome of P. marneffei, the most important thermal dimorphic fungus causing respiratory, skin and systemic mycosis in China and Southeast Asia, possesses 23 polyketide synthase (PKS genes and 2 polyketide synthase nonribosomal peptide synthase hybrid (PKS-NRPS genes, which is of high diversity compared to other thermal dimorphic pathogenic fungi. We hypothesized that the yellow pigment in the mold form of P. marneffei could also be synthesized by one or more PKS genes. METHODOLOGY/PRINCIPAL FINDINGS: All 23 PKS and 2 PKS-NRPS genes of P. marneffei were systematically knocked down. A loss of the yellow pigment was observed in the mold form of the pks11 knockdown, pks12 knockdown and pks11pks12 double knockdown mutants. Sequence analysis showed that PKS11 and PKS12 are fungal non-reducing PKSs. Ultra high performance liquid chromatography-photodiode array detector/electrospray ionization-quadruple time of flight-mass spectrometry (MS and MS/MS analysis of the culture filtrates of wild type P. marneffei and the pks11 knockdown, pks12 knockdown and pks11pks12 double knockdown mutants showed that the yellow pigment is composed of mitorubrinic acid and mitorubrinol. The survival of mice challenged with the pks11 knockdown, pks12 knockdown and pks11pks12 double knockdown mutants was significantly better than those challenged with wild type P. marneffei (P<0.05. There was also statistically significant decrease in survival of pks11 knockdown, pks12 knockdown and pks11pks12 double knockdown mutants compared to wild type P. marneffei in both J774 and THP1 macrophages (P<0.05. CONCLUSIONS/SIGNIFICANCE: The yellow pigment of the mold form of P. marneffei is composed of mitorubrinol and mitorubrinic acid. This represents the first discovery of PKS genes responsible for mitorubrinol and mitorubrinic acid biosynthesis. pks12 and pks11 are probably responsible for sequential use in the biosynthesis of mitorubrinol and mitorubrinic acid

  18. Gene discovery in Eimeria tenella by immunoscreening cDNA expression libraries of sporozoites and schizonts with chicken intestinal antibodies.

    Science.gov (United States)

    Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie

    2003-04-01

    Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.

  19. MOLECULAR MODELING AND DRUG DISCOVERY OF POTENTIAL INHIBITORS FOR ANTICANCER TARGET GENE MELK (MATERNAL EMBRYONIC LEUCINE ZIPPER KINASE

    Directory of Open Access Journals (Sweden)

    Sabitha. K

    2011-12-01

    Full Text Available Maternal embryonic leucine zipper kinase (MELK, a member of the AMP serine/threonine kinase family, exhibits multiple features consistent with the potential utility of this gene as an anticancer target. Reports show that MELK functions as a cancer-specific protein kinase, and that down-regulation of MELK results in growth suppression of breast cancer cells. There are many inhibitors which bind to kinases and are in clinical trials too. In our study we have taken a library of different inhibitors and docked those using GLIDE Induced Fit. From docking result we can conclude that Syk inhibitor II, Rho kinase inhibitor IV, p38 MAP Kinase Inhibitor III, HA 1004, Dihydrochloride and IKK -2 inhibitor VI have good binding affinity towards MELK and may have anticancer activity.

  20. Discovery of Antischistosomal Drug Leads Based on Tetraazamacrocyclic Derivatives and Their Metal Complexes.

    Science.gov (United States)

    Khan, M O Faruk; Keiser, Jennifer; Amoyaw, P N A; Hossain, Mohammad F; Vargas, Mireille; Le, Justin G; Simpson, Natalie C; Roewe, Kimberly D; Freeman, TaRynn N Carder; Hasley, Travis R; Maples, Randall D; Archibald, Stephen J; Hubin, Timothy J

    2016-09-01

    Praziquantel (PZQ) is the only drug available for the treatment of schistosomiasis, and since its large-scale use might be associated with the onset of resistance, new antischistosomal drugs should be developed. A series of 26 synthetic tetraazamacrocyclic derivatives and their metal complexes were synthesized, characterized, and screened for antischistosomal activity by application of a phased screening program. The compounds were first screened against newly transformed schistosomula (NTS) of harvested Schistosoma mansoni cercariae, then against adult worms, and finally, in vivo using the mouse model of S. mansoni infection. At a concentration of 33 μM, incubation with a total of 12 compounds resulted in the mortality of NTS at the 62% to 100% level. Five of these showing 100% inhibition of viability of NTS at 10 μM were selected for further screening for determination of the 50 inhibitory concentrations (IC50s) against both NTS and adult worms. Against NTS, all 5 compounds showed IC50s comparable to the IC50 of the standard drug, PZQ (0.87 to 9.65 μM for the 5 compounds versus 2.20 μM for PZQ). Three of these, which are the bisquinoline derivative of cyclen and its Fe(2+) and Mn(2+) complexes, showed micromolar IC50s (1.62 μM, 1.34 μM, and 4.12 μM, respectively, versus 0.10 μM for PZQ) against adult worms. In vivo, the worm burden reductions were 12.3%, 88.4%, and 74.5%, respectively, at a single oral dose of 400 mg/kg of body weight. The Fe(2+) complex exhibited activity in vivo comparable to that of PZQ, pointing to the discovery of a novel drug lead for schistosomiasis. PMID:27324765

  1. From amplification to gene in thyroid cancer: A high-resolution mapped bacterial-artificial-chromosome resource for cancer chromosome aberrations guides gene discovery after comparative genome hybridization

    Energy Technology Data Exchange (ETDEWEB)

    Chen, X.N.; Gonsky, R.; Korenberg, J.R. [UCLA School of Medicine, Los Angeles, CA (United States). Cedars-Sinai Research Inst.; Knauf, J.A.; Fagin, J.A. [Univ. of Cincinnati, OH (United States). Div. of Endocrinology/Metabolism; Wang, M.; Lai, E.H. [Univ. of North Carolina, Chapel Hill, NC (United States). Dept. of Pharmacology; Chissoe, S. [Washington Univ. School of Medicine, St. Louis, MO (United States). Genome Sequencing

    1998-08-01

    Chromosome rearrangements associated with neoplasms provide a rich resource for definition of the pathways of tumorigenesis. The power of comparative genome hybridization (CGH) to identify novel genes depends on the existence of suitable markers, which are lacking throughout most of the genome. The authors now report a general approach that translates CGH data into higher-resolution genomic-clone data that are then used to define the genes located in aneuploid regions. They used CGH to study 33 thyroid-tumor DNAs and two tumor-cell-line DNAs. The results revealed amplifications of chromosome band 2p21, with less-intense amplification on 2p13, 19q13.1, and 1p36 and with least-intense amplification on 1p34, 1q42, 5q31, 5q33-34, 9q32-34, and 14q32. To define the 2p21 region amplified, a dense array of 373 FISH-mapped chromosome 2 bacterial artificial chromosomes (BACs) was constructed, and 87 of these were hybridized to a tumor-cell line. Four BACs carried genomic DNA that was amplified in these cells. The maximum amplified region was narrowed to 3--6 Mb by multicolor FISH with the flanking BACs, and the minimum amplicon size was defined by a contig of 420 kb. Sequence analysis of the amplified BAC 1D9 revealed a fragment of the gene, encoding protein kinase C epsilon (PKC{epsilon}), that was then shown to be amplified and rearranged in tumor cells. In summary, CGH combined with a dense mapped resource of BACs and large-scale sequencing has led directly to the definition of PKC{epsilon} as a previously unmapped candidate gene involved in thyroid tumorigenesis.

  2. Discovery of miRNAs and Their Corresponding miRNA Genes in Atlantic Cod (Gadus morhua): Use of Stable miRNAs as Reference Genes Reveals Subgroups of miRNAs That Are Highly Expressed in Particular Organs

    Science.gov (United States)

    Andreassen, Rune; Rangnes, Fredrik; Sivertsen, Maria; Chiang, Michelle; Tran, Michelle; Worren, Merete Molton

    2016-01-01

    Background Atlantic cod (Gadus morhua) is among the economically most important species in the northern Atlantic Ocean and a model species for studying development of the immune system in vertebrates. MicroRNAs (miRNAs) are an abundant class of small RNA molecules that regulate fundamental biological processes at the post-transcriptional level. Detailed knowledge about a species miRNA repertoire is necessary to study how the miRNA transcriptome modulate gene expression. We have therefore discovered and characterized mature miRNAs and their corresponding miRNA genes in Atlantic cod. We have also performed a validation study to identify suitable reference genes for RT-qPCR analysis of miRNA expression in Atlantic cod. Finally, we utilized the newly characterized miRNA repertoire and the dedicated RT-qPCR method to reveal miRNAs that are highly expressed in certain organs. Results The discovery analysis revealed 490 mature miRNAs (401 unique sequences) along with precursor sequences and genomic location of the miRNA genes. Twenty six of these were novel miRNA genes. Validation studies ranked gmo-miR-17-1—5p or the two-gene combination gmo-miR25-3p and gmo-miR210-5p as most suitable qPCR reference genes. Analysis by RT-qPCR revealed 45 miRNAs with significantly higher expression in tissues from one or a few organs. Comparisons to other vertebrates indicate that some of these miRNAs may regulate processes like growth, lipid metabolism, immune response to microbial infections and scar damage repair. Three teleost-specific and three novel Atlantic cod miRNAs were among the differentially expressed miRNAs. Conclusions The number of known mature miRNAs was considerably increased by our identification of miRNAs and miRNA genes in Atlantic cod. This will benefit further functional studies of miRNA expression using deep sequencing methods. The validation study showed that stable miRNAs are suitable reference genes for RT-qPCR analysis of miRNA expression. Applying RT-qPCR we

  3. Discovery of miRNAs and Their Corresponding miRNA Genes in Atlantic Cod (Gadus morhua: Use of Stable miRNAs as Reference Genes Reveals Subgroups of miRNAs That Are Highly Expressed in Particular Organs.

    Directory of Open Access Journals (Sweden)

    Rune Andreassen

    Full Text Available Atlantic cod (Gadus morhua is among the economically most important species in the northern Atlantic Ocean and a model species for studying development of the immune system in vertebrates. MicroRNAs (miRNAs are an abundant class of small RNA molecules that regulate fundamental biological processes at the post-transcriptional level. Detailed knowledge about a species miRNA repertoire is necessary to study how the miRNA transcriptome modulate gene expression. We have therefore discovered and characterized mature miRNAs and their corresponding miRNA genes in Atlantic cod. We have also performed a validation study to identify suitable reference genes for RT-qPCR analysis of miRNA expression in Atlantic cod. Finally, we utilized the newly characterized miRNA repertoire and the dedicated RT-qPCR method to reveal miRNAs that are highly expressed in certain organs.The discovery analysis revealed 490 mature miRNAs (401 unique sequences along with precursor sequences and genomic location of the miRNA genes. Twenty six of these were novel miRNA genes. Validation studies ranked gmo-miR-17-1-5p or the two-gene combination gmo-miR25-3p and gmo-miR210-5p as most suitable qPCR reference genes. Analysis by RT-qPCR revealed 45 miRNAs with significantly higher expression in tissues from one or a few organs. Comparisons to other vertebrates indicate that some of these miRNAs may regulate processes like growth, lipid metabolism, immune response to microbial infections and scar damage repair. Three teleost-specific and three novel Atlantic cod miRNAs were among the differentially expressed miRNAs.The number of known mature miRNAs was considerably increased by our identification of miRNAs and miRNA genes in Atlantic cod. This will benefit further functional studies of miRNA expression using deep sequencing methods. The validation study showed that stable miRNAs are suitable reference genes for RT-qPCR analysis of miRNA expression. Applying RT-qPCR we have identified

  4. Beyond Discovery

    DEFF Research Database (Denmark)

    Korsgaard, Steffen; Sassmannshausen, Sean Patrick

    2015-01-01

    as their central concepts and conceptualization of the entrepreneurial function. On this basis we discuss three central themes that cut across the four alternatives: process, uncertainty, and agency. These themes provide new foci for entrepreneurship research and can help to generate new research questions......In this chapter we explore four alternatives to the dominant discovery view of entrepreneurship; the development view, the construction view, the evolutionary view, and the Neo-Austrian view. We outline the main critique points of the discovery presented in these four alternatives, as well...

  5. Discovery of TNF inhibitors from a DNA-encoded chemical library based on diels-alder cycloaddition.

    Science.gov (United States)

    Buller, Fabian; Zhang, Yixin; Scheuermann, Jörg; Schäfer, Juliane; Bühlmann, Peter; Neri, Dario

    2009-10-30

    DNA-encoded chemical libraries are promising tools for the discovery of ligands toward protein targets of pharmaceutical relevance. DNA-encoded small molecules can be enriched in affinity-based selections and their unique DNA "barcode" allows the amplification and identification by high-throughput sequencing. We describe selection experiments using a DNA-encoded 4000-compound library generated by Diels-Alder cycloadditions. High-throughput sequencing enabled the identification and relative quantification of library members before and after selection. Sequence enrichment profiles corresponding to the "bar-coded" library members were validated by affinity measurements of single compounds. We were able to affinity mature trypsin inhibitors and identify a series of albumin binders for the conjugation of pharmaceuticals. Furthermore, we discovered a ligand for the antiapoptotic Bcl-xL protein and a class of tumor necrosis factor (TNF) binders that completely inhibited TNF-mediated killing of L-M fibroblasts in vitro.

  6. SNP-based high density genetic map and mapping of btwd1 dwarfing gene in barley.

    Science.gov (United States)

    Ren, Xifeng; Wang, Jibin; Liu, Lipan; Sun, Genlou; Li, Chengdao; Luo, Hong; Sun, Dongfa

    2016-01-01

    A high-density linkage map is a valuable tool for functional genomics and breeding. A newly developed sequence-based marker technology, restriction site associated DNA (RAD) sequencing, has been proven to be powerful for the rapid discovery and genotyping of genome-wide single nucleotide polymorphism (SNP) markers and for the high-density genetic map construction. The objective of this research was to construct a high-density genetic map of barley using RAD sequencing. 1894 high-quality SNP markers were developed and mapped onto all seven chromosomes together with 68 SSR markers. These 1962 markers constituted a total genetic length of 1375.8 cM and an average of 0.7 cM between adjacent loci. The number of markers within each linkage group ranged from 209 to 396. The new recessive dwarfing gene btwd1 in Huaai 11 was mapped onto the high density linkage maps. The result showed that the btwd1 is positioned between SNP marks 7HL_6335336 and 7_249275418 with a genetic distance of 0.9 cM and 0.7 cM on chromosome 7H, respectively. The SNP-based high-density genetic map developed and the dwarfing gene btwd1 mapped in this study provide critical information for position cloning of the btwd1 gene and molecular breeding of barley. PMID:27530597

  7. Discovery of biaryls as RORγ inverse agonists by using structure-based design.

    Science.gov (United States)

    Enyedy, Istvan J; Powell, Noel A; Caravella, Justin; van Vloten, Kurt; Chao, Jianhua; Banerjee, Daliya; Marcotte, Douglas; Silvian, Laura; McKenzie, Andres; Hong, Victor Sukbong; Fontenot, Jason D

    2016-05-15

    RORγ plays a critical role in controlling a pro-inflammatory gene expression program in several lymphocyte lineages including T cells, γδ T cells, and innate lymphoid cells. RORγ-mediated inflammation has been linked to susceptibility to Crohn's disease, arthritis, and psoriasis. Thus inverse agonists of RORγ have the potential of modulating inflammation. Our goal was to optimize two RORγ inverse agonists: T0901317 from literature and 1 that we obtained from internal screening. We used information from internal X-ray structures to design two libraries that led to a new biaryl series. PMID:27080181

  8. Transcriptome sequencing and annotation of the microalgae Dunaliella tertiolecta: Pathway description and gene discovery for production of next-generation biofuels

    Directory of Open Access Journals (Sweden)

    Bibby Kyle

    2011-03-01

    Full Text Available Abstract Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology (KO identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.

  9. Toward Omics-Based, Systems Biomedicine, and Path and Drug Discovery Methodologies for Depression-Inflammation Research.

    Science.gov (United States)

    Maes, Michael; Nowak, Gabriel; Caso, Javier R; Leza, Juan Carlos; Song, Cai; Kubera, Marta; Klein, Hans; Galecki, Piotr; Noto, Cristiano; Glaab, Enrico; Balling, Rudi; Berk, Michael

    2016-07-01

    Meta-analyses confirm that depression is accompanied by signs of inflammation including increased levels of acute phase proteins, e.g., C-reactive protein, and pro-inflammatory cytokines, e.g., interleukin-6. Supporting the translational significance of this, a meta-analysis showed that anti-inflammatory drugs may have antidepressant effects. Here, we argue that inflammation and depression research needs to get onto a new track. Firstly, the choice of inflammatory biomarkers in depression research was often too selective and did not consider the broader pathways. Secondly, although mild inflammatory responses are present in depression, other immune-related pathways cannot be disregarded as new drug targets, e.g., activation of cell-mediated immunity, oxidative and nitrosative stress (O&NS) pathways, autoimmune responses, bacterial translocation, and activation of the toll-like receptor and neuroprogressive pathways. Thirdly, anti-inflammatory treatments are sometimes used without full understanding of their effects on the broader pathways underpinning depression. Since many of the activated immune-inflammatory pathways in depression actually confer protection against an overzealous inflammatory response, targeting these pathways may result in unpredictable and unwanted results. Furthermore, this paper discusses the required improvements in research strategy, i.e., path and drug discovery processes, omics-based techniques, and systems biomedicine methodologies. Firstly, novel methods should be employed to examine the intracellular networks that control and modulate the immune, O&NS and neuroprogressive pathways using omics-based assays, including genomics, transcriptomics, proteomics, metabolomics, epigenomics, immunoproteomics and metagenomics. Secondly, systems biomedicine analyses are essential to unravel the complex interactions between these cellular networks, pathways, and the multifactorial trigger factors and to delineate new drug targets in the cellular

  10. A contig-based strategy for the genome-wide discovery of microRNAs without complete genome resources.

    Directory of Open Access Journals (Sweden)

    Jun-Zhi Wen

    Full Text Available MicroRNAs (miRNAs are important regulators of many cellular processes and exist in a wide range of eukaryotes. High-throughput sequencing is a mainstream method of miRNA identification through which it is possible to obtain the complete small RNA profile of an organism. Currently, most approaches to miRNA identification rely on a reference genome for the prediction of hairpin structures. However, many species of economic and phylogenetic importance are non-model organisms without complete genome sequences, and this limits miRNA discovery. Here, to overcome this limitation, we have developed a contig-based miRNA identification strategy. We applied this method to a triploid species of edible banana (GCTCV-119, Musa spp. AAA group and identified 180 pre-miRNAs and 314 mature miRNAs, which is three times more than those were predicted by the available dataset-based methods (represented by EST+GSS. Based on the recently published miRNA data set of Musa acuminate, the recall rate and precision of our strategy are estimated to be 70.6% and 92.2%, respectively, significantly better than those of EST+GSS-based strategy (10.2% and 50.0%, respectively. Our novel, efficient and cost-effective strategy facilitates the study of the functional and evolutionary role of miRNAs, as well as miRNA-based molecular breeding, in non-model species of economic or evolutionary interest.

  11. Integrative Genomics-Based Discovery of Novel Regulators of the Innate Antiviral Response.

    Directory of Open Access Journals (Sweden)

    Robin van der Lee

    2015-10-01

    Full Text Available The RIG-I-like receptor (RLR pathway is essential for detecting cytosolic viral RNA to trigger the production of type I interferons (IFNα/β that initiate an innate antiviral response. Through systematic assessment of a wide variety of genomics data, we discovered 10 molecular signatures of known RLR pathway components that collectively predict novel members. We demonstrate that RLR pathway genes, among others, tend to evolve rapidly, interact with viral proteins, contain a limited set of protein domains, are regulated by specific transcription factors, and form a tightly connected interaction network. Using a Bayesian approach to integrate these signatures, we propose likely novel RLR regulators. RNAi knockdown experiments revealed a high prediction accuracy, identifying 94 genes among 187 candidates tested (~50% that affected viral RNA-induced production of IFNβ. The discovered antiviral regulators may participate in a wide range of processes that highlight the complexity of antiviral defense (e.g. MAP3K11, CDK11B, PSMA3, TRIM14, HSPA9B, CDC37, NUP98, G3BP1, and include uncharacterized factors (DDX17, C6orf58, C16orf57, PKN2, SNW1. Our validated RLR pathway list (http://rlr.cmbi.umcn.nl/, obtained using a combination of integrative genomics and experiments, is a new resource for innate antiviral immunity research.

  12. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    Directory of Open Access Journals (Sweden)

    Alamar Santiago

    2009-09-01

    Full Text Available Abstract Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new

  13. De Novo Deep Transcriptome Analysis of Medicinal Plants for Gene Discovery in Biosynthesis of Plant Natural Products.

    Science.gov (United States)

    Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K

    2016-01-01

    Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. PMID:27480681

  14. Discovery of a strongly-interrelated gene network in corals under constant darkness by correlation analysis after wavelet transform on complex network model.

    Directory of Open Access Journals (Sweden)

    Longlong Liu

    Full Text Available Coral reefs occupy a relatively small portion of sea area, yet serve as a crucial source of biodiversity by establishing harmonious ecosystems with marine plants and animals. Previous researches mainly focused on screening several key genes induced by stress. Here we proposed a novel method--correlation analysis after wavelet transform of complex network model, to explore the effect of light on gene expression in the coral Acropora millepora based on microarray data. In this method, wavelet transform and the conception of complex network were adopted, and 50 key genes with large differences were finally captured, including both annotated genes and novel genes without accurate annotation. These results shed light on our understanding of coral's response toward light changes and the genome-wide interaction among genes under the control of biorhythm, and hence help us to better protect the coral reef ecosystems. Further studies are needed to explore how functional connections are related to structural connections, and how connectivity arises from the interactions within and between different systems. The method introduced in this study for analyzing microarray data will allow researchers to explore genome-wide interaction network with their own dataset and understand the relevant biological processes.

  15. Discovery of potential new gene variants and inflammatory cytokine associations with fibromyalgia syndrome by whole exome sequencing.

    Directory of Open Access Journals (Sweden)

    Jinong Feng

    Full Text Available Fibromyalgia syndrome (FMS is a chronic musculoskeletal pain disorder affecting 2% to 5% of the general population. Both genetic and environmental factors may be involved. To ascertain in an unbiased manner which genes play a role in the disorder, we performed complete exome sequencing on a subset of FMS patients. Out of 150 nuclear families (trios DNA from 19 probands was subjected to complete exome sequencing. Since >80,000 SNPs were found per proband, the data were further filtered, including analysis of those with stop codons, a rare frequency (<2.5% in the 1000 Genomes database, and presence in at least 2/19 probands sequenced. Two nonsense mutations, W32X in C11orf40 and Q100X in ZNF77 among 150 FMS trios had a significantly elevated frequency of transmission to affected probands (p = 0.026 and p = 0.032, respectively and were present in a subset of 13% and 11% of FMS patients, respectively. Among 9 patients bearing more than one of the variants we have described, 4 had onset of symptoms between the ages of 10 and 18. The subset with the C11orf40 mutation had elevated plasma levels of the inflammatory cytokines, MCP-1 and IP-10, compared with unaffected controls or FMS patients with the wild-type allele. Similarly, patients with the ZNF77 mutation have elevated levels of the inflammatory cytokine, IL-12, compared with controls or patients with the wild type allele. Our results strongly implicate an inflammatory basis for FMS, as well as specific cytokine dysregulation, in at least 35% of our FMS cohort.

  16. Prediction on the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase based on gene expression programming.

    Science.gov (United States)

    Li, Yuqin; You, Guirong; Jia, Baoxiu; Si, Hongzong; Yao, Xiaojun

    2014-01-01

    Quantitative structure-activity relationships (QSAR) were developed to predict the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase via heuristic method (HM) and gene expression programming (GEP). The descriptors of 33 pyrrolidine derivatives were calculated by the software CODESSA, which can calculate quantum chemical, topological, geometrical, constitutional, and electrostatic descriptors. HM was also used for the preselection of 5 appropriate molecular descriptors. Linear and nonlinear QSAR models were developed based on the HM and GEP separately and two prediction models lead to a good correlation coefficient (R (2)) of 0.93 and 0.94. The two QSAR models are useful in predicting the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase during the discovery of new anticancer drugs and providing theory information for studying the new drugs. PMID:24971318

  17. Prediction on the Inhibition Ratio of Pyrrolidine Derivatives on Matrix Metalloproteinase Based on Gene Expression Programming

    Directory of Open Access Journals (Sweden)

    Yuqin Li

    2014-01-01

    Full Text Available Quantitative structure-activity relationships (QSAR were developed to predict the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase via heuristic method (HM and gene expression programming (GEP. The descriptors of 33 pyrrolidine derivatives were calculated by the software CODESSA, which can calculate quantum chemical, topological, geometrical, constitutional, and electrostatic descriptors. HM was also used for the preselection of 5 appropriate molecular descriptors. Linear and nonlinear QSAR models were developed based on the HM and GEP separately and two prediction models lead to a good correlation coefficient (R2 of 0.93 and 0.94. The two QSAR models are useful in predicting the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase during the discovery of new anticancer drugs and providing theory information for studying the new drugs.

  18. In Vitro Assessment of the Inflammatory Breast Cancer Cell Line SUM 149: Discovery of 2 Single Nucleotide Polymorphisms in the RNase L Gene

    Directory of Open Access Journals (Sweden)

    Brandon T. Nokes, Heather E. Cunliffe, Bonnie LaFleur, David W. Mount, Robert B. Livingston, Bernard W. Futscher, Julie E. Lang

    2013-01-01

    Full Text Available Background: Inflammatory breast cancer (IBC is a rare, highly aggressive form of breast cancer. The mechanism of IBC carcinogenesis remains unknown. We sought to evaluate potential genetic risk factors for IBC and whether or not the IBC cell lines SUM149 and SUM190 demonstrated evidence of viral infection.Methods: We performed single nucleotide polymorphism (SNP genotyping for 2 variants of the ribonuclease (RNase L gene that have been correlated with the risk of prostate cancer due to a possible viral etiology. We evaluated dose-response to treatment with interferon-alpha (IFN-α; and assayed for evidence of the putative human mammary tumor virus (HMTV, which has been implicated in IBC in SUM149 cells. A bioinformatic analysis was performed to evaluate expression of RNase L in IBC and non-IBC.Results: 2 of 2 IBC cell lines were homozygous for RNase L common missense variants 462 and 541; whereas 2 of 10 non-IBC cell lines were homozygous positive for the 462 variant (p= 0.09 and 0 of 10 non-IBC cell lines were homozygous positive for the 541 variant (p = 0.015. Our real-time polymerase chain reaction (RT-PCR and Southern blot analysis for sequences of HMTV revealed no evidence of the putative viral genome.Conclusion: We discovered 2 SNPs in the RNase L gene that were homozygously present in IBC cell lines. The 462 variant was absent in non-IBC lines. Our discovery of these SNPs present in IBC cell lines suggests a possible biomarker for risk of IBC. We found no evidence of HMTV in SUM149 cells. A query of a panel of human IBC and non-IBC samples showed no difference in RNase L expression. Further studies of the RNase L 462 and 541 variants in IBC tissues are warranted to validate our in vitro findings.

  19. Discovery of precursor and mature microRNAs and their putative gene targets using high-throughput sequencing in pineapple (Ananas comosus var. comosus).

    Science.gov (United States)

    Yusuf, Noor Hydayaty Md; Ong, Wen Dee; Redwan, Raimi Mohamed; Latip, Mariam Abd; Kumar, S Vijay

    2015-10-15

    MicroRNAs (miRNAs) are a class of small, endogenous non-coding RNAs that negatively regulate gene expression, resulting in the silencing of target mRNA transcripts through mRNA cleavage or translational inhibition. MiRNAs play significant roles in various biological and physiological processes in plants. However, the miRNA-mediated gene regulatory network in pineapple, the model tropical non-climacteric fruit, remains largely unexplored. Here, we report a complete list of pineapple mature miRNAs obtained from high-throughput small RNA sequencing and precursor miRNAs (pre-miRNAs) obtained from ESTs. Two small RNA libraries were constructed from pineapple fruits and leaves, respectively, using Illumina's Solexa technology. Sequence similarity analysis using miRBase revealed 579,179 reads homologous to 153 miRNAs from 41 miRNA families. In addition, a pineapple fruit transcriptome library consisting of approximately 30,000 EST contigs constructed using Solexa sequencing was used for the discovery of pre-miRNAs. In all, four pre-miRNAs were identified (MIR156, MIR399, MIR444 and MIR2673). Furthermore, the same pineapple transcriptome was used to dissect the function of the miRNAs in pineapple by predicting their putative targets in conjunction with their regulatory networks. In total, 23 metabolic pathways were found to be regulated by miRNAs in pineapple. The use of high-throughput sequencing in pineapples to unveil the presence of miRNAs and their regulatory pathways provides insight into the repertoire of miRNA regulation used exclusively in this non-climacteric model plant. PMID:26115767

  20. Discovery Learning, Representation, and Explanation within a Computer-Based Simulation: Finding the Right Mix

    Science.gov (United States)

    Rieber, Lloyd P.; Tzeng, Shyh-Chii; Tribble, Kelly

    2004-01-01

    The purpose of this research was to explore how adult users interact and learn during an interactive computer-based simulation supplemented with brief multimedia explanations of the content. A total of 52 college students interacted with a computer-based simulation of Newton's laws of motion in which they had control over the motion of a simple…

  1. Geo-Caching: Place-Based Discovery of Virginia State Parks and Museums

    Science.gov (United States)

    Gray, Howard Richard

    2007-01-01

    The use of Global Positioning Systems (GPS) units has exploded in recent years along with the computer technology to access this data-based information. Geo-caching is an exciting game using GPS that provides place-based information regarding the public lands, facilities and cultural heritage programs within the Virginia Parks and Museum system.…

  2. Characterization of Genes for Beef Marbling Based on Applying Gene Coexpression Network

    Directory of Open Access Journals (Sweden)

    Dajeong Lim

    2014-01-01

    Full Text Available Marbling is an important trait in characterization beef quality and a major factor for determining the price of beef in the Korean beef market. In particular, marbling is a complex trait and needs a system-level approach for identifying candidate genes related to the trait. To find the candidate gene associated with marbling, we used a weighted gene coexpression network analysis from the expression value of bovine genes. Hub genes were identified; they were topologically centered with large degree and BC values in the global network. We performed gene expression analysis to detect candidate genes in M. longissimus with divergent marbling phenotype (marbling scores 2 to 7 using qRT-PCR. The results demonstrate that transmembrane protein 60 (TMEM60 and dihydropyrimidine dehydrogenase (DPYD are associated with increasing marbling fat. We suggest that the network-based approach in livestock may be an important method for analyzing the complex effects of candidate genes associated with complex traits like marbling or tenderness.

  3. Modeling Gene Networks in Saccharomyces cerevisiae Based on Gene Expression Profiles

    Directory of Open Access Journals (Sweden)

    Yulin Zhang

    2015-01-01

    Full Text Available Detailed and innovative analysis of gene regulatory network structures may reveal novel insights to biological mechanisms. Here we study how gene regulatory network in Saccharomyces cerevisiae can differ under aerobic and anaerobic conditions. To achieve this, we discretized the gene expression profiles and calculated the self-entropy of down- and upregulation of gene expression as well as joint entropy. Based on these quantities the uncertainty coefficient was calculated for each gene triplet, following which, separate gene logic networks were constructed for the aerobic and anaerobic conditions. Four structural parameters such as average degree, average clustering coefficient, average shortest path, and average betweenness were used to compare the structure of the corresponding aerobic and anaerobic logic networks. Five genes were identified to be putative key components of the two energy metabolisms. Furthermore, community analysis using the Newman fast algorithm revealed two significant communities for the aerobic but only one for the anaerobic network. David Gene Functional Classification suggests that, under aerobic conditions, one such community reflects the cell cycle and cell replication, while the other one is linked to the mitochondrial respiratory chain function.

  4. Discovery of binding proteins for a protein target using protein-protein docking-based virtual screening.

    Science.gov (United States)

    Zhang, Changsheng; Tang, Bo; Wang, Qian; Lai, Luhua

    2014-10-01

    Target structure-based virtual screening, which employs protein-small molecule docking to identify potential ligands, has been widely used in small-molecule drug discovery. In the present study, we used a protein-protein docking program to identify proteins that bind to a specific target protein. In the testing phase, an all-to-all protein-protein docking run on a large dataset was performed. The three-dimensional rigid docking program SDOCK was used to examine protein-protein docking on all protein pairs in the dataset. Both the binding affinity and features of the binding energy landscape were considered in the scoring function in order to distinguish positive binding pairs from negative binding pairs. Thus, the lowest docking score, the average Z-score, and convergency of the low-score solutions were incorporated in the analysis. The hybrid scoring function was optimized in the all-to-all docking test. The docking method and the hybrid scoring function were then used to screen for proteins that bind to tumor necrosis factor-α (TNFα), which is a well-known therapeutic target for rheumatoid arthritis and other autoimmune diseases. A protein library containing 677 proteins was used for the screen. Proteins with scores among the top 20% were further examined. Sixteen proteins from the top-ranking 67 proteins were selected for experimental study. Two of these proteins showed significant binding to TNFα in an in vitro binding study. The results of the present study demonstrate the power and potential application of protein-protein docking for the discovery of novel binding proteins for specific protein targets.

  5. The gene expression data of Mycobacterium tuberculosis based on Affymetrix gene chips provide insight into regulatory and hypothetical genes

    Directory of Open Access Journals (Sweden)

    Fu-Liu Casey S

    2007-05-01

    Full Text Available Abstract Background Tuberculosis remains a leading infectious disease with global public health threat. Its control and management have been complicated by multi-drug resistance and latent infection, which prompts scientists to find new and more effective drugs. With the completion of the genome sequence of the etiologic bacterium, Mycobacterium tuberculosis, it is now feasible to search for new drug targets by sieving through a large number of gene products and conduct genome-scale experiments based on microarray technology. However, the full potential of genome-wide microarray analysis in configuring interrelationships among all genes in M. tuberculosis has yet to be realized. To date, it is only possible to assign a function to 52% of proteins predicted in the genome. Results We conducted a functional-genomics study using the high-resolution Affymetrix oligonucleotide GeneChip. Approximately one-half of the genes were found to be always expressed, including more than 100 predicted conserved hypotheticals, in the genome of M. tuberculosis during the log phase of in vitro growth. The gene expression profiles were analyzed and visualized through cluster analysis to epitomize the full details of genomic behavior. Broad patterns derived from genome-wide expression experiments in this study have provided insight into the interrelationships among genes in the basic cellular processes of M. tuberculosis. Conclusion Our results have confirmed several known gene clusters in energy production, information pathways, and lipid metabolism, and also hinted at potential roles of hypothetical and regulatory proteins.

  6. Patients, evidence and genes: an exploration of GPs' perspectives on gene-based personalized nutrition advice

    NARCIS (Netherlands)

    Bouwman, L.I.; Molder, te H.F.M.; Hiddink, G.J.

    2008-01-01

    Background. Nutrigenomics science examines the response of individuals to food compounds using post-genomics technology. It is expected that in the future, personalized nutrition advice can be provided based on information about genetic make-up. Objectives. Gene-based personalized nutrition advice e

  7. Knowledge acquisition and discovery for the textual case-based cooking system WIKITAAABLE

    OpenAIRE

    Badra, Fadi; Cojan, Julien; Cordier, Amélie; Lieber, Jean; Meilender, Thomas; Mille, Alain; Molli, Pascal; Nauer, Emmanuel; Napoli, Amedeo; Skaf-Molli, Hala; Toussaint, Yannick

    2009-01-01

    International audience The textual case-based cooking systemWIKITAAABLE participates to the second Computer cooking contest (CCC). It is an extension of the TAAABLE system that has participated to the first CCC. WIKITAAABLE's architecture is composed of a semantic wiki used for the collaborative acquisition of knowledge (recipe, ontology, adaptation knowledge) and of a case-based inference engine using this knowledge for retrieving and adapting recipes. This architecture allows various mod...

  8. Opposition-Based Discrete PSO Using Natural Encoding for Classification Rule Discovery

    OpenAIRE

    Naveed Kazim Khan; Abdul Rauf Baig; Muhammad Amjad Iqbal

    2012-01-01

    In this paper we present a new Discrete Particle Swarm Optimization approach to induce rules from discrete data. The proposed algorithm, called Opposition‐ based Natural Discrete PSO (ONDPSO), initializes its population by taking into account the discrete nature of the data. Particles are encoded using a Natural Encoding scheme. Each member of the population updates its position iteratively on the basis of a newly designed position update rule. Opposition‐based learning is implemented in the ...

  9. GOBO: gene expression-based outcome for breast cancer online.

    Directory of Open Access Journals (Sweden)

    Markus Ringnér

    Full Text Available Microarray-based gene expression analysis holds promise of improving prognostication and treatment decisions for breast cancer patients. However, the heterogeneity of breast cancer emphasizes the need for validation of prognostic gene signatures in larger sample sets stratified into relevant subgroups. Here, we describe a multifunctional user-friendly online tool, GOBO (http://co.bmc.lu.se/gobo, allowing a range of different analyses to be performed in an 1881-sample breast tumor data set, and a 51-sample breast cancer cell line set, both generated on Affymetrix U133A microarrays. GOBO supports a wide range of applications including: 1 rapid assessment of gene expression levels in subgroups of breast tumors and cell lines, 2 identification of co-expressed genes for creation of potential metagenes, 3 association with outcome for gene expression levels of single genes, sets of genes, or gene signatures in multiple subgroups of the 1881-sample breast cancer data set. The design and implementation of GOBO facilitate easy incorporation of additional query functions and applications, as well as additional data sets irrespective of tumor type and array platform.

  10. Cancer classification based on gene expression using neural networks.

    Science.gov (United States)

    Hu, H P; Niu, Z J; Bai, Y P; Tan, X H

    2015-12-21

    Based on gene expression, we have classified 53 colon cancer patients with UICC II into two groups: relapse and no relapse. Samples were taken from each patient, and gene information was extracted. Of the 53 samples examined, 500 genes were considered proper through analyses by S-Kohonen, BP, and SVM neural networks. Classification accuracy obtained by S-Kohonen neural network reaches 91%, which was more accurate than classification by BP and SVM neural networks. The results show that S-Kohonen neural network is more plausible for classification and has a certain feasibility and validity as compared with BP and SVM neural networks.

  11. Representation Discovery using Harmonic Analysis

    CERN Document Server

    Mahadevan, Sridhar

    2008-01-01

    Representations are at the heart of artificial intelligence (AI). This book is devoted to the problem of representation discovery: how can an intelligent system construct representations from its experience? Representation discovery re-parameterizes the state space - prior to the application of information retrieval, machine learning, or optimization techniques - facilitating later inference processes by constructing new task-specific bases adapted to the state space geometry. This book presents a general approach to representation discovery using the framework of harmonic analysis, in particu

  12. Improving low-level plasma protein mass spectrometry-based detection for candidate biomarker discovery and validation

    Energy Technology Data Exchange (ETDEWEB)

    Page, Jason S.; Kelly, Ryan T.; Camp, David G.; Smith, Richard D.

    2008-09-01

    Methods. To improve the detection of low abundance protein candidate biomarker discovery and validation, particularly in complex biological fluids such as blood plasma, increased sensitivity is desired using mass spectrometry (MS)-based instrumentation. A key current limitation on the sensitivity of electrospray ionization (ESI) MS is due to the fact that many sample molecules in solution are never ionized, and the vast majority of the ions that are created are lost during transmission from atmospheric pressure to the low pressure region of the mass analyzer. Two key technologies, multi-nanoelectrospray emitters and the electrodynamic ion funnel have recently been developed and refined at Pacific Northwest National Laboratory (PNNL) to greatly improve the ionization and transmission efficiency of ESI MS based analyses. Multi-emitter based ESI enables the flow from a single source (typically a liquid chromatography [LC] column) to be divided among an array of emitters (Figure 1). The flow rate delivered to each emitter is thus reduced, allowing the well-documented benefits of nanoelectrospray 1 for both sensitivity and quantitation to be realized for higher flow rate separations. To complement the increased ionization efficiency afforded by multi-ESI, tandem electrodynamic ion funnels have also been developed at PNNL, and shown to greatly improve ion transmission efficiency in the ion source interface.2, 3 These technologies have been integrated into a triple quadrupole mass spectrometer for multiple reaction monitoring (MRM) of probable biomarker candidates in blood plasma and show promise for the identification of new species even at low level concentrations.

  13. Analysis of expressed sequence tags from Actinidia: applications of a cross species EST database for gene discovery in the areas of flavor, health, color and ripening

    Directory of Open Access Journals (Sweden)

    Richardson Annette C

    2008-07-01

    Full Text Available Abstract Background Kiwifruit (Actinidia spp. are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs. Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons. Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases and pathways (terpenoid biosynthesis is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia.

  14. An Efficient Grid Service Discovery Mechanism Based on the Locality Principle

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    With the explosion of services in grid environment, it's necessary to develop a mechanism which has the ability of discovering suitable grid services efficiently. This paper attempts to establish a layered resource management model based on the locality principle which classifies services into different domains and virtual organizations (VOs) according to their shared purposes. We propose an ontology-based search method applying the ontology theory for characterizing semantic information. In addition, we extend the UDDI in querying, storing, and so on. Simulation experiments have shown that our mechanism achieves higher performance in precision, recall and query response time.

  15. Structure-based drug design to the discovery of new 2-aminothiazole CDK2 inhibitors.

    Science.gov (United States)

    Vulpetti, Anna; Casale, Elena; Roletto, Fulvia; Amici, Raffaella; Villa, Manuela; Pevarello, Paolo

    2006-03-01

    N-(5-Bromo-1,3-thiazol-2-yl)butanamide (compound 1) was found active (IC50=808 nM) in a high throughput screening (HTS) for CDK2 inhibitors. By exploiting crystal structures of several complexes between CDK2 and inhibitors and applying structure-based drug design (SBDD), we rapidly discovered a very potent and selective CDK2 inhibitor 4-[(5-isopropyl-1,3-thiazol-2-yl)amino] benzenesulfonamide (compound 4, IC50=20 nM). The syntheses, structure-based analog design, kinases inhibition data and X-ray crystallographic structures of CDK2/inhibitor complexes are reported.

  16. Tree-Based Methods for Discovery of Association between Flow Cytometry Data and Clinical Endpoints.

    Science.gov (United States)

    Eliot, M; Azzoni, L; Firnhaber, C; Stevens, W; Glencross, D K; Sanne, I; Montaner, L J; Foulkes, A S

    2009-01-01

    We demonstrate the application and comparative interpretations of three tree-based algorithms for the analysis of data arising from flow cytometry: classification and regression trees (CARTs), random forests (RFs), and logic regression (LR). Specifically, we consider the question of what best predicts CD4 T-cell recovery in HIV-1 infected persons starting antiretroviral therapy with CD4 count between 200 and 350 cell/muL. A comparison to a more standard contingency table analysis is provided. While contingency table analysis and RFs provide information on the importance of each potential predictor variable, CART and LR offer additional insight into the combinations of variables that together are predictive of the outcome. In all cases considered, baseline CD3-DR-CD56+CD16+ emerges as an important predictor variable, while the tree-based approaches identify additional variables as potentially informative. Application of tree-based methods to our data suggests that a combination of baseline immune activation states, with emphasis on CD8 T-cell activation, may be a better predictor than any single T-cell/innate cell subset analyzed. Taken together, we show that tree-based methods can be successfully applied to flow cytometry data to better inform and discover associations that may not emerge in the context of a univariate analysis.

  17. Tree-Based Methods for Discovery of Association between Flow Cytometry Data and Clinical Endpoints

    Directory of Open Access Journals (Sweden)

    M. Eliot

    2009-01-01

    Full Text Available We demonstrate the application and comparative interpretations of three tree-based algorithms for the analysis of data arising from flow cytometry: classification and regression trees (CARTs, random forests (RFs, and logic regression (LR. Specifically, we consider the question of what best predicts CD4 T-cell recovery in HIV-1 infected persons starting antiretroviral therapy with CD4 count between 200 and 350 cell/μL. A comparison to a more standard contingency table analysis is provided. While contingency table analysis and RFs provide information on the importance of each potential predictor variable, CART and LR offer additional insight into the combinations of variables that together are predictive of the outcome. In all cases considered, baseline CD3-DR-CD56+CD16+ emerges as an important predictor variable, while the tree-based approaches identify additional variables as potentially informative. Application of tree-based methods to our data suggests that a combination of baseline immune activation states, with emphasis on CD8 T-cell activation, may be a better predictor than any single T-cell/innate cell subset analyzed. Taken together, we show that tree-based methods can be successfully applied to flow cytometry data to better inform and discover associations that may not emerge in the context of a univariate analysis.

  18. Sucrose ester based cationic liposomes as effective non-viral gene vectors for gene delivery.

    Science.gov (United States)

    Zhao, Yinan; Zhu, Jie; Zhou, Hengjun; Guo, Xin; Tian, Tian; Cui, Shaohui; Zhen, Yuhong; Zhang, Shubiao; Xu, Yuhong

    2016-09-01

    As sucrose esters (SEs) are natural and biodegradable excipients with excellent drug dissolution and drug absorption/permeation in controlled release systems, we firstly incorporated SE into liposomes for gene delivery in this article. A peptide-based lipid (CDO14), Gemini-based quaternary ammonium-based lipid (CTA14), and mono-head quaternary ammonium lipid (CPA14), and SE as helper lipid, were prepared into liposomes which could enhance the interactions between liposomes and pDNA. Most importantly, the liposomes with helper lipid SE showed higher transfection and lower cytotoxicity than those without SE in Hela and A549 cells. It was also found that the transfection efficiency increased with the increase of SE content. The selected liposome, CDO14/SE, was able to deliver siRNA against luciferase for silencing gene in lung tumors of mice, with little in vivo toxicity. The results convincingly demonstrated SEs could be highly desirable candidates for gene delivery systems. PMID:27232309

  19. PCR-based detection of gene transfer vectors: application to gene doping surveillance.

    Science.gov (United States)

    Perez, Irene C; Le Guiner, Caroline; Ni, Weiyi; Lyles, Jennifer; Moullier, Philippe; Snyder, Richard O

    2013-12-01

    Athletes who illicitly use drugs to enhance their athletic performance are at risk of being banned from sports competitions. Consequently, some athletes may seek new doping methods that they expect to be capable of circumventing detection. With advances in gene transfer vector design and therapeutic gene transfer, and demonstrations of safety and therapeutic benefit in humans, there is an increased probability of the pursuit of gene doping by athletes. In anticipation of the potential for gene doping, assays have been established to directly detect complementary DNA of genes that are top candidates for use in doping, as well as vector control elements. The development of molecular assays that are capable of exposing gene doping in sports can serve as a deterrent and may also identify athletes who have illicitly used gene transfer for performance enhancement. PCR-based methods to detect foreign DNA with high reliability, sensitivity, and specificity include TaqMan real-time PCR, nested PCR, and internal threshold control PCR. PMID:23912835

  20. The Development of Learning Devices Based Guided Discovery Model to Improve Understanding Concept and Critical Thinking Mathematically Ability of Students at Islamic Junior High School of Medan

    Science.gov (United States)

    Yuliani, Kiki; Saragih, Sahat

    2015-01-01

    The purpose of this research was to: 1) development of learning devices based guided discovery model in improving of understanding concept and critical thinking mathematically ability of students at Islamic Junior High School; 2) describe improvement understanding concept and critical thinking mathematically ability of students at MTs by using…

  1. A Discovery-Based Hydrochlorination of Carvone Utilizing a Guided-Inquiry Approach to Determine the Product Structure from [superscript 13]C NMR Spectra

    Science.gov (United States)

    Pelter, Michael W.; Walker, Natalie M.

    2012-01-01

    This experiment describes a discovery-based method for the regio- and stereoselective hydrochlorination of carvone, appropriate for a 3-h second-semester organic chemistry laboratory. The product is identified through interpretation of the [superscript 13]C NMR and DEPT spectra are obtained on an Anasazi EFT-60 at 15 MHz as neat samples. A…

  2. Opposition-Based Discrete PSO Using Natural Encoding for Classification Rule Discovery

    Directory of Open Access Journals (Sweden)

    Naveed Kazim Khan

    2012-11-01

    Full Text Available In this paper we present a new Discrete Particle Swarm Optimization approach to induce rules from discrete data. The proposed algorithm, called Opposition‐ based Natural Discrete PSO (ONDPSO, initializes its population by taking into account the discrete nature of the data. Particles are encoded using a Natural Encoding scheme. Each member of the population updates its position iteratively on the basis of a newly designed position update rule. Opposition‐based learning is implemented in the optimization process. The encoding scheme and position update rule used by the algorithm allows individual terms corresponding to different attributes within the rule’s antecedent to be a disjunction of the values of those attributes. The performance of the proposed algorithm is evaluated against seven different datasets using a tenfold testing scheme. The achieved median accuracy is compared against various evolutionary and non‐evolutionary classification techniques. The algorithm produces promising results by creating highly accurate and precise rules for each dataset.

  3. Efficient Independence-Based MAP Approach for Robust Markov Networks Structure Discovery

    CERN Document Server

    Bromberg, Facundo

    2011-01-01

    This work introduces the IB-score, a family of independence-based score functions for robust learning of Markov networks independence structures. Markov networks are a widely used graphical representation of probability distributions, with many applications in several fields of science. The main advantage of the IB-score is the possibility of computing it without the need of estimation of the numerical parameters, an NP-hard problem, usually solved through an approximate, data-intensive, iterative optimization. We derive a formal expression for the IB-score from first principles, mainly maximum a posteriori and conditional independence properties, and exemplify several instantiations of it, resulting in two novel algorithms for structure learning: IBMAP-HC and IBMAP-TS. Experimental results over both artificial and real world data show these algorithms achieve important error reductions in the learnt structures when compared with the state-of-the-art independence-based structure learning algorithm GSMN, achie...

  4. Collaborative case-based reasoning for knowledge discovery of elders health assessment system.

    Science.gov (United States)

    Hu, Ping; Gu, Dong-Xiao; Zhu, Yu

    2014-01-01

    The existing Elders Health Assessment (EHA) system based on single-case-library reasoning has low intelligence level, poor coordination, and limited capabilities of assessment decision support. To effectively support knowledge reuse of EHA system, this paper proposes collaborative case reasoning and applies it to the whole knowledge reuse process of EHA system. It proposes a multi-case library reasoning application framework of EHA knowledge reuse system, and studies key techniques such as case representation, case retrieval algorithm, case optimization and correction, and reuse etc.. In the aspect of case representation, XML-based multi-case representation for case organization and storage is applied to facilitate case retrieval and management. In the aspect of retrieval method, Knowledge-Guided Approach with Nearest-Neighbor is proposed. Given the complexity of EHA, Gray Relational Analysis with weighted Euclidean Distance is used to measure the similarity so as to improve case retrieval accuracy.

  5. Discovery and characterization of the first non-coding RNA that regulates gene expression,micF RNA:A historical perspective

    Institute of Scientific and Technical Information of China (English)

    Nicholas; Delihas

    2015-01-01

    The first evidence that RNA can function as a regulator of gene expression came from experiments with prokaryotes in the 1980 s. It was shown that Escherichia coli micF isan independent gene,has its own promoter,and encodes a small non-coding RNA that base pairs with and inhibits translation of a target messenger RNA in response to environmental stress conditions. The mic F RNA was isolated,sequenced and shown to be a primary transcript. In vitro experiments showed binding to the target ompF mR NA. Secondary structure probing revealed an imperfect micF RNA/ompF RNA duplex interaction and the presence of a non-canonical base pair. Several transcription factors,including OmpR,regulate micF transcription in response to environmental factors. micF has also been found in other bacterial species,however,recently Gerhart Wagner and J?rg Vogel showed pleiotropic effects and found micF inhibits expression of multiple target mR NAs; importantly,one is the global regulatory gene lrp. In addition,micF RNA was found to interact with its targets in different ways; it either inhibits ribosome binding or induces degradation of the message. Thus the concept and initial experimental evidence that RNA can regulate gene expression was born with prokaryotes.

  6. Pharmacophore-based discovery of FXR agonists. Part I: Model development and experimental validation

    OpenAIRE

    Schuster, Daniela; Markt, Patrick; Grienke, Ulrike; Mihaly-Bison, Judit; Binder, Markus; Noha, Stefan M.; Rollinger, Judith M.; Stuppner, Hermann; Bochkov, Valery N.; Wolber, Gerhard

    2011-01-01

    The farnesoid X receptor (FXR) is involved in glucose and lipid metabolism regulation, which makes it an attractive target for the metabolic syndrome, dyslipidemia, atherosclerosis, and type 2 diabetes. In order to find novel FXR agonists, a structure-based pharmacophore model collection was developed and theoretically evaluated against virtual databases including the ChEMBL database. The most suitable models were used to screen the National Cancer Institute (NCI) database. Biological evaluat...

  7. The discovery of novel tartrate-based TNF-[alpha] converting enzyme (TACE) inhibitors

    Energy Technology Data Exchange (ETDEWEB)

    Rosner, Kristin E.; Guo, Zhuyan; Orth, Peter; Shipps, Jr., Gerald W.; Belanger, David B.; Chan, Tin Yau; Curran, Patrick J.; Dai, Chaoyang; Deng, Yongqi; Girijavallabhan, Vinay M.; Hong, Liwu; Lavey, Brian J.; Lee, Joe F.; Li, Dansu; Liu, Zhidan; Popovici-Muller, Janeta; Ting, Pauline C.; Vaccaro, Henry; Wang, Li; Wang, Tong; Yu, W.; Zhou, G.; Niu, X.; Sun, J.; Kozlowski, J.A.; Lundell, D.J.; Madison, V.; McKittrick, B.; Piwinski, J.J.; Shih, N.Y.; Siddiqui, M. Arshad; Strickland, Corey O. (SPRI)

    2010-09-17

    A novel series of TNF-{alpha} convertase (TACE) inhibitors which are non-hydroxamate have been discovered. These compounds are bis-amides of L-tartaric acid (tartrate) and coordinate to the active site zinc in a tridentate manner. They are selective for TACE over other MMP's. We report the first X-ray crystal structure for a tartrate-based TACE inhibitor.

  8. From raw data to biological discoveries: a computational analysis pipeline for mass spectrometry-based proteomics.

    Science.gov (United States)

    Lavallée-Adam, Mathieu; Park, Sung Kyu Robin; Martínez-Bartolomé, Salvador; He, Lin; Yates, John R

    2015-11-01

    In the last two decades, computational tools for mass spectrometry-based proteomics data analysis have evolved from a few stand-alone software solutions serving specific goals, such as the identification of amino acid sequences based on mass spectrometry spectra, to large-scale complex pipelines integrating multiple computer programs to solve a collection of problems. This software evolution has been mostly driven by the appearance of novel technologies that allowed the community to tackle complex biological problems, such as the identification of proteins that are differentially expressed in two samples under different conditions. The achievement of such objectives requires a large suite of programs to analyze the intricate mass spectrometry data. Our laboratory addresses complex proteomics questions by producing and using algorithms and software packages. Our current computational pipeline includes, among other things, tools for mass spectrometry raw data processing, peptide and protein identification and quantification, post-translational modification analysis, and protein functional enrichment analysis. In this paper, we describe a suite of software packages we have developed to process mass spectrometry-based proteomics data and we highlight some of the new features of previously published programs as well as tools currently under development. Graphical Abstract ᅟ.

  9. From Raw Data to Biological Discoveries: A Computational Analysis Pipeline for Mass Spectrometry-Based Proteomics

    Science.gov (United States)

    Lavallée-Adam, Mathieu; Park, Sung Kyu Robin; Martínez-Bartolomé, Salvador; He, Lin; Yates, John R.

    2015-11-01

    In the last two decades, computational tools for mass spectrometry-based proteomics data analysis have evolved from a few stand-alone software solutions serving specific goals, such as the identification of amino acid sequences based on mass spectrometry spectra, to large-scale complex pipelines integrating multiple computer programs to solve a collection of problems. This software evolution has been mostly driven by the appearance of novel technologies that allowed the community to tackle complex biological problems, such as the identification of proteins that are differentially expressed in two samples under different conditions. The achievement of such objectives requires a large suite of programs to analyze the intricate mass spectrometry data. Our laboratory addresses complex proteomics questions by producing and using algorithms and software packages. Our current computational pipeline includes, among other things, tools for mass spectrometry raw data processing, peptide and protein identification and quantification, post-translational modification analysis, and protein functional enrichment analysis. In this paper, we describe a suite of software packages we have developed to process mass spectrometry-based proteomics data and we highlight some of the new features of previously published programs as well as tools currently under development.

  10. Discovery of potent nitrotriazole-based antitrypanosomal agents: In vitro and in vivo evaluation.

    Science.gov (United States)

    Papadopoulou, Maria V; Bloomer, William D; Rosenzweig, Howard S; O'Shea, Ivan P; Wilkinson, Shane R; Kaiser, Marcel; Chatelain, Eric; Ioset, Jean-Robert

    2015-10-01

    3-Nitro-1H-1,2,4-triazole- and 2-nitro-1H-imidazole-based amides with an aryloxy-phenyl core were synthesized and evaluated as antitrypanosomal agents. All 3-nitrotriazole-based derivatives were extremely potent anti-Trypanosoma cruzi agents at sub nM concentrations and exhibited a high degree of selectivity for the parasite. The 2-nitroimidazole analogs were only moderately active against T. cruzi amastigotes and exhibited low selectivity. Both types of compound were active against Leishmania donovani axenic amastigotes with excellent selectivity for the parasite, whereas three 2-nitroimidazole-based analogs were also moderately active against infected macrophages. However, no compound demonstrated selective activity against Trypanosoma brucei rhodesiense. The most potent in vitro anti-T. cruzi compounds were tested in an acute murine model and reduced the parasites to an undetectable level after five days of treatment at 13 mg/kg/day. Such compounds are potential inhibitors of T. cruzi CYP51 and, being excellent substrates for the type I nitroreductase (NTR) which is specific to trypanosomatids, work as prodrugs and constitute a new generation of effective and more affordable antitrypanosomal agents. PMID:26344593

  11. Fragment-hopping-based discovery of a novel chemical series of proto-oncogene PIM-1 kinase inhibitors.

    Directory of Open Access Journals (Sweden)

    Gustavo Saluste

    Full Text Available A new chemical series, triazolo[4,5-b]pyridines, has been identified as an inhibitor of PIM-1 by a chemotype hopping strategy based on a chemically feasible fragment database. In this case, structure-based virtual screening and in silico chemogenomics provide added value to the previously reported strategy of prioritizing among proposed novel scaffolds. Pairwise comparison between compound 3, recently discontinued from Phase I clinical trials, and molecule 8, bearing the selected novel scaffold, shows that the primary activities are similar (IC(50 in the 20 to 150 nM range. At the same time, some ADME properties (for example, an increase of more than 45% in metabolic stability in human liver microsomes and the off-target selectivity (for example, an increase of more than 2 log units in IC(50vs. FLT3 are improved, and the intellectual property (IP position is enhanced. The discovery of a reliable starting point that fulfills critical criteria for a plausible medicinal chemistry project is demonstrated in this prospective study.

  12. Discovery and molecular mapping of a new gene conferring resistance to stem rust, Sr53, derived from Aegilops geniculata and characterization of spontaneous translocation stocks with reduced alien chromatin

    Science.gov (United States)

    This study reports the discovery and molecular mapping of a resistance gene effective against stem rust races RKQQC and TTKSK (Ug99) derived from Aegilops geniculata (2n=4x=28, UgUgMgMg). Two populations from the crosses TA5599 (T5DL-5MgL.5MgS)/TA3809 (ph1b mutant in Chinese Spring background) and T...

  13. Prediction and discovery of new geothermal resources in the Great Basin: Multiple evidence of a large undiscovered resource base

    Science.gov (United States)

    Coolbaugh, M.F.; Raines, G.L.; Zehner, R.E.; Shevenell, L.; Williams, C.F.

    2006-01-01

    Geothermal potential maps by themselves cannot directly be used to estimate undiscovered resources. To address the undiscovered resource base in the Great Basin, a new and relatively quantitative methodology is presented. The methodology involves three steps, the first being the construction of a data-driven probabilistic model of the location of known geothermal systems using weights of evidence. The second step is the construction of a degree-of-exploration model. This degree-of-exploration model uses expert judgment in a fuzzy logic context to estimate how well each spot in the state has been explored, using as constraints digital maps of the depth to the water table, presence of the carbonate aquifer, and the location, depth, and type of drill-holes. Finally, the exploration model and the data-driven occurrence model are combined together quantitatively using area-weighted modifications to the weights-of-evidence equations. Using this methodology in the state of Nevada, the number of undiscovered geothermal systems with reservoir temperatures ???100??C is estimated at 157, which is 3.2 times greater than the 69 known systems. Currently, nine of the 69 known systems are producing electricity. If it is conservatively assumed that an additional nine for a total of 18 of the known systems will eventually produce electricity, then the model predicts 59 known and undiscovered geothermal systems are capable of producing electricity under current economic conditions in the state, a figure that is more than six times higher than the current number. Many additional geothermal systems could potentially become economic under improved economic conditions or with improved methods of reservoir stimulation (Enhanced Geothermal Systems).This large predicted geothermal resource base appears corroborated by recent grass-roots geothermal discoveries in the state of Nevada. At least two and possibly three newly recognized geothermal systems with estimated reservoir temperatures

  14. 基于PCM聚类算法的Blog社区发现%Blog Community Discovery Based on PCM Clustering Algorithm

    Institute of Scientific and Technical Information of China (English)

    柳助民; 李绍滋; 林达真; 柯逍; 曹冬林

    2009-01-01

    Considering that the traditional calculation of community discovery can not find the shortcomings of the core and boundary members of the community,this paper puts forward Blog community discovery algorithm based on soft clustering algorithm PCM to identify the core and boundary of the Blog community. Firstly, the use of calculation with random walk method can measure the symmetrical society distance between two Blogs′ intimacy. Then, on the base of symmetrical society distance, algorithm use PCM to cluster Blog to get the probability of the member in every community group belonging to community group. At last, the core and boundary of the community can be determined through the definition of corresponding probability threshold value. The experiment has shown that the algorithm can obtain the probability of the community member belonging to the community and can find out the core and boundary members of the community according to the probability.%针对传统的社区发现算法无法发现社区中的核心成员和边界成员的缺点,提出了基于PCM聚类算法的Blog社区发现算法,用来识别Blog社区的核心和边界.首先,使用随机行走的方法计算可以衡量两个Blog亲密度的对称社会距离;然后,在对称社区距离的基础上使用PCM聚类算法对Blog进行聚类,得到每个社区中的成员属于社区的概率表示.最后,通过确定相应的概率阈值,确定社区的核心和边界.实验结果表明:该算法能够获得社区中的成员属于社区的概率,根据这个概率可以确定社区中的核心成员和边界成员.

  15. Abiotic Methane in Land-Based Serpentinized Peridotites: New Discoveries and Isotope Surprises

    Science.gov (United States)

    Whiticar, M. J.; Etiope, G.

    2014-12-01

    Until 2008, abiotic methane in land-based serpentinized ultramafic rocks was documented (including gas and C- and H- isotope compositions) only at sites in Oman, Philippines, New Zealand and Turkey. Methane emanates from seeps and/or hyperalkaline water springs along faults and is associated with molecular hydrogen. These were considered to be very unusual and rare occurrences of gas. Now, methane is documented for peridotite-based springs or seeps (in ophiolites, orogenic massifs or intrusions) in US, Canada, Costa Rica, Greece, Italy, Japan, New Caledonia, Portugal, Spain and United Arab Emirates. Gas flux measurements are indicating that methane can also flux as invisible microseepages from the ground, through fractured peridotites, even far removed from seeps and springs. Methane C-isotope ratios range from -6 to -37 permil (VPDB) for dominantly abiotic methane. The more 13-C depleted values, e.g., California, are likely mixed with biotic gas (microbial and thermogenic gas). Methane H-isotope ratios cover a wide range from -118 to -333 permil (VSMOW). The combination of C- and H-isotopes clearly distinguish biotic from abiotic methane. Radiocarbon (14-C) analysis from bubbling seeps in Italian peridotites indicate that the methane is fossil (pMC temperatures of land-based peridotites (generally methane production. We discuss some hypotheses concerning gas generation in water vs. dry systems, i.e., in deeper, older, waters or in unsaturated rocks). We also discuss low vs. high temperatures, i.e., at the present-day low T conditions or at higher temperatures eventually occurring in the early stages of peridotite emplacement on land.

  16. Abiotic Methane in Land-Based Serpentinized Peridotites: New Discoveries and Isotope Surprises

    Science.gov (United States)

    Whiticar, M. J.; Etiope, G.

    2014-12-01

    Until 2008, abiotic methane in land-based serpentinized ultramafic rocks was documented (including gas and C- and H- isotope compositions) only at sites in Oman, Philippines, New Zealand and Turkey. Methane emanates from seeps and/or hyperalkaline water springs along faults and is associated with molecular hydrogen. These were considered to be very unusual and rare occurrences of gas. Now, methane is documented for peridotite-based springs or seeps (in ophiolites, orogenic massifs or intrusions) in US, Canada, Costa Rica, Greece, Italy, Japan, New Caledonia, Portugal, Spain and United Arab Emirates. Gas flux measurements are indicating that methane can also flux as invisible microseepages from the ground, through fractured peridotites, even far removed from seeps and springs. Methane C-isotope ratios range from -6 to -37 permil (VPDB) for dominantly abiotic methane. The more 13-C depleted values, e.g., California, are likely mixed with biotic gas (microbial and thermogenic gas). Methane H-isotope ratios cover a wide range from -118 to -333 permil (VSMOW). The combination of C- and H-isotopes clearly distinguish biotic from abiotic methane. Radiocarbon (14-C) a