Zhang Michael Q
Full Text Available Abstract Background Although microarray-based studies have revealed global view of gene expression in cancer cells, we still have little knowledge about regulatory mechanisms underlying the transcriptome. Several computational methods applied to yeast data have recently succeeded in identifying expression modules, which is defined as co-expressed gene sets under common regulatory mechanisms. However, such module discovery methods are not applied cancer transcriptome data. Results In order to decode oncogenic regulatory programs in cancer cells, we developed a novel module discovery method termed EEM by extending a previously reported module discovery method, and applied it to breast cancer expression data. Starting from seed gene sets prepared based on cis-regulatory elements, ChIP-chip data, and gene locus information, EEM identified 10 principal expression modules in breast cancer based on their expression coherence. Moreover, EEM depicted their activity profiles, which predict regulatory programs in each subtypes of breast tumors. For example, our analysis revealed that the expression module regulated by the Polycomb repressive complex 2 (PRC2 is downregulated in triple negative breast cancers, suggesting similarity of transcriptional programs between stem cells and aggressive breast cancer cells. We also found that the activity of the PRC2 expression module is negatively correlated to the expression of EZH2, a component of PRC2 which belongs to the E2F expression module. E2F-driven EZH2 overexpression may be responsible for the repression of the PRC2 expression modules in triple negative tumors. Furthermore, our network analysis predicts regulatory circuits in breast cancer cells. Conclusion These results demonstrate that the gene set-based module discovery approach is a powerful tool to decode regulatory programs in cancer cells.
Full Text Available Secondary metabolites (SMs produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.
Moriarity, Branden S; Largaespada, David A
Large-scale genomic efforts to study human cancer, such as the cancer gene atlas (TCGA), have identified numerous cancer drivers in a wide variety of tumor types. However, there are limitations to this approach, the mutations and expression or copy number changes that are identified are not always clearly functionally relevant, and only annotated genes and genetic elements are thoroughly queried. The use of complimentary, nonbiased, functional approaches to identify drivers of cancer development and progression is ideal to maximize the rate at which cancer discoveries are achieved. One such approach that has been successful is the use of the Sleeping Beauty (SB) transposon-based mutagenesis system in mice. This system uses a conditionally expressed transposase and mutagenic transposon allele to target mutagenesis to somatic cells of a given tissue in mice to cause random mutations leading to tumor development. Analysis of tumors for transposon common insertion sites (CIS) identifies candidate cancer genes specific to that tumor type. While similar screens have been performed in mice with the PiggyBac (PB) transposon and viral approaches, we limit extensive discussion to SB. Here we discuss the basic structure of these screens, screens that have been performed, methods used to identify CIS. PMID:26051241
Full Text Available Abstract Background It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that propel and characterize the progression of versatile life phenomena, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. The vast amount of large-scale and genome-wide time-resolved data is becoming increasing available, which provides the golden opportunity to unravel the challenging reverse-engineering problem of time-delayed gene regulatory networks. Results In particular, this methodological paper aims to reconstruct regulatory networks from temporal gene expression data by using delayed correlations between genes, i.e., pairwise overlaps of expression levels shifted in time relative each other. We have thus developed a novel model-free computational toolbox termed TdGRN (Time-delayed Gene Regulatory Network to address the underlying regulations of genes that can span any unit(s of time intervals. This bioinformatics toolbox has provided a unified approach to uncovering time trends of gene regulations through decision analysis of the newly designed time-delayed gene expression matrix. We have applied the proposed method to yeast cell cycling and human HeLa cell cycling and have discovered most of the underlying time-delayed regulations that are supported by multiple lines of experimental evidence and that are remarkably consistent with the current knowledge on phase characteristics for the cell cyclings. Conclusion We established a usable and powerful model-free approach to dissecting high-order dynamic trends of gene-gene interactions. We have carefully validated the proposed algorithm by applying it to two publicly available cell cycling datasets. In addition to uncovering the time trends of gene regulations for cell cycling, this unified approach can also be used to study the complex
Mrowka, Ralf; Blüthgen, Nils; Fähling, Michael
Reliable prediction of specific transcription factor target genes is a major challenge in systems biology and functional genomics. Current sequence-based methods yield many false predictions, due to the short and degenerated DNA-binding motifs. Here, we describe a new systematic genome-wide approach, the seed-distribution-distance method, that searches large-scale genome-wide expression data for genes that are similarly expressed as known targets. This method is used to identify genes that are likely targets, allowing sequence-based methods to focus on a subset of genes, giving rise to fewer false-positive predictions. We show by cross-validation that this method is robust in recovering specific target genes. Furthermore, this method identifies genes with typical functions and binding motifs of the seed. The method is illustrated by predicting novel targets of the transcription factor nuclear factor kappaB (NF-kappaB). Among the new targets is optineurin, which plays a key role in the pathogenesis of acquired blindness caused by adult-onset primary open-angle glaucoma. We show experimentally that the optineurin gene and other predicted genes are targets of NF-kappaB. Thus, our data provide a missing link in the signalling of NF-kappaB and the damping function of optineurin in signalling feedback of NF-kappaB. We present a robust and reliable method to enhance the genome-wide prediction of specific transcription factor target genes that exploits the vast amount of expression information available in public databases today. PMID:18485006
Palsule, Vrushalee; Coric, Dijana; Delancy, Russell; Dunham, Heather; Melancon, Caleb; Thompson, Dennis; Toms, Jamie; White, Ashley; Shultz, Jeffry
A clear understanding of basic gene structure is critical when teaching molecular genetics, the central dogma and the biological sciences. We sought to create a gene-based teaching project to improve students' understanding of gene structure and to integrate this into a research project that can be implemented by instructors at the secondary level…
Full Text Available Abstract Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L. Walp. We aimed to use this method to enrich for genes expressed during drought stress in a drought tolerant cowpea line. However, current methods were inefficient in screening libraries and management of the sequence data, and thus there was a need to develop software tools to facilitate the process. Results Forward and reverse cDNA libraries enriched for cowpea drought response genes were screened on microarrays, and the R software package SSHscreen 2.0.1 was developed (i to normalize the data effectively using spike-in control spot normalization, and (ii to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self-BLAST function within SSHdb grouped
Full Text Available Abstract Background Identification of essential proteins is always a challenging task since it requires experimental approaches that are time-consuming and laborious. With the advances in high throughput technologies, a large number of protein-protein interactions are available, which have produced unprecedented opportunities for detecting proteins' essentialities from the network level. There have been a series of computational approaches proposed for predicting essential proteins based on network topologies. However, the network topology-based centrality measures are very sensitive to the robustness of network. Therefore, a new robust essential protein discovery method would be of great value. Results In this paper, we propose a new centrality measure, named PeC, based on the integration of protein-protein interaction and gene expression data. The performance of PeC is validated based on the protein-protein interaction network of Saccharomyces cerevisiae. The experimental results show that the predicted precision of PeC clearly exceeds that of the other fifteen previously proposed centrality measures: Degree Centrality (DC, Betweenness Centrality (BC, Closeness Centrality (CC, Subgraph Centrality (SC, Eigenvector Centrality (EC, Information Centrality (IC, Bottle Neck (BN, Density of Maximum Neighborhood Component (DMNC, Local Average Connectivity-based method (LAC, Sum of ECC (SoECC, Range-Limited Centrality (RL, L-index (LI, Leader Rank (LR, Normalized α-Centrality (NC, and Moduland-Centrality (MC. Especially, the improvement of PeC over the classic centrality measures (BC, CC, SC, EC, and BN is more than 50% when predicting no more than 500 proteins. Conclusions We demonstrate that the integration of protein-protein interaction network and gene expression data can help improve the precision of predicting essential proteins. The new centrality measure, PeC, is an effective essential protein discovery method.
Yuan, Jinzhou; Raizen, David; Bau, Haim
Animal motility varies with genotype, disease progression, aging, and environmental conditions. In many studies, it is desirable to carry out high throughput motility-based sorting to isolate rare animals for, among other things, forward genetic screens to identify genetic pathways that regulate phenotypes of interest. Many commonly used screening processes are labor-intensive, lack sensitivity, and require extensive investigator training. Here, we describe a sensitive, high throughput, automated, motility-based method for sorting nematodes. Our method was implemented in a simple microfluidic device capable of sorting many thousands of animals per hour per module, and is amenable to parallelism. The device successfully enriched for known C. elegans motility mutants. Furthermore, using this device, we isolated low-abundance mutants capable of suppressing the somnogenic effects of the flp-13 gene, which regulates sleep-like quiescence in C. elegans. Subsequent genomic sequencing led to the identification of a flp-13-suppressor gene. This research was supported, in part, by NIH NIA Grant 5R03AG042690-02.
Celedon, J M; Bohlmann, J
Terpenoid fragrances are powerful mediators of ecological interactions in nature and have a long history of traditional and modern industrial applications. Plants produce a great diversity of fragrant terpenoid metabolites, which make them a superb source of biosynthetic genes and enzymes. Advances in fragrance gene discovery have enabled new approaches in synthetic biology of high-value speciality molecules toward applications in the fragrance and flavor, food and beverage, cosmetics, and other industries. Rapid developments in transcriptome and genome sequencing of nonmodel plant species have accelerated the discovery of fragrance biosynthetic pathways. In parallel, advances in metabolic engineering of microbial and plant systems have established platforms for synthetic biology applications of some of the thousands of plant genes that underlie fragrance diversity. While many fragrance molecules (eg, simple monoterpenes) are abundant in readily renewable plant materials, some highly valuable fragrant terpenoids (eg, santalols, ambroxides) are rare in nature and interesting targets for synthetic biology. As a representative example for genomics/transcriptomics enabled gene and enzyme discovery, we describe a strategy used successfully for elucidation of a complete fragrance biosynthetic pathway in sandalwood (Santalum album) and its reconstruction in yeast (Saccharomyces cerevisiae). We address questions related to the discovery of specific genes within large gene families and recovery of rare gene transcripts that are selectively expressed in recalcitrant tissues. To substantiate the validity of the approaches, we describe the combination of methods used in the gene and enzyme discovery of a cytochrome P450 in the fragrant heartwood of tropical sandalwood, responsible for the fragrance defining, final step in the biosynthesis of (Z)-santalols. PMID:27480682
Christopher J. Hayes
Full Text Available PCR is a common and often indispensable technique used in medical and biological research labs for a variety of applications. Real-time quantitative PCR (RT-qPCR has become a definitive technique for quantitating differences in gene expression levels between samples. Yet, in spite of this importance, reliable methods to quantitate nucleic acid amounts in a higher throughput remain elusive. In the following paper, a unique design to quantify gene expression levels at the nanoscale in a continuous flow system is presented. Fully automated, high-throughput, low volume amplification of deoxynucleotides (DNA in a droplet based microfluidic system is described. Unlike some conventional qPCR instrumentation that use integrated fluidic circuits or plate arrays, the instrument performs qPCR in a continuous, micro-droplet flowing process with droplet generation, distinctive reagent mixing, thermal cycling and optical detection platforms all combined on one complete instrument. Detailed experimental profiling of reactions of less than 300 nl total volume is achieved using the platform demonstrating the dynamic range to be 4 order logs and consistent instrument sensitivity. Furthermore, reduced pipetting steps by as much as 90% and a unique degree of hands-free automation makes the analytical possibilities for this instrumentation far reaching. In conclusion, a discussion of the first demonstrations of this approach to perform novel, continuous high-throughput biological screens is presented. The results generated from the instrument, when compared with commercial instrumentation, demonstrate the instrument reliability and robustness to carry out further studies of clinical significance with added throughput and economic benefits.
LI Xia; RAO Shaoqi; ZHANG Tianwen; GUO Zheng; ZHANG Qingpu; Kathy L. MOSER; Eric J. TOPOL
The advent of DNA microarray technology has offered the promise of casting new insights onto deciphering secrets of life by monitoring activities of thousands of genes simultaneously.Current analyses of microarray data focus on precise classification of biological types,for example,tumor versus normal tissues.A further scientific challenging task is to extract disease-relevant genes from the bewildering amounts of raw data,which is one of the most critical themes in the post-genomic era,but it is generally ignored due to lack of an efficient approach.In this paper,we present a novel ensemble method for gene extraction that can be tailored to fulfill multiple biological tasks including(i)precise classification of biological types;(ii)disease gene mining; and(iii)target-driven gene networking.We also give a numerical application for(i)and(ii)using a public microarrary data set and set aside a separate paper to address(iii).
The emergence of a huge volume of "omics" data enables a computational approach to the investigation of the biology of cancer.The cancer informatics approach is a useful supplement to the traditional experimental approach.I reviewed several reports that used a bioinformatics approach to analyze the associations among aging,stem cells,and cancer by microarray gene expression profiling.The high expression of aging-or human embryonic stem cell-related molecules in cancer suggests that certain important mechanisms are commonly underlying aging,stem cells,and cancer.These mechanisms are involved in cell cycle regulation,metabolic process,DNA damage response,apoptosis,p53 signaling pathway,immune/inflammatory response,and other processes,suggesting that cancer is a developmental and evolutional disease that is strongly related to aging.Moreover,these mechanisms demonstrate that the initiation,proliferation,and metastasis of cancer are associated with the deregulation of stem cells.These findings provide insights into the biology of cancer.Certainly,the findings that are obtained by the informatics approach should be justified by experimental validation.This review also noted that next-generation sequencing data provide enriched sources for cancer informatics study.
de Burgos Nelia
Full Text Available Abstract Background Triatoma infestans is the most relevant vector of Chagas disease in the southern cone of South America. Since its genome has not yet been studied, sequencing of Expressed Sequence Tags (ESTs is one of the most powerful tools for efficiently identifying large numbers of expressed genes in this insect vector. Results In this work, we generated 826 ESTs, resulting in an increase of 47% in the number of ESTs available for T. infestans. These ESTs were assembled in 471 unique sequences, 151 of which represent 136 new genes for the Reduviidae family. Conclusions Among the putative new genes for the Reduviidae family, we identified and described an interesting subset of genes involved in development and reproduction, which constitute potential targets for insecticide development.
Full Text Available BACKGROUND: Paspalum dilatatum Poir. (common name dallisgrass is a native grass species of South America, with special relevance to dairy and red meat production. P. dilatatum exhibits higher forage quality than other C4 forage grasses and is tolerant to frost and water stress. This species is predominantly cultivated in an apomictic monoculture, with an inherent high risk that biotic and abiotic stresses could potentially devastate productivity. Therefore, advanced breeding strategies that characterise and use available genetic diversity, or assess germplasm collections effectively are required to deliver advanced cultivars for production systems. However, there are limited genomic resources available for this forage grass species. RESULTS: Transcriptome sequencing using second-generation sequencing platforms has been employed using pooled RNA from different tissues (stems, roots, leaves and inflorescences at the final reproductive stage of P. dilatatum cultivar Primo. A total of 324,695 sequence reads were obtained, corresponding to c. 102 Mbp. The sequences were assembled, generating 20,169 contigs of a combined length of 9,336,138 nucleotides. The contigs were BLAST analysed against the fully sequenced grass species of Oryza sativa subsp. japonica, Brachypodium distachyon, the closely related Sorghum bicolor and foxtail millet (Setaria italica genomes as well as against the UniRef 90 protein database allowing a comprehensive gene ontology analysis to be performed. The contigs generated from the transcript sequencing were also analysed for the presence of simple sequence repeats (SSRs. A total of 2,339 SSR motifs were identified within 1,989 contigs and corresponding primer pairs were designed. Empirical validation of a cohort of 96 SSRs was performed, with 34% being polymorphic between sexual and apomictic biotypes. CONCLUSIONS: The development of genetic and genomic resources for P. dilatatum will contribute to gene discovery and expression
Jupiter, Daniel; Chen, Hailin; VanBuren, Vincent
Background Although expression microarrays have become a standard tool used by biologists, analysis of data produced by microarray experiments may still present challenges. Comparison of data from different platforms, organisms, and labs may involve complicated data processing, and inferring relationships between genes remains difficult. Results STARNET 2 is a new web-based tool that allows post hoc visual analysis of correlations that are derived from expression microarray data. STARNET 2 fa...
Karuppiah, Valliappan; Li, Yingxin; Sun, Wei; Feng, Guofang; Li, Zhiyong
Phenazines represent a large group of nitrogen-containing heterocyclic compounds produced by the diverse group of bacteria including actinobacteria. In this study, a total of 197 actinobacterial strains were isolated from seven different marine sponge species in the South China Sea using five different culture media. Eighty-seven morphologically different actinobacterial strains were selected and grouped into 13 genera, including Actinoalloteichus, Kocuria, Micrococcus, Micromonospora, Mycobacterium, Nocardiopsis, Prauserella, Rhodococcus, Saccharopolyspora, Salinispora, Serinicoccus, and Streptomyces by the phylogenetic analysis of 16S rRNA gene. Based on the screening of phzE genes, ten strains, including five Streptomyces, two Nocardiopsis, one Salinispora, one Micrococcus, and one Serinicoccus were found to be potential for phenazine production. The level of phzE gene expression was highly expressed in Nocardiopsis sp. 13-33-15, 13-12-13, and Serinicoccus sp. 13-12-4 on the fifth day of fermentation. Finally, 1,6-dihydroxy phenazine (1) from Nocardiopsis sp. 13-33-15 and 13-12-13, and 1,6-dimethoxy phenazine (2) from Nocardiopsis sp. 13-33-15 were isolated and identified successfully based on ESI-MS and NMR analysis. The compounds 1 and 2 showed antibacterial activity against Bacillus mycoides SJ14, Staphylococcus aureus SJ51, Escherichia coli SJ42, and Micrococcus luteus SJ47. This study suggests that the integrated approach of gene screening and chemical analysis is an effective strategy to find the target compounds and lays the basis for the production of phenazine from the sponge-associated actinobacteria. PMID:25820602
Grados, Marco A.
Objective: To provide a contemporary perspective on genetic discovery methods applied to obsessive-compulsive disorder (OCD) and Tourette syndrome (TS). Method: A review of research trends in genetics research in OCD and TS is conducted, with emphasis on novel approaches. Results: Genome-wide association studies (GWAS) are now in progress in OCD…
Full Text Available Abstract Background The use of gene signatures can potentially be of considerable value in the field of clinical diagnosis. However, gene signatures defined with different methods can be quite various even when applied the same disease and the same endpoint. Previous studies have shown that the correct selection of subsets of genes from microarray data is key for the accurate classification of disease phenotypes, and a number of methods have been proposed for the purpose. However, these methods refine the subsets by only considering each single feature, and they do not confirm the association between the genes identified in each gene signature and the phenotype of the disease. We proposed an innovative new method termed Minimize Feature's Size (MFS based on multiple level similarity analyses and association between the genes and disease for breast cancer endpoints by comparing classifier models generated from the second phase of MicroArray Quality Control (MAQC-II, trying to develop effective meta-analysis strategies to transform the MAQC-II signatures into a robust and reliable set of biomarker for clinical applications. Results We analyzed the similarity of the multiple gene signatures in an endpoint and between the two endpoints of breast cancer at probe and gene levels, the results indicate that disease-related genes can be preferably selected as the components of gene signature, and that the gene signatures for the two endpoints could be interchangeable. The minimized signatures were built at probe level by using MFS for each endpoint. By applying the approach, we generated a much smaller set of gene signature with the similar predictive power compared with those gene signatures from MAQC-II. Conclusions Our results indicate that gene signatures of both large and small sizes could perform equally well in clinical applications. Besides, consistency and biological significances can be detected among different gene signatures, reflecting the
Akle, Sebastian; Chun, Sung; Jordan, Daniel M; Cassa, Christopher A
Clinical sequencing is expanding, but causal variants are still not identified in the majority of cases. These unsolved cases can aid in gene discovery when individuals with similar phenotypes are identified in systems such as the Matchmaker Exchange. We describe risks for gene discovery in this growing set of unsolved cases. In a set of rare disease cases with the same phenotype, it is not difficult to find two individuals with the same phenotype that carry variants in the same gene. We quantify the risk of false-positive association in a cohort of individuals with the same phenotype, using the prior probability of observing a variant in each gene from over 60,000 individuals (Exome Aggregation Consortium). Based on the number of individuals with a genic variant, cohort size, specific gene, and mode of inheritance, we calculate a P value that the match represents a true association. A match in two of 10 patients in MECP2 is statistically significant (P = 0.0014), whereas a match in TTN would not reach significance, as expected (P > 0.999). Finally, we analyze the probability of matching in clinical exome cases to estimate the number of cases needed to identify genes related to different disorders. We offer Rare Disease Match, an online tool to mitigate the uncertainty of false-positive associations. PMID:26378430
Kamanu, Timothy K.
MicroRNA (miRNA) are a class of small endogenous non-coding RNA that are mainly negative transcriptional and post-transcriptional regulators in both plants and animals. Recent studies have shown that miRNA are involved in different types of cancer and other incurable diseases such as autism and Alzheimer’s. Functional miRNAs are excised from hairpin-like sequences that are known as miRNA genes. There are about 21,000 known miRNA genes, most of which have been determined using experimental methods. miRNA genes are classified into different groups (miRNA families). This study reports about 19,000 unknown miRNA genes in nine species whereby approximately 15,300 predictions were computationally validated to contain at least one experimentally verified functional miRNA product. The predictions are based on a novel computational strategy which relies on miRNA family groupings and exploits the physics and geometry of miRNA genes to unveil the hidden palindromic signals and symmetries in miRNA gene sequences. Unlike conventional computational miRNA gene discovery methods, the algorithm developed here is species-independent: it allows prediction at higher accuracy and resolution from arbitrary RNA/DNA sequences in any species and thus enables examination of repeat-prone genomic regions which are thought to be non-informative or ’junk’ sequences. The information non-redundancy of uni-directional RNA sequences compared to information redundancy of bi-directional DNA is demonstrated, a fact that is overlooked by most pattern discovery algorithms. A novel method for computing upstream and downstream miRNA gene boundaries based on mathematical/statistical functions is suggested, as well as cutoffs for annotation of miRNA genes in different miRNA families. Another tool is proposed to allow hypotheses generation and visualization of data matrices, intra- and inter-species chromosomal distribution of miRNA genes or miRNA families. Our results indicate that: miRNA and mi
Georg K Gerber
Full Text Available An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-kappaB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal
Clark, Peter E.; Cookson, Michael S.
Mutations or aberrations of the von Hippel-Lindau gene are responsible for the hereditary neoplastic syndrome that bears the same name, as well as for the majority of sporadic clear cell renal cell carcinomas. The discovery of this gene and subsequent clarification of its mechanism of action have led to a series of targeted treatments for advanced kidney cancer and have dramatically changed how we manage this disease. The discovery of the VHL gene is a prime example of how discoveries at the ...
Theresa A Hill
Full Text Available The widely cultivated pepper, Capsicum spp., important as a vegetable and spice crop world-wide, is one of the most diverse crops. To enhance breeding programs, a detailed characterization of Capsicum diversity including morphological, geographical and molecular data is required. Currently, molecular data characterizing Capsicum genetic diversity is limited. The development and application of high-throughput genome-wide markers in Capsicum will facilitate more detailed molecular characterization of germplasm collections, genetic relationships, and the generation of ultra-high density maps. We have developed the Pepper GeneChip® array from Affymetrix for polymorphism detection and expression analysis in Capsicum. Probes on the array were designed from 30,815 unigenes assembled from expressed sequence tags (ESTs. Our array design provides a maximum redundancy of 13 probes per base pair position allowing integration of multiple hybridization values per position to detect single position polymorphism (SPP. Hybridization of genomic DNA from 40 diverse C. annuum lines, used in breeding and research programs, and a representative from three additional cultivated species (C. frutescens, C. chinense and C. pubescens detected 33,401 SPP markers within 13,323 unigenes. Among the C. annuum lines, 6,426 SPPs covering 3,818 unigenes were identified. An estimated three-fold reduction in diversity was detected in non-pungent compared with pungent lines, however, we were able to detect 251 highly informative markers across these C. annuum lines. In addition, an 8.7 cM region without polymorphism was detected around Pun1 in non-pungent C. annuum. An analysis of genetic relatedness and diversity using the software Structure revealed clustering of the germplasm which was confirmed with statistical support by principle components analysis (PCA and phylogenetic analysis. This research demonstrates the effectiveness of parallel high-throughput discovery and
Hill, Theresa A; Ashrafi, Hamid; Reyes-Chin-Wo, Sebastian; Yao, JiQiang; Stoffel, Kevin; Truco, Maria-Jose; Kozik, Alexander; Michelmore, Richard W; Van Deynze, Allen
The widely cultivated pepper, Capsicum spp., important as a vegetable and spice crop world-wide, is one of the most diverse crops. To enhance breeding programs, a detailed characterization of Capsicum diversity including morphological, geographical and molecular data is required. Currently, molecular data characterizing Capsicum genetic diversity is limited. The development and application of high-throughput genome-wide markers in Capsicum will facilitate more detailed molecular characterization of germplasm collections, genetic relationships, and the generation of ultra-high density maps. We have developed the Pepper GeneChip® array from Affymetrix for polymorphism detection and expression analysis in Capsicum. Probes on the array were designed from 30,815 unigenes assembled from expressed sequence tags (ESTs). Our array design provides a maximum redundancy of 13 probes per base pair position allowing integration of multiple hybridization values per position to detect single position polymorphism (SPP). Hybridization of genomic DNA from 40 diverse C. annuum lines, used in breeding and research programs, and a representative from three additional cultivated species (C. frutescens, C. chinense and C. pubescens) detected 33,401 SPP markers within 13,323 unigenes. Among the C. annuum lines, 6,426 SPPs covering 3,818 unigenes were identified. An estimated three-fold reduction in diversity was detected in non-pungent compared with pungent lines, however, we were able to detect 251 highly informative markers across these C. annuum lines. In addition, an 8.7 cM region without polymorphism was detected around Pun1 in non-pungent C. annuum. An analysis of genetic relatedness and diversity using the software Structure revealed clustering of the germplasm which was confirmed with statistical support by principle components analysis (PCA) and phylogenetic analysis. This research demonstrates the effectiveness of parallel high-throughput discovery and application of genome
LI Ji-yun; GENG Zhao-feng; SHAO Shi-huang
A novel DNA coding based knowledge discovery algorithm was proposed, an example which verified its validity was given. It is proved that this algorithm can discover new simplified rules from the original rule set efficiently.
As the sequencing stage of human genome project is near the end, the work has begun for discovering novel genes from genome sequences and annotating their biological functions. Here are reviewed current major bioinformatics tools and technologies available for large scale gene discovery and annotation from human genome sequences. Some ideas about possible future development are also provided.
G. M. Souza
Full Text Available Sugarcane is a highly productive crop used for centuries as the main source of sugar and recently to produce ethanol, a renewable bio-fuel energy source. There is increased interest in this crop due to the impending need to decrease fossil fuel usage. Sugarcane has a highly polyploid genome. Expressed sequence tag (EST sequencing has significantly contributed to gene discovery and expression studies used to associate function with sugarcane genes. A significant amount of data exists on regulatory events controlling responses to herbivory, drought, and phosphate deficiency, which cause important constraints on yield and on endophytic bacteria, which are highly beneficial. The means to reduce drought, phosphate deficiency, and herbivory by the sugarcane borer have a negative impact on the environment. Improved tolerance for these constraints is being sought. Sugarcane's ability to accumulate sucrose up to 16% of its culm dry weight is a challenge for genetic manipulation. Genome-based technology such as cDNA microarray data indicates genes associated with sugar content that may be used to develop new varieties improved for sucrose content or for traits that restrict the expansion of the cultivated land. The genes can also be used as molecular markers of agronomic traits in traditional breeding programs.
Bashir, Saba; Javed, M Younus; Khan, Aihab; Khiyal, Malik Sikandar Hayat
Recent advancement in web services plays an important role in business to business and business to consumer interaction. Discovery mechanism is not only used to find a suitable service but also provides collaboration between service providers and consumers by using standard protocols. A static web service discovery mechanism is not only time consuming but requires continuous human interaction. This paper proposed an efficient dynamic web services discovery mechanism that can locate relevant and updated web services from service registries and repositories with timestamp based on indexing value and categorization for faster and efficient discovery of service. The proposed prototype focuses on quality of service issues and introduces concept of local cache, categorization of services, indexing mechanism, CSP (Constraint Satisfaction Problem) solver, aging and usage of translator. Performance of proposed framework is evaluated by implementing the algorithm and correctness of our method is shown. The results of p...
G. M. Souza; M.-A. Van-Sluys; Vincentz, M.; Silva-Filho, M. C.; Menossi, M.
Sugarcane is a highly productive crop used for centuries as the main source of sugar and recently to produce ethanol, a renewable bio-fuel energy source. There is increased interest in this crop due to the impending need to decrease fossil fuel usage. Sugarcane has a highly polyploid genome. Expressed sequence tag (EST) sequencing has significantly contributed to gene discovery and expression studies used to associate function with sugarcane genes. A significant amount of data exists on regul...
Mitreva, Makedonka; Papenfuss, Antony T.; Whittington, Camilla M; Locke, Devin P.; Mardis, Elaine; Wilson, Richard K.; Abubucker, Sahar; Wong, Emily Sw; Hsu, Artur; Kuchei, Philip W.; Belov, Katherine; Warren, Wesley
Background: To date, few peptides in the complex mixture of platypus venom have been identified and sequenced, in part due to the limited amounts of platypus venom available to study. We have constructed and sequenced a cDNA library from an active platypus venom gland to identify the remaining components. Results: We identified 83 novel putative platypus venom genes from 13 toxin families, which are homologous to known toxins from a wide range of vertebrates (fish, reptiles, insectivores)...
Casari, G; Daruvar, Dea; Sander, C.; Schneider, Reinhard
Scientific history was made in completing the yeast genuine sequence, yet its 13 Mb are a mere starting point. Two challenges loom large: to decipher the function of all genes and to describe the workings of the eukaryotic cell in full molecular detail. A combination of experimental and theoretical approaches will be brought to bear on these challenges. What will be next in yeast genome analysis from the point of view of bioinformatics?
D'Aquin, Mathieu; Badra, Fadi; Lafrogne, Sandrine; Lieber, Jean; Napoli, Amedeo; Szathmary, Laszlo
In case-based reasoning, the adaptation step depends in general on domain-dependent knowledge, which motivates studies on adaptation knowledge acquisition (AKA). CABAMAKA is an AKA system based on principles of knowledge discovery from databases. This system explores the variations within the case base to elicit adaptation knowledge. It has been successfully tested in an application of case-based decision support to breast cancer treatment.
Patel, Sejal; Park, Min Tae M.; Chakravarty, M. Mallar; Knight, Jo
Imaging genetics is an emerging field in which the association between genes and neuroimaging-based quantitative phenotypes are used to explore the functional role of genes in neuroanatomy and neurophysiology in the context of healthy function and neuropsychiatric disorders. The main obstacle for researchers in the field is the high dimensionality of the data in both the imaging phenotypes and the genetic variants commonly typed. In this article, we develop a novel method that utilizes Gene Ontology, an online database, to select and prioritize certain genes, employing a stratified false discovery rate (sFDR) approach to investigate their associations with imaging phenotypes. sFDR has the potential to increase power in genome wide association studies (GWAS), and is quickly gaining traction as a method for multiple testing correction. Our novel approach addresses both the pressing need in genetic research to move beyond candidate gene studies, while not being overburdened with a loss of power due to multiple testing. As an example of our methodology, we perform a GWAS of hippocampal volume using both the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA2) and the Alzheimer's Disease Neuroimaging Initiative datasets. The analysis of ENIGMA2 data yielded a set of SNPs with sFDR values between 10 and 20%. Our approach demonstrates a potential method to prioritize genes based on biological systems impaired in a disease. PMID:27092072
ElShal, Sarah; Tranchevent, Léon-Charles; Sifrim, Alejandro; Ardeshirdavani, Amin; Davis, Jesse; Moreau, Yves
Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeavour (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/. PMID:26384564
on the activation of different downstream pathways. Thus it seems that different genetic backgrounds can lead to similar clinical manifestations, and as well determines the susceptibility to IBD. In the previous micro array based expression studies on UC the main target has been to point to new...
Full Text Available In today's scenario web services have become a grand vision to implement the business process functionalities. With increase in number of similar web services, one of the essential challenges is to discover relevant web service with regard to user specification. Relevancy of web service discovery can be improved by augmenting semantics through expressive formats like OWL. QoS based service selection will play a significant role in meeting the non-functional user requirements. Hence QoS and semantics has been used as finer search constraints to discover the most relevant service. In this paper, we describe a QoS framework for ontology based web service discovery. The QoS factors taken into consideration are execution time, response time, throughput, scalability, reputation, accessibility and availability. The behavior of each web service at various instances is observed over a period of time and their QoS based performance is analyzed.
The history of the discovery of the microphthalmia locus and its gene, now called Mitf, is a testament to the triumph of serendipity. Although the first microphthalmia mutation was discovered among the descendants of a mouse that was irradiated for the purpose of mutagenesis, the mutation most likely was not radiation-induced but occurred spontaneously in one of the parents of a later breeding. Although Mitf might eventually have been identified by other molecular genetic techniques, it was f...
Marcelo Bento Soares
In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.
Alfieri, L.; Brooks, PJ; Aldrich, NJ; Tenenbaum, HR
Discovery learning approaches to education have recently come under scrutiny (Tobias & Duffy, 2009), with many studies indicating limitations to discovery learning practices. Therefore, 2 meta-analyses were conducted using a sample of 164 studies: The 1st examined the effects of unassisted discovery learning versus explicit instruction, and the 2nd examined the effects of enhanced and/or assisted discovery versus other types of instruction (e.g., explicit, unassisted discovery). Random effect...
Li, Zhong; Zhang, Huiping
Background Polymorphisms in the μ-, δ- and κ-opioid receptor genes (OPRM1, OPRD1 and OPRK1) have been reported to be associated with substance (alcohol or drug) dependence. The influence of an individual gene on a disease trait should be more evident when analyzed in the context of gene-gene interactions. Thus, we assessed the joint effect of variants in these three opioid receptor genes on alcohol, cocaine, or opioid dependence. Methods Genotype data for 13 OPRM1 Single Nucleotide Polymorphi...
Alfieri, Louis; Brooks, Patricia J.; Aldrich, Naomi J.; Tenenbaum, Harriet R.
Discovery learning approaches to education have recently come under scrutiny (Tobias & Duffy, 2009), with many studies indicating limitations to discovery learning practices. Therefore, 2 meta-analyses were conducted using a sample of 164 studies: The 1st examined the effects of unassisted discovery learning versus explicit instruction, and the…
Thapar, A; Martin, J; Mick, E; Arias Vásquez, A; Langley, K; Scherer, S W; Schachar, R; Crosbie, J; Williams, N; Franke, B; Elia, J; Glessner, J; Hakonarson, H; Owen, M J; Faraone, S V; O'Donovan, M C; Holmans, P
A strong motivation for undertaking psychiatric gene discovery studies is to provide novel insights into unknown biology. Although attention-deficit hyperactivity disorder (ADHD) is highly heritable, and large, rare copy number variants (CNVs) contribute to risk, little is known about its pathogenesis and it remains commonly misunderstood. We assembled and pooled five ADHD and control CNV data sets from the United Kingdom, Ireland, United States of America, Northern Europe and Canada. Our aim was to test for enrichment of neurodevelopmental gene sets, implicated by recent exome-sequencing studies of (a) schizophrenia and (b) autism as a means of testing the hypothesis that common pathogenic mechanisms underlie ADHD and these other neurodevelopmental disorders. We also undertook hypothesis-free testing of all biological pathways. We observed significant enrichment of individual genes previously found to harbour schizophrenia de novo non-synonymous single-nucleotide variants (SNVs; P=5.4 × 10−4) and targets of the Fragile X mental retardation protein (P=0.0018). No enrichment was observed for activity-regulated cytoskeleton-associated protein (P=0.23) or N-methyl-D-aspartate receptor (P=0.74) post-synaptic signalling gene sets previously implicated in schizophrenia. Enrichment of ADHD CNV hits for genes impacted by autism de novo SNVs (P=0.019 for non-synonymous SNV genes) did not survive Bonferroni correction. Hypothesis-free testing yielded several highly significantly enriched biological pathways, including ion channel pathways. Enrichment findings were robust to multiple testing corrections and to sensitivity analyses that excluded the most significant sample. The findings reveal that CNVs in ADHD converge on biologically meaningful gene clusters, including ones now established as conferring risk of other neurodevelopmental disorders. PMID:26573769
Thapar, A; Martin, J; Mick, E; Arias Vásquez, A; Langley, K; Scherer, S W; Schachar, R; Crosbie, J; Williams, N; Franke, B; Elia, J; Glessner, J; Hakonarson, H; Owen, M J; Faraone, S V; O'Donovan, M C; Holmans, P
A strong motivation for undertaking psychiatric gene discovery studies is to provide novel insights into unknown biology. Although attention-deficit hyperactivity disorder (ADHD) is highly heritable, and large, rare copy number variants (CNVs) contribute to risk, little is known about its pathogenesis and it remains commonly misunderstood. We assembled and pooled five ADHD and control CNV data sets from the United Kingdom, Ireland, United States of America, Northern Europe and Canada. Our aim was to test for enrichment of neurodevelopmental gene sets, implicated by recent exome-sequencing studies of (a) schizophrenia and (b) autism as a means of testing the hypothesis that common pathogenic mechanisms underlie ADHD and these other neurodevelopmental disorders. We also undertook hypothesis-free testing of all biological pathways. We observed significant enrichment of individual genes previously found to harbour schizophrenia de novo non-synonymous single-nucleotide variants (SNVs; P=5.4 × 10(-4)) and targets of the Fragile X mental retardation protein (P=0.0018). No enrichment was observed for activity-regulated cytoskeleton-associated protein (P=0.23) or N-methyl-D-aspartate receptor (P=0.74) post-synaptic signalling gene sets previously implicated in schizophrenia. Enrichment of ADHD CNV hits for genes impacted by autism de novo SNVs (P=0.019 for non-synonymous SNV genes) did not survive Bonferroni correction. Hypothesis-free testing yielded several highly significantly enriched biological pathways, including ion channel pathways. Enrichment findings were robust to multiple testing corrections and to sensitivity analyses that excluded the most significant sample. The findings reveal that CNVs in ADHD converge on biologically meaningful gene clusters, including ones now established as conferring risk of other neurodevelopmental disorders. PMID:26573769
Filichkin, Sergei; Etherington, Elizabeth; Ma, Caiping; Strauss, Steve
The goals of the S.H. Strauss laboratory portion of 'Genome-enabled discovery of carbon sequestration genes in poplar' are (1) to explore the functions of candidate genes using Populus transformation by inserting genes provided by Oakridge National Laboratory (ORNL) and the University of Florida (UF) into poplar; (2) to expand the poplar transformation toolkit by developing transformation methods for important genotypes; and (3) to allow induced expression, and efficient gene suppression, in roots and other tissues. As part of the transformation improvement effort, OSU developed transformation protocols for Populus trichocarpa 'Nisqually-1' clone and an early flowering P. alba clone, 6K10. Complete descriptions of the transformation systems were published (Ma et. al. 2004, Meilan et. al 2004). Twenty-one 'Nisqually-1' and 622 6K10 transgenic plants were generated. To identify root predominant promoters, a set of three promoters were tested for their tissue-specific expression patterns in poplar and in Arabidopsis as a model system. A novel gene, ET304, was identified by analyzing a collection of poplar enhancer trap lines generated at OSU (Filichkin et. al 2006a, 2006b). Other promoters include the pGgMT1 root-predominant promoter from Casuarina glauca and the pAtPIN2 promoter from Arabidopsis root specific PIN2 gene. OSU tested two induction systems, alcohol- and estrogen-inducible, in multiple poplar transgenics. Ethanol proved to be the more efficient when tested in tissue culture and greenhouse conditions. Two estrogen-inducible systems were evaluated in transgenic Populus, neither of which functioned reliably in tissue culture conditions. GATEWAY-compatible plant binary vectors were designed to compare the silencing efficiency of homologous (direct) RNAi vs. heterologous (transitive) RNAi inverted repeats. A set of genes was targeted for post transcriptional silencing in the model Arabidopsis system; these include the floral
Jagarlapudi, Sarma A R P; Kishan, K V Radha
Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery. PMID:19727614
Saba Bashir,; Farhan Hassan Khan; M. Younus Javed; Aihab Khan; Malik Sikandar Hayat Khiyal
Recent advancement in web services plays an important role in business to business and business to consumer interaction. Discovery mechanism is not only used to find a suitable service but also provides collaboration between service providers and consumers by using standard protocols. A static web service discovery mechanism is not only time consuming but requires continuous human interaction. This paper proposed an efficient dynamic web services discovery mechanism that can locate relevant a...
Philippakis, Anthony A; Azzariti, Danielle R; Beltran, Sergi; Brookes, Anthony J; Brownstein, Catherine A; Brudno, Michael; Brunner, Han G; Buske, Orion J; Carey, Knox; Doll, Cassie; Dumitriu, Sergiu; Dyke, Stephanie O M; den Dunnen, Johan T; Firth, Helen V; Gibbs, Richard A; Girdea, Marta; Gonzalez, Michael; Haendel, Melissa A; Hamosh, Ada; Holm, Ingrid A; Huang, Lijia; Hurles, Matthew E; Hutton, Ben; Krier, Joel B; Misyura, Andriy; Mungall, Christopher J; Paschall, Justin; Paten, Benedict; Robinson, Peter N; Schiettecatte, François; Sobreira, Nara L; Swaminathan, Ganesh J; Taschner, Peter E; Terry, Sharon F; Washington, Nicole L; Züchner, Stephan; Boycott, Kym M; Rehm, Heidi L
There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow. PMID:26295439
Gwinn, Katrina; Corriveau, Roderick A.; Mitsumoto, Hiroshi; Bednarz, Kate; Brown, Robert H.; Cudkowicz, Merit; Gordon, Paul H.; Hardy, John; Kasarskis, Edward J.; Kaufmann, Petra; Miller, Robert; Sorenson, Eric; Tandan, Rup; Traynor, Bryan J.; Nash, Josefina; Sherman, Alex; Mailman, Matthew D.; Ostell, James; Bruijn, Lucie; Cwik, Valerie; Rich, Stephen S.; Singleton, Andrew; Refolo, Larry; Andrews, Jaime; Zhang, Ran; Conwit, Robin; Keller, Margaret A.
Amyotrophic lateral sclerosis (ALS) is the most common form of motor neuron disease (MND). It is currently incurable and treatment is largely limited to supportive care. Family history is associated with an increased risk of ALS, and many Mendelian causes have been discovered. However, most forms of the disease are not obviously familial. Recent advances in human genetics have enabled genome-wide analyses of single nucleotide polymorphisms (SNPs) that make it possible to study complex genetic contributions to human disease. Genome-wide SNP analyses require a large sample size and thus depend upon collaborative efforts to collect and manage the biological samples and corresponding data. Public availability of biological samples (such as DNA), phenotypic and genotypic data further enhances research endeavors. Here we discuss a large collaboration among academic investigators, government, and non-government organizations which has created a public repository of human DNA, immortalized cell lines, and clinical data to further gene discovery in ALS. This resource currently maintains samples and associated phenotypic data from 2332 MND subjects and 4692 controls. This resource should facilitate genetic discoveries which we anticipate will ultimately provide a better understanding of the biological mechanisms of neurodegeneration in ALS. PMID:18060051
Full Text Available Abstract Background Rise of temperatures and shortening of available water as result of predicted climate change will impose significant pressure on long-lived forest tree species. Discovering allelic variation present in drought related genes of two Austrian oak species can be the key to understand mechanisms of natural selection and provide forestry with key tools to cope with future challenges. Results In the present study we have used Roche 454 sequencing and developed a bioinformatic pipeline to process multiplexed tagged amplicons in order to identify single nucleotide polymorphisms and allelic sequences of ten candidate genes related to drought/osmotic stress from sessile oak (Quercus robur and sessile oak (Q. petraea individuals. Out of these, eight genes of 336 oak individuals growing in Austria have been detected with a total number of 158 polymorphic sites. Allele numbers ranged from ten to 52 with observed heterozygosity ranging from 0.115 to 0.640. All loci deviated from Hardy-Weinberg equilibrium and linkage disequilibrium was found among six combinations of loci. Conclusions We have characterized 183 alleles of drought related genes from oak species and detected first evidences of natural selection. Beside the potential for marker development, we have created an expandable bioinformatic pipeline for the analysis of next generation sequencing data.
Kim, Minhyung; Hwang, Daehee
The advances in mass spectrometry-based proteomics technologies have enabled the generation of global proteome data from tissue or body fluid samples collected from a broad spectrum of human diseases. Comparative proteomic analysis of global proteome data identifies and prioritizes the proteins showing altered abundances, called differentially expressed proteins (DEPs), in disease samples, compared to control samples. Protein biomarker candidates that can serve as indicators of disease states are then selected as key molecules among these proteins. Recently, it has been addressed that cellular pathways can provide better indications of disease states than individual molecules and also network analysis of the DEPs enables effective identification of cellular pathways altered in disease conditions and key molecules representing the altered cellular pathways. Accordingly, a number of network-based approaches to identify disease-related pathways and representative molecules of such pathways have been developed. In this review, we summarize analytical platforms for network-based protein biomarker discovery and key components in the platforms. PMID:27103885
Wilkowski, Bartlomiej; Fiszman, Marcelo; Miller, Christopher M; Hristovski, Dimitar; Arabandi, Sivaram; Rosemblat, Graciela; Rindflesch, Thomas C
We present an extension to literature-based discovery that goes beyond making discoveries to a principled way of navigating through selected aspects of some biomedical domain. The method is a type of "discovery browsing" that guides the user through the research literature on a specified phenomen...... illustrated with depressive disorder and focuses on the interaction of inflammation, circadian phenomena, and the neurotransmitter norepinephrine. Insight provided may contribute to enhanced understanding of the pathophysiology, treatment, and prevention of this disorder....
Wilkowski, Bartlomiej; Fiszman, Marcelo; Miller, Christopher M.; Hristovski, Dimitar; Arabandi, Sivaram; Rosemblat, Graciela; Rindflesch, Thomas C.
We present an extension to literature-based discovery that goes beyond making discoveries to a principled way of navigating through selected aspects of some biomedical domain. The method is a type of “discovery browsing” that guides the user through the research literature on a specified phenomenon. Poorly understood relationships may be explored through novel points of view, and potentially interesting relationships need not be known ahead of time. In a process of “cooperative reciprocity” t...
Baker, Erich J; Jay, Jeremy J; Philip, Vivek M; Zhang, Yun; Li, Zuopan; Kirova, Roumyana; Langston, Michael A; Chesler, Elissa J
The wealth of genomic technologies has enabled biologists to rapidly ascribe phenotypic characters to biological substrates. Central to effective biological investigation is the operational definition of the process under investigation. We propose an elucidation of categories of biological characters, including disease relevant traits, based on natural endogenous processes and experimentally observed biological networks, pathways and systems rather than on externally manifested constructs and current semantics such as disease names and processes. The Ontological Discovery Environment (ODE) is an Internet accessible resource for the storage, sharing, retrieval and analysis of phenotype-centered genomic data sets across species and experimental model systems. Any type of data set representing gene-phenotype relationships, such quantitative trait loci (QTL) positional candidates, literature reviews, microarray experiments, ontological or even meta-data, may serve as inputs. To demonstrate a use case leveraging the homology capabilities of ODE and its ability to synthesize diverse data sets, we conducted an analysis of genomic studies related to alcoholism. The core of ODE's gene set similarity, distance and hierarchical analysis is the creation of a bipartite network of gene-phenotype relations, a unique discrete graph approach to analysis that enables set-set matching of non-referential data. Gene sets are annotated with several levels of metadata, including community ontologies, while gene set translations compare models across species. Computationally derived gene sets are integrated into hierarchical trees based on gene-derived phenotype interdependencies. Automated set identifications are augmented by statistical tools which enable users to interpret the confidence of modeled results. This approach allows data integration and hypothesis discovery across multiple experimental contexts, regardless of the face similarity and semantic annotation of the experimental
Full Text Available Abstract Background An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which limits its applicability in real-life data sets where the biclusters may be fragmented due to random noise/errors. Moreover, as they only work with binary or boolean attributes, their application on gene-expression data require transforming real-valued attributes to binary attributes, which often results in loss of information. Many past approaches have tried to address the issue of noise and handling real-valued attributes independently but there is no systematic approach that addresses both of these issues together. Results In this paper, we first propose a novel error-tolerant biclustering model, ‘ET-bicluster’, and then propose a bottom-up heuristic-based mining algorithm to sequentially discover error-tolerant biclusters directly from real-valued gene-expression data. The efficacy of our proposed approach is illustrated by comparing it with a recent approach RAP in the context of two biological problems: discovery of functional modules and discovery of biomarkers. For the first problem, two real-valued S.Cerevisiae microarray gene-expression data sets are used to demonstrate
Full Text Available Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter (226 gene panel and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlations of >0.94 and >0.80 with NanoString and ScriptSeq protocols, respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes and ScriptSeq whole transcriptome protocols respectively, p<2x10(-16. Specifically for lincRNAs, we observed superb Pearson correlation (0.988 between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads. Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transcriptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol
Gasser Robin B
Full Text Available Abstract Background Cholangiocarcinoma (CCA – cancer of the bile ducts – is associated with chronic infection with the liver fluke, Opisthorchis viverrini. Despite being the only eukaryote that is designated as a 'class I carcinogen' by the International Agency for Research on Cancer, little is known about its genome. Results Approximately 5,000 randomly selected cDNAs from the adult stage of O. viverrini were characterized and accounted for 1,932 contigs, representing ~14% of the entire transcriptome, and, presently, the largest sequence dataset for any species of liver fluke. Twenty percent of contigs were assigned GO classifications. Abundantly represented protein families included those involved in physiological functions that are essential to parasitism, such as anaerobic respiration, reproduction, detoxification, surface maintenance and feeding. GO assignments were well conserved in relation to other parasitic flukes, however, some categories were over-represented in O. viverrini, such as structural and motor proteins. An assessment of evolutionary relationships showed that O. viverrini was more similar to other parasitic (Clonorchis sinensis and Schistosoma japonicum than to free-living (Schmidtea mediterranea flatworms, and 105 sequences had close homologues in both parasitic species but not in S. mediterranea. A total of 164 O. viverrini contigs contained ORFs with signal sequences, many of which were platyhelminth-specific. Examples of convergent evolution between host and parasite secreted/membrane proteins were identified as were homologues of vaccine antigens from other helminths. Finally, ORFs representing secreted proteins with known roles in tumorigenesis were identified, and these might play roles in the pathogenesis of O. viverrini-induced CCA. Conclusion This gene discovery effort for O. viverrini should expedite molecular studies of cholangiocarcinogenesis and accelerate research focused on developing new interventions
Literature-based knowledge discovery method was introduced by Dr. Swanson in 1986. He hypothesized a connection between Raynaud's phenomenon and dietary fish oil, the field of literature-based discovery (LBD) was born from then on. During the subsequent two decades, LBD's research attracts some scientists including information science, computer science, and biomedical science, etc.. It has been a part of knowledge discovery and text mining. This paper summarizes the development of recent years about LBD and presents two parts, methodology research and applied research. Lastly, some problems are pointed as future research directions.
Brideau, Chelsea; Soloway, Paul
This chapter serves as an introduction to the collection of genome-wide sequence and epigenomic data, as well as the use of these data in training generalized linear models (glm) to predicted imprinted status. This is meant to be an introduction to the method, so only the most straightforward examples will be covered. For instance, the examples given below refer to 11 classes of genomic regions (the entire gene body, introns, exons, 5' UTR, 3' UTR, and 1, 10, and 100 kb upstream and downstream of each gene). One could also build models based on combinations of these regions. Likewise, models could be built on combinations of epigenetic features, or on combinations of both genomic regions and epigenetic features.This chapter relies heavily on computational methods, including basic programming. However, this chapter is not meant to be an introduction to programming. Throughout the chapter, the reader will be provided with example code in the Perl programming language. PMID:22907493
Full Text Available It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust, which is based on a novel Common-background and Sparse-foreground Decomposition (CSD model and the Maximum Block Improvement (MBI co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust and nonnegative matrix factorization (NMF. We apply SPARCoC to the study of lung adenocarcinoma (ADCA, an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05. Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification.
Manzanares, Chloe; Yates, Steven; Ruckle, Michael; Nay, Michelle; Studer, Bruno
Mutation breeding has a long-standing history and in some major crop species, many of the most important cultivars have their origin in germplasm generated by mutation induction. For almost two decades, methods for TILLING (Targeting Induced Local Lesions IN Genomes) have been established in model plant species such as Arabidopsis (Arabidopsis thaliana L.), enabling the functional analysis of genes. Recent advances in mutation detection by second generation sequencing technology have brought its utility to major crop species. However, it has remained difficult to apply similar approaches in forage and turf grasses, mainly due to their outbreeding nature maintained by an efficient self-incompatibility system. Starting with a description of the extent to which traditional mutagenesis methods have contributed to crop yield increase in the past, this review focuses on technological approaches to implement TILLING-based strategies for the improvement of forage grass breeding through forward and reverse genetics. We present first results from TILLING in allogamous forage grasses for traits such as stress tolerance and evaluate prospects for rapid implementation of beneficial alleles to forage grass breeding. In conclusion, large-scale induced mutation resources, used for forward genetic screens, constitute a valuable tool to increase the genetic diversity for breeding and can be generated with relatively small investments in forage grasses. Furthermore, large libraries of sequenced mutations can be readily established, providing enhanced opportunities to discover mutations in genes controlling traits of agricultural importance and to study gene functions by reverse genetics. PMID:26924175
Bucur, Doina; Bardram, Jakob
This paper proposes a service discovery protocol for sensor networks that is specifically tailored for use in humancentered pervasive environments. It uses the high-level concept of computational activities (as logical bundles of data and resources) to give sensors in Activity-Based Sensor Networks...... (ABSNs) knowledge about their usage even at the network layer. ABSN redesigns classical network-level service discovery protocols to include and use this logical structuring of the network for a more practically applicable service discovery scheme. Noting that in practical settings activity-based sensor...... patches are localized, ABSN designs a completely distributed, hybrid discovery protocol which is proactive in a neighbourhood zone and reactive outside, tailored so that any query among the sensors of one activity is routed through the network with minimum overhead, guided by the bounds of that activity...
Full Text Available Abstract Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7. Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies.
Kaytoue-Uberall, Mehdi; Duplessis, Sébastien; Napoli, Amedeo
This paper presents new syntactic constraints for itemset mining in gene expression matrices. Biologists are interested in identifying gene expression profiles which present similar quantitative variation features. A two dimensional gene expression profile representation is introduced and adapted to itemset mining allowing to control gene expression. Syntactic constraints introduce expert knowledge at the beginning of the Knowledge Discovery in Databases process and are used to discover items...
Zhao, X; Cai, H; Wang, X; Ma, L
Gene expression profiles of gastric cancer (GC) were analyzed with bioinformatics tools to identify signature genes associated with prognosis. Four gene expression data sets (accession number: GSE2685, GSE30727, GSE38932 and GSE26253) were downloaded from Gene Expression Omnibus. Differentially expressed genes (DEGs) were screened out using significance analysis of microarrays (SAM) algorithm. P-value 1 were set as the threshold. A co-expression network was constructed for the GC-related genes with package WGCNA of R. Modules were disclosed with WGCNA algorithm. Survival-related signature genes were screened out via COX single-variable regression.A total of 3210 GC-related genes were identified from the 3 data sets. Significantly enriched GO biological process terms included cell death, cell proliferation, apoptosis, response to hormone and phosphorylation. Pathways like viral carcinogenesis, metabolism, EBV viral infection, and PI3K-AKT signaling pathway were significantly over-represented in the DEGs. A gene co-expression network including 2414 genes was constructed, from which 7 modules were revealed. A total of 17 genes were identified as signature genes, such as DAB2, ALDH2, CD58, CITED2, BNIP3L, SLC43A2, FAU and COL5A1.Many signature genes associated with prognosis of GC were identified in present study, some of which have been implicated in the pathogenesis of GC. These findings could not only improve the knowledge about GC, but also provide clues for clinical treatments. PMID:26774142
Full text: Trypanosoma vivax is a hemoparasite affecting livestock industry in South America and Africa. According to Seidl et al more than 11 million cattle evaluated in more than 3 billion dollars are found in the Pantanal region of Brazil and other lowlands in Bolivia. According to the same authors, if the outbreak reported in Pocone-MT (Center-East of Brazil) had gone untreated, the estimated losses would have exceeded US$140,000 on the seven ranches, $200 million in the Pantanal and $700 million regionwide. Despite the high economic relevance of the disease caused by T. vivax, few researches on its molecular characterization has been made as compared with human trypanosomes as T. brucei spp and T. cruzi. The main reason is the difficulty to grow the parasite into laboratory rodents and 'in vitro'. Very few (West African) strains have been adapted to laboratory rodents. Furthermore, most field isolates cannot be characterized by tools as RAPD, since parasitemias are usually very low making difficult the separation of parasites from animal blood for posterior extraction of parasite DNA. These characteristics have limited the research on T. vivax during the last decades, consequently very few markers have been described for its molecular characterization. A search in Genbank showed that there are only 22 entries for T. vivax confronted with nearly 98289, 38577, 23507 available for T. brucei, T. cruzi and Leishmania, respectively. T. vivax (molecular) biology is also little understood, even considering major differences as mechanical transmission in South America and both cyclical and mechanical transmission in Africa. In a consultation with several experts on genomics, it was emphasized that T. vivax and T. congolense are underepresented species in the molecular parasitology and genomics age, then they should be considered to have their genome sequenced. In order to discovery new markers to be explored in the molecular characterization of T. vivax, we decided to
Full Text Available Abstract Background Viruses are obligate intracellular parasites that rely upon the host cell for different steps in their life cycles. The characterization of cellular genes required for virus infection and/or cell killing will be essential for understanding viral life cycles, and may provide cellular targets for new antiviral therapies. Results Candidate genes required for lytic reovirus infection were identified by tagged sequence mutagenesis, a process that permits rapid identification of genes disrupted by gene entrapment. One hundred fifty-one reovirus resistant clones were selected from cell libraries containing 2 × 105 independently disrupted genes, of which 111 contained mutations in previously characterized genes and functionally anonymous transcription units. Collectively, the genes associated with reovirus resistance differed from genes targeted by random gene entrapment in that known mutational hot spots were under represented, and a number of mutations appeared to cluster around specific cellular processes, including: IGF-II expression/signalling, vesicular transport/cytoskeletal trafficking and apoptosis. Notably, several of the genes have been directly implicated in the replication of reovirus and other viruses at different steps in the viral lifecycle. Conclusions Tagged sequence mutagenesis provides a rapid, genome-wide strategy to identify candidate cellular genes required for virus infection. The candidate genes provide a starting point for mechanistic studies of cellular processes that participate in the virus lifecycle and may provide targets for novel anti-viral therapies.
Tang, Suisheng; Tan, Sin Lam; Ramadoss, Suresh Kumar;
Estrogen has a profound impact on human physiology and affects numerous genes. The classical estrogen reaction is mediated by its receptors (ERs), which bind to the estrogen response elements (EREs) in target gene's promoter region. Due to tedious and expensive experiments, a limited number of...... human genes are functionally well characterized. It is still unclear how many and which human genes respond to estrogen treatment. We propose a simple, economic, yet effective computational method to predict a subclass of estrogen responsive genes. Our method relies on the similarity of ERE frames...... across different promoters in the human genome. Matching ERE frames of a test set of 60 known estrogen responsive genes to the collection of over 18,000 human promoters, we obtained 604 candidate genes. Evaluating our result by comparison with the published microarray data and literature, we found that...
Thomas, Samuel; Hao, Ling; Ricke, William A; Li, Lingjun
Urinary proteomics has become one of the most attractive topics in disease biomarker discovery. MS-based proteomic analysis has advanced continuously and emerged as a prominent tool in the field of clinical bioanalysis. However, only few protein biomarkers have made their way to validation and clinical practice. Biomarker discovery is challenged by many clinical and analytical factors including, but not limited to, the complexity of urine and the wide dynamic range of endogenous proteins in the sample. This article highlights promising technologies and strategies in the MS-based biomarker discovery process, including study design, sample preparation, protein quantification, instrumental platforms, and bioinformatics. Different proteomics approaches are discussed, and progresses in maximizing urinary proteome coverage and standardization are emphasized in this review. MS-based urinary proteomics has great potential in the development of noninvasive diagnostic assays in the future, which will require collaborative efforts between analytical scientists, systems biologists, and clinicians. PMID:26703953
Full Text Available In MANETs, Service discovery process is always considered to be crucial since they do not possess a centralized infrastructure for communication. Moreover, different services available through the network necessitate varying categories. Hence, a need arises for devising a secure probe based service discovery mechanism to reduce the complexity in providing the services to the network users. In this paper, we propose a Secure Service Discovery Based on Probe Packet Mechanism (SSDPPM for identifying the DoS attack in MANETs, which depicts a new approach for estimating the level of trust present in each and every routing path of a mobile ad hoc network by using probe packets. Probing based service discovery mechanisms mainly identifies a mobile node’s genuineness using a test packet called probe that travels the entire network for the sake of computing the degree of trust maintained between the mobile nodes and it’s attributed impact towards the network performance. The performance of SSDPPM is investigated through a wide range of network related parameters like packet delivery, throughput, Control overhead and total overhead using the version ns-2.26 network simulator. This mechanism SSDPPM, improves the performance of the network in an average by 23% and 19% in terms of packet delivery ratio and throughput than the existing service discovery mechanisms available in the literature.
To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global cl...
Rad, Roland; Rad, Lena; Wang, Wei; Cadinanos, Juan; Vassiliou, George; Rice, Stephen; Campos, Lia S.; Yusa, Kosuke; Banerjee, Ruby; Li, Meng Amy; de la Rosa, Jorge; Strong, Alexander; Lu, Dong; Ellis, Peter; Conte, Nathalie
Transposons are mobile DNA segments that can disrupt gene function by inserting in or near genes. Here we show that insertional mutagenesis by the PiggyBac transposon can be used for cancer gene discovery in mice. PiggyBac transposition in genetically engineered transposon/transposase mice induced cancers whose type (hematopoietic versus solid) and latency were dependent on the regulatory elements introduced into transposons. Analysis of 63 hematopoietic tumors revealed the unique qualities o...
Natural products provide a successful supply of new chemical entities (NCEs) for drug discovery to treat human diseases. Approximately half of the NCEs are based on natural products and their derivatives. Notably, marine natural products, a largely untapped resource, have contributed to drug discovery and development with eight drugs or cosmeceuticals approved by the U.S. Food and Drug Administration and European Medicines Agency, and ten candidates undergoing clinical trials. Collaborative efforts from drug developers, biologists, organic, medicinal, and natural product chemists have elevated drug discoveries to new levels. These efforts are expected to continue to improve the efficiency of natural product-based drugs. Marinopyrroles are examined here as a case study for potential anticancer and antibiotic agents. PMID:26332654
Wang, Yanjie; Dong, Chunlan; Xue, Zeyun; Jin, Qijiang; Xu, Yingchun
Paeonia ostii, an important ornamental and medicinal plant, grows normally on copper (Cu) mines with widespread Cu contamination of soils, and it has the ability to lower Cu contents in the Cu-contaminated soils. However, very little molecular information concerned with Cu resistance of P. ostii is available. In this study, high-throughput de novo transcriptome sequencing was carried out for P. ostii with and without Cu treatment using Illumina HiSeq 2000 platform. A total of 77,704 All-unigenes were obtained with a mean length of 710 bp. Of these unigenes, 47,461 were annotated with public databases based on sequence similarities. Comparative transcript profiling allowed the discovery of 4324 differentially expressed genes (DEGs), with 2207 up-regulated and 2117 down-regulated unigenes in Cu-treated library as compared to the control counterpart. Based on these DEGs, Gene Ontology (GO) enrichment analysis indicated Cu stress-relevant terms, such as 'membrane' and 'antioxidant activity'. Meanwhile, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis uncovered some important pathways, including 'biosynthesis of secondary metabolites' and 'metabolic pathways'. In addition, expression patterns of 12 selected DEGs derived from quantitative real-time polymerase chain reaction (qRT-PCR) were consistent with their transcript abundance changes obtained by transcriptomic analyses, suggesting that all the 12 genes were authentically involved in Cu tolerance in P. ostii. This is the first report to identify genes related to Cu stress responses in P. ostii, which could offer valuable information on the molecular mechanisms of Cu resistance, and provide a basis for further genomics research on this and related ornamental species for phytoremediation. PMID:26435192
Full Text Available Abstract Background In the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these p-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the p-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available. Results A simulation model based on the distribution of real microarray data plus two real data sets were used to assess the methods. The proposed alternative methods for estimating the proportion unchanged fared very well, and gave evidence of low bias and very low variance. Different methods perform well depending upon whether there are few or many regulated genes. Furthermore, the methods for estimating FDR showed a varying performance, and were sometimes misleading. The new method had a very low error. Conclusion The concept of the q-value or false discovery rate is useful in practical research, despite some theoretical and practical shortcomings. However, it seems possible to challenge the performance of the published methods, and there is likely scope for further developing the estimates of the FDR. The new methods provide the scientist with more options to choose a suitable method for any particular experiment. The article advocates the use of the conjoint information
Vrzal, Lukáš; Dvořáková, H.; Veverka, Václav
Brno : Masaryk University Press, 2015 - (Sklenář, V.). s. 888 ISBN 978-80-210-7890-1. [EUROMAR 2015. 05.07.2015-10.07.2015, Praha] Institutional support: RVO:61388963 Keywords : fragment-based drug design * NMR experiments Subject RIV: CE - Biochemistry
Vrzal, Lukáš; Dvořáková, H.; Veverka, Václav
Brno : Stuare, 2015 - (Novotný, J.). C5 ISBN 978-80-86441-46-7. [NMR Valtice. Central European NMR Meeting /30./. 19.04.2015-22.04.2015, Valtice] Institutional support: RVO:61388963 Keywords : fragment-based drug design * NMR experiments Subject RIV: CE - Biochemistry
Atanasovska, Biljana; Kumar, Vinod; Fu, Jingyuan; Wijmenga, Cisca; Hofker, Marten H
Cardiometabolic diseases represent a common complex disorder with a strong genetic component. Currently, genome-wide association studies (GWAS) have yielded some 755 single-nucleotide polymorphisms (SNPs) encompassing 366 independent loci that may help to decipher the molecular basis of cardiometabolic diseases. Going from a disease SNP to the underlying disease mechanisms is a huge challenge because the associated SNPs rarely disrupt protein function. Many disease SNPs are located in noncoding regions, and therefore attention is now focused on linking genetic SNP variation to effects on gene expression levels. By integrating genetic information with large-scale gene expression data, and with data from epigenetic roadmaps revealing gene regulatory regions, we expect to be able to identify candidate disease genes and the regulatory potential of disease SNPs. PMID:26596674
DAVIS J M
Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.
Cohen Barak A
Full Text Available Abstract Background Algorithms that locate evolutionarily conserved sequences have become powerful tools for finding functional DNA elements, including transcription factor binding sites; however, most methods do not take advantage of an explicit model for the constrained evolution of functional DNA sequences. Results We developed a probabilistic framework that combines an HKY85 model, which assigns probabilities to different base substitutions between species, and weight matrix models of transcription factor binding sites, which describe the probabilities of observing particular nucleotides at specific positions in the binding site. The method incorporates the phylogenies of the species under consideration and takes into account the position specific variation of transcription factor binding sites. Using our framework we assessed the suitability of alignments of genomic sequences from commonly used species as substrates for comparative genomic approaches to regulatory motif finding. We then applied this technique to Saccharomyces cerevisiae and related species by examining all possible six base pair DNA sequences (hexamers and identifying sequences that are conserved in a significant number of promoters. By combining similar conserved hexamers we reconstructed known cis-regulatory motifs and made predictions of previously unidentified motifs. We tested one prediction experimentally, finding it to be a regulatory element involved in the transcriptional response to glucose. Conclusion The experimental validation of a regulatory element prediction missed by other large-scale motif finding studies demonstrates that our approach is a useful addition to the current suite of tools for finding regulatory motifs.
To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. A validation technique, Figure of Merit is used to determine the quality of clustering results. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. To determine the quality of clustering results, a validation technique, Figure of Merit is used. Appropriate ...
Collier, Lara S.; Largaespada, David A
The use of Sleeping Beauty transposons as somatic mutagens to discover cancer genes in hematopoietic tumors and sarcomas has been documented. Here, we discuss the future of Sleeping Beauty for cancer genetic studies and the potential use of additional transposable elements for somatic mutagenesis.
Cai-Ping Feng; John Mundy
The present mini-review describes newer methods and strategies, including transposon and T-DNA insertions,TILLING, Deleteagene, and RNA interference, to functionally analyze genes of interest in the model plant Arabidopsis. The relative advantages and disadvantages of the systems are also discussed.
Feng, Cai-ping; Mundy, J.
The present mini-review describes newer methods and strategies, including transposon and T-DNA insertions, TILLING, Deleteagene, and RNA interference, to functionally analyze genes of interest in the model plant Arabidopsis. The relative advantages and disadvantages of the systems are also...
Coppe, Alessandro; Ferrari, Francesco; Bisognin, Andrea; Danieli, Gian Antonio; Ferrari, Sergio; Bicciato, Silvio; Bortoluzzi, Stefania
Genes co-expressed may be under similar promoter-based and/or position-based regulation. Although data on expression, position and function of human genes are available, their true integration still represents a challenge for computational biology, hampering the identification of regulatory mechanisms. We carried out an integrative analysis of genomic position, functional annotation and promoters of genes expressed in myeloid cells. Promoter analysis was conducted by a novel multi-step method for discovering putative regulatory elements, i.e. over-represented motifs, in a selected set of promoters, as compared with a background model. The combination of transcriptional, structural and functional data allowed the identification of sets of promoters pertaining to groups of genes co-expressed and co-localized in regions of the human genome. The application of motif discovery to 26 groups of genes co-expressed in myeloid cells differentiation and co-localized in the genome showed that there are more over-represented motifs in promoters of co-expressed and co-localized genes than in promoters of simply co-expressed genes (CEG). Motifs, which are similar to the binding sequences of known transcription factors, non-uniformly distributed along promoter sequences and/or occurring in highly co-expressed subset of genes were identified. Co-expressed and co-localized gene sets were grouped in two co-expressed genomic meta-regions, putatively representing functional domains of a high-level expression regulation. PMID:19059999
Recent advances in proteomic and transcriptomic technologies resulted in the accumulation of vast amount of high-throughput data that span multiple biological processes and characteristics in different organisms. Much of the data come in the form of interaction networks and mRNA expression arrays. An important task in systems biology is functional modules discovery where the goal is to uncover well-connected sub-networks (modules). These discovered modules help to unravel the underlying mechanisms of the observed biological processes. While most of the existing module discovery methods use only the interaction data, in this work we propose, CLARM, which discovers biological modules by incorporating gene profiles data with protein-protein interaction networks. We demonstrate the effectiveness of CLARM on Yeast and Human interaction datasets, and gene expression and molecular function profiles. Experiments on these real datasets show that the CLARM approach is competitive to well established functional module discovery methods.
González-Navarro, Félix F; Belanche-Muñoz, Lluís A; Gámez-Moreno, María G; Flores-Ríos, Brenda L; Ibarra-Esquer, Jorge E; López-Morteo, Gabriel A
Facioscapulohumeral muscular dystrophy (FSHD) is a neuromuscular disorder that shows a preference for the facial, shoulder and upper arm muscles. FSHD affects about one in 20-400,000 people, and no effective therapeutic strategies are known to halt disease progression or reverse muscle weakness or atrophy. Many genes may be incorrectly regulated in affected muscle tissue, but the mechanisms responsible for the progressive muscle weakness remain largely unknown. Although machine learning (ML) has made significant inroads in biomedical disciplines such as cancer research, no reports have yet addressed FSHD analysis using ML techniques. This study explores a specific FSHD data set from a ML perspective. We report results showing a very promising small group of genes that clearly separates FSHD samples from healthy samples. In addition to numerical prediction figures, we show data visualizations and biological evidence illustrating the potential usefulness of these results. PMID:26960968
Lozano, Aurélie C.; Abe, Naoki; Yan LIU; Rosset, Saharon
We consider the problem of discovering gene regulatory networks from time-series microarray data. Recently, graphical Granger modeling has gained considerable attention as a promising direction for addressing this problem. These methods apply graphical modeling methods on time-series data and invoke the notion of ‘Granger causality’ to make assertions on causality through inference on time-lagged effects. Existing algorithms, however, have neglected an important aspect of the problem—the grou...
Getchell Thomas V
Full Text Available Abstract Background Cluster analyses are used to analyze microarray time-course data for gene discovery and pattern recognition. However, in general, these methods do not take advantage of the fact that time is a continuous variable, and existing clustering methods often group biologically unrelated genes together. Results We propose a quadratic regression method for identification of differentially expressed genes and classification of genes based on their temporal expression profiles for non-cyclic short time-course microarray data. This method treats time as a continuous variable, therefore preserves actual time information. We applied this method to a microarray time-course study of gene expression at short time intervals following deafferentation of olfactory receptor neurons. Nine regression patterns have been identified and shown to fit gene expression profiles better than k-means clusters. EASE analysis identified over-represented functional groups in each regression pattern and each k-means cluster, which further demonstrated that the regression method provided more biologically meaningful classifications of gene expression profiles than the k-means clustering method. Comparison with Peddada et al.'s order-restricted inference method showed that our method provides a different perspective on the temporal gene profiles. Reliability study indicates that regression patterns have the highest reliabilities. Conclusion Our results demonstrate that the proposed quadratic regression method improves gene discovery and pattern recognition for non-cyclic short time-course microarray data. With a freely accessible Excel macro, investigators can readily apply this method to their microarray data.
Li, Xu; Mitton, Nathalie; Simplot-Ryl, David
Hello protocol is the basic technique for neighborhood discovery in wireless ad hoc networks. It requires nodes to claim their existence/aliveness by periodic `hello' messages. Central to any hello protocol is the determination of `hello' message transmission rate. No fixed optimal rate exists in the presence of node mobility. The rate should in fact adapt to it, high for high mobility and low for low mobility. In this paper, we propose a novel mobility prediction based hello protocol, named ...
Fragment based drug discovery (FBDD) has been applied to two protease drug targets, MMP-12 and HIV-1 protease. The primary screening and characterization of hit fragments were performed with surface plasmon resonance -technology. Further evaluation of the interaction was done by inhibition studies and in one case with X-ray crystallography. The focus of the two projects was different. Many MMP inhibitors contain a strong zinc chelating group, hydroxamate, interacting with the catalytic zinc a...
YANG Xuemei; XU Lizhen; DONG Yisheng; WANG Yongli
A novel semantic model of Web service description and discovery was proposed through an extension for profile model of Web ontology language for services (OWL-S) in this paper.Similarity matching of Web services was implemented through computing weighted summation of semantic similarity value based on specific domain ontology and dynamical satisfy extent evaluation for quality of service (QoS).Experiments show that the provided semantic matching model is efficient.
Fragment-based methods are successfully generating novel and selective drug-like inhibitors of protein targets, with a number of groups reporting compounds entering clinical trials. This paper summarizes the key features of the approach as one of the tools in structure-guided drug discovery. There has been considerable interest recently in what is known as 'fragment-based lead discovery'. The novel feature of the approach is to begin with small low-affinity compounds. The main advantage is that a larger potential chemical diversity can be sampled with fewer compounds, which is particularly important for new target classes. The approach relies on careful design of the fragment library, a method that can detect binding of the fragment to the protein target, determination of the structure of the fragment bound to the target, and the conventional use of structural information to guide compound optimization. In this article the methods are reviewed, and experiences in fragment-based discovery of lead series of compounds against kinases such as PDK1 and ATPases such as Hsp90 are discussed. The examples illustrate some of the key benefits and issues of the approach and also provide anecdotal examples of the patterns seen in selectivity and the binding mode of fragments across different protein targets
Full Text Available Abstract Background Technological leaps in genome sequencing have resulted in a surge in discovery of human disease genes. These discoveries have led to increased clarity on the molecular pathology of disease and have also demonstrated considerable overlap in the genetic roots of human diseases. In light of this large genetic overlap, we tested whether cross-disease research approaches lead to faster, more impactful discoveries. Methods We leveraged several gene-disease association databases to calculate a Mutual Citation Score (MCS for 10,853 pairs of genetically related diseases to measure the frequency of cross-citation between research fields. To assess the importance of cooperative research, we computed an Individual Disease Cooperation Score (ICS and the average publication rate for each disease. Results For all disease pairs with one gene in common, we found that the degree of genetic overlap was a poor predictor of cooperation (r2=0.3198 and that the vast majority of disease pairs (89.56% never cited previous discoveries of the same gene in a different disease, irrespective of the level of genetic similarity between the diseases. A fraction (0.25% of the pairs demonstrated cross-citation in greater than 5% of their published genetic discoveries and 0.037% cross-referenced discoveries more than 10% of the time. We found strong positive correlations between ICS and publication rate (r2=0.7931, and an even stronger correlation between the publication rate and the number of cross-referenced diseases (r2=0.8585. These results suggested that cross-disease research may have the potential to yield novel discoveries at a faster pace than singular disease research. Conclusions Our findings suggest that the frequency of cross-disease study is low despite the high level of genetic similarity among many human diseases, and that collaborative methods may accelerate and increase the impact of new genetic discoveries. Until we have a better
Integration of pines into the large scope of plant biology research depends on study of pines in parallel with study of annual plants, and on availability of research materials from pine to plant biologists interested in comparing pine with annual plant systems. The objectives of the Pine Gene Discovery Project were to obtain 10,000 partial DNA sequences of genes expressed in loblolly pine, to determine which of those pine genes were similar to known genes from other organisms, and to make the DNA sequences and isolated pine genes available to plant researchers to stimulate integration of pines into the wider scope of plant biology research. Those objectives have been completed, and the results are available to the public. Requests for pine genes have been received from a number of laboratories that would otherwise not have included pine in their research, indicating that progress is being made toward the goal of integrating pine research into the larger molecular biology research community
Whetten, R. W.; Sederoff, R. R.; Kinlaw, C.; Retzel, E.
Integration of pines into the large scope of plant biology research depends on study of pines in parallel with study of annual plants, and on availability of research materials from pine to plant biologists interested in comparing pine with annual plant systems. The objectives of the Pine Gene Discovery Project were to obtain 10,000 partial DNA sequences of genes expressed in loblolly pine, to determine which of those pine genes were similar to known genes from other organisms, and to make the DNA sequences and isolated pine genes available to plant researchers to stimulate integration of pines into the wider scope of plant biology research. Those objectives have been completed, and the results are available to the public. Requests for pine genes have been received from a number of laboratories that would otherwise not have included pine in their research, indicating that progress is being made toward the goal of integrating pine research into the larger molecular biology research community.
El-Sayed Najib M
Full Text Available Abstract Background Perkinsus marinus, a protozoan parasite of the eastern oyster Crassostrea virginica, has devastated natural and farmed oyster populations along the Atlantic and Gulf coasts of the United States. It is classified as a member of the Perkinsozoa, a recently established phylum considered close to the ancestor of ciliates, dinoflagellates, and apicomplexans, and a key taxon for understanding unique adaptations (e.g. parasitism within the Alveolata. Despite intense parasite pressure, no disease-resistant oysters have been identified and no effective therapies have been developed to date. Results To gain insight into the biological basis of the parasite's virulence and pathogenesis mechanisms, and to identify genes encoding potential targets for intervention, we generated >31,000 5' expressed sequence tags (ESTs derived from four trophozoite libraries generated from two P. marinus strains. Trimming and clustering of the sequence tags yielded 7,863 unique sequences, some of which carry a spliced leader. Similarity searches revealed that 55% of these had hits in protein sequence databases, of which 1,729 had their best hit with proteins from the chromalveolates (E-value ≤ 1e-5. Some sequences are similar to those proven to be targets for effective intervention in other protozoan parasites, and include not only proteases, antioxidant enzymes, and heat shock proteins, but also those associated with relict plastids, such as acetyl-CoA carboxylase and methyl erythrithol phosphate pathway components, and those involved in glycan assembly, protein folding/secretion, and parasite-host interactions. Conclusions Our transcriptome analysis of P. marinus, the first for any member of the Perkinsozoa, contributes new insight into its biology and taxonomic position. It provides a very informative, albeit preliminary, glimpse into the expression of genes encoding functionally relevant proteins as potential targets for chemotherapy, and evidence
Hristovski, D.; Friedman, C.; Rindflesch, T. C.; Peterlin, B.
Literature-based discovery (LBD) is an emerging methodology for uncovering nonovert relationships in the online research literature. Making such relationships explicit supports hypothesis generation and discovery. Currently LBD systems depend exclusively on co-occurrence of words or concepts in target documents, regardless of whether relations actually exist between the words or concepts. We describe a method to enhance LBD through capture of semantic relations from the literature via use of natural language processing (NLP). This paper reports on an application of LBD that combines two NLP systems: BioMedLEE and SemRep, which are coupled with an LBD system called BITOLA. The two NLP systems complement each other to increase the types of information utilized by BITOLA. We also discuss issues associated with combining heterogeneous systems. Initial experiments suggest this approach can uncover new associations that were not possible using previous methods.
Verstraeten, Aline; Alaerts, Maaike; Van Laer, Lut; Loeys, Bart
Marfan syndrome (MFS) is a rare, autosomal-dominant, multisystem disorder, presenting with skeletal, ocular, skin, and cardiovascular symptoms. Significant clinical overlap with other systemic connective tissue diseases, including Loeys-Dietz syndrome (LDS), Shprintzen-Goldberg syndrome (SGS), and the MASS phenotype, has been documented. In MFS and LDS, the cardiovascular manifestations account for the major cause of patient morbidity and mortality, rendering them the main target for therapeutic intervention. Over the past decades, gene identification studies confidently linked the aforementioned syndromes, as well as nonsyndromic aneurysmal disease, to genetic defects in proteins related to the transforming growth factor (TGF)-β pathway, greatly expanding our knowledge on the disease mechanisms and providing us with novel therapeutic targets. As a result, the focus of the developing pharmacological treatment strategies is shifting from hemodynamic stress management to TGF-β antagonism. In this review, we discuss the insights that have been gained in the molecular biology of MFS and related disorders over the past 25 years. PMID:26919284
There are many approaches to data mining and knowledge discovery (DM&KD), including neural networks, closest neighbor methods, and various statistical methods. This monograph, however, focuses on the development and use of a novel approach, based on mathematical logic, that the author and his research associates have worked on over the last 20 years. The methods presented in the book deal with key DM&KD issues in an intuitive manner and in a natural sequence. Compared to other DM&KD methods, those based on mathematical logic offer a direct and often intuitive approach for extracting easily int
Hao, Ge-Fei; Jiang, Wen; Ye, Yuan-Nong; Wu, Feng-Xu; Zhu, Xiao-Lei; Guo, Feng-Biao; Yang, Guang-Fu
In order to foster innovation and improve the effectiveness of drug discovery, there is a considerable interest in exploring unknown 'chemical space' to identify new bioactive compounds with novel and diverse scaffolds. Hence, fragment-based drug discovery (FBDD) was developed rapidly due to its advanced expansive search for 'chemical space', which can lead to a higher hit rate and ligand efficiency (LE). However, computational screening of fragments is always hampered by the promiscuous binding model. In this study, we developed a new web server Auto Core Fragment in silico Screening (ACFIS). It includes three computational modules, PARA_GEN, CORE_GEN and CAND_GEN. ACFIS can generate core fragment structure from the active molecule using fragment deconstruction analysis and perform in silico screening by growing fragments to the junction of core fragment structure. An integrated energy calculation rapidly identifies which fragments fit the binding site of a protein. We constructed a simple interface to enable users to view top-ranking molecules in 2D and the binding mode in 3D for further experimental exploration. This makes the ACFIS a highly valuable tool for drug discovery. The ACFIS web server is free and open to all users at http://chemyang.ccnu.edu.cn/ccb/server/ACFIS/. PMID:27150808
Huang, Angela Lilly
There are five basic taste modalities in mammals: bitter, sweet, sour, salty, and Umami (taste of MSG and L-amino acids). Receptors for bitter, sweet, and Umami were previously discovered. Identities of receptors for salty and sour taste modalities remained elusive. In this dissertation, I will present: 1) development of a novel bioinformatics screen to discover candidate receptors; 2) discovery of a novel gene, PKD2L1, in taste receptor cells; 3) evidence demonstrating PKD2L1-expressing tast...
Rybnik, Mariusz; Jastrzebska, Agnieszka
The paper is focused on automated knowledge discovery in musical pieces, based on transformations of digital musical notation. Usually a single musical piece is analyzed, to discover the structure as well as traits of separate voices. Melody and rhythm is processed with the use of three proposed operators, that serve as meta-data. In this work we focus on melody, so the processed data is labeled using fuzzy labels, created for detecting various voice characteristics. A comparative analysis of two musical pieces may be performed as well, that compares them in terms of various rhythmic or melodic traits (as a whole or with voice separation).
Full Text Available Abstract Background High throughput methods of the genome era produce vast amounts of data in the form of gene lists. These lists are large and difficult to interpret without advanced computational or bioinformatic tools. Most existing methods analyse a gene list as a single entity although it is comprised of multiple gene groups associated with separate biological functions. Therefore it is imperative to define and visualize gene groups with unique functionality within gene lists. Results In order to analyse the functional heterogeneity within a gene list, we have developed a method that clusters genes to groups with homogenous functionalities. The method uses Non-negative Matrix Factorization (NMF to create several clustering results with varying numbers of clusters. The obtained clustering results are combined into a simple graphical presentation showing the functional groups over-represented in the analyzed gene list. We demonstrate its performance on two data sets and show results that improve upon existing methods. The comparison also shows that our method creates a more simplified view that aids in discovery of biological themes within the list and discards less informative classes from the results. Conclusion The presented method and associated software are useful for the identification and interpretation of biological functions associated with gene lists and are especially useful for the analysis of large lists.
Opium poppy (Papaver somniferum) accumulates the analgesic benzyl-isoquinoline alkaloids morphine, codeine and thebaine, and remains one of the world's most important medicinal plants. The development of varieties that accumulate valuable compounds, such as thebaine and codeine, but not morphine precludes the illicit synthesis of heroin (O,O-diacetylmorphine) and has led to the establishment of alternative cash crops. Novel cDNAs encoding a growing number of biosynthetic enzymes have been isolated, and various -omics resources including EST databases and DNA microarray chips have been established. However, the full potential of functional genomics as a tool for gene discovery in opium poppy remains limited by the relative inefficiency of genetic transformation protocols, which also restricts the application of metabolic engineering for both experimental and commercial purposes. We are establishing an effective functional genomics initiative based on induced mutagenesis and recently developed reverse genetics methodology, such as TILLING (Targeting Induced Local Lesions IN Genomes), with the aim of identifying biosynthetic genes that can be used to engineer opium poppy for the production of copious levels of high-value pharmaceutical alkaloids. Mutagenesis involves the treatment of seeds with ethyl methane sulfonate (EMS) or by fast-neutron bombardment (FNB). In preliminary experiments with EMS-treated seeds, the screening of 1,250 independent M2 plants led to the isolation of four mutants that displayed two distinctly altered alkaloid profiles. Two lines accumulated the central pathway intermediate reticuline and relatively low levels of morphine, codeine and thebaine compared to wild-type plants. Two other lines showed the unusual accumulation in the latex of the antimicrobial alkaloid sanguinarine, which is the product of a branch pathway distinct from that leading to morphine. The present status of -omics resources and functional genomics platforms available to
Opium poppy (Papaver somniferum) accumulates the analgesic alkaloids morphine, codeine and thebaine, and remains one of the world's most important medicinal plants. The development of varieties that accumulate valuable compounds, such as thebaine and codeine, but not morphine precludes the illicit synthesis of heroin (O,O-diacetylmorphine) and has created opportunities to establish alternative cash crops. Novel cDNAs encoding more than a dozen biosynthetic enzymes have been isolated, and substantial EST databases and DNA microarray chips have been established. The full potential of functional genomics as a tool for gene discovery in opium poppy remains limited by the relative inefficiency of genetic transformation protocols, which also restricts the application of metabolic engineering for both experimental and commercial purposes. We are establishing an effective functional genomics initiative based on induced mutagenesis and TILLING (Targeting Induced Local Lesions IN Genomes) and with the aim of identifying biosynthetic genes that can be used to engineer opium poppy to produce copious levels of high-value pharmaceutical alkaloids. Mutagenesis involves the treatment of seeds by fast-neutron bombardment (FNB) or with ethyl methane sulfonate (EMS). Mutagenized opium poppy plants are cultivated in a secure underground growth facility in partnership with a Canadian biotechnology company. In preliminary experiments with EMS-treated seeds, the screening of 1,250 independent M2 plants led to the isolation of four mutants that displayed two distinctly altered alkaloid profiles. Two lines accumulated the central pathway intermediate (S)- reticuline and only low levels of morphine, codeine and thebaine. Two other lines showed the unusual accumulation of the antimicrobial alkaloid sanguinarine, which is the product of a branch pathway distinct from that leading to morphine, in the latex. The present status of -omics resources and functional genomics platforms available to
Vikash K. Singh
Full Text Available Flower development is one of the major developmental processes that governs seed setting in angiosperms. However, little is known about the molecular mechanisms underlying flower development in legumes. Employing RNA-seq for various stages of flower development and few vegetative tissues in chickpea, we identified differentially expressed genes in flower tissues/stages in comparison to vegetative tissues, which are related to various biological processes and molecular functions during flower development. Here, we provide details of experimental methods, RNA-seq data (available at Gene Expression Omnibus database under GSE42679 and analysis pipeline published by Singh and colleagues in the Plant Biotechnology Journal (Singh et al., 2013, along with additional analysis for discovery of genes involved in shoot apical meristem (SAM development. Our data provide a resource for exploring the complex molecular mechanisms underlying SAM and flower development and identification of gene targets for functional and applied genomics in legumes.
Velarde, Gissel; Meredith, David; Weyde, Tillman
We present a computational method for pattern discovery based on the application of the wavelet transform to symbolic representations of melodies or monophonic voices. We model the importance of a discovered pattern in terms of the compression ratio that can be achieved by using it to describe that...... part of the melody covered by its occurrences. The proposed method resembles that of paradigmatic analysis developed by Ruwet (1966) and Nattiez (1975). In our approach, melodies are represented either as ‘raw’ 1-dimensional pitch signals or as these signals filtered with the continuous wavelet...... transform (CWT) at a single scale using the Haar wavelet. These representations are segmented using various approaches and the segments are then concatenated based on their similarity. The concatenated segments are compared, clustered and ranked. The method was evaluated on two musicological tasks...
OhEigeartaigh, Sean S
Abstract Background In standard BLAST searches, no information other than the sequences of the query and the database entries is considered. However, in situations where two genes from different species have only borderline similarity in a BLAST search, the discovery that the genes are located within a region of conserved gene order (synteny) can provide additional evidence that they are orthologs. Thus, for interpreting borderline search results, it would be useful to know whether the syntenic context of a database hit is similar to that of the query. This principle has often been used in investigations of particular genes or genomic regions, but to our knowledge it has never been implemented systematically. Results We made use of the synteny information contained in the Yeast Gene Order Browser database for 11 yeast species to carry out a systematic search for protein-coding genes that were overlooked in the original annotations of one or more yeast genomes but which are syntenic with their orthologs. Such genes tend to have been overlooked because they are short, highly divergent, or contain introns. The key features of our software - called SearchDOGS - are that the database entries are classified into sets of genomic segments that are already known to be orthologous, and that very weak BLAST hits are retained for further analysis if their genomic location is similar to that of the query. Using SearchDOGS we identified 595 additional protein-coding genes among the 11 yeast species, including two new genes in Saccharomyces cerevisiae. We found additional genes for the mating pheromone a-factor in six species including Kluyveromyces lactis. Conclusions SearchDOGS has proven highly successful for identifying overlooked genes in the yeast genomes. We anticipate that our approach can be adapted for study of further groups of species, such as bacterial genomes. More generally, the concept of doing sequence similarity searches against databases to which external
Pankratius, V.; Gowanlock, M.; Blair, D. M.
Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).
Full Text Available Abstract Background The Signal-to-Noise-Ratio (SNR is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs. These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems. Results and discussion To evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer. For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC, Nearest Mean Classifier (NMC, Support Vector Machine (SVM classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA and one-vs-one (OVO strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the
Full Text Available Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC, is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC's ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting "disease map" network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung's disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks.
Tam, GHF; Hung, YS; Chang, C.
Evaluation of gene regulatory network (GRN) discovery methods relies heavily on synthetic time series. However, synthetic data generated by traditional method deviate a lot from real data, making such evaluation questionable. Guiding by decaying sinusoids, we propose a new method that generates synthetic data resembling human (HeLa) cell-cycle gene expression data. Using the new synthetic data, a simple comparison between four GRN discovery methods reveals that Granger causality (GC) methods ...
Lee H. Pratt
Full Text Available The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs, and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.
Full Text Available BACKGROUND: Constructing coexpression networks and performing network analysis using large-scale gene expression data sets is an effective way to uncover new biological knowledge; however, the methods used for gene association in constructing these coexpression networks have not been thoroughly evaluated. Since different methods lead to structurally different coexpression networks and provide different information, selecting the optimal gene association method is critical. METHODS AND RESULTS: In this study, we compared eight gene association methods - Spearman rank correlation, Weighted Rank Correlation, Kendall, Hoeffding's D measure, Theil-Sen, Rank Theil-Sen, Distance Covariance, and Pearson - and focused on their true knowledge discovery rates in associating pathway genes and construction coordination networks of regulatory genes. We also examined the behaviors of different methods to microarray data with different properties, and whether the biological processes affect the efficiency of different methods. CONCLUSIONS: We found that the Spearman, Hoeffding and Kendall methods are effective in identifying coexpressed pathway genes, whereas the Theil-sen, Rank Theil-Sen, Spearman, and Weighted Rank methods perform well in identifying coordinated transcription factors that control the same biological processes and traits. Surprisingly, the widely used Pearson method is generally less efficient, and so is the Distance Covariance method that can find gene pairs of multiple relationships. Some analyses we did clearly show Pearson and Distance Covariance methods have distinct behaviors as compared to all other six methods. The efficiencies of different methods vary with the data properties to some degree and are largely contingent upon the biological processes, which necessitates the pre-analysis to identify the best performing method for gene association and coexpression network construction.
Yang Mark CK
Full Text Available Abstract Background A large number of genes usually show differential expressions in a microarray experiment with two types of tissues, and the p-values of a proper statistical test are often used to quantify the significance of these differences. The genes with small p-values are then picked as the genes responsible for the differences in the tissue RNA expressions. One key question is what should be the threshold to consider the p-values small. There is always a trade off between this threshold and the rate of false claims. Recent statistical literature shows that the false discovery rate (FDR criterion is a powerful and reasonable criterion to pick those genes with differential expression. Moreover, the power of detection can be increased by knowing the number of non-differential expression genes. While this number is unknown in practice, there are methods to estimate it from data. The purpose of this paper is to present a new method of estimating this number and use it for the FDR procedure construction. Results A combination of test functions is used to estimate the number of differentially expressed genes. Simulation study shows that the proposed method has a higher power to detect these genes than other existing methods, while still keeping the FDR under control. The improvement can be substantial if the proportion of true differentially expressed genes is large. This procedure has also been tested with good results using a real dataset. Conclusion For a given expected FDR, the method proposed in this paper has better power to pick genes that show differentiation in their expression than two other well known methods.
Full Text Available Biomarker discovery can identify molecular markers in various cancers that can be used for detection, screening, diagnosis, and monitoring of disease progression. Lectin-affinity is a technique that can be used for the enrichment of glycoproteins from a complex sample, facilitating the discovery of novel cancer biomarkers associated with a disease state.
Rad, Roland; Rad, Lena; Wang, Wei; Cadinanos, Juan; Vassiliou, George; Rice, Stephen; Campos, Lia S; Yusa, Kosuke; Banerjee, Ruby; Li, Meng Amy; de la Rosa, Jorge; Strong, Alexander; Lu, Dong; Ellis, Peter; Conte, Nathalie; Yang, Fang Tang; Liu, Pentao; Bradley, Allan
Transposons are mobile DNA segments that can disrupt gene function by inserting in or near genes. Here, we show that insertional mutagenesis by the PiggyBac transposon can be used for cancer gene discovery in mice. PiggyBac transposition in genetically engineered transposon-transposase mice induced cancers whose type (hematopoietic versus solid) and latency were dependent on the regulatory elements introduced into transposons. Analysis of 63 hematopoietic tumors revealed that PiggyBac is capable of genome-wide mutagenesis. The PiggyBac screen uncovered many cancer genes not identified in previous retroviral or Sleeping Beauty transposon screens, including Spic, which encodes a PU.1-related transcription factor, and Hdac7, a histone deacetylase gene. PiggyBac and Sleeping Beauty have different integration preferences. To maximize the utility of the tool, we engineered 21 mouse lines to be compatible with both transposon systems in constitutive, tissue- or temporal-specific mutagenesis. Mice with different transposon types, copy numbers, and chromosomal locations support wide applicability. PMID:20947725
Lee, Jihoon; Jeon, Seungwoo; Kim, Jaehoon
Multi-hop Wireless LAN-based mesh network (WMN) provides high capacity and self-configuring capabilities. Due to data forwarding and path selection based on MAC address, WMN requires additional operations to achieve global connectivity using IPv6 address. The neighbor discovery operation over WLAN mesh networks requires repeated all-node broadcasting and this gives rise to a big burden in the entire mesh networks. In this letter, we propose the proxy neighbor discovery scheme for optimized IPv6 communication over WMN to reduce network overhead and communication latency. Using simulation experiments, we show that the control overhead and communication setup latency can be significantly reduced using the proxy-based neighbor discovery mechanism.
Full Text Available Prioritising candidate genes for further experimental characterisation is a non-trivial challenge in drug discovery and biomedical research in general. An integrated approach that combines results from multiple data types is best suited for optimal target selection. We developed TargetMine, a data warehouse for efficient target prioritisation. TargetMine utilises the InterMine framework, with new data models such as protein-DNA interactions integrated in a novel way. It enables complicated searches that are difficult to perform with existing tools and it also offers integration of custom annotations and in-house experimental data. We proposed an objective protocol for target prioritisation using TargetMine and set up a benchmarking procedure to evaluate its performance. The results show that the protocol can identify known disease-associated genes with high precision and coverage. A demonstration version of TargetMine is available at http://targetmine.nibio.go.jp/.
Desbois, Andrew P
Drug discovery from marine organisms has been underway for > 60 years and there have been notable successes in discovering, developing and introducing clinical agents derived from marine sources. Such examples include: the analgesic ziconotide and the anti cancer compound trabectedin. However, in light of the pressing need for new drugs, particularly those with anti-infective and anticancer properties, there is strong justification for increased exploration of marine organisms as sources of novel compounds. This article considers approaches that might enhance our chances of delivering new medicines from marine-based drug discovery efforts. Consideration is given to the organisms and habitats deserving of more attention and how we might make best use of these marine genetic resources. In particular, the opportunities offered by synthetic biology are highlighted because these methods allow drug discoverers to explore pathways in 'non-culturable' species and turn on natural product biosynthesis genes that are difficult to activate under laboratory conditions (so-called 'silent' gene clusters). PMID:24909595
Nandini S Sidnal; Manvi, Sunilkumar S
Internet enabled auctions are one of the popular application which basically require a web service discovery mechanism that is efficient in all perspectives. This paper focuses on auction service discovery and building repository of services for the use of E-customers. The auction service directory (repository) is developed based on the customerâ€™s desires. Agent based Belief Desire Intention (BDI) architecture is used in this model, not only to support the service discovery process in spott...
Yetisgen-Yildiz, Meliha; Pratt, Wanda
While medical researchers formulate new hypotheses to test, they need to identify connections to their work from other parts of the medical literature. However, the current volume of information has become a great barrier for this task. Recently, many literature-based discovery (LBD) systems have been developed to help researchers identify new knowledge that bridges gaps across distinct sections of the medical literature. Each LBD system uses different methods for mining the connections from text and ranking the identified connections, but none of the currently available LBD evaluation approaches can be used to compare the effectiveness of these methods. In this paper, we present an evaluation methodology for LBD systems that allows comparisons across different systems. We demonstrate the abilities of our evaluation methodology by using it to compare the performance of different correlation-mining and ranking approaches used by existing LBD systems. This evaluation methodology should help other researchers compare approaches, make informed algorithm choices, and ultimately help to improve the performance of LBD systems overall. PMID:19124086
Sparks, Thomas C; Crouse, Gary D; Dripps, James E; Anzeveno, Peter; Martynow, Jacek; Deamicis, Carl V; Gifford, James
Improvements in the efficacy and spectrum of the spinosyns, novel fermentation derived insecticide, has long been a goal within Dow AgroSciences. As large and complex fermentation products identifying specific modifications to the spinosyns likely to result in improved activity was a difficult process, since most modifications decreased the activity. A variety of approaches were investigated to identify new synthetic directions for the spinosyn chemistry including several explorations of the quantitative structure activity relationships (QSAR) of spinosyns, which initially were unsuccessful. However, application of artificial neural networks (ANN) to the spinosyn QSAR problem identified new directions for improved activity in the chemistry, which subsequent synthesis and testing confirmed. The ANN-based analogs coupled with other information on substitution effects resulting from spinosyn structure activity relationships lead to the discovery of spinetoram (XDE-175). Launched in late 2007, spinetoram provides both improved efficacy and an expanded spectrum while maintaining the exceptional environmental and toxicological profile already established for the spinosyn chemistry. PMID:18344004
Judith R Denery
Full Text Available BACKGROUND: Development of robust, sensitive, and reproducible diagnostic tests for understanding the epidemiology of neglected tropical diseases is an integral aspect of the success of worldwide control and elimination programs. In the treatment of onchocerciasis, clinical diagnostics that can function in an elimination scenario are non-existent and desperately needed. Due to its sensitivity and quantitative reproducibility, liquid chromatography-mass spectrometry (LC-MS based metabolomics is a powerful approach to this problem. METHODOLOGY/PRINCIPAL FINDINGS: Analysis of an African sample set comprised of 73 serum and plasma samples revealed a set of 14 biomarkers that showed excellent discrimination between Onchocerca volvulus-positive and negative individuals by multivariate statistical analysis. Application of this biomarker set to an additional sample set from onchocerciasis endemic areas where long-term ivermectin treatment has been successful revealed that the biomarker set may also distinguish individuals with worms of compromised viability from those with active infection. Machine learning extended the utility of the biomarker set from a complex multivariate analysis to a binary format applicable for adaptation to a field-based diagnostic, validating the use of complex data mining tools applied to infectious disease biomarker discovery and diagnostic development. CONCLUSIONS/SIGNIFICANCE: An LC-MS metabolomics-based diagnostic has the potential to monitor the progression of onchocerciasis in both endemic and non-endemic geographic areas, as well as provide an essential tool to multinational programs in the ongoing fight against this neglected tropical disease. Ultimately this technology can be expanded for the diagnosis of other filarial and/or neglected tropical diseases.
Full Text Available Abstract Background Identification of drug-drug and drug-diseases interactions can pose a difficult problem to cope with, as the increasingly large number of available drugs coupled with the ongoing research activities in the pharmaceutical domain, make the task of discovering relevant information difficult. Although international standards, such as the ICD-10 classification and the UNII registration, have been developed in order to enable efficient knowledge sharing, medical staff needs to be constantly updated in order to effectively discover drug interactions before prescription. The use of Semantic Web technologies has been proposed in earlier works, in order to tackle this problem. Results This work presents a semantic-enabled online service, named GalenOWL, capable of offering real time drug-drug and drug-diseases interaction discovery. For enabling this kind of service, medical information and terminology had to be translated to ontological terms and be appropriately coupled with medical knowledge of the field. International standards such as the aforementioned ICD-10 and UNII, provide the backbone of the common representation of medical data, while the medical knowledge of drug interactions is represented by a rule base which makes use of the aforementioned standards. Details of the system architecture are presented while also giving an outline of the difficulties that had to be overcome. A comparison of the developed ontology-based system with a similar system developed using a traditional business logic rule engine is performed, giving insights on the advantages and drawbacks of both implementations. Conclusions The use of Semantic Web technologies has been found to be a good match for developing drug recommendation systems. Ontologies can effectively encapsulate medical knowledge and rule-based reasoning can capture and encode the drug interactions knowledge.
Full Text Available Web Services standard has been broadly acknowledged by industries and academic researches along with the progress of web technology and e-business. Increasing number of web applications have been bundled as web services that can be published, positioned and invoked across the web. The importance of the issues regarding their publication and innovation attains a maximum as web services multiply and become more advanced and mutually dependent. With the intension of determining the web services through effiective manner with in the minimum time period in this study proposes an UDDI with intelligent serach engine. In order to publishing and discovering web services initially, the web services are published in the UDDI registry subsequently the published web services are indexed. To improve the efficiency of discovery of web services, the indexed web services are saved as index database. The search query is compared with the index database for discovering of web services and the discovered web services are given to the service customer. The way of accessing the web services is stored in a log file, which is then utilized to provide personalized web services to the user. The finding of web service is enhanced significantly by means of an efficient exploring capability provided by the proposed system and it is accomplished of providing the maximum appropriate web service. Universal Description, Discovery and Integration (UDDI.
Meir, Yaa-Jyuhn J; Weirauch, Matthew T.; Yang, Herng-Shing; Chung, Pei-Cheng; Yu, Robert K.; Wu, Sareina C-Y
Background DNA transposons have emerged as indispensible tools for manipulating vertebrate genomes with applications ranging from insertional mutagenesis and transgenesis to gene therapy. To fully explore the potential of two highly active DNA transposons, piggyBac and Tol2, as mammalian genetic tools, we have conducted a side-by-side comparison of the two transposon systems in the same setting to evaluate their advantages and disadvantages for use in gene therapy and gene discovery. Results ...
Amrine, Katherine C H; Blanco-Ulate, Barbara; Cantu, Dario
Intricate signal networks and transcriptional regulators translate the recognition of pathogens into defense responses. In this study, we carried out a gene co-expression analysis of all currently publicly available microarray data, which were generated in experiments that studied the interaction of the model plant Arabidopsis thaliana with microbial pathogens. This work was conducted to identify (i) modules of functionally related co-expressed genes that are differentially expressed in response to multiple biotic stresses, and (ii) hub genes that may function as core regulators of disease responses. Using Weighted Gene Co-expression Network Analysis (WGCNA) we constructed an undirected network leveraging a rich curated expression dataset comprising 272 microarrays that involved microbial infections of Arabidopsis plants with a wide array of fungal and bacterial pathogens with biotrophic, hemibiotrophic, and necrotrophic lifestyles. WGCNA produced a network with scale-free and small-world properties composed of 205 distinct clusters of co-expressed genes. Modules of functionally related co-expressed genes that are differentially regulated in response to multiple pathogens were identified by integrating differential gene expression testing with functional enrichment analyses of gene ontology terms, known disease associated genes, transcriptional regulators, and cis-regulatory elements. The significance of functional enrichments was validated by comparisons with randomly generated networks. Network topology was then analyzed to identify intra- and inter-modular gene hubs. Based on high connectivity, and centrality in meta-modules that are clearly enriched in defense responses, we propose a list of 66 target genes for reverse genetic experiments to further dissect the Arabidopsis immune system. Our results show that statistical-based data trimming prior to network analysis allows the integration of expression datasets generated by different groups, under different
Katherine C H Amrine
Full Text Available Intricate signal networks and transcriptional regulators translate the recognition of pathogens into defense responses. In this study, we carried out a gene co-expression analysis of all currently publicly available microarray data, which were generated in experiments that studied the interaction of the model plant Arabidopsis thaliana with microbial pathogens. This work was conducted to identify (i modules of functionally related co-expressed genes that are differentially expressed in response to multiple biotic stresses, and (ii hub genes that may function as core regulators of disease responses. Using Weighted Gene Co-expression Network Analysis (WGCNA we constructed an undirected network leveraging a rich curated expression dataset comprising 272 microarrays that involved microbial infections of Arabidopsis plants with a wide array of fungal and bacterial pathogens with biotrophic, hemibiotrophic, and necrotrophic lifestyles. WGCNA produced a network with scale-free and small-world properties composed of 205 distinct clusters of co-expressed genes. Modules of functionally related co-expressed genes that are differentially regulated in response to multiple pathogens were identified by integrating differential gene expression testing with functional enrichment analyses of gene ontology terms, known disease associated genes, transcriptional regulators, and cis-regulatory elements. The significance of functional enrichments was validated by comparisons with randomly generated networks. Network topology was then analyzed to identify intra- and inter-modular gene hubs. Based on high connectivity, and centrality in meta-modules that are clearly enriched in defense responses, we propose a list of 66 target genes for reverse genetic experiments to further dissect the Arabidopsis immune system. Our results show that statistical-based data trimming prior to network analysis allows the integration of expression datasets generated by different groups
Cirulli, Elizabeth T.
In recent years, genome and exome sequencing studies have implicated a plethora of new disease genes with rare causal variants. Here, I review 150 exome sequencing studies that claim to have discovered that a disease can be caused by different rare variants in the same gene, and I determine whether their methods followed the current best-practice guidelines in the interpretation of their data. Specifically, I assess whether studies appropriately assess controls for rare variants throughout the entire gene or implicated region as opposed to only investigating the specific rare variants identified in the cases, and I assess whether studies present sufficient co-segregation data for statistically significant linkage. I find that the proportion of studies performing gene-based analyses has increased with time, but that even in 2015 fewer than 40% of the reviewed studies used this method, and only 10% presented statistically significant co-segregation data. Furthermore, I find that the genes reported in these papers are explaining a decreasing proportion of cases as the field moves past most of the low-hanging fruit, with 50% of the genes from studies in 2014 and 2015 having variants in fewer than 5% of cases. As more studies focus on genes explaining relatively few cases, the importance of performing appropriate gene-based analyses is increasing. It is becoming increasingly important for journal editors and reviewers to require stringent gene-based evidence to avoid an avalanche of misleading disease gene discovery papers. PMID:27055023
Full Text Available Big geospatial data are archived and made available through online web discovery and access. However, finding the right data for scientific research and application development is still a challenge. This paper aims to improve the data discovery by mining the user knowledge from log files. Specifically, user web session reconstruction is focused upon in this paper as a critical step for extracting usage patterns. However, reconstructing user sessions from raw web logs has always been difficult, as a session identifier tends to be missing in most data portals. To address this problem, we propose two session identification methods, including time-clustering-based and time-referrer-based methods. We also present the workflow of session reconstruction and discuss the approach of selecting appropriate thresholds for relevant steps in the workflow. The proposed session identification methods and workflow are proven to be able to extract data access patterns for further pattern analyses of user behavior and improvement of data discovery for more relevancy data ranking, suggestion, and navigation.
candidate genes and their expression profile showed predominance in specific stress-challenged libraries. Conclusion Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species.
Cruz-Morales, Pablo; Kopp, Johannes Florian; Martínez-Guerrero, Christian; Yáñez-Guerra, Luis Alfonso; Selem-Mojica, Nelly; Ramos-Aboites, Hilda; Feldmann, Jörg; Barona-Gómez, Francisco
Natural products from microbes have provided humans with beneficial antibiotics for millennia. However, a decline in the pace of antibiotic discovery exerts pressure on human health as antibiotic resistance spreads, a challenge that may better faced by unveiling chemical diversity produced by microbes. Current microbial genome mining approaches have revitalized research into antibiotics, but the empirical nature of these methods limits the chemical space that is explored. Here, we address the problem of finding novel pathways by incorporating evolutionary principles into genome mining. We recapitulated the evolutionary history of twenty-three enzyme families previously uninvestigated in the context of natural product biosynthesis in Actinobacteria, the most proficient producers of natural products. Our genome evolutionary analyses where based on the assumption that expanded—repurposed enzyme families—from central metabolism, occur frequently and thus have the potential to catalyze new conversions in the context of natural products biosynthesis. Our analyses led to the discovery of biosynthetic gene clusters coding for hidden chemical diversity, as validated by comparing our predictions with those from state-of-the-art genome mining tools; as well as experimentally demonstrating the existence of a biosynthetic pathway for arseno-organic metabolites in Streptomyces coelicolor and Streptomyces lividans, Using a gene knockout and metabolite profile combined strategy. As our approach does not rely solely on sequence similarity searches of previously identified biosynthetic enzymes, these results establish the basis for the development of an evolutionary-driven genome mining tool termed EvoMining that complements current platforms. We anticipate that by doing so real ‘chemical dark matter’ will be unveiled. PMID:27289100
Cruz-Morales, Pablo; Kopp, Johannes Florian; Martínez-Guerrero, Christian; Yáñez-Guerra, Luis Alfonso; Selem-Mojica, Nelly; Ramos-Aboites, Hilda; Feldmann, Jörg; Barona-Gómez, Francisco
Natural products from microbes have provided humans with beneficial antibiotics for millennia. However, a decline in the pace of antibiotic discovery exerts pressure on human health as antibiotic resistance spreads, a challenge that may better faced by unveiling chemical diversity produced by microbes. Current microbial genome mining approaches have revitalized research into antibiotics, but the empirical nature of these methods limits the chemical space that is explored.Here, we address the problem of finding novel pathways by incorporating evolutionary principles into genome mining. We recapitulated the evolutionary history of twenty-three enzyme families previously uninvestigated in the context of natural product biosynthesis in Actinobacteria, the most proficient producers of natural products. Our genome evolutionary analyses where based on the assumption that expanded-repurposed enzyme families-from central metabolism, occur frequently and thus have the potential to catalyze new conversions in the context of natural products biosynthesis. Our analyses led to the discovery of biosynthetic gene clusters coding for hidden chemical diversity, as validated by comparing our predictions with those from state-of-the-art genome mining tools; as well as experimentally demonstrating the existence of a biosynthetic pathway for arseno-organic metabolites in Streptomyces coelicolor and Streptomyces lividans, Using a gene knockout and metabolite profile combined strategy.As our approach does not rely solely on sequence similarity searches of previously identified biosynthetic enzymes, these results establish the basis for the development of an evolutionary-driven genome mining tool termed EvoMining that complements current platforms. We anticipate that by doing so real 'chemical dark matter' will be unveiled. PMID:27289100
Khan Shafiq A
Full Text Available Abstract Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells.
Sabita Barik; Debahuti Mishra; Shruti Mishra; Sandeep Ku. Satapathy; Amiya Ku. Rath; Milu Acharya
Abstract- The goal of microarray experiments is to identify genes that are differentially transcribed with respect to different biological conditions of cell cultures and samples. Hence, method of data analysis needs to be carefully evaluated such as clustering, classification, prediction etc. In this paper, we have proposed an efficient frequent pattern based clustering to find the gene which forms frequent patterns showing similar phenotypes leading to specific symptoms for specific disease...
da Silva Felipe R
Full Text Available Abstract Background Due to its origin, peanut has a very narrow genetic background. Wild relatives can be a source of genetic variability for cultivated peanut. In this study, the transcriptome of the wild species Arachis stenosperma accession V10309 was analyzed. Results ESTs were produced from four cDNA libraries of RNAs extracted from leaves and roots of A. stenosperma. Randomly selected cDNA clones were sequenced to generate 8,785 ESTs, of which 6,264 (71.3% had high quality, with 3,500 clusters: 963 contigs and 2537 singlets. Only 55.9% matched homologous sequences of known genes. ESTs were classified into 23 different categories according to putative protein functions. Numerous sequences related to disease resistance, drought tolerance and human health were identified. Two hundred and six microsatellites were found and markers have been developed for 188 of these. The microsatellite profile was analyzed and compared to other transcribed and genomic sequence data. Conclusion This is, to date, the first report on the analysis of transcriptome of a wild relative of peanut. The ESTs produced in this study are a valuable resource for gene discovery, the characterization of new wild alleles, and for marker development. The ESTs were released in the [GenBank:EH041934 to EH048197].
Full Text Available To detect the wireless network equipment whether there is protocol vulnerability, using the method of modular design and implementation of a new suitable for Wi-Fi protocol vulnerability discovery fuzzy test framework. It can be independent of its transmission medium, produce deformity packet and implementation of the attack on the target system. The author firstly describes the wireless network protocol vulnerability discovery and fuzzy test in this paper，then focused on the test frame technical scheme, detailed technical realization and so on, and its application are analyzed. In the experimental stage the fuzzy test is applied to a wireless networks gateway, the test results show that the fuzzy test framework can be well applied to the wireless network equipment agreement loophole mining work.
Li, Xu; Mitton, Nathalie; Simplot-Ryl, David
International audience Hello protocol is the basic technique for neighborhood discovery in wireless ad hoc networks. It requires nodes to claim their existence/ aliveness by periodic 'hello' messages. Central to a hello protocol is the determination of 'hello' message transmission rate. No fixed optimal rate exists in the presence of node mobility. The rate should in fact adapt to it, high for high mobility and low for low mobility. In this paper, we propose a novel mobility prediction bas...
Le Sommer, Nicolas; Ben Sassi, Salma
Opportunistic networks are usually formed spontaneously by mobile devices equipped with short range wireless communication interfaces. Designing and implementing a routing protocol to support both service discovery and delivery in such kinds of networks is a challenging problem on account of frequent disconnections and topology changes. In these networks one of the most important issues relies on the selection of the best intermediate node(s) to forward the messages towards their destination(...
Tsai, Chen-An; Hsueh, Huey-miin; Chen, James J
Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes. If R denotes the number of rejections (declared significant genes) and V denotes the number of false rejections, then V/R, if R > 0, is the proportion of false rejected hypotheses. This paper proposes a model for the distribution of the number of rejections and the conditional distribution of V given R, V / R. Under the independence assumption, the distribution of R is a convolution of two binomials and the distribution of V / R has a noncentral hypergeometric distribution. Under an equicorrelated model, the distributions are more complex and are also derived. Five false discovery rate probability error measures are considered: FDR = E(V/R), pFDR = E(V/R / R > 0) (positive FDR), cFDR = E(V/R / R = r) (conditional FDR), mFDR = E(V)/E(R) (marginal FDR), and eFDR = E(V)/r (empirical FDR). The pFDR, cFDR, and mFDR are shown to be equivalent under the Bayesian framework, in which the number of true null hypotheses is modeled as a random variable. We present a parametric and a bootstrap procedure to estimate the FDRs. Monte Carlo simulations were conducted to evaluate the performance of these two methods. The bootstrap procedure appears to perform reasonably well, even when the alternative hypotheses are correlated (rho = .25). An example from a toxicogenomic microarray experiment is presented for illustration. PMID:14969487
Aggarwal, Amit; Guo, Dong Li; Hoshida, Yujin; Yuen, Siu Tsan; Chu, Kent-Man; So, Samuel; Boussioutas, Alex; Chen, Xin; Bowtell, David; Aburatani, Hiroyuki; Leung, Suet Yi; Tan, Patrick
Gastric cancer is a leading cause of global cancer mortality, but comparatively little is known about the cellular pathways regulating different aspects of the gastric cancer phenotype. To achieve a better understanding of gastric cancer at the levels of systems topology, functional modules, and constituent genes, we assembled and systematically analyzed a consensus gene coexpression meta-network of gastric cancer incorporating >300 tissue samples from four independent patient populations (the "gastrome"). We find that the gastrome exhibits a hierarchical scale-free architecture, with an internal structure comprising multiple deeply embedded modules associated with diverse cellular functions. Individual modules display distinct subtopologies, with some (cellular proliferation) being integrated within the primary network, and others (ribosomal biosynthesis) being relatively isolated. One module associated with intestinal differentiation exhibited a remarkably high degree of autonomy, raising the possibility that its specific topological features may contribute towards the frequent occurrence of intestinal metaplasia in gastric cancer. At the single-gene level, we discovered a novel conserved interaction between the PLA2G2A prognostic marker and the EphB2 receptor, and used tissue microarrays to validate the PLA2G2A/EphB2 association. Finally, because EphB2 is a known target of the Wnt signaling pathway, we tested and provide evidence that the Wnt pathway may also similarly regulate PLA2G2A. Many of these findings were not discernible by studying the single patient populations in isolation. Thus, besides enhancing our knowledge of gastric cancer, our results show the broad utility of applying meta-analytic approaches to genome-wide data for the purposes of biological discovery. PMID:16397236
Dang, Thu Thuy T; Onoyovwi, Akpevwe; Farrow, Scott C; Facchini, Peter J
Benzylisoquinoline alkaloids (BIAs) are a large, diverse group of ∼2500 specialized plant metabolites. Many BIAs display potent pharmacological activities, including the narcotic analgesics codeine and morphine, the vasodilator papaverine, the cough suppressant and potential anticancer drug noscapine, the antimicrobial agents sanguinarine and berberine, and the muscle relaxant (+)-tubocurarine. Opium poppy remains the sole commercial source for codeine, morphine, and a variety of semisynthetic drugs, including oxycodone and buprenorphine, derived primarily from the biosynthetic pathway intermediate thebaine. Recent advances in transcriptomics, proteomics, and metabolomics have created unprecedented opportunities for isolating and characterizing novel BIA biosynthetic genes. Here, we describe the application of next-generation sequencing and cDNA microarrays for selecting gene candidates based on comparative transcriptome analysis. We outline the basic mass spectrometric techniques to perform deep proteome and targeted metabolite analyses on BIA-producing plant tissues and provide methodologies for functionally characterizing biosynthetic gene candidates through in vitro enzyme assays and transient gene silencing in planta. PMID:22999177
Hawkridge, Adam M.; Muddiman, David C.
Biomarker discovery and proteomics have become synonymous with mass spectrometry in recent years. Although this conflation is an injustice to the many essential biomolecular techniques widely used in biomarker-discovery platforms, it underscores the power and potential of contemporary mass spectrometry. Numerous novel and powerful technologies have been developed around mass spectrometry, proteomics, and biomarker discovery over the past 20 years to globally study complex proteomes (e.g., plasma). However, very few large-scale longitudinal studies have been carried out using these platforms to establish the analytical variability relative to true biological variability. The purpose of this review is not to cover exhaustively the applications of mass spectrometry to biomarker discovery, but rather to discuss the analytical methods and strategies that have been developed for mass spectrometry-based biomarker-discovery platforms and to place them in the context of the many challenges and opportunities yet to be addressed.
Cassava landraces are the earliest form of the modern cultivars and represents the first step in cassava domestication. Our forward genetic analysis uses this resource to discover spontaneous mutations in the sucrose/starch and carotenoid synthesis/accumulation and to develop both evolutionary and breeding perspective of gene function related to those traits. Biochemical phenotype variants for the synthesis and accumulation of carotenoid, free sugar and starch were identified. Six subtractive cDNA libraries were prepared to construct a high quality (phred > 20) EST database with 1645 entries. Macroarray analysis was performed to identify differentially expressed gene aiming to identify candidate gene related to sugary phenotype. cDNA sequence for gene coding for specific enzymes in the two pathways were obtained. Gene expression analysis for coding specific enzymes was performed by RNA blot and Real Time PCR analysis. Chromoplastassociated proteins of yellow storage root were fractionated and a peptide sequence data base with 906 entries sequences (MASCOT validated) was constructed. For the sucrose/starch metabolism a sugary class of cassava was identified carrying mutation in the BEI and GBSS mutation. For the pigmented cassava a pink color phenotype showed absence of expression of the gene CasLYB while an intense yellow phenotype showed a down regulation of the gene CasHYb. Heat shock proteins were identified as the major proteins associated with chromoplast. Genetic diversity for the GBSS gene in the natural population identified 22 haplotype and a large nucleotide diversity in four subset of population. Single segregating population derived from F2, half sib and S1 population showed segregation for sugary phenotype (93% of the individuals), waxy phenotype (38% of the individuals) and glycogen like starch (2% of the individuals). Here we summarize our current results for the genetic analysis of this variants and recent progress in the direction of mapping of
Full Text Available Pattern genes are a group of genes that have a modularized expression behavior under serial physiological conditions. The identification of pattern genes will provide a path toward a global and dynamic understanding of gene functions and their roles in particular biological processes or events, such as development and pathogenesis. In this study, we present PaGenBase, a novel repository for the collection of tissue- and time-specific pattern genes, including specific genes, selective genes, housekeeping genes and repressed genes. The PaGenBase database is now freely accessible at http://bioinf.xmu.edu.cn/PaGenBase/. In the current version (PaGenBase 1.0, the database contains 906,599 pattern genes derived from the literature or from data mining of more than 1,145,277 gene expression profiles in 1,062 distinct samples collected from 11 model organisms. Four statistical parameters were used to quantitatively evaluate the pattern genes. Moreover, three methods (quick search, advanced search and browse were designed for rapid and customized data retrieval. The potential applications of PaGenBase are also briefly described. In summary, PaGenBase will serve as a resource for the global and dynamic understanding of gene function and will facilitate high-level investigations in a variety of fields, including the study of development, pathogenesis and novel drug discovery.
background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data.
Grau, Jan; Keilwagen, Jens; Gohr, André; Paponov, Ivan A; Posch, Stefan; Seifert, Michael; Strickert, Marc; Grosse, Ivo
DNA-binding proteins are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in target regions of genomic DNA. However, de-novo discovery of these binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not yet been solved satisfactorily. Here, we present a detailed description and analysis of the de-novo motif discovery tool Dispom, which has been developed for finding binding sites of DNA-binding proteins that are differentially abundant in a set of target regions compared to a set of control regions. Two additional features of Dispom are its capability of modeling positional preferences of binding sites and adjusting the length of the motif in the learning process. Dispom yields an increased prediction accuracy compared to existing tools for de-novo motif discovery, suggesting that the combination of searching for differentially abundant motifs, inferring their positional distributions, and adjusting the motif lengths is beneficial for de-novo motif discovery. When applying Dispom to promoters of auxin-responsive genes and those of ABI3 target genes from Arabidopsis thaliana, we identify relevant binding motifs with pronounced positional distributions. These results suggest that learning motifs, their positional distributions, and their lengths by a discriminative learning principle may aid motif discovery from ChIP-chip and gene expression data. We make Dispom freely available as part of Jstacs, an open-source Java library that is tailored to statistical sequence analysis. To facilitate extensions of Dispom, we describe its implementation using Jstacs in this manuscript. In addition, we provide a stand-alone application of Dispom at http://www.jstacs.de/index.php/Dispom for instant use. PMID:23427988
Bonds, Wesley D., Sr.; Paolella, Mary Jane
A single-semester elective combines Mendelian and molecular genetics in a problem-solving format. Students encounter a genetic disease scenario, construct a family pedigree, and try to confirm their medical diagnoses through laboratory experiences. Encouraged to generate ideas as they test their hypotheses, students realize the importance of data…
Full Text Available Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM, which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM. The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.
Monier, Adam; Sudek, Sebastian; Fast, Naomi M; Worden, Alexandra Z
Inteins are rare, translated genetic parasites mainly found in bacteria and archaea, while spliceosomal introns are distinctly eukaryotic features abundant in most nuclear genomes. Using targeted metagenomics, we discovered an intein in an Atlantic population of the photosynthetic eukaryote, Bathycoccus, harbored by the essential spliceosomal protein PRP8 (processing factor 8 protein). Although previously thought exclusive to fungi, we also identified PRP8 inteins in parasitic (Capsaspora) and predatory (Salpingoeca) protists. Most new PRP8 inteins were at novel insertion sites that, surprisingly, were not in the most conserved regions of the gene. Evolutionarily, Dikarya fungal inteins at PRP8 insertion site a appeared more related to the Bathycoccus intein at a unique insertion site, than to other fungal and opisthokont inteins. Strikingly, independent analyses of Pacific and Atlantic samples revealed an intron at the same codon as the Bathycoccus PRP8 intein. The two elements are mutually exclusive and neither was found in cultured Bathycoccus or other picoprasinophyte genomes. Thus, wild Bathycoccus contain one of few non-fungal eukaryotic inteins known and a rare polymorphic intron. Our data indicate at least two Bathycoccus ecotypes exist, associated respectively with oceanic or mesotrophic environments. We hypothesize that intein propagation is facilitated by marine viruses; and, while intron gain is still poorly understood, presence of a spliceosomal intron where a locus lacks an intein raises the possibility of new, intein-primed mechanisms for intron gain. The discovery of nucleus-encoded inteins and associated sequence polymorphisms in uncultivated marine eukaryotes highlights their diversity and reveals potential sexual boundaries between populations indistinguishable by common marker genes. PMID:23635865
Finlay, William J J; Almagro, Juan C
Antibodies are the fastest-growing segment of the biologics market. The success of antibody-based drugs resides in their exquisite specificity, high potency, stability, solubility, safety, and relatively inexpensive manufacturing process in comparison with other biologics. We outline here the structural studies and fundamental principles that define how antibodies interact with diverse targets. We also describe the antibody repertoires and affinity maturation mechanisms of humans, mice, and chickens, plus the use of novel single-domain antibodies in camelids and sharks. These species all utilize diverse evolutionary solutions to generate specific and high affinity antibodies and illustrate the plasticity of natural antibody repertoires. In addition, we discuss the multiple variations of man-made antibody repertoires designed and validated in the last two decades, which have served as tools to explore how the size, diversity, and composition of a repertoire impact the antibody discovery process. PMID:23162556
Bower, Neil I; Johnston, Ian A
A genomics approach was used to identify nutritionally regulated genes involved in growth of fast skeletal muscle in Atlantic salmon (Salmo salar L.). Forward and reverse subtractive cDNA libraries were prepared comparing fish with zero growth rates to fish growing rapidly. We produced 7,420 ESTs and assembled them into nonredundant clusters prior to annotation. Contigs representing 40 potentially unrecognized nutritionally responsive candidate genes were identified. Twenty-three of the subtractive library candidates were also differentially regulated by nutritional state in an independent fasting-refeeding experiment and their expression placed in the context of 26 genes with established roles in muscle growth regulation. The expression of these genes was also determined during the maturation of a primary myocyte culture, identifying 13 candidates from the subtractive cDNA libraries with putative roles in the myogenic program. During early stages of refeeding DNAJA4, HSPA1B, HSP90A, and CHAC1 expression increased, indicating activation of unfolded protein response pathways. Four genes were considered inhibitory to myogenesis based on their in vivo and in vitro expression profiles (CEBPD, ASB2, HSP30, novel transcript GE623928). Other genes showed increased expression with feeding and highest in vitro expression during the proliferative phase of the culture (FOXD1, DRG1) or as cells differentiated (SMYD1, RTN1, MID1IP1, HSP90A, novel transcript GE617747). The genes identified were associated with chromatin modification (SMYD1, RTN1), microtubule stabilization (MID1IP1), cell cycle regulation (FOXD1, CEBPD, DRG1), and negative regulation of signaling (ASB2) and may play a role in the stimulation of myogenesis during the transition from a catabolic to anabolic state in skeletal muscle. PMID:20663983
Full Text Available Location discovery with uncertainty using passive sensor networks in the nation's power grid is known to be challenging, due to the massive scale and inherent complexity. For bearings-only target localization in passive sensor networks, the approach of fuzzy geometry is introduced to investigate the fuzzy measurability for a moving target in R2 space. The fuzzy analytical bias expressions and the geometrical constraints are derived for bearings-only target localization. The interplay between fuzzy geometry of target localization and the fuzzy estimation bias for the case of fuzzy linear observer trajectory is analyzed in detail in sensor networks, which can realize the 3-dimensional localization including fuzzy estimate position and velocity of the target by measuring the fuzzy azimuth angles at intervals of fixed time. Simulation results show that the resulting estimate position outperforms the traditional least squares approach for localization with uncertainty.
Le Sommer, Nicolas; Ben Sassi, Salma; Guidec, Frédéric; Mahéo, Yves
Disconnected MANETs show a changing topology and a fragmentation in distinct communication islands. In such networks, service discovery and invocation rely preferably on protocols that can support connectivity disruptions such as opportunistic protocols based on the store, carry, and forward principle. In this paper we present a middleware platform for geolocated services in disconnected MANETs. We detail in particular the location methods we used, the facilities for service discovery, select...
Hur, Junguk; Özgür, Arzucan; Xiang, Zuoshuang; He, Yongqun
Background Fever is one of the most common adverse events of vaccines. The detailed mechanisms of fever and vaccine-associated gene interaction networks are not fully understood. In the present study, we employed a genome-wide, Centrality and Ontology-based Network Discovery using Literature data (CONDL) approach to analyse the genes and gene interaction networks associated with fever or vaccine-related fever responses. Results Over 170,000 fever-related articles from PubMed abstracts and tit...
Yue, Hong; Wang, Le; Liu, Hui; Yue, Wenjie; Du, Xianghong; Song, Weining; Nie, Xiaojun
Broomcorn millet (Panicum miliaceum L.) is one of the world's oldest cultivated cereals, which is well-adapted to extreme environments such as drought, heat, and salinity with an efficient C4 carbon fixation. Discovery and identification of genes involved in these processes will provide valuable information to improve the crop for meeting the challenge of global climate change. However, the lack of genetic resources and genomic information make gene discovery and molecular mechanism studies very difficult. Here, we sequenced and assembled the transcriptome of broomcorn millet using Illumina sequencing technology. After sequencing, a total of 45,406,730 and 51,160,820 clean paired-end reads were obtained for two genotypes Yumi No. 2 and Yumi No. 3. These reads were mixed and then assembled into 113,643 unigenes, with the length ranging from 351 to 15,691 bp, of which 62,543 contings could be assigned to 315 gene ontology (GO) categories. Cluster of orthologous groups and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses assigned could map 15,514 unigenes into 202 KEGG pathways and 51,020 unigenes to 25 COG categories, respectively. Furthermore, 35,216 simple sequence repeats (SSRs) were identified in 27,055 unigene sequences, of which trinucleotides were the most abundant repeat unit, accounting for 66.72% of SSRs. In addition, 292 differentially expressed genes were identified between the two genotypes, which were significantly enriched in 88 GO terms and 12 KEGG pathways. Finally, the expression patterns of four selected transcripts were validated through quantitative reverse transcription polymerase chain reaction analysis. Our study for the first time sequenced and assembled the transcriptome of broomcorn millet, which not only provided a rich sequence resource for gene discovery and marker development in this important crop, but will also facilitate the further investigation of the molecular mechanism of its favored agronomic traits and beyond. PMID
Bowman, Sharen; Hubert, Sophie; Higgins, Brent; Stone, Cynthia; Kimball, Jennifer; Borza, Tudor; Bussey, Jillian Tarrant; Simpson, Gary; Kozera, Catherine; Curtis, Bruce A.; Hall, Jennifer R.; Hori, Tiago S.; Feng, Charles Y.; Rise, Marlies; Booman, Marije
Atlantic cod is a species that has been overexploited by the capture fishery. Programs to domesticate this species are underway in several countries, including Canada, to provide an alternative route for production. Selective breeding programs have been successfully applied in the domestication of other species, with genomics-based approaches used to augment conventional methods of animal production in recent years. Genomics tools, such as gene sequences and sets of variable markers, also hav...
Full Text Available The Chinese goose is one of the most economically important poultry birds and is a natural reservoir for many avian viruses. However, the nature and regulation of the innate and adaptive immune systems of this waterfowl species are not completely understood due to limited information on the goose genome. Recently, transcriptome sequencing technology was applied in the genomic studies focused on novel gene discovery. Thus, this study described the transcriptome of the goose peripheral blood lymphocytes to identify immunity relevant genes.De novo transcriptome assembly of the goose peripheral blood lymphocytes was sequenced by Illumina-Solexa technology. In total, 211,198 unigenes were assembled from the 69.36 million cleaned reads. The average length, N50 size and the maximum length of the assembled unigenes were 687 bp, 1,298 bp and 18,992 bp, respectively. A total of 36,854 unigenes showed similarity by BLAST search against the NCBI non-redundant (Nr protein database. For functional classification, 163,161 unigenes were comprised of three Gene Ontology (Go categories and 67 subcategories. A total of 15,334 unigenes were annotated into 25 eukaryotic orthologous groups (KOGs categories. Kyoto Encyclopedia of Genes and Genomes (KEGG database annotated 39,585 unigenes into six biological functional groups and 308 pathways. Among the 2,757 unigenes that participated in the 15 immune system KEGG pathways, 125 of the most important immune relevant genes were summarized and analyzed by STRING analysis to identify gene interactions and relationships. Moreover, 10 genes were confirmed by PCR and analyzed. Of these 125 unigenes, 109 unigenes, approximately 87%, were not previously identified in the goose.This de novo transcriptome analysis could provide important Chinese goose sequence information and highlights the value of new gene discovery, pathways investigation and immune system gene identification, and comparison with other avian species as useful
Full Text Available The recent development of DNA microarray technology allows us to measure simultaneously the expression levels of thousands of genes and to identify truly correlated genes with anticancer drug response (differentially expressed genes from many candidate genes. Significance Analysis of Microarray (SAM is often used to estimate the false discovery rate (FDR, which is an index for optimizing the identifiability of differentially expressed genes, while the accuracy of the estimated FDR by SAM is not necessarily confirmed. We propose a new method for estimating the FDR assuming a mixed normal distribution on the test statistic and examine the performance of the proposed method and SAM using simulated data. The simulation results indicate that the accuracy of the estimated FDR by the proposed method and SAM, varied depending on the experimental conditions. We applied both methods to actual data comprised of expression levels of 12,625 genes of 10 responders and 14 non-responders to docetaxel for breast cancer. The proposed method identified 280 differentially expressed genes correlated with docetaxel response using a cut-off value for achieving FDR <0.01 to prevent false-positive genes, although 92 genes were previously thought to be correlated with docetaxel response ones.
Truax, Valarie M; Zhao, Huanyu; Katzman, Brooke M; Prosser, Anthony R; Alcaraz, Ana A; Saindane, Manohar T; Howard, Randy B; Culver, Deborah; Arrendale, Richard F; Gruddanti, Prahbakar R; Evers, Taylor J; Natchus, Michael G; Snyder, James P; Liotta, Dennis C; Wilson, Lawrence J
A de novo hit-to-lead effort involving the redesign of benzimidazole-containing antagonists of the CXCR4 receptor resulted in the discovery of a novel series of 1,2,3,4-tetrahydroisoquinoline (TIQ) analogues. In general, this series of compounds show good potencies (3-650 nM) in assays involving CXCR4 function, including both inhibition of attachment of X4 HIV-1IIIB virus in MAGI-CCR5/CXCR4 cells and inhibition of calcium release in Chem-1 cells. Series profiling permitted the identification of TIQ-(R)-stereoisomer 15 as a potent and selective CXCR4 antagonist lead candidate with a promising in vitro profile. The drug-like properties of 15 were determined in ADME in vitro studies, revealing low metabolic liability potential. Further in vivo evaluations included pharmacokinetic experiments in rats and mice, where 15 was shown to have oral bioavailability (F = 63%) and resulted in the mobilization of white blood cells (WBCs) in a dose-dependent manner. PMID:24936240
Adam Y Ye
Full Text Available Transporters are essential in homeostatic exchange of endogenous and exogenous substances at the systematic, organic, cellular, and subcellular levels. Gene mutations of transporters are often related to pharmacogenetics traits. Recent developments in high throughput technologies on genomics, transcriptomics and proteomics allow in depth studies of transporter genes in normal cellular processes and diverse disease conditions. The flood of high throughput data have resulted in urgent need for an updated knowledgebase with curated, organized, and annotated human transporters in an easily accessible way. Using a pipeline with the combination of automated keywords query, sequence similarity search and manual curation on transporters, we collected 1,555 human non-redundant transporter genes to develop the Human Transporter Database (HTD (http://htd.cbi.pku.edu.cn. Based on the extensive annotations, global properties of the transporter genes were illustrated, such as expression patterns and polymorphisms in relationships with their ligands. We noted that the human transporters were enriched in many fundamental biological processes such as oxidative phosphorylation and cardiac muscle contraction, and significantly associated with Mendelian and complex diseases such as epilepsy and sudden infant death syndrome. Overall, HTD provides a well-organized interface to facilitate research communities to search detailed molecular and genetic information of transporters for development of personalized medicine.
Full Text Available Abstract Background Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results. Results GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (e.g. by level of expression or of differential expression. GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list. Building on a complete theoretical characterization of the underlying distribution, called mHG, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations. This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds. The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms. Conclusion GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools. GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation. GOrilla is publicly available at: http://cbl-gorilla.cs.technion.ac.il
Weeber, Marc; Klein, Henny; de Jong-van den Berg, Lolkje T. W.; Vos, Rein
Proposes a two-step model of discovery in which new scientific hypotheses can be generated and subsequently tested. Applying advanced natural language processing techniques to find biomedical concepts in text, the model is implemented in a versatile interactive discovery support tool. This tool is used to successfully simulate Don R. Swanson's…
Full Text Available Abstract Background The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks. We present a technique called Global Mapping of Unknown Proteins (GMUP which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. Results We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae and applied the technique to a set of previously annotated open reading frames (ORFs. These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. Conclusion This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain
Kumar, Rakesh; Wehner, Todd C
The watermelon (Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus) has high variability for fruit size, shape, rind pattern, and flesh color. This study was designed to measure the qualitative inheritance of rind phenotypes (solid dark green vs. light green). For each of the 2 families, "Mountain Hoosier" × "Minilee" and "Early Arizona" × "Minilee," 6 generations (P(a)S(1), P(b)S(1), F(1), F(2), BC(1)P(a), BC(1)P(b)) were developed. Each family was tested in summer 2008 in 3 environments in North Carolina. Phenotypic data were analyzed with the χ(2) method to test the segregation of Mendelian genes. Deviations from the expected segregation ratios based on hypothesized single dominant gene for solid dark green versus light green rind pattern were recorded, raising questions on the inheritance of this trait. Inheritance of solid dark green rind versus light (gray) rind showed duplicate dominant epistasis. Duplicate dominant epistasis gives rise to a 15:1 ratio (solid dark green:light rind pattern) in F(2) generation. When both the loci are homozygous recessive, we observe light rind pattern. The g-1 and g-2 genes were identified to control light green rind when in homozygous recessive form. PMID:21566001
@@ NS5A is one of the non-structural gene products encoded by Hepatitis C virus (HCV) and related viruses that are essential for viral replication. The amino acid sequence of NS5A is conserved between different HCV genotypes and the primary amino acid sequence of NS5A is unique to HCV and closely related viruses. Importantly, NS5A is unrelated to any human protein. This indicates that drugs designed to block the actions of NS5A could inhibit the replication of HCV without showing toxic side effects in human host cells, thus making NS5A inhibitors ideal anti-viral drugs. However, there are presently no functional assays for this essential viral protein. Therefore, conventional high throughput screening (HTS) approaches can not be used to discover antiviral drugs against NS5A.
Chen, I-Min; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Huang, Jinghua; Reddy, T. B.K.; Cimermancic, Peter; Fischbach, Michael; Ivanova, Natalia; Markowitz, Victor; Kyrpides, Nikos; Pati, Amrita
In the discovery of secondary metabolites (SMs), large-scale analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of relevant computational resources. We present IMG-ABC (https://img.jgi.doe.gov/abc/) -- An Atlas of Biosynthetic gene Clusters within the Integrated Microbial Genomes (IMG) system1. IMG-ABC is a rich repository of both validated and predicted biosynthetic clusters (BCs) in cultured isolates, single-cells and metagenomes linked with the SM chemicals they produce and enhanced with focused analysis tools within IMG. The underlying scalable framework enables traversal of phylogenetic dark matter and chemical structure space -- serving as a doorway to a new era in the discovery of novel molecules.
Sibley, Christopher R.; Seow, Yiqi; Wood, Matthew JA
The past decade has seen intense scientific interest in non-coding RNAs. In particular, the discovery and subsequent exploitation of gene silencing via RNA interference (RNAi) has revolutionized the way in which gene expression is now studied and understood. It is now well established that post-transcriptional gene silencing (PTGS) by the microRNA (miRNA) and other RNAi-associated pathways represents an essential layer of complexity to gene regulation. Gene silencing using RNAi additionally d...
YANG BingRu; SONG Wei; XU ZhangYan
Knowledge acquisition is the bottleneck of expert system. To solve this problem, KD (D&K), which is a comprehensive knowledge discovery process model cooperating both database and knowledge base, and related technology are proposed. Then based on KD (D&K) and related technology, the new construction of Expert System based on Knowledge Discovery (ESKD) is proposed. As the key knowledge acquisition component of ESKD, KD (D&K) is composed of KDD* and KDK*. KDD*-the new process model based on double bases cooperating mechanism; KDK*- the new process model based on double-basis fusion mechanism are introduced, respectively. The overall framework of ESKD is proposed. Some sub-systems and dynamic knowledge base system are discussed. Finally, the effectiveness and advantages of ESKD are tested in a real-world agriculture database. We hope that ESKD may be useful for the new generation of expert systems.
Sehested, Jens; Larsen, Kasper Emil; Kustov, Arkadii;
Methanation is a classical reaction in heterogeneous catalysis and significant effort has been put into improving the industrially preferred nickel-based catalysts. Recently, a computational screening study showed that nickel-iron alloys should be more active than the pure nickel catalyst and at ...
May Gregory D
Full Text Available Abstract Background Meiosis is a critical process in the reproduction and life cycle of flowering plants in which homologous chromosomes pair, synapse, recombine and segregate. Understanding meiosis will not only advance our knowledge of the mechanisms of genetic recombination, but also has substantial applications in crop improvement. Despite the tremendous progress in the past decade in other model organisms (e.g., Saccharomyces cerevisiae and Drosophila melanogaster, the global identification of meiotic genes in flowering plants has remained a challenge due to the lack of efficient methods to collect pure meiocytes for analyzing the temporal and spatial gene expression patterns during meiosis, and for the sensitive identification and quantitation of novel genes. Results A high-throughput approach to identify meiosis-specific genes by combining isolated meiocytes, RNA-Seq, bioinformatic and statistical analysis pipelines was developed. By analyzing the studied genes that have a meiosis function, a pipeline for identifying meiosis-specific genes has been defined. More than 1,000 genes that are specifically or preferentially expressed in meiocytes have been identified as candidate meiosis-specific genes. A group of 55 genes that have mitochondrial genome origins and a significant number of transposable element (TE genes (1,036 were also found to have up-regulated expression levels in meiocytes. Conclusion These findings advance our understanding of meiotic genes, gene expression and regulation, especially the transcript profiles of MGI genes and TE genes, and provide a framework for functional analysis of genes in meiosis.
Fernández Caballero, Antonio; Sokolova, Marina
Monitoring of negative effects of urban pollution and real-time decision making allow to clarify consequences upon human health. Large amounts of raw data information describe this situation, and to get knowledge from it, we apply intelligent agents. Further modeling and simulation gives the new knowledge about the tendencies of situation development and about its structure. Agent-based decision support system can help to foresee possible ways of situation development and contribute to effect...
Full Text Available Web service discovery is developing as a conspicuous technology both in technical and business domains. Due to the swift growth in the number of web service registries and repositories, discovering the web services based on user request is a major concern. Though progressive and cuttingedge techniques are proposed to address these challenges in conservative environment, still it is in its critical stage at the semantic discovery scenario, particularly when the quality attributes are included. The retrieved web services, at times may not be desirable for the user and the objective of the userdesirable service discovery fails. This paper addressed the issue of discovering user relevant services, which reduce the semantic gap between the services requested and the service provided using latticebased clustering technique. A service discovery model with novel similarity measure is proposed to discover user relevant services. Further to enhance the performance of the proposed discovery model, a 2- Tier User Preference Model (UPM is proposed to address the qualitative requirements of the service request. The Tier I of the UPM qualifies the QoS parameters, whereas the Tier II quantifies the qualified parameters. The test bed comprising of about a thousand services from different domains and a model UDDI are implemented to evaluate the proposed techniques in terms of precision, recall and f-measure. The merits of the proposed techniques are demonstrated by means of improved user desirability of the discovered services.
FLISSI, A; GRANSART, C; Merle, P.
International audience Software applications running on mobile devices are more and more needed. These applications have strong requirements to address: device heterogeneity, limited resources, networked communications, and security. Moreover it is required to have appropriate application design, discovery, deployment, and execution paradigms. These requirements are similar to those of any ubiquitous computing application. In this paper, we present a component-based software infrastructure...
Reilly, Maureen K.; King, Ryan P.; Wagner, Alexander J.; King, Susan M.
An undergraduate organic chemistry laboratory experiment has been developed that features a discovery-based microscale Fischer esterification utilizing a microwave reactor. Students individually synthesize a unique ester from known sets of alcohols and carboxylic acids. Each student identifies the best reaction conditions given their particular…
We present searches for generic SUSY models with R-parity conservation in the ATLAS detector at the LHC, based on signatures including missing transverse momentum from undetected neutralinos, multiple jets and leptons or b and tau jets. We show the corresponding discovery reach for early ATLAS data, including the effect of systematic uncertainties on the background estimate. (author)
Ritzén, Andreas; Sørensen, Morten D; Dack, Kevin N; Greve, Daniel R; Jerre, Anders; Carnerup, Martin A; Rytved, Klaus A; Bagger-Bahnsen, Jesper
Janus kinase (JAK) inhibitors are emerging as novel and efficacious drugs for treating psoriasis and other inflammatory skin disorders, but their full potential is hampered by systemic side effects. To overcome this limitation, we set out to discover soft drug JAK inhibitors for topical use. A fragment screen yielded an indazole hit that was elaborated into a potent JAK inhibitor using structure-based design. Growing the fragment by installing a phenol moiety in the 6-position afforded a greatly improved potency. Fine-tuning the substituents on the phenol and sulfonamide moieties afforded a set of compounds with lead-like properties, but they were found to be phototoxic and unstable in the presence of light. PMID:27326341
A.V. Senthil Kumar
Full Text Available Mining frequent patterns in large transactional databases is a highly researched area in the field of data mining. Existing frequent pattern discovering algorithms suffer from many problems regarding the high memory dependency when mining large amount of data, computational and I/O cost. Additionally, the recursive mining process to mine these structures is also too voracious in memory resources. In this paper, we describe a more efficient algorithm for mining complete frequent itemsets from transactional databases. The suggested algorithm is partially based on FP-tree hypothesis and extracts the frequent itemsets directly from the tree. Its memory requirement, which is independent from the number of processed transactions, is another benefit of the new method. We present performance comparisons for our algorithm against the Apriori algorithm and FP-growth.
Wain Hester M
Full Text Available Abstract Background Massive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck. Results We developed a simple thesaurus-based disambiguation algorithm that can operate with very little training data. The thesaurus comprises the information from five human genetic databases and MeSH. The extent of the homonym problem for human gene symbols is shown to be substantial (33% of the genes in our combined thesaurus had one or more ambiguous symbols, not only because one symbol can refer to multiple genes, but also because a gene symbol can have many non-gene meanings. A test set of 52,529 Medline abstracts, containing 690 ambiguous human gene symbols taken from OMIM, was automatically generated. Overall accuracy of the disambiguation algorithm was up to 92.7% on the test set. Conclusion The ambiguity of human gene symbols is substantial, not only because one symbol may denote multiple genes but particularly because many symbols have other, non-gene meanings. The proposed disambiguation approach resolves most ambiguities in our test set with high accuracy, including the important gene/not a gene decisions. The algorithm is fast and scalable, enabling gene-symbol disambiguation in massive text mining applications.
Kana, Bavesh Davandra; Karakousis, Petros C; Parish, Tanya; Dick, Thomas
New drugs that retain potency against multidrug/extensively drug-resistant strains of Mycobacterium tuberculosis, with the additional benefit of a shortened treatment duration and ease of administration, are urgently needed by tuberculosis (TB) control programs. Efforts to develop this new generation of treatment interventions have been plagued with numerous problems, the most significant being our insufficient understanding of mycobacterial metabolism during disease. This, combined with limited chemical diversity and poor entry of small molecules into the cell, has limited the number of new bioactive agents that result from drug screening efforts. The biochemical, target-driven approach to drug development has been largely abandoned in the TB field, to be replaced by whole-cell or target-based whole-cell screening approaches. In this context, the properties of a good drug target are unclear, since these are directly determined by the ability to find compounds, using current screening algorithms, which are able to kill M. tuberculosis. In this review, we discuss issues related to the identification and validation of drug targets and highlight some key properties for promising targets. Some of these include essentiality for growth, vulnerability, druggability, reduced propensity to evolve drug resistance and target location to facilitate ready access to drugs during chemotherapy. We present these in the context of recent drugs that have emerged through various approaches with the aim of consolidating the knowledge gained from these experiences to inform future efforts. PMID:25458615
Global climate change is one of the most important international problems of the 21st century. The overall rapid increase in the dynamics of cataclysms, which have been observed in recent decades, is particularly alarming. Howdo modern scientists predict the occurrence of certain events? In meteorology, unusually powerful cumulonimbus clouds are one of the main conditions for the emergence of a tornado. The former, in their turn, are formed during the invasion of cold air on the overheated land surface. The satellite captures the cloud front, and, based on these pictures, scientists make assumptions about the possibility of occurrence of the respective natural phenomena. In fact, mankind visually observes and draws conclusions about the consequences of the physical phenomena which have already taken place in the invisible world, so the conclusions of scientists are assumptions by their nature, rather than precise knowledge of the causes of theorigin of these phenomena in the physics of microcosm. The latest research in the field of the particle physics and neutrino astrophysics, which was conducted by a working team of scientists of ALLATRA International Public Movement (hereinafter ALLATRA SCIENCE group) allatra-science.org, last accessed 10 April 2016.
Full Text Available Global climate change is one of the most important international problems of the 21st century. The overall rapid increase in the dynamics of cataclysms, which have been observed in recent decades, is particularly alarming. Howdo modern scientists predict the occurrence of certain events? In meteorology, unusually powerful cumulonimbus clouds are one of the main conditions for the emergence of a tornado. The former, in their turn, are formed during the invasion of cold air on the overheated land surface. The satellite captures the cloud front, and, based on these pictures, scientists make assumptions about the possibility of occurrence of the respective natural phenomena. In fact, mankind visually observes and draws conclusions about the consequences of the physical phenomena which have already taken place in the invisible world, so the conclusions of scientists are assumptions by their nature, rather than precise knowledge of the causes of theorigin of these phenomena in the physics of microcosm. The latest research in the field of the particle physics and neutrino astrophysics, which was conducted by a working team of scientists of ALLATRA International Public Movement (hereinafter ALLATRA SCIENCE groupallatra-science.org, last accessed 10 April 2016., offers increased opportunities for advanced fundamental and applied research in climatic engineering.
Coppe, Alessandro; Ferrari, Francesco; Bisognin, Andrea; Danieli, Gian Antonio; Ferrari, Sergio; Bicciato, Silvio; Bortoluzzi, Stefania
Genes co-expressed may be under similar promoter-based and/or position-based regulation. Although data on expression, position and function of human genes are available, their true integration still represents a challenge for computational biology, hampering the identification of regulatory mechanisms. We carried out an integrative analysis of genomic position, functional annotation and promoters of genes expressed in myeloid cells. Promoter analysis was conducted by a novel multi-step method...
Ruiz Pestana, Luis A.
Today, there is a growing need for environmentally friendly synthetic membranes with selective transport capabilities to address some of society's most pressing issues, such as carbon dioxide pollution, or access to clean water. While conventional membranes cannot stand up to the challenge, thin nanocomposite membranes, where vertically aligned subnanometer pores (e.g. nanotubes) are embedded in a thin polymeric film, promise to overcome some of the current limitations, namely, achieving a monodisperse distribution of subnanometer size pores, vertical pore alignment across the membrane thickness, and tunability of the pore surface chemistry. Self-assembled cyclic peptide nanotubes (CPNs), are particularly promising as selective nanopores because the pore size can be controlled at the subnanometer level, exhibit high chemical design flexibility, and display remarkable mechanical stability. In addition, when conjugated with polymer chains, the cyclic peptides can co-assemble in block copolymer domains to form nanoporous thin films. CPNs are thus well positioned to tackle persistent challenges in molecular separation applications. However, our poor understanding of the physics underlying their remarkable properties prevents the rational design and implementation of CPNs in technologically relevant membranes. In this dissertation, we use a simulation-based approach, in particular molecular dynamics (MD) simulations, to investigate the critical knowledge gaps hindering the implementation of CPNs. Computational mechanical tests show that, despite the weak nature of the stabilizing hydrogen bonds and the small cross section, CPNs display a Young's modulus of approximately 20 GPa and a maximum strength of around 1 GPa, placing them among the strongest proteinaceous materials known. Simulations of the self-assembly process reveal that CPNs grow by self-similar coarsening, contrary to other low-dimensional peptide systems, such as amyloids, that are believed to grow through
Phase II attrition remains the most important challenge for drug discovery. Tackling the problem requires improved understanding of the complexity of disease biology. Systems biology approaches to this problem can, in principle, deliver this. This article reviews the reports of the application of mechanistic systems models to drug discovery questions and discusses the added value. Although we are on the journey to the virtual human, the length, path and rate of learning from this remain an open question. Success will be dependent on the will to invest and make the most of the insight generated along the way. PMID:26464089
Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji
Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. PMID:26989145
Pappas, Nikolaos; Katsimpras, Georgios; Stamatatos, Efstathios
The discovery of web documents about certain topics is an important task for web-based applications including web document retrieval, opinion mining and knowledge extraction. In this paper, we propose an agent-based focused crawling framework able to retrieve topic- and genre-related web documents. Starting from a simple topic query, a set of focused crawler agents explore in parallel topic-specific web paths using dynamic seed URLs that belong to certain web genres and are collected from web...
Full Text Available We report the discovery of aurora kinase inhibitor using the fragment-based virtual screening by multi-docking strategy. Among a number of fragments collected from eMololecules, we found four fragment molecules showing potent activity (>50% at 100 μM against aurora kinase. Based on the explored fragment scaffold, we selected two compounds in our synthesized library and validated the biological activity against Aurora kinase.
Jun-Tae Kim; Seo Hee Jung; Sun Young Kang; Chung-Kyu Ryu; Nam Sook Kang
We report the discovery of aurora kinase inhibitor using the fragment-based virtual screening by multi-docking strategy. Among a number of fragments collected from eMololecules, we found four fragment molecules showing potent activity (>50% at 100 μM) against aurora kinase. Based on the explored fragment scaffold, we selected two compounds in our synthesized library and validated the biological activity against Aurora kinase.
Debasish Paul; Avinash Kumar; Akshada Gajbhiye; Santra, Manas K.; Rapole Srikanth
Accurate diagnosis and proper monitoring of cancer patients remain a key obstacle for successful cancer treatment and prevention. Therein comes the need for biomarker discovery, which is crucial to the current oncological and other clinical practices having the potential to impact the diagnosis and prognosis. In fact, most of the biomarkers have been discovered utilizing the proteomics-based approaches. Although high-throughput mass spectrometry-based proteomic approaches like SILAC, 2D-DIGE,...
McBride, Christopher; Cheruvallath, Zacharia; Komandla, Mallareddy; Tang, Mingnam; Farrell, Pamela; Lawson, J David; Vanderpool, Darin; Wu, Yiqin; Dougan, Douglas R; Plonowski, Artur; Holub, Corine; Larson, Chris
Methionine aminopeptidase-2 (MetAP2) is an enzyme that cleaves an N-terminal methionine residue from a number of newly synthesized proteins. This step is required before they will fold or function correctly. Pre-clinical and clinical studies with a MetAP2 inhibitor suggest that they could be used as a novel treatment for obesity. Herein we describe the discovery of a series of pyrazolo[4,3-b]indoles as reversible MetAP2 inhibitors. A fragment-based drug discovery (FBDD) approach was used, beginning with the screening of fragment libraries to generate hits with high ligand-efficiency (LE). An indazole core was selected for further elaboration, guided by structural information. SAR from the indazole series led to the design of a pyrazolo[4,3-b]indole core and accelerated knowledge-based fragment growth resulted in potent and efficient MetAP2 inhibitors, which have shown robust and sustainable body weight loss in DIO mice when dosed orally. PMID:27136719
杨斌; 田永青; 朱仲英
Extraction of interesting and general spatial association rules from large spatial databases is an important task in the development of spatial database systems. In this paper, we investigate the generalization-based knowledge discovery mechanism that integrates attribute-oriented induction on nonspatial data and spatial merging and generalization on spatial data. Furthermore, we present linguistic cloud models for knowledge representation and uncertainty handling to enhance current generalization-based method. With these models, spatial and nonspatial attribute values are well generalized at higher-concept levels, allowing discovery of strong spatial association rules. Combining the cloud model based generalization method with Apriori algorithm for mining association rules from a spatial database shows the benefits in effectiveness and flexibility.
Gupta Rohit; Rao Navneet; Kumar Vipin
Abstract Background An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bi...
Yao, Shun; Yoo, Shinjae; Yu, Dantong
Background Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>>T. Results In this study, we proposed a new method, viz., CGC-2SPR (CGC ...
Full Text Available Accurate diagnosis and proper monitoring of cancer patients remain a key obstacle for successful cancer treatment and prevention. Therein comes the need for biomarker discovery, which is crucial to the current oncological and other clinical practices having the potential to impact the diagnosis and prognosis. In fact, most of the biomarkers have been discovered utilizing the proteomics-based approaches. Although high-throughput mass spectrometry-based proteomic approaches like SILAC, 2D-DIGE, and iTRAQ are filling up the pitfalls of the conventional techniques, still serum proteomics importunately poses hurdle in overcoming a wide range of protein concentrations, and also the availability of patient tissue samples is a limitation for the biomarker discovery. Thus, researchers have looked for alternatives, and profiling of candidate biomarkers through tissue culture of tumor cell lines comes up as a promising option. It is a rich source of tumor cell-derived proteins, thereby, representing a wide array of potential biomarkers. Interestingly, most of the clinical biomarkers in use today (CA 125, CA 15.3, CA 19.9, and PSA were discovered through tissue culture-based system and tissue extracts. This paper tries to emphasize the tissue culture-based discovery of candidate biomarkers through various mass spectrometry-based proteomic approaches.
Chen, Chong-Jian; Zhou, Hui; Chen, Yue-Qin; Qu, Liang-Hu; Gautheret, Daniel
Plant genomes have undergone multiple rounds of duplications that contributed massively to the growth of gene families. The structure of resulting families has been studied in depth for protein-coding genes. However, little is known about the impact of duplications on noncoding RNA (ncRNA) genes. Here we perform a systematic analysis of duplicated regions in the rice genome in search of such ncRNA repeats. We observe that, just like their protein counterparts, most ncRNA genes have undergone ...
Wada, Youichiro; Li, Dan; Merley, Anne; Zukauskas, Andrew; Aird, William C.; Dvorak, Harold F.; Shih, Shou-Ching
A profile of transcript abundances from multiple genes constitutes a molecular signature if the expression pattern is unique to one cell type. Here we measure mRNA copy numbers per cell by normalizing per million copies of 18S rRNA and identify 6 genes (TIE1, KDR, CDH5, TIE2, EFNA1 and MYO5C) out of 79 genes tested as excellent molecular signature markers for endothelial cells (ECs) in vitro. The selected genes are uniformly expressed in ECs of 4 different origins but weakly or not expressed ...
Nadia Nicole Ono; Monica Therese Britton; Joseph Nathaniel Fass; Charles Meyer Nicolet; Dawei Lin; Li Tian
Pomegranate fruit peel is rich in bioactive plant natural products,such as hydrolyzable tannins and anthocyanins.Despite their documented roles in human nutrition and fruit quality,genes involved in natural product biosynthesis have not been cloned from pomegranate and very little sequence information is available on pomegranate in the public domain.Shotgun transcriptome sequencing of pomegranate fruit peel cDNA was performed using RNA-Seq on the Illumina Genome Analyzer platform.Over 100 million raw sequence reads were obtained and assembled into 9,839 transcriptome assemblies (TAs) (＞200 bp).Candidate genes for hydrolyzable tannin,anthocyanin,flavonoid,terpenoid and fatty acid biosynthesis and/or regulation were identified.Three lipid transfer proteins were obtained that may contribute to the previously reported IgE reactivity of pomegranate fruit extracts.In addition,115 SSR markers were identified from the pomegranate fruit peel transcriptome and primers were designed for 77 SSR markers.The pomegranate fruit peel transcriptome set provides a valuable platform for natural product biosynthetic gene and SSR marker discovery in pomegranate.This work also demonstrates that next-generation transcriptome sequencing is an economical and effective approach for investigating natural product biosynthesis,identifying genes controlling important agronomic traits,and discovering molecular markers in non-model specialty crop species.
Mei, Suyu; Zhang, Kun
Epstein-Barr virus (EBV) plays important roles in the origin and the progression of human carcinomas, e.g. diffuse large B cell tumors, T cell lymphomas, etc. Discovering EBV targeted human genes and signaling pathways is vital to understand EBV tumorigenesis. In this study we propose a noise-tolerant homolog knowledge transfer method to reconstruct functional protein-protein interactions (PPI) networks between Epstein-Barr virus and Homo sapiens. The training set is augmented via homolog instances and the homolog noise is counteracted by support vector machine (SVM). Additionally we propose two methods to define subcellular co-localization (i.e. stringent and relaxed), based on which to further derive physical PPI networks. Computational results show that the proposed method achieves sound performance of cross validation and independent test. In the space of 648,672 EBV-human protein pairs, we obtain 51,485 functional interactions (7.94%), 869 stringent physical PPIs and 46,050 relaxed physical PPIs. Fifty-eight evidences are found from the latest database and recent literature to validate the model. This study reveals that Epstein-Barr virus interferes with normal human cell life, such as cholesterol homeostasis, blood coagulation, EGFR binding, p53 binding, Notch signaling, Hedgehog signaling, etc. The proteome-wide predictions are provided in the supplementary file for further biomedical research. PMID:27470517
Bowman, Sharen; Hubert, Sophie; Higgins, Brent; Stone, Cynthia; Kimball, Jennifer; Borza, Tudor; Bussey, Jillian Tarrant; Simpson, Gary; Kozera, Catherine; Curtis, Bruce A; Hall, Jennifer R; Hori, Tiago S; Feng, Charles Y; Rise, Marlies; Booman, Marije; Gamperl, A Kurt; Trippel, Edward; Symonds, Jane; Johnson, Stewart C; Rise, Matthew L
Atlantic cod is a species that has been overexploited by the capture fishery. Programs to domesticate this species are underway in several countries, including Canada, to provide an alternative route for production. Selective breeding programs have been successfully applied in the domestication of other species, with genomics-based approaches used to augment conventional methods of animal production in recent years. Genomics tools, such as gene sequences and sets of variable markers, also have the potential to enhance and accelerate selective breeding programs in aquaculture, and to provide better monitoring tools to ensure that wild cod populations are well managed. We describe the generation of significant genomics resources for Atlantic cod through an integrated genomics/selective breeding approach. These include 158,877 expressed sequence tags (ESTs), a set of annotated putative transcripts and several thousand single nucleotide polymorphism markers that were developed from, and have been shown to be highly variable in, fish enrolled in two selective breeding programs. Our EST collection was generated from various tissues and life cycle stages. In some cases, tissues from which libraries were generated were isolated from fish exposed to stressors, including elevated temperature, or antigen stimulation (bacterial and viral) to enrich for transcripts that are involved in these response pathways. The genomics resources described here support the developing aquaculture industry, enabling the application of molecular markers within selective breeding programs. Marker sets should also find widespread application in fisheries management. PMID:20396923
Mei, Suyu; Zhang, Kun
Epstein-Barr virus (EBV) plays important roles in the origin and the progression of human carcinomas, e.g. diffuse large B cell tumors, T cell lymphomas, etc. Discovering EBV targeted human genes and signaling pathways is vital to understand EBV tumorigenesis. In this study we propose a noise-tolerant homolog knowledge transfer method to reconstruct functional protein-protein interactions (PPI) networks between Epstein-Barr virus and Homo sapiens. The training set is augmented via homolog instances and the homolog noise is counteracted by support vector machine (SVM). Additionally we propose two methods to define subcellular co-localization (i.e. stringent and relaxed), based on which to further derive physical PPI networks. Computational results show that the proposed method achieves sound performance of cross validation and independent test. In the space of 648,672 EBV-human protein pairs, we obtain 51,485 functional interactions (7.94%), 869 stringent physical PPIs and 46,050 relaxed physical PPIs. Fifty-eight evidences are found from the latest database and recent literature to validate the model. This study reveals that Epstein-Barr virus interferes with normal human cell life, such as cholesterol homeostasis, blood coagulation, EGFR binding, p53 binding, Notch signaling, Hedgehog signaling, etc. The proteome-wide predictions are provided in the supplementary file for further biomedical research. PMID:27470517
Full Text Available Neighbor discovery and the power of sensors play an important role in the formation of Wireless Sensor Networks (WSNs and mobile networks. Many asynchronous protocols based on wake-up time scheduling have been proposed to enable neighbor discovery among neighboring nodes for the energy saving, especially in the difficulty of clock synchronization. However, existing researches are divided two parts with the neighbor-discovery methods, one is the quorum-based protocols and the other is co-primality based protocols. Their distinction is on the arrangements of time slots, the former uses the quorums in the matrix, the latter adopts the numerical analysis. In our study, we propose the weighted heuristic quorum system (WQS, which is based on the quorum algorithm to eliminate redundant paths of active slots. We demonstrate the specification of our system: fewer active slots are required, the referring rate is balanced, and remaining power is considered particularly when a device maintains rendezvous with discovered neighbors. The evaluation results showed that our proposed method can effectively reschedule the active slots and save the computing time of the network system.
Caselle, M; Provero, P
Gene regulation in eukaryotes is mainly effected through transcription factors binding to rather short recognition motifs generally located upstream of the coding region. We present a novel computational method to identify regulatory elements in the upstream region of eukaryotic genes. The genes are grouped in sets sharing an overrepresented short motif in their upstream sequence. For each set, the average expression level from a microarray experiment is determined: If this level is significantly higher or lower than the average taken over the whole genome, then the overerpresented motif shared by the genes in the set is likely to play a role in their regulation. The method was tested by applying it to the genome of Saccharomyces cerevisiae, using the publicly available results of a DNA microarray experiment, in which expression levels for virtually all the genes were measured during the diauxic shift from fermentation to respiration. Several known motifs were correctly identified, and a new candidate regulat...
The mitochondrial genome libraries of HL-type sterile line(A) and maintainer line(B) have been constructed.Mitochondrial gene, atp6, was used to screen libraries, due to the different Southern and Northern blot results between sterile and maintainer line. Sequencing analysis of positive clones proved that there were two copies of atp6 gene in sterile line and only one in maintainer line. One copy of atpt6 in sterile line was same to that in maintainer line; the other showed different flanking sequence from the 49th nucleotide downstream of the termination codon of atp6 gene. A new chimeric gene, orfH79, was found in the region. OrfH79 had homology to mitochondrial gene coxⅡ and orfl07, and was special to HL-sterile cytoplasm.``
Cassava landraces are the earliest form of the modern cultivars and represent the first step in cassava domestication. Our forward genetic analysis uses this resource to discover spontaneous mutations in the sucrose/ starch and carotenoid synthesis/accumulation and to develop both an evolutionary and breeding perspective of gene function related to those traits. Biochemical phenotype variants for the synthesis and accumulation of carotenoid, free sugar and starch were identified. Six subtractive cDNA libraries were prepared to construct a high quality (phred > 20) EST database with 1,645 entries. Macroarray and micro-array analysis was performed to identify differentially expressed genes aiming to identify candidate genes related to sugary phenotype and carotenoid diversity. cDNA sequence for gene coding for specific enzymes in the two pathways was obtained. Gene expression analysis for coding specific enzymes was performed by RNA blot and Real Time PCR analysis. Chromoplast-associated proteins of yellow storage root were fractionated and a peptide sequence database with 906 entries sequences (MASCOT validated) was constructed. For the sucrose/starch, metabolism a sugary class of cassava was identified, carrying a mutation in the BEI and GBSS genes. For the pigmented cassava, a pink color phenotype showed absence of expression of the gene CasLYB, while an intense yellow phenotype showed a down regulation of the gene CasHYb. Heat shock proteins were identified as the major proteins associated with carotenoid. Genetic diversity for the GBSS gene in the natural population identified 22 haplotypes and a large nucleotide diversity in four subsets of population. Single segregating population derived from F2, half-sibling and S1 population showed segregation for sugary phenotype (93% of individuals), waxy phenotype (38% of individuals) and glycogen like starch (2% of individuals). Here we summarize our current results for the genetic analysis of these variants and recent
Simon, Arnaud; Napoli, Amedeo
In this paper, we present an approach to knowledge discovery in databases in the context of object-based representation systems. The goal of this approach is to extract viewpoints and association rules from data represented by objects. A viewpoint is a hierarchy of classes (a kind of partial lattice) and an association rule can be defined within a viewpoint or between two classes lying in different viewpoints. The viewpoints construction algorithm allows to manipulate objects which are indiff...
Kelly V Ruggles; Sonali Rajan
OBJECTIVE: To apply discovery-based computational methods to nationally representative data from the Centers for Disease Control and Preventions' Youth Risk Behavior Surveillance System to better understand and visualize the behavioral factors associated with gun possession among adolescent youth. RESULTS: Our study uncovered the multidimensional nature of gun possession across nearly five million unique data points over a ten year period (2001-2011). Specifically, we automated odds ratio cal...
Ruggles, Kelly V.; Rajan, Sonali
Objective To apply discovery-based computational methods to nationally representative data from the Centers for Disease Control and Preventions’ Youth Risk Behavior Surveillance System to better understand and visualize the behavioral factors associated with gun possession among adolescent youth. Results Our study uncovered the multidimensional nature of gun possession across nearly five million unique data points over a ten year period (2001–2011). Specifically, we automated odds ratio calcu...
This paper presents a framework for distributed service discovery based on VIRGO P2P technologies. The services are classified as multi-layer, hierarchical catalogue domains according to their contents. The service providers, which have their own service registries such as UDDIs, register the services they provide and establish a virtual tree in a VIRGO network according to the domain of their service. The service location done by the proposed strategy is effective and guaranteed. This paper ...
Venugopal Abhilash; Chaerkady Raghothama; Pandey Akhilesh
Mass spectrometry-based quantitative proteomics has emerged as a powerful approach that has the potential to accelerate biomarker discovery, both for diagnostic as well as therapeutic purposes. Proteomics has traditionally been synonymous with 2D gels but is increasingly shifting to the use of gel-free systems and liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS). Quantitative proteomic approaches have already been applied to investigate various neurological disorders, espe...
Wada, Youichiro; Li, Dan; Merley, Anne; Zukauskas, Andrew; Aird, William C; Dvorak, Harold F; Shih, Shou-Ching
A profile of transcript abundances from multiple genes constitutes a molecular signature if the expression pattern is unique to one cell type. Here we measure mRNA copy numbers per cell by normalizing per million copies of 18S rRNA and identify 6 genes (TIE1, KDR, CDH5, TIE2, EFNA1 and MYO5C) out of 79 genes tested as excellent molecular signature markers for endothelial cells (ECs) in vitro. The selected genes are uniformly expressed in ECs of 4 different origins but weakly or not expressed in 4 non-EC cell lines. A multi-gene transcriptional profile of these 6 genes clearly distinguishes ECs from non-ECs in vitro. We conclude that (i) a profile of mRNA copy numbers per cell from a well-chosen multi-gene panel can act as a sensitive and accurate cell type signature marker, and (ii) the method described here can be applied to in vivo cell fingerprinting and molecular diagnosis. PMID:20972619
Chen, S W; Liu, T; Gao, Y; Zhang, C; Peng, S D; Bai, M B; Li, S J; Xu, L; Zhou, X Y; Lin, L B
Clubroot significantly affects plants of the Brassicaceae family and is one of the main diseases causing serious losses in B. napus yield. Few studies have investigated the clubroot-resistance mechanism in B. napus. Identification of clubroot-resistant genes may be used in clubroot-resistant breeding, as well as to elucidate the molecular mechanism behind B. napus clubroot-resistance. We used three B. napus transcriptome samples to construct a transcriptome sequencing library by using Illumina HiSeq™ 2000 sequencing and bioinformatic analysis. In total, 171 million high-quality reads were obtained, containing 96,149 unigenes of N50-value. We aligned the obtained unigenes with the Nr, Swiss-Prot, clusters of orthologous groups, and gene ontology databases and annotated their functions. In the Kyoto encyclopedia of genes and genomes database, 25,033 unigenes (26.04%) were assigned to 124 pathways. Many genes, including broad-spectrum disease-resistance genes, specific clubroot-resistant genes, and genes related to indole-3-acetic acid (IAA) signal transduction, cytokinin synthesis, and myrosinase synthesis in the Huashuang 3 variety of B. napus were found to be related to clubroot-resistance. The effective clubroot-resistance observed in this variety may be due to the induced increased expression of these disease-resistant genes and strong inhibition of the IAA signal transduction, cytokinin synthesis, and myrosinase synthesis. The homology observed between unigenes 0048482, 0061770 and the Crr1 gene shared 94% nucleotide similarity. Furthermore, unigene 0061770 could have originated from an inversion of the Crr1 5'-end sequence. PMID:27525940
Argraves W Scott
Full Text Available Abstract Background An important objective of DNA microarray-based gene expression experimentation is determining inter-relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular components, signaling pathways, physiologic processes and diseases. Results Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., Cardiovascular System, Genetic Processes, Immune System Diseases etc. or unbiased relational analysis to broader MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.. Genes found associated with a given MeSH category are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a number of commercial microarray platforms including Affymetrix, Agilent and Illumina. Conclusions GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment to descriptors making up the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system further enhances the discovery process by providing links between sets of genes associated with a given MeSH category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene ontologies
Gruber Stephen B
Full Text Available Abstract Background A critical step in processing oligonucleotide microarray data is combining the information in multiple probes to produce a single number that best captures the expression level of a RNA transcript. Several systematic studies comparing multiple methods for array processing have used tightly controlled calibration data sets as the basis for comparison. Here we compare performances for seven processing methods using two data sets originally collected for disease profiling studies. An emphasis is placed on understanding sensitivity for detecting differentially expressed genes in terms of two key statistical determinants: test statistic variability for non-differentially expressed genes, and test statistic size for truly differentially expressed genes. Results In the two data sets considered here, up to seven-fold variation across the processing methods was found in the number of genes detected at a given false discovery rate (FDR. The best performing methods called up to 90% of the same genes differentially expressed, had less variable test statistics under randomization, and had a greater number of large test statistics in the experimental data. Poor performance of one method was directly tied to a tendency to produce highly variable test statistic values under randomization. Based on an overall measure of performance, two of the seven methods (Dchip and a trimmed mean approach are superior in the two data sets considered here. Two other methods (MAS5 and GCRMA-EB are inferior, while results for the other three methods are mixed. Conclusions Choice of processing method has a major impact on differential expression analysis of microarray data. Previously reported performance analyses using tightly controlled calibration data sets are not highly consistent with results reported here using data from human tissue samples. Performance of array processing methods in disease profiling and other realistic biological studies should be
Jacob Kruger Jensen
Full Text Available The exact biochemical steps of xylan backbone synthesis remain elusive. In Arabidopsis, three non-redundant genes from two glycosyltransferase (GT families, IRX9 and IRX14 from GT43 and IRX10 from GT47, are candidates for forming the xylan backbone. In other plants, evidence exists that different tissues express these three genes at widely different levels, which suggests that diversity in the makeup of the xylan synthase complex exists. Recently we have profiled the transcripts present in the developing mucilaginous tissue of psyllium (Plantago ovata Forsk. This tissue was found to have high expression levels of an IRX10 homolog, but very low levels of the two GT43 family members. This contrasts with recent wheat endosperm tissue profiling that found a relatively high abundance of the GT43 family members. We have performed an in-depth analysis of all GTs genes expressed in four developmental stages of the psyllium mucilagenous layer and in a single stage of the psyllium stem using RNA-Seq. This analysis revealed several IRX10 homologs, an expansion in GT61 (homologs of At3g18170/At3g18180, and several GTs from other GT families that are highly abundant and specifically expressed in the mucilaginous tissue. Our current hypothesis is that the four IRX10 genes present in the mucilagenous tissues have evolved to function without the GT43 genes. These four genes represent some of the most divergent IRX10 genes identified to date. Conversely, those present in the psyllium stem are very similar to those in other eudicots. This suggests these genes are under selective pressure, likely due to the synthesis of the various xylan structures present in mucilage that has a different biochemical role than that present in secondary walls. The numerous GT61 family members also show a wide sequence diversity and may be responsible for the larger number of side chain structures present in the psyllium mucilage.
Jensen, Jacob K; Johnson, Nathan; Wilkerson, Curtis G
The exact biochemical steps of xylan backbone synthesis remain elusive. In Arabidopsis, three non-redundant genes from two glycosyltransferase (GT) families, IRX9 and IRX14 from GT43 and IRX10 from GT47, are candidates for forming the xylan backbone. In other plants, evidence exists that different tissues express these three genes at widely different levels, which suggests that diversity in the makeup of the xylan synthase complex exists. Recently we have profiled the transcripts present in the developing mucilaginous tissue of psyllium (Plantago ovata Forsk). This tissue was found to have high expression levels of an IRX10 homolog, but very low levels of the two GT43 family members. This contrasts with recent wheat endosperm tissue profiling that found a relatively high abundance of the GT43 family members. We have performed an in-depth analysis of all GTs genes expressed in four developmental stages of the psyllium mucilagenous layer and in a single stage of the psyllium stem using RNA-Seq. This analysis revealed several IRX10 homologs, an expansion in GT61 (homologs of At3g18170/At3g18180), and several GTs from other GT families that are highly abundant and specifically expressed in the mucilaginous tissue. Our current hypothesis is that the four IRX10 genes present in the mucilagenous tissues have evolved to function without the GT43 genes. These four genes represent some of the most divergent IRX10 genes identified to date. Conversely, those present in the psyllium stem are very similar to those in other eudicots. This suggests these genes are under selective pressure, likely due to the synthesis of the various xylan structures present in mucilage that has a different biochemical role than that present in secondary walls. The numerous GT61 family members also show a wide sequence diversity and may be responsible for the larger number of side chain structures present in the psyllium mucilage. PMID:23761806
Cassava (Manihot esculenta) is the main source of calories for more than 1,000 millions of people around the world and has been consolidated as the fourth most important crop after rice, corn and wheat. Cassava is considered tolerant to abiotic and biotic stress conditions; nevertheless these characteristics are mainly present in non-commercial varieties. Genetic breeding strategies represent an alternative to introduce the desirable characteristics into commercial varieties. A fundamental step for accelerating the genetic breeding process in cassava requires the identification of genes associated to these characteristics. One rapid strategy for the identification of genes is the possibility to have a large collection of ESTs (expressed sequence tag). In this study, a complete analysis of cassava ESTs was done. The cassava ESTs represent 80,459 sequences which were assembled in a set of 29,231 unique genes (unigen), comprising 10,945 contigs and 18,286 singletones. These 29,231 unique genes represent about 80% of the genes of the cassava's genome. Between 5% and 10% of the unigenes of cassava not show similarity to any sequences present in the NCBI database and could be consider as cassava specific genes. a functional category was assigned to a group of sequences of the unigen set (29%) following the Gene Ontology Vocabulary. the molecular function component was the best represented with 43% of the sequences, followed by the biological process component (38%) and finally the cellular component with 19%. in the cassava ESTs collection, 3,709 microsatellites were identified and they could be used as molecular markers. this study represents an important contribution to the knowledge of the functional genomic structure of cassava and constitutes an important tool for the identification of genes associated to agricultural characteristics of interest that could be employed in cassava breeding programs.
Full Text Available MOTIVATION: Array-CGH can be used to determine DNA copy number, imbalances in which are a fundamental factor in the genesis and progression of tumors. The discovery of classes with similar patterns of array-CGH profiles therefore adds to our understanding of cancer and the treatment of patients. Various input data representations for array-CGH, dissimilarity measures between tumor samples and clustering algorithms may be used for this purpose. The choice between procedures is often difficult. An evaluation procedure is therefore required to select the best class discovery method (combination of one input data representation, one dissimilarity measure and one clustering algorithm for array-CGH. Robustness of the resulting classes is a common requirement, but no stability-based comparison of class discovery methods for array-CGH profiles has ever been reported. RESULTS: We applied several class discovery methods and evaluated the stability of their solutions, with a modified version of Bertoni's [Formula: see text]-based test . Our version relaxes the assumption of independency required by original Bertoni's [Formula: see text]-based test. We conclude that Minimal Regions of alteration (a concept introduced by  for input data representation, sim  or agree  for dissimilarity measure and the use of average group distance in the clustering algorithm produce the most robust classes of array-CGH profiles. AVAILABILITY: The software is available from http://bioinfo.curie.fr/projects/cgh-clustering. It has also been partly integrated into "Visualization and analysis of array-CGH"(VAMP. The data sets used are publicly available from ACTuDB .
Gustavo A Gomez
Full Text Available The transcription factor Etsrp is required for vasculogenesis and primitive myelopoiesis in zebrafish. When ectopically expressed, etsrp is sufficient to induce the expression of many vascular and myeloid genes in zebrafish. The mammalian homolog of etsrp, ER71/Etv2, is also essential for vascular and hematopoietic development. To identify genes downstream of etsrp, gain-of-function experiments were performed for etsrp in zebrafish embryos followed by transcription profile analysis by microarray. Subsequent in vivo expression studies resulted in the identification of fourteen genes with blood and/or vascular expression, six of these being completely novel. Regulation of these genes by etsrp was confirmed by ectopic induction in etsrp overexpressing embryos and decreased expression in etsrp deficient embryos. Additional functional analysis of two newly discovered genes, hapln1b and sh3gl3, demonstrates their importance in embryonic vascular development. The results described here identify a group of genes downstream of etsrp likely to be critical for vascular and/or myeloid development.
Full Text Available Puccinellia tenuiflora is a monocotyledonous halophyte that is able to survive in extreme saline soil environments at an alkaline pH range of 9–10. In this study, we transformed full-length cDNAs of P. tenuiflora into Saccharomyces cerevisiae by using the full-length cDNA over-expressing gene-hunting system to identify novel salt-tolerance genes. In all, 32 yeast clones overexpressing P. tenuiflora cDNA were obtained by screening under NaCl stress conditions; of these, 31 clones showed stronger tolerance to NaCl and were amplified using polymerase chain reaction (PCR and sequenced. Four novel genes encoding proteins with unknown function were identified; these genes had no homology with genes from higher plants. Of the four isolated genes, two that encoded proteins with two transmembrane domains showed the strongest resistance to 1.3 M NaCl. RT-PCR and northern blot analysis of P. tenuiflora cultured cells confirmed the endogenous NaCl-induced expression of the two proteins. Both of the proteins conferred better tolerance in yeasts to high salt, alkaline and osmotic conditions, some heavy metals and H2O2 stress. Thus, we inferred that the two novel proteins might alleviate oxidative and other stresses in P. tenuiflora.
Full Text Available BACKGROUND: The Chinese mitten crab Eriocheir sinensis is an important economic crustacean and has been seriously attacked by various diseases, which requires more and more information for immune relevant genes on genome background. Recently, high-throughput RNA sequencing (RNA-seq technology provides a powerful and efficient method for transcript analysis and immune gene discovery. METHODS/PRINCIPAL FINDINGS: A cDNA library from hepatopancreas of E. sinensis challenged by a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 10(8 cfu·mL(-1 was constructed and randomly sequenced using Illumina technique. Totally 39.76 million clean reads were assembled to 70,300 unigenes. After ruling out short-length and low-quality sequences, 52,074 non-redundant unigenes were compared to public databases for homology searching and 17,617 of them showed high similarity to sequences in NCBI non-redundant protein (Nr database. For function classification and pathway assignment, 18,734 (36.00% unigenes were categorized to three Gene Ontology (GO categories, 12,243 (23.51% were classified to 25 Clusters of Orthologous Groups (COG, and 8,983 (17.25% were assigned to six Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. Potentially, 24, 14, 47 and 132 unigenes were characterized to be involved in Toll, IMD, JAK-STAT and MAPK pathways, respectively. CONCLUSIONS/SIGNIFICANCE: This is the first systematical transcriptome analysis of components relating to innate immune pathways in E. sinensis. Functional genes and putative pathways identified here will contribute to better understand immune system and prevent various diseases in crab.
Kothiwale, Sandeepkumar; Borza, Corina M; Lowe, Edward W; Pozzi, Ambra; Meiler, Jens
Discoidin domain receptor (DDR) 1 and 2 are transmembrane receptors that belong to the family of receptor tyrosine kinases (RTK). Upon collagen binding, DDRs transduce cellular signaling involved in various cell functions, including cell adhesion, proliferation, differentiation, migration, and matrix homeostasis. Altered DDR function resulting from either mutations or overexpression has been implicated in several types of disease, including atherosclerosis, inflammation, cancer, and tissue fibrosis. Several established inhibitors, such as imatinib, dasatinib, and nilotinib, originally developed as Abelson murine leukemia (Abl) kinase inhibitors, have been found to inhibit DDR kinase activity. As we review here, recent discoveries of novel inhibitors and their co-crystal structure with the DDR1 kinase domain have made structure-based drug discovery for DDR1 amenable. PMID:25284748
Full Text Available How to discover hotspot in the Internet public opinions effectively is a hot research field for the researchers related which plays a key role for governments and corporations to find useful information from mass data in the Internet. An improved -means algorithm for hotspot discovery in internet public opinions is presented based on the analysis of existing defects and calculation principle of original -means algorithm. First, some new methods are designed to preprocess website texts, select and express the characteristics of website texts, and define the similarity between two website texts, respectively. Second, clustering principle and the method of initial classification centers selection are analyzed and improved in order to overcome the limitations of original -means algorithm. Finally, the experimental results verify that the improved algorithm can improve the clustering stability and classification accuracy of hotspot discovery in internet public opinions when used in practice.
Mungall, Christopher J; Washington, Nicole L; Nguyen-Xuan, Jeremy; Condit, Christopher; Smedley, Damian; Köhler, Sebastian; Groza, Tudor; Shefchek, Kent; Hochheiser, Harry; Robinson, Peter N; Lewis, Suzanna E; Haendel, Melissa A
The Matchmaker Exchange application programming interface (API) allows searching a patient's genotypic or phenotypic profiles across clinical sites, for the purposes of cohort discovery and variant disease causal validation. This API can be used not only to search for matching patients, but also to match against public disease and model organism data. This public disease data enable matching known diseases and variant-phenotype associations using phenotype semantic similarity algorithms developed by the Monarch Initiative. The model data can provide additional evidence to aid diagnosis, suggest relevant models for disease mechanism and treatment exploration, and identify collaborators across the translational divide. The Monarch Initiative provides an implementation of this API for searching multiple integrated sources of data that contextualize the knowledge about any given patient or patient family into the greater biomedical knowledge landscape. While this corpus of data can aid diagnosis, it is also the beginning of research to improve understanding of rare human diseases. PMID:26269093
Lam Alex C
Full Text Available Abstract Background The aetiology of the autoimmune disease type 1 diabetes (T1D involves many genetic and environmental factors. Evidence suggests that innate immune responses, including the action of interferons, may also play a role in the initiation and/or pathogenic process of autoimmunity. In the present report, we have adopted a linkage disequilibrium (LD mapping approach to test for an association between T1D and three regions encompassing 13 interferon alpha (IFNA genes, interferon omega-1 (IFNW1, interferon beta-1 (IFNB1, interferon gamma (IFNG and the interferon consensus-sequence binding protein 1 (ICSBP1. Results We identified 238 variants, most, single nucleotide polymorphisms (SNPs, by sequencing IFNA, IFNB1, IFNW1 and ICSBP1, 98 of which where novel when compared to dbSNP build 124. We used polymorphisms identified in the SeattleSNP database for INFG. A set of tag SNPs was selected for each of the interferon and interferon-related genes to test for an association between T1D and this complex gene family. A total of 45 tag SNPs were selected and genotyped in a collection of 472 multiplex families. Conclusion We have developed informative sets of SNPs for the interferon and interferon related genes. No statistical evidence of a major association between T1D and any of the interferon and interferon related genes tested was found.
Full Text Available To pursue a systematic approach to discovery of mechanisms of action of traditional Chinese medicine (TCM, we used microarrays, bioinformatics and the "Connectivity Map" (CMAP to examine TCM-induced changes in gene expression. We demonstrated that this approach can be used to elucidate new molecular targets using a model TCM herbal formula Si-Wu-Tang (SWT which is widely used for women's health. The human breast cancer MCF-7 cells treated with 0.1 µM estradiol or 2.56 mg/ml of SWT showed dramatic gene expression changes, while no significant change was detected for ferulic acid, a known bioactive compound of SWT. Pathway analysis using differentially expressed genes related to the treatment effect identified that expression of genes in the nuclear factor erythroid 2-related factor 2 (Nrf2 cytoprotective pathway was most significantly affected by SWT, but not by estradiol or ferulic acid. The Nrf2-regulated genes HMOX1, GCLC, GCLM, SLC7A11 and NQO1 were upregulated by SWT in a dose-dependent manner, which was validated by real-time RT-PCR. Consistently, treatment with SWT and its four herbal ingredients resulted in an increased antioxidant response element (ARE-luciferase reporter activity in MCF-7 and HEK293 cells. Furthermore, the gene expression profile of differentially expressed genes related to SWT treatment was used to compare with those of 1,309 compounds in the CMAP database. The CMAP profiles of estradiol-treated MCF-7 cells showed an excellent match with SWT treatment, consistent with SWT's widely claimed use for women's diseases and indicating a phytoestrogenic effect. The CMAP profiles of chemopreventive agents withaferin A and resveratrol also showed high similarity to the profiles of SWT. This study identified SWT as an Nrf2 activator and phytoestrogen, suggesting its use as a nontoxic chemopreventive agent, and demonstrated the feasibility of combining microarray gene expression profiling with CMAP mining to discover mechanisms
Montiel, Daniel; Kang, Hahk-Soo; Chang, Fang-Yuan; Charlop-Powers, Zachary; Brady, Sean F
Large-scale sequencing of prokaryotic (meta)genomic DNA suggests that most bacterial natural product gene clusters are not expressed under common laboratory culture conditions. Silent gene clusters represent a promising resource for natural product discovery and the development of a new generation of therapeutics. Unfortunately, the characterization of molecules encoded by these clusters is hampered owing to our inability to express these gene clusters in the laboratory. To address this bottleneck, we have developed a promoter-engineering platform to transcriptionally activate silent gene clusters in a model heterologous host. Our approach uses yeast homologous recombination, an auxotrophy complementation-based yeast selection system and sequence orthogonal promoter cassettes to exchange all native promoters in silent gene clusters with constitutively active promoters. As part of this platform, we constructed and validated a set of bidirectional promoter cassettes consisting of orthogonal promoter sequences, Streptomyces ribosome binding sites, and yeast selectable marker genes. Using these tools we demonstrate the ability to simultaneously insert multiple promoter cassettes into a gene cluster, thereby expediting the reengineering process. We apply this method to model active and silent gene clusters (rebeccamycin and tetarimycin) and to the silent, cryptic pseudogene-containing, environmental DNA-derived Lzr gene cluster. Complete promoter refactoring and targeted gene exchange in this "dead" cluster led to the discovery of potent indolotryptoline antiproliferative agents, lazarimides A and B. This potentially scalable and cost-effective promoter reengineering platform should streamline the discovery of natural products from silent natural product biosynthetic gene clusters. PMID:26150486
Balint, Eva; Lapointe, David; Drissi, Hicham; van der Meijden, Caroline; Young, Daniel W; van Wijnen, Andre J; Stein, Janet L; Stein, Gary S; Lian, Jane B
osteogenic phenotype is recognized by 8 h, reflected by downregulation of most myogenic-related genes and induction of a spectrum of signaling proteins and enzymes facilitating synthesis and assembly of an extracellular skeletal environment. These genes included collagens Type I and VI and the small leucine rich repeat family of proteoglycans (e.g., decorin, biglycan, osteomodulin, fibromodulin, and osteoadherin/osteoglycin) that reached peak expression at 24 h. With extracellular matrix development, the bone phenotype was further established from 16 to 24 h by induction of genes for cell adhesion and communication and enzymes that organize the bone ECM. Our microarray analysis resulted in the discovery of a class of genes, initially described in relation to differentiation of astrocytes and oligodendrocytes that are functionally coupled to signals for cellular extensions. They include nexin, neuropilin, latexin, neuroglian, neuron specific gene 1, and Ulip; suggesting novel roles for these genes in the bone microenvironment. This global analysis identified a multistage molecular and cellular cascade that supports BMP-2-mediated osteoblast differentiation. PMID:12704803
Birnbaum, Mark J.; Picco, Jenna; Clements, Meghan; Witwicka, Hanna; Yang, Meiheng; Hoey, Margaret T.; Odgren, Paul R.
A key goal of molecular/cell biology/biotechnology is to identify essential genes in virtually every physiological process to uncover basic mechanisms of cell function and to establish potential targets of drug therapy combating human disease. This article describes a semester-long, project-oriented molecular/cellular/biotechnology laboratory…
Gligorijevic, Djordje; Stojanovic, Jelena; Djuric, Nemanja; Radosavljevic, Vladan; Grbovic, Mihajlo; Kulathinal, Rob J; Obradovic, Zoran
Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies. PMID:27578529
Verma, Mohit; Ghangal, Rajesh; Sharma, Raghvendra; Sinha, Alok K; Jain, Mukesh
The medicinal plant, Catharanthus roseus, accumulates wide range of terpenoid indole alkaloids, which are well documented therapeutic agents. In this study, deep transcriptome sequencing of C. roseus was carried out to identify the pathways and enzymes (genes) involved in biosynthesis of these compounds. About 343 million reads were generated from different tissues (leaf, flower and root) of C. roseus using Illumina platform. Optimization of de novo assembly involving a two-step process resulted in a total of 59,220 unique transcripts with an average length of 1284 bp. Comprehensive functional annotation and gene ontology (GO) analysis revealed the representation of many genes involved in different biological processes and molecular functions. In total, 65% of C. roseus transcripts showed homology with sequences available in various public repositories, while remaining 35% unigenes may be considered as C. roseus specific. In silico analysis revealed presence of 11,620 genic simple sequence repeats (excluding mono-nucleotide repeats) and 1820 transcription factor encoding genes in C. roseus transcriptome. Expression analysis showed roots and leaves to be actively participating in bisindole alkaloid production with clear indication that enzymes involved in pathway of vindoline and vinblastine biosynthesis are restricted to aerial tissues. Such large-scale transcriptome study provides a rich source for understanding plant-specialized metabolism, and is expected to promote research towards production of plant-derived pharmaceuticals. PMID:25072156
Full Text Available The medicinal plant, Catharanthus roseus, accumulates wide range of terpenoid indole alkaloids, which are well documented therapeutic agents. In this study, deep transcriptome sequencing of C. roseus was carried out to identify the pathways and enzymes (genes involved in biosynthesis of these compounds. About 343 million reads were generated from different tissues (leaf, flower and root of C. roseus using Illumina platform. Optimization of de novo assembly involving a two-step process resulted in a total of 59,220 unique transcripts with an average length of 1284 bp. Comprehensive functional annotation and gene ontology (GO analysis revealed the representation of many genes involved in different biological processes and molecular functions. In total, 65% of C. roseus transcripts showed homology with sequences available in various public repositories, while remaining 35% unigenes may be considered as C. roseus specific. In silico analysis revealed presence of 11,620 genic simple sequence repeats (excluding mono-nucleotide repeats and 1820 transcription factor encoding genes in C. roseus transcriptome. Expression analysis showed roots and leaves to be actively participating in bisindole alkaloid production with clear indication that enzymes involved in pathway of vindoline and vinblastine biosynthesis are restricted to aerial tissues. Such large-scale transcriptome study provides a rich source for understanding plant-specialized metabolism, and is expected to promote research towards production of plant-derived pharmaceuticals.
Jiao, Yinping; Burke, John; Chopra, Ratan; Burow, Gloria; Chen, Junping; Wang, Bo; Hayes, Chad; Emendack, Yves; Ware, Doreen; Xin, Zhanguo
Sorghum (Sorghum bicolor) is a versatile C4 crop and a model for research in family Poaceae. High-quality genome sequence is available for the elite inbred line BTx623, but functional validation of genes remains challenging due to the limited genomic and germplasm resources available for comprehensive analysis of induced mutations. In this study, we generated 6400 pedigreed M4 mutant pools from EMS-mutagenized BTx623 seeds through single-seed descent. Whole-genome sequencing of 256 phenotyped mutant lines revealed >1.8 million canonical EMS-induced mutations, affecting >95% of genes in the sorghum genome. The vast majority (97.5%) of the induced mutations were distinct from natural variations. To demonstrate the utility of the sequenced sorghum mutant resource, we performed reverse genetics to identify eight genes potentially affecting drought tolerance, three of which had allelic mutations and two of which exhibited exact cosegregation with the phenotype of interest. Our results establish that a large-scale resource of sequenced pedigreed mutants provides an efficient platform for functional validation of genes in sorghum, thereby accelerating sorghum breeding. Moreover, findings made in sorghum could be readily translated to other members of the Poaceae via integrated genomics approaches. PMID:27354556
Smink Luc J
Full Text Available Abstract Background Type 1 diabetes (T1D is a common autoimmune disease resulting from T-cell mediated destruction of pancreatic beta cells. Decay accelerating factor (DAF, CD55, a glycosylphosphatidylinositol-anchored membrane protein, is a candidate for autoimmune disease susceptibility based on its role in restricting complement activation and evidence that DAF expression modulates the phenotype of mice models for autoimmune disease. In this study, we adopt a linkage disequilibrium (LD mapping approach to test for an association between the DAF gene and T1D. Results Initially, we used HapMap II genotype data to examine LD across the DAF region. Additional resequencing was required, identifying 16 novel polymorphisms. Combining both datasets, a LD mapping approach was adopted to test for association with T1D. Seven tag SNPs were selected and genotyped in case-control (3,523 cases and 3,817 controls and family (725 families collections. Conclusion We obtained no evidence of association between T1D and the DAF region in two independent collections. In addition, we assessed the impact of using only HapMap II genotypes for the selection of tag SNPs and, based on this study, found that HapMap II genotypes may require additional SNP discovery for comprehensive LD mapping of some genes in common disease.
Perry, Antoinette S
Aberrant activation of Wnts is common in human cancers, including prostate. Hypermethylation associated transcriptional silencing of Wnt antagonist genes SFRPs (Secreted Frizzled-Related Proteins) is a frequent oncogenic event. The significance of this is not known in prostate cancer. The objectives of our study were to (i) profile Wnt signaling related gene expression and (ii) investigate methylation of Wnt antagonist genes in prostate cancer. Using TaqMan Low Density Arrays, we identified 15 Wnt signaling related genes with significantly altered expression in prostate cancer; the majority of which were upregulated in tumors. Notably, histologically benign tissue from men with prostate cancer appeared more similar to tumor (r = 0.76) than to benign prostatic hyperplasia (BPH; r = 0.57, p < 0.001). Overall, the expression profile was highly similar between tumors of high (≥ 7) and low (≤ 6) Gleason scores. Pharmacological demethylation of PC-3 cells with 5-Aza-CdR reactivated 39 genes (≥ 2-fold); 40% of which inhibit Wnt signaling. Methylation frequencies in prostate cancer were 10% (2\\/20) (SFRP1), 64.86% (48\\/74) (SFRP2), 0% (0\\/20) (SFRP4) and 60% (12\\/20) (SFRP5). SFRP2 methylation was detected at significantly lower frequencies in high-grade prostatic intraepithelial neoplasia (HGPIN; 30%, (6\\/20), p = 0.0096), tumor adjacent benign areas (8.82%, (7\\/69), p < 0.0001) and BPH (11.43% (4\\/35), p < 0.0001). The quantitative level of SFRP2 methylation (normalized index of methylation) was also significantly higher in tumors (116) than in the other samples (HGPIN = 7.45, HB = 0.47, and BPH = 0.12). We show that SFRP2 hypermethylation is a common event in prostate cancer. SFRP2 methylation in combination with other epigenetic markers may be a useful biomarker of prostate cancer.
Nicholas R Polato
Full Text Available BACKGROUND: Cnidarians, including corals and anemones, offer unique insights into metazoan evolution because they harbor genetic similarities with vertebrates beyond that found in model invertebrates and retain genes known only from non-metazoans. Cataloging genes expressed in Acropora palmata, a foundation-species of reefs in the Caribbean and western Atlantic, will advance our understanding of the genetic basis of ecologically important traits in corals and comes at a time when sequencing efforts in other cnidarians allow for multi-species comparisons. RESULTS: A cDNA library from a sample enriched for symbiont free larval tissue was sequenced on the 454 GS-FLX platform. Over 960,000 reads were obtained and assembled into 42,630 contigs. Annotation data was acquired for 57% of the assembled sequences. Analysis of the assembled sequences indicated that 83-100% of all A. palmata transcripts were tagged, and provided a rough estimate of the total number genes expressed in our samples (~18,000-20,000. The coral annotation data contained many of the same molecular components as in the Bilateria, particularly in pathways associated with oxidative stress and DNA damage repair, and provided evidence that homologs of p53, a key player in DNA repair pathways, has experienced selection along the branch separating Cnidaria and Bilateria. Transcriptome wide screens of paralog groups and transition/transversion ratios highlighted genes including: green fluorescent proteins, carbonic anhydrase, and oxidative stress proteins; and functional groups involved in protein and nucleic acid metabolism, and the formation of structural molecules. These results provide a starting point for study of adaptive evolution in corals. CONCLUSIONS: Currently available transcriptome data now make comparative studies of the mechanisms underlying coral's evolutionary success possible. Here we identified candidate genes that enable corals to maintain genomic integrity despite
Demian Antony D’Mello
Full Text Available E-learning is currently taking the shape of a Web Service in various applications i.e. learners can search for suitable content, book it, pay for it and consume it. This paper shows how the search aspects for e-learning content can technically be combined with the recent standardization efforts that aim at content exchangeability and efficient reuse. A repository for learning object publication and search is proposed that essentially adapts the UDDI framework used in commercial Web Services to the e-learning context. To adopt Web Services technology towards the reusability and aggregation of e-learning services, the conceptual Web Services architecture and its building blocks need to be augmented. The objective of this research is to design broker based registry architecture for e- Web services which facilitates effective elearning content/service discovery for the consumption or composition. The implementation followed by experimentation showed that, the proposed e-learning discovery architecture facilitates effective discovery with moderate performance in terms of overall response.
Hung-Chih Hsueh; Jung-Yi Jiang; Jen-Sheng Tsai; Wen-Hao Tsai; Kuan-Rong Lee; Yau-Hwang Kuo
Abstract⎯In this paper, we propose a knowledge discovery method based on the fuzzy set theory to help elders with plant cultivation. Initially, the fuzzy sets are constructed by using the feature selection and statistical interval estimation. The min-max inference and the center of gravity defuzzification method are then used to output a candidate pattern set. Finally, a pattern discovery is adopted to obtain the patterns from the candidate set for the cultivation suggestions by considering the frequency weight and user’s experience. In order to demonstrate the performance of our method in planting systems, we conduct a clicks-and-mortar cultivation platform, namely Eden Garden, for the elderly lifestyles of health and sustainability (LOHAS). The experimental results show that the accuracy rate of our knowledge discovery method can reach up to 85%. Moreover, the results of the LOHAS index scale table present that the happiness of the elders is increasing while the elders are using our proposed method.
TANG Shan-cheng; HOU Yi-bin
Reducing cost of service is an important goal for resource discovery and interaction technologies. The shortcomings of transhipment-method and hibernation-method are to increase holistic cost of service and to slower resource discovery respectively. To overcome these shortcomings, a context-aware computing-based method is developed. This method, firstly,analyzes the courses of devices using resource discovery and interaction technologies to identify some types of context related to reducing cost of service, then, chooses effective methods such as stopping broadcast and hibernation to reduce cost of service according to information supplied by the context but not the transhipment-method's simple hibernations. The results of experiments indicate that under the worst condition this method overcomes the shortcomings of transhipment-method, makes the "poor" devices hibernate longer than hibernation-method to reduce cost of service more effectively, and discovers resources faster than hibernation-method; under the best condition it is far better than hibernation-method in all aspects.
Loots, Gabriela G.; Ovcharenko, Ivan; Pachter, Lior; Dubchak, Inna; Rubin, Edward M.
Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVISTA, for high-throughput discovery of cis-regulatory elements that combines transcription factor binding site prediction and the analysis of inter-species sequence conservation. Here, we illustrate the ability of rVISTA to identify true transcription factor binding sites through the analysis of AP-1 and NFAT binding sites in the 1 Mb well-annotated cytokine gene cluster1 (Hs5q31; Mm11). The exploitation of orthologous human-mouse data set resulted in the elimination of 95 percent of the 38,000 binding sites predicted upon analysis of the human sequence alone, while it identified 87 percent of the experimentally verified binding sites in this region.
DeAngelis, K.M.; Gladden, J.G.; Allgaier, M.; D' haeseleer, P.; Fortney, J.L.; Reddy, A.; Hugenholtz, P.; Singer, S.W.; Vander Gheynst, J.; Silver, W.L.; Simmons, B.; Hazen, T.C.
Producing cellulosic biofuels from plant material has recently emerged as a key U.S. Department of Energy goal. For this technology to be commercially viable on a large scale, it is critical to make production cost efficient by streamlining both the deconstruction of lignocellulosic biomass and fuel production. Many natural ecosystems efficiently degrade lignocellulosic biomass and harbor enzymes that, when identified, could be used to increase the efficiency of commercial biomass deconstruction. However, ecosystems most likely to yield relevant enzymes, such as tropical rain forest soil in Puerto Rico, are often too complex for enzyme discovery using current metagenomic sequencing technologies. One potential strategy to overcome this problem is to selectively cultivate the microbial communities from these complex ecosystems on biomass under defined conditions, generating less complex biomass-degrading microbial populations. To test this premise, we cultivated microbes from Puerto Rican soil or green waste compost under precisely defined conditions in the presence dried ground switchgrass (Panicum virgatum L.) or lignin, respectively, as the sole carbon source. Phylogenetic profiling of the two feedstock-adapted communities using SSU rRNA gene amplicon pyrosequencing or phylogenetic microarray analysis revealed that the adapted communities were significantly simplified compared to the natural communities from which they were derived. Several members of the lignin-adapted and switchgrass-adapted consortia are related to organisms previously characterized as biomass degraders, while others were from less well-characterized phyla. The decrease in complexity of these communities make them good candidates for metagenomic sequencing and will likely enable the reconstruction of a greater number of full length genes, leading to the discovery of novel lignocellulose-degrading enzymes adapted to feedstocks and conditions of interest.
Steinfeld Israel; Navon Roy; Eden Eran; Lipson Doron; Yakhini Zohar
Abstract Background Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on ...
Singh, Vikash K.; Mukesh Jain
Flower development is one of the major developmental processes that governs seed setting in angiosperms. However, little is known about the molecular mechanisms underlying flower development in legumes. Employing RNA-seq for various stages of flower development and few vegetative tissues in chickpea, we identified differentially expressed genes in flower tissues/stages in comparison to vegetative tissues, which are related to various biological processes and molecular functions during flower ...
Truong, Hoa T; Ramos, A Marcos; Yalcin, Feyruz; de Ruiter, Marjo; van der Poel, Hein J A; Huvenaars, Koen H J; Hogers, René C J; van Enckevort, Leonora J G; Janssen, Antoine; van Orsouw, Nathalie J; van Eijk, Michiel J T
Conventional marker-based genotyping platforms are widely available, but not without their limitations. In this context, we developed Sequence-Based Genotyping (SBG), a technology for simultaneous marker discovery and co-dominant scoring, using next-generation sequencing. SBG offers users several advantages including a generic sample preparation method, a highly robust genome complexity reduction strategy to facilitate de novo marker discovery across entire genomes, and a uniform bioinformatics workflow strategy to achieve genotyping goals tailored to individual species, regardless of the availability of a reference sequence. The most distinguishing features of this technology are the ability to genotype any population structure, regardless whether parental data is included, and the ability to co-dominantly score SNP markers segregating in populations. To demonstrate the capabilities of SBG, we performed marker discovery and genotyping in Arabidopsis thaliana and lettuce, two plant species of diverse genetic complexity and backgrounds. Initially we obtained 1,409 SNPs for arabidopsis, and 5,583 SNPs for lettuce. Further filtering of the SNP dataset produced over 1,000 high quality SNP markers for each species. We obtained a genotyping rate of 201.2 genotypes/SNP and 58.3 genotypes/SNP for arabidopsis (n = 222 samples) and lettuce (n = 87 samples), respectively. Linkage mapping using these SNPs resulted in stable map configurations. We have therefore shown that the SBG approach presented provides users with the utmost flexibility in garnering high quality markers that can be directly used for genotyping and downstream applications. Until advances and costs will allow for routine whole-genome sequencing of populations, we expect that sequence-based genotyping technologies such as SBG will be essential for genotyping of model and non-model genomes alike. PMID:22662172
Zhang, Kun; Janzing, Dominik; Schoelkopf, Bernhard
Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties.
Jung Me Hwang
Full Text Available We have developed a simple but highly efficient in-cell protein-protein interaction (PPI discovery system based on the translocation properties of protein kinase C- and its C1a domain in live cells. This system allows the visual detection of trimeric and dimeric protein interactions including cytosolic, nuclear, and/or membrane proteins with their cognate ligands. In addition, this system can be used to identify pharmacological small compounds that inhibit specific PPIs. These properties make this PPI system an attractive tool for screening drug candidates and mapping the protein interactome.
New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples. AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data, performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully
Bilal Çakir; Onur Dağliyan; Ezgi Dağyildiz; İbrahim Bariş; Ibrahim Halil Kavakli; Seda Kizilel; Metin Türkay
Structure Based Discovery of Small Molecules to Regulate the Activity of Human Insulin Degrading Enzyme Bilal C¸ akir1, Onur Dag˘ liyan1, Ezgi Dag˘ yildiz1, I˙brahim Baris¸1, Ibrahim Halil Kavakli1,2*, Seda Kizilel1*, Metin Tu¨ rkay3* 1 Department of Chemical and Biological Engineering, Koc¸ University, Sariyer, Istanbul, Turkey, 2 Department of Molecular Biology and Genetics, Koc¸ University, Sariyer, Istanbul, Turkey, 3 Department of Industrial Engineering, Koc¸ University...