WorldWideScience

Sample records for big genomes facilitate

  1. Big Data Analytics for Genomic Medicine.

    Science.gov (United States)

    He, Karen Y; Ge, Dongliang; He, Max M

    2017-02-15

    Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients' genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.

  2. Big Data Analytics for Genomic Medicine

    Science.gov (United States)

    He, Karen Y.; Ge, Dongliang; He, Max M.

    2017-01-01

    Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients’ genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs. PMID:28212287

  3. Privacy Challenges of Genomic Big Data.

    Science.gov (United States)

    Shen, Hong; Ma, Jian

    2017-01-01

    With the rapid advancement of high-throughput DNA sequencing technologies, genomics has become a big data discipline where large-scale genetic information of human individuals can be obtained efficiently with low cost. However, such massive amount of personal genomic data creates tremendous challenge for privacy, especially given the emergence of direct-to-consumer (DTC) industry that provides genetic testing services. Here we review the recent development in genomic big data and its implications on privacy. We also discuss the current dilemmas and future challenges of genomic privacy.

  4. Genome Variation Map: a data repository of genome variations in BIG Data Center

    OpenAIRE

    Song, Shuhui; Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang; Zhang, Zhang

    2017-01-01

    Abstract The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research a...

  5. Big Data Analysis of Human Genome Variations

    KAUST Repository

    Gojobori, Takashi

    2016-01-25

    Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations: 1) HapMap Data (1,417 individuals) (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/), 2) HGDP (Human Genome Diversity Project) Data (940 individuals) (http://www.hagsc.org/hgdp/files.html), 3) 1000 genomes Data (2,504 individuals) http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets. Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.

  6. The Human Genome Project: big science transforms biology and medicine

    OpenAIRE

    Hood, Leroy; Rowen, Lee

    2013-01-01

    The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called ‘big science’ - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and a...

  7. Genome Variation Map: a data repository of genome variations in BIG Data Center.

    Science.gov (United States)

    Song, Shuhui; Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang; Zhang, Zhang

    2018-01-04

    The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Unlike existing related databases, GVM features integration of a large number of genome variations for a broad diversity of species including human, cultivated plants and domesticated animals. Specifically, the current implementation of GVM not only houses a total of ∼4.9 billion variants for 19 species including chicken, dog, goat, human, poplar, rice and tomato, but also incorporates 8669 individual genotypes and 13 262 manually curated high-quality genotype-to-phenotype associations for non-human species. In addition, GVM provides friendly intuitive web interfaces for data submission, browse, search and visualization. Collectively, GVM serves as an important resource for archiving genomic variation data, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Genome Variation Map: a data repository of genome variations in BIG Data Center

    Science.gov (United States)

    Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang

    2018-01-01

    Abstract The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Unlike existing related databases, GVM features integration of a large number of genome variations for a broad diversity of species including human, cultivated plants and domesticated animals. Specifically, the current implementation of GVM not only houses a total of ∼4.9 billion variants for 19 species including chicken, dog, goat, human, poplar, rice and tomato, but also incorporates 8669 individual genotypes and 13 262 manually curated high-quality genotype-to-phenotype associations for non-human species. In addition, GVM provides friendly intuitive web interfaces for data submission, browse, search and visualization. Collectively, GVM serves as an important resource for archiving genomic variation data, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes. PMID:29069473

  9. The Human Genome Project: big science transforms biology and medicine.

    Science.gov (United States)

    Hood, Leroy; Rowen, Lee

    2013-01-01

    The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called 'big science' - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and analytical tools, and how it brought the expertise of engineers, computer scientists and mathematicians together with biologists. It established an open approach to data sharing and open-source software, thereby making the data resulting from the project accessible to all. The genome sequences of microbes, plants and animals have revolutionized many fields of science, including microbiology, virology, infectious disease and plant biology. Moreover, deeper knowledge of human sequence variation has begun to alter the practice of medicine. The Human Genome Project has inspired subsequent large-scale data acquisition initiatives such as the International HapMap Project, 1000 Genomes, and The Cancer Genome Atlas, as well as the recently announced Human Brain Project and the emerging Human Proteome Project.

  10. Genome-wide signatures of complex introgression and adaptive evolution in the big cats

    Science.gov (United States)

    Figueiró, Henrique V.; Li, Gang; Trindade, Fernanda J.; Assis, Juliana; Pais, Fabiano; Fernandes, Gabriel; Santos, Sarah H. D.; Hughes, Graham M.; Komissarov, Aleksey; Antunes, Agostinho; Trinca, Cristine S.; Rodrigues, Maíra R.; Linderoth, Tyler; Bi, Ke; Silveira, Leandro; Azevedo, Fernando C. C.; Kantek, Daniel; Ramalho, Emiliano; Brassaloti, Ricardo A.; Villela, Priscilla M. S.; Nunes, Adauto L. V.; Teixeira, Rodrigo H. F.; Morato, Ronaldo G.; Loska, Damian; Saragüeta, Patricia; Gabaldón, Toni; Teeling, Emma C.; O’Brien, Stephen J.; Nielsen, Rasmus; Coutinho, Luiz L.; Oliveira, Guilherme; Murphy, William J.; Eizirik, Eduardo

    2017-01-01

    The great cats of the genus Panthera comprise a recent radiation whose evolutionary history is poorly understood. Their rapid diversification poses challenges to resolving their phylogeny while offering opportunities to investigate the historical dynamics of adaptive divergence. We report the sequence, de novo assembly, and annotation of the jaguar (Panthera onca) genome, a novel genome sequence for the leopard (Panthera pardus), and comparative analyses encompassing all living Panthera species. Demographic reconstructions indicated that all of these species have experienced variable episodes of population decline during the Pleistocene, ultimately leading to small effective sizes in present-day genomes. We observed pervasive genealogical discordance across Panthera genomes, caused by both incomplete lineage sorting and complex patterns of historical interspecific hybridization. We identified multiple signatures of species-specific positive selection, affecting genes involved in craniofacial and limb development, protein metabolism, hypoxia, reproduction, pigmentation, and sensory perception. There was remarkable concordance in pathways enriched in genomic segments implicated in interspecies introgression and in positive selection, suggesting that these processes were connected. We tested this hypothesis by developing exome capture probes targeting ~19,000 Panthera genes and applying them to 30 wild-caught jaguars. We found at least two genes (DOCK3 and COL4A5, both related to optic nerve development) bearing significant signatures of interspecies introgression and within-species positive selection. These findings indicate that post-speciation admixture has contributed genetic material that facilitated the adaptive evolution of big cat lineages. PMID:28776029

  11. A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.

    Science.gov (United States)

    Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi

    2014-01-01

    With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).

  12. Machine learning for Big Data analytics in plants.

    Science.gov (United States)

    Ma, Chuang; Zhang, Hao Helen; Wang, Xiangfeng

    2014-12-01

    Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. Harnessing Omics Big Data in Nine Vertebrate Species by Genome-Wide Prioritization of Sequence Variants with the Highest Predicted Deleterious Effect on Protein Function.

    Science.gov (United States)

    Rozman, Vita; Kunej, Tanja

    2018-05-10

    Harnessing the genomics big data requires innovation in how we extract and interpret biologically relevant variants. Currently, there is no established catalog of prioritized missense variants associated with deleterious protein function phenotypes. We report in this study, to the best of our knowledge, the first genome-wide prioritization of sequence variants with the most deleterious effect on protein function (potentially deleterious variants [pDelVars]) in nine vertebrate species: human, cattle, horse, sheep, pig, dog, rat, mouse, and zebrafish. The analysis was conducted using the Ensembl/BioMart tool. Genes comprising pDelVars in the highest number of examined species were identified using a Python script. Multiple genomic alignments of the selected genes were built to identify interspecies orthologous potentially deleterious variants, which we defined as the "ortho-pDelVars." Genome-wide prioritization revealed that in humans, 0.12% of the known variants are predicted to be deleterious. In seven out of nine examined vertebrate species, the genes encoding the multiple PDZ domain crumbs cell polarity complex component (MPDZ) and the transforming acidic coiled-coil containing protein 2 (TACC2) comprise pDelVars. Five interspecies ortho-pDelVars were identified in three genes. These findings offer new ways to harness genomics big data by facilitating the identification of functional polymorphisms in humans and animal models and thus provide a future basis for optimization of protocols for whole genome prioritization of pDelVars and screening of orthologous sequence variants. The approach presented here can inform various postgenomic applications such as personalized medicine and multiomics study of health interventions (iatromics).

  14. BigWig and BigBed: enabling browsing of large distributed datasets.

    Science.gov (United States)

    Kent, W J; Zweig, A S; Barber, G; Hinrichs, A S; Karolchik, D

    2010-09-01

    BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets. Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/. Source code for the creation and visualization software is freely available for non-commercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, implemented in C and supported on Linux. The UCSC Genome Browser is available at http://genome.ucsc.edu.

  15. Big Data in Medicine is Driving Big Changes

    Science.gov (United States)

    Verspoor, K.

    2014-01-01

    Summary Objectives To summarise current research that takes advantage of “Big Data” in health and biomedical informatics applications. Methods Survey of trends in this work, and exploration of literature describing how large-scale structured and unstructured data sources are being used to support applications from clinical decision making and health policy, to drug design and pharmacovigilance, and further to systems biology and genetics. Results The survey highlights ongoing development of powerful new methods for turning that large-scale, and often complex, data into information that provides new insights into human health, in a range of different areas. Consideration of this body of work identifies several important paradigm shifts that are facilitated by Big Data resources and methods: in clinical and translational research, from hypothesis-driven research to data-driven research, and in medicine, from evidence-based practice to practice-based evidence. Conclusions The increasing scale and availability of large quantities of health data require strategies for data management, data linkage, and data integration beyond the limits of many existing information systems, and substantial effort is underway to meet those needs. As our ability to make sense of that data improves, the value of the data will continue to increase. Health systems, genetics and genomics, population and public health; all areas of biomedicine stand to benefit from Big Data and the associated technologies. PMID:25123716

  16. Crowd-funded micro-grants for genomics and "big data": an actionable idea connecting small (artisan) science, infrastructure science, and citizen philanthropy.

    Science.gov (United States)

    Özdemir, Vural; Badr, Kamal F; Dove, Edward S; Endrenyi, Laszlo; Geraci, Christy Jo; Hotez, Peter J; Milius, Djims; Neves-Pereira, Maria; Pang, Tikki; Rotimi, Charles N; Sabra, Ramzi; Sarkissian, Christineh N; Srivastava, Sanjeeva; Tims, Hesther; Zgheib, Nathalie K; Kickbusch, Ilona

    2013-04-01

    Biomedical science in the 21(st) century is embedded in, and draws from, a digital commons and "Big Data" created by high-throughput Omics technologies such as genomics. Classic Edisonian metaphors of science and scientists (i.e., "the lone genius" or other narrow definitions of expertise) are ill equipped to harness the vast promises of the 21(st) century digital commons. Moreover, in medicine and life sciences, experts often under-appreciate the important contributions made by citizen scholars and lead users of innovations to design innovative products and co-create new knowledge. We believe there are a large number of users waiting to be mobilized so as to engage with Big Data as citizen scientists-only if some funding were available. Yet many of these scholars may not meet the meta-criteria used to judge expertise, such as a track record in obtaining large research grants or a traditional academic curriculum vitae. This innovation research article describes a novel idea and action framework: micro-grants, each worth $1000, for genomics and Big Data. Though a relatively small amount at first glance, this far exceeds the annual income of the "bottom one billion"-the 1.4 billion people living below the extreme poverty level defined by the World Bank ($1.25/day). We describe two types of micro-grants. Type 1 micro-grants can be awarded through established funding agencies and philanthropies that create micro-granting programs to fund a broad and highly diverse array of small artisan labs and citizen scholars to connect genomics and Big Data with new models of discovery such as open user innovation. Type 2 micro-grants can be funded by existing or new science observatories and citizen think tanks through crowd-funding mechanisms described herein. Type 2 micro-grants would also facilitate global health diplomacy by co-creating crowd-funded micro-granting programs across nation-states in regions facing political and financial instability, while sharing similar disease

  17. Big data, open science and the brain: lessons learned from genomics

    Directory of Open Access Journals (Sweden)

    Suparna eChoudhury

    2014-05-01

    Full Text Available The BRAIN Initiative aims to break new ground in the scale and speed of data collection in neuroscience, requiring tools to handle data in the magnitude of yottabytes (1024. The scale, investment and organization of it are being compared to the Human Genome Project (HGP, which has exemplified ‘big science’ for biology. In line with the trend towards Big Data in genomic research, the promise of the BRAIN Initiative, as well as the European Human Brain Project, rests on the possibility to amass vast quantities of data to model the complex interactions between the brain and behaviour and inform the diagnosis and prevention of neurological disorders and psychiatric disease. Advocates of this ‘data driven’ paradigm in neuroscience argue that harnessing the large quantities of data generated across laboratories worldwide has numerous methodological, ethical and economic advantages, but it requires the neuroscience community to adopt a culture of data sharing and open access to benefit from them. In this article, we examine the rationale for data sharing among advocates and briefly exemplify these in terms of new ‘open neuroscience’ projects. Then, drawing on the frequently invoked model of data sharing in genomics, we go on to demonstrate the complexities of data sharing, shedding light on the sociological and ethical challenges within the realms of institutions, researchers and participants, namely dilemmas around public/private interests in data, (lack of motivation to share in the academic community, and potential loss of participant anonymity. Our paper serves to highlight some foreseeable tensions around data sharing relevant to the emergent ‘open neuroscience’ movement.

  18. Complete mitochondrial genome of the big-eared horseshoe bat Rhinolophus macrotis (Chiroptera, Rhinolophidae).

    Science.gov (United States)

    Zhang, Lin; Sun, Keping; Feng, Jiang

    2016-11-01

    We sequenced and characterized the complete mitochondrial genome of the big-eared horseshoe bat, Rhinolophus macrotis. Total length of the mitogenome is 16,848 bp, with a base composition of 31.2% A, 25.3% T, 28.8% C and 14.7% G. The mitogenome consists of 13 protein-coding genes, 2 rRNA (12S and 16S rRNA) genes, 22 tRNA genes and 1 control region. It has the same gene arrangement pattern as those of typical vertebrate mitochondrial genome. The results will contribute to our understanding of the taxonomic status and evolution in the genus Rhinolophus bats.

  19. Opportunities and challenges of big data for the social sciences: The case of genomic data.

    Science.gov (United States)

    Liu, Hexuan; Guo, Guang

    2016-09-01

    In this paper, we draw attention to one unique and valuable source of big data, genomic data, by demonstrating the opportunities they provide to social scientists. We discuss different types of large-scale genomic data and recent advances in statistical methods and computational infrastructure used to address challenges in managing and analyzing such data. We highlight how these data and methods can be used to benefit social science research. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Crowd-Funded Micro-Grants for Genomics and “Big Data”: An Actionable Idea Connecting Small (Artisan) Science, Infrastructure Science, and Citizen Philanthropy

    Science.gov (United States)

    Badr, Kamal F.; Dove, Edward S.; Endrenyi, Laszlo; Geraci, Christy Jo; Hotez, Peter J.; Milius, Djims; Neves-Pereira, Maria; Pang, Tikki; Rotimi, Charles N.; Sabra, Ramzi; Sarkissian, Christineh N.; Srivastava, Sanjeeva; Tims, Hesther; Zgheib, Nathalie K.; Kickbusch, Ilona

    2013-01-01

    Abstract Biomedical science in the 21st century is embedded in, and draws from, a digital commons and “Big Data” created by high-throughput Omics technologies such as genomics. Classic Edisonian metaphors of science and scientists (i.e., “the lone genius” or other narrow definitions of expertise) are ill equipped to harness the vast promises of the 21st century digital commons. Moreover, in medicine and life sciences, experts often under-appreciate the important contributions made by citizen scholars and lead users of innovations to design innovative products and co-create new knowledge. We believe there are a large number of users waiting to be mobilized so as to engage with Big Data as citizen scientists—only if some funding were available. Yet many of these scholars may not meet the meta-criteria used to judge expertise, such as a track record in obtaining large research grants or a traditional academic curriculum vitae. This innovation research article describes a novel idea and action framework: micro-grants, each worth $1000, for genomics and Big Data. Though a relatively small amount at first glance, this far exceeds the annual income of the “bottom one billion”—the 1.4 billion people living below the extreme poverty level defined by the World Bank ($1.25/day). We describe two types of micro-grants. Type 1 micro-grants can be awarded through established funding agencies and philanthropies that create micro-granting programs to fund a broad and highly diverse array of small artisan labs and citizen scholars to connect genomics and Big Data with new models of discovery such as open user innovation. Type 2 micro-grants can be funded by existing or new science observatories and citizen think tanks through crowd-funding mechanisms described herein. Type 2 micro-grants would also facilitate global health diplomacy by co-creating crowd-funded micro-granting programs across nation-states in regions facing political and financial instability, while

  1. The BIG Data Center: from deposition to integration to translation.

    Science.gov (United States)

    2017-01-04

    Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Personalized medicine beyond genomics: alternative futures in big data-proteomics, environtome and the social proteome.

    Science.gov (United States)

    Özdemir, Vural; Dove, Edward S; Gürsoy, Ulvi K; Şardaş, Semra; Yıldırım, Arif; Yılmaz, Şenay Görücü; Ömer Barlas, I; Güngör, Kıvanç; Mete, Alper; Srivastava, Sanjeeva

    2017-01-01

    No field in science and medicine today remains untouched by Big Data, and psychiatry is no exception. Proteomics is a Big Data technology and a next generation biomarker, supporting novel system diagnostics and therapeutics in psychiatry. Proteomics technology is, in fact, much older than genomics and dates to the 1970s, well before the launch of the international Human Genome Project. While the genome has long been framed as the master or "elite" executive molecule in cell biology, the proteome by contrast is humble. Yet the proteome is critical for life-it ensures the daily functioning of cells and whole organisms. In short, proteins are the blue-collar workers of biology, the down-to-earth molecules that we cannot live without. Since 2010, proteomics has found renewed meaning and international attention with the launch of the Human Proteome Project and the growing interest in Big Data technologies such as proteomics. This article presents an interdisciplinary technology foresight analysis and conceptualizes the terms "environtome" and "social proteome". We define "environtome" as the entire complement of elements external to the human host, from microbiome, ambient temperature and weather conditions to government innovation policies, stock market dynamics, human values, political power and social norms that collectively shape the human host spatially and temporally. The "social proteome" is the subset of the environtome that influences the transition of proteomics technology to innovative applications in society. The social proteome encompasses, for example, new reimbursement schemes and business innovation models for proteomics diagnostics that depart from the "once-a-life-time" genotypic tests and the anticipated hype attendant to context and time sensitive proteomics tests. Building on the "nesting principle" for governance of complex systems as discussed by Elinor Ostrom, we propose here a 3-tiered organizational architecture for Big Data science such as

  3. Big data or bust: realizing the microbial genomics revolution.

    Science.gov (United States)

    Raza, Sobia; Luheshi, Leila

    2016-02-01

    Pathogen genomics has the potential to transform the clinical and public health management of infectious diseases through improved diagnosis, detection and tracking of antimicrobial resistance and outbreak control. However, the wide-ranging benefits of this technology can only fully be realized through the timely collation, integration and sharing of genomic and clinical/epidemiological metadata by all those involved in the delivery of genomic-informed services. As part of our review on bringing pathogen genomics into 'health-service' practice, we undertook extensive stakeholder consultation to examine the factors integral to achieving effective data sharing and integration. Infrastructure tailored to the needs of clinical users, as well as practical support and policies to facilitate the timely and responsible sharing of data with relevant health authorities and beyond, are all essential. We propose a tiered data sharing and integration model to maximize the immediate and longer term utility of microbial genomics in healthcare. Realizing this model at the scale and sophistication necessary to support national and international infection management services is not uncomplicated. Yet the establishment of a clear data strategy is paramount if failures in containing disease spread due to inadequate knowledge sharing are to be averted, and substantial progress made in tackling the dangers posed by infectious diseases.

  4. Database Resources of the BIG Data Center in 2018.

    Science.gov (United States)

    2018-01-04

    The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Considering the Role of Personality in the Work-Family Experience: Relationships of the Big Five to Work-Family Conflict and Facilitation

    Science.gov (United States)

    Wayne, Julie Holliday; Musisca, Nicholas; Fleeson, William

    2004-01-01

    Using a national, random sample (N=2130), we investigated the relationship between each of the Big Five personality traits and conflict and facilitation between work and family roles. Extraversion was related to greater facilitation between roles but was not related to conflict, whereas neuroticism was related to greater conflict but only weakly…

  6. Clinical research of traditional Chinese medicine in big data era.

    Science.gov (United States)

    Zhang, Junhua; Zhang, Boli

    2014-09-01

    With the advent of big data era, our thinking, technology and methodology are being transformed. Data-intensive scientific discovery based on big data, named "The Fourth Paradigm," has become a new paradigm of scientific research. Along with the development and application of the Internet information technology in the field of healthcare, individual health records, clinical data of diagnosis and treatment, and genomic data have been accumulated dramatically, which generates big data in medical field for clinical research and assessment. With the support of big data, the defects and weakness may be overcome in the methodology of the conventional clinical evaluation based on sampling. Our research target shifts from the "causality inference" to "correlativity analysis." This not only facilitates the evaluation of individualized treatment, disease prediction, prevention and prognosis, but also is suitable for the practice of preventive healthcare and symptom pattern differentiation for treatment in terms of traditional Chinese medicine (TCM), and for the post-marketing evaluation of Chinese patent medicines. To conduct clinical studies involved in big data in TCM domain, top level design is needed and should be performed orderly. The fundamental construction and innovation studies should be strengthened in the sections of data platform creation, data analysis technology and big-data professionals fostering and training.

  7. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons.

    Science.gov (United States)

    Braasch, Ingo; Gehrke, Andrew R; Smith, Jeramiah J; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M; Campbell, Michael S; Barrell, Daniel; Martin, Kyle J; Mulley, John F; Ravi, Vydianathan; Lee, Alison P; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E G; Sun, Yi; Hertel, Jana; Beam, Michael J; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H; Litman, Gary W; Litman, Ronda T; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F; Wang, Han; Taylor, John S; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M J; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T; Venkatesh, Byrappa; Holland, Peter W H; Guiguen, Yann; Bobe, Julien; Shubin, Neil H; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H

    2016-04-01

    To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before teleost genome duplication (TGD). The slowly evolving gar genome has conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization and development (mediated, for example, by Hox, ParaHox and microRNA genes). Numerous conserved noncoding elements (CNEs; often cis regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles for such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses showed that the sums of expression domains and expression levels for duplicated teleost genes often approximate the patterns and levels of expression for gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes and the function of human regulatory sequences.

  8. Characterizing Big Data Management

    OpenAIRE

    Rogério Rossi; Kechi Hirama

    2015-01-01

    Big data management is a reality for an increasing number of organizations in many areas and represents a set of challenges involving big data modeling, storage and retrieval, analysis and visualization. However, technological resources, people and processes are crucial to facilitate the management of big data in any kind of organization, allowing information and knowledge from a large volume of data to support decision-making. Big data management can be supported by these three dimensions: t...

  9. Private and Efficient Query Processing on Outsourced Genomic Databases.

    Science.gov (United States)

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  10. The spotted gar genome illuminates vertebrate evolution and facilitates human-to-teleost comparisons

    Science.gov (United States)

    Braasch, Ingo; Gehrke, Andrew R.; Smith, Jeramiah J.; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M.; Campbell, Michael S.; Barrell, Daniel; Martin, Kyle J.; Mulley, John F.; Ravi, Vydianathan; Lee, Alison P.; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E. G.; Sun, Yi; Hertel, Jana; Beam, Michael J.; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H.; Litman, Gary W.; Litman, Ronda T.; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F.; Wang, Han; Taylor, John S.; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M. J.; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A.; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T.; Venkatesh, Byrappa; Holland, Peter W. H.; Guiguen, Yann; Bobe, Julien; Shubin, Neil H.; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H.

    2016-01-01

    To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before the teleost genome duplication (TGD). The slowly evolving gar genome conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization, and development (e.g., Hox, ParaHox, and miRNA genes). Numerous conserved non-coding elements (CNEs, often cis-regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles of such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses revealed that the sum of expression domains and levels from duplicated teleost genes often approximate patterns and levels of gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes, and the function of human regulatory sequences. PMID:26950095

  11. Genomic sequencing: assessing the health care system, policy, and big-data implications.

    Science.gov (United States)

    Phillips, Kathryn A; Trosman, Julia R; Kelley, Robin K; Pletcher, Mark J; Douglas, Michael P; Weldon, Christine B

    2014-07-01

    New genomic sequencing technologies enable the high-speed analysis of multiple genes simultaneously, including all of those in a person's genome. Sequencing is a prominent example of a "big data" technology because of the massive amount of information it produces and its complexity, diversity, and timeliness. Our objective in this article is to provide a policy primer on sequencing and illustrate how it can affect health care system and policy issues. Toward this end, we developed an easily applied classification of sequencing based on inputs, methods, and outputs. We used it to examine the implications of sequencing for three health care system and policy issues: making care more patient-centered, developing coverage and reimbursement policies, and assessing economic value. We conclude that sequencing has great promise but that policy challenges include how to optimize patient engagement as well as privacy, develop coverage policies that distinguish research from clinical uses and account for bioinformatics costs, and determine the economic value of sequencing through complex economic models that take into account multiple findings and downstream costs. Project HOPE—The People-to-People Health Foundation, Inc.

  12. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

    Science.gov (United States)

    Masseroli, Marco; Kaitoua, Abdulrahman; Pinoli, Pietro; Ceri, Stefano

    2016-12-01

    While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Characterizing Big Data Management

    Directory of Open Access Journals (Sweden)

    Rogério Rossi

    2015-06-01

    Full Text Available Big data management is a reality for an increasing number of organizations in many areas and represents a set of challenges involving big data modeling, storage and retrieval, analysis and visualization. However, technological resources, people and processes are crucial to facilitate the management of big data in any kind of organization, allowing information and knowledge from a large volume of data to support decision-making. Big data management can be supported by these three dimensions: technology, people and processes. Hence, this article discusses these dimensions: the technological dimension that is related to storage, analytics and visualization of big data; the human aspects of big data; and, in addition, the process management dimension that involves in a technological and business approach the aspects of big data management.

  14. The Widening Gulf between Genomics Data Generation and Consumption: A Practical Guide to Big Data Transfer Technology

    Science.gov (United States)

    Feltus, Frank A.; Breen, Joseph R.; Deng, Juan; Izard, Ryan S.; Konger, Christopher A.; Ligon, Walter B.; Preuss, Don; Wang, Kuang-Ching

    2015-01-01

    In the last decade, high-throughput DNA sequencing has become a disruptive technology and pushed the life sciences into a distributed ecosystem of sequence data producers and consumers. Given the power of genomics and declining sequencing costs, biology is an emerging “Big Data” discipline that will soon enter the exabyte data range when all subdisciplines are combined. These datasets must be transferred across commercial and research networks in creative ways since sending data without thought can have serious consequences on data processing time frames. Thus, it is imperative that biologists, bioinformaticians, and information technology engineers recalibrate data processing paradigms to fit this emerging reality. This review attempts to provide a snapshot of Big Data transfer across networks, which is often overlooked by many biologists. Specifically, we discuss four key areas: 1) data transfer networks, protocols, and applications; 2) data transfer security including encryption, access, firewalls, and the Science DMZ; 3) data flow control with software-defined networking; and 4) data storage, staging, archiving and access. A primary intention of this article is to orient the biologist in key aspects of the data transfer process in order to frame their genomics-oriented needs to enterprise IT professionals. PMID:26568680

  15. Databases and web tools for cancer genomics study.

    Science.gov (United States)

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-02-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  16. Big Data Analytics in Medicine and Healthcare.

    Science.gov (United States)

    Ristevski, Blagoj; Chen, Ming

    2018-05-10

    This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various - omics data (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenomics, diseasomics), biomedical data and electronic health records data. We underline the challenging issues about big data privacy and security. Regarding big data characteristics, some directions of using suitable and promising open-source distributed data processing software platform are given.

  17. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease

    NARCIS (Netherlands)

    L. Shen (Lishuang); M.A. Diroma (Maria Angela); M. Gonzalez (Michael); D. Navarro-Gomez (Daniel); J. Leipzig (Jeremy); M.T. Lott (Marie T.); M. van Oven (Mannis); D.C. Wallace; C.C. Muraresku (Colleen Clarke); Z. Zolkipli-Cunningham (Zarazuela); P.F. Chinnery (Patrick); M. Attimonelli (Marcella); S. Zuchner (Stephan); M.J. Falk (Marni J.); X. Gai (Xiaowu)

    2016-01-01

    textabstractMSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes,

  18. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease

    OpenAIRE

    Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T.; Oven, Mannis; Wallace, D.C.; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J.; Gai, Xiaowu

    2016-01-01

    textabstractMSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR ...

  19. Big data for health.

    Science.gov (United States)

    Andreu-Perez, Javier; Poon, Carmen C Y; Merrifield, Robert D; Wong, Stephen T C; Yang, Guang-Zhong

    2015-07-01

    This paper provides an overview of recent developments in big data in the context of biomedical and health informatics. It outlines the key characteristics of big data and how medical and health informatics, translational bioinformatics, sensor informatics, and imaging informatics will benefit from an integrated approach of piecing together different aspects of personalized information from a diverse range of data sources, both structured and unstructured, covering genomics, proteomics, metabolomics, as well as imaging, clinical diagnosis, and long-term continuous physiological sensing of an individual. It is expected that recent advances in big data will expand our knowledge for testing new hypotheses about disease management from diagnosis to prevention to personalized treatment. The rise of big data, however, also raises challenges in terms of privacy, security, data ownership, data stewardship, and governance. This paper discusses some of the existing activities and future opportunities related to big data for health, outlining some of the key underlying issues that need to be tackled.

  20. BigQ: a NoSQL based framework to handle genomic variants in i2b2.

    Science.gov (United States)

    Gabetta, Matteo; Limongelli, Ivan; Rizzo, Ettore; Riva, Alberto; Segagni, Daniele; Bellazzi, Riccardo

    2015-12-29

    Precision medicine requires the tight integration of clinical and molecular data. To this end, it is mandatory to define proper technological solutions able to manage the overwhelming amount of high throughput genomic data needed to test associations between genomic signatures and human phenotypes. The i2b2 Center (Informatics for Integrating Biology and the Bedside) has developed a widely internationally adopted framework to use existing clinical data for discovery research that can help the definition of precision medicine interventions when coupled with genetic data. i2b2 can be significantly advanced by designing efficient management solutions of Next Generation Sequencing data. We developed BigQ, an extension of the i2b2 framework, which integrates patient clinical phenotypes with genomic variant profiles generated by Next Generation Sequencing. A visual programming i2b2 plugin allows retrieving variants belonging to the patients in a cohort by applying filters on genomic variant annotations. We report an evaluation of the query performance of our system on more than 11 million variants, showing that the implemented solution scales linearly in terms of query time and disk space with the number of variants. In this paper we describe a new i2b2 web service composed of an efficient and scalable document-based database that manages annotations of genomic variants and of a visual programming plug-in designed to dynamically perform queries on clinical and genetic data. The system therefore allows managing the fast growing volume of genomic variants and can be used to integrate heterogeneous genomic annotations.

  1. LLNL's Big Science Capabilities Help Spur Over $796 Billion in U.S. Economic Activity Sequencing the Human Genome

    Energy Technology Data Exchange (ETDEWEB)

    Stewart, Jeffrey S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2015-07-28

    LLNL’s successful history of taking on big science projects spans beyond national security and has helped create billions of dollars per year in new economic activity. One example is LLNL’s role in helping sequence the human genome. Over $796 billion in new economic activity in over half a dozen fields has been documented since LLNL successfully completed this Grand Challenge.

  2. Big data analytics methods and applications

    CERN Document Server

    Rao, BLS; Rao, SB

    2016-01-01

    This book has a collection of articles written by Big Data experts to describe some of the cutting-edge methods and applications from their respective areas of interest, and provides the reader with a detailed overview of the field of Big Data Analytics as it is practiced today. The chapters cover technical aspects of key areas that generate and use Big Data such as management and finance; medicine and healthcare; genome, cytome and microbiome; graphs and networks; Internet of Things; Big Data standards; bench-marking of systems; and others. In addition to different applications, key algorithmic approaches such as graph partitioning, clustering and finite mixture modelling of high-dimensional data are also covered. The varied collection of themes in this volume introduces the reader to the richness of the emerging field of Big Data Analytics.

  3. The Naked Mole Rat Genome Resource: facilitating analyses of cancer and longevity-related adaptations.

    Science.gov (United States)

    Keane, Michael; Craig, Thomas; Alföldi, Jessica; Berlin, Aaron M; Johnson, Jeremy; Seluanov, Andrei; Gorbunova, Vera; Di Palma, Federica; Lindblad-Toh, Kerstin; Church, George M; de Magalhães, João Pedro

    2014-12-15

    The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat's extraordinary traits, including in regions of p53, and the hyaluronan receptors CD44 and HMMR (RHAMM). Furthermore, we developed a freely available web portal, the Naked Mole Rat Genome Resource (http://www.naked-mole-rat.org), featuring the data and results of our analysis, to assist researchers interested in the genome and genes of the naked mole rat, and also to facilitate further studies on this fascinating species. © The Author 2014. Published by Oxford University Press.

  4. [Big Data Revolution or Data Hubris? : On the Data Positivism of Molecular Biology].

    Science.gov (United States)

    Gramelsberger, Gabriele

    2017-12-01

    Genome data, the core of the 2008 proclaimed big data revolution in biology, are automatically generated and analyzed. The transition from the manual laboratory practice of electrophoresis sequencing to automated DNA-sequencing machines and software-based analysis programs was completed between 1982 and 1992. This transition facilitated the first data deluge, which was considerably increased by the second and third generation of DNA-sequencers during the 2000s. However, the strategies for evaluating sequence data were also transformed along with this transition. The paper explores both the computational strategies of automation, as well as the data evaluation culture connected with it, in order to provide a complete picture of the complexity of today's data generation and its intrinsic data positivism. This paper is thereby guided by the question, whether this data positivism is the basis of the big data revolution of molecular biology announced today, or it marks the beginning of its data hubris.

  5. Big Data Application in Biomedical Research and Health Care: A Literature Review.

    Science.gov (United States)

    Luo, Jake; Wu, Min; Gopukumar, Deepika; Zhao, Yiqing

    2016-01-01

    Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care.

  6. Big Data in industry

    Science.gov (United States)

    Latinović, T. S.; Preradović, D. M.; Barz, C. R.; Latinović, M. T.; Petrica, P. P.; Pop-Vadean, A.

    2016-08-01

    The amount of data at the global level has grown exponentially. Along with this phenomena, we have a need for a new unit of measure like exabyte, zettabyte, and yottabyte as the last unit measures the amount of data. The growth of data gives a situation where the classic systems for the collection, storage, processing, and visualization of data losing the battle with a large amount, speed, and variety of data that is generated continuously. Many of data that is created by the Internet of Things, IoT (cameras, satellites, cars, GPS navigation, etc.). It is our challenge to come up with new technologies and tools for the management and exploitation of these large amounts of data. Big Data is a hot topic in recent years in IT circles. However, Big Data is recognized in the business world, and increasingly in the public administration. This paper proposes an ontology of big data analytics and examines how to enhance business intelligence through big data analytics as a service by presenting a big data analytics services-oriented architecture. This paper also discusses the interrelationship between business intelligence and big data analytics. The proposed approach in this paper might facilitate the research and development of business analytics, big data analytics, and business intelligence as well as intelligent agents.

  7. The tiger genome and comparative analysis with lion and snow leopard genomes.

    Science.gov (United States)

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-Uk; Luo, Shu-Jin; Johnson, Warren E; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A; Marker, Laurie; Harper, Cindy; Miller, Susan M; Jacobs, Wilhelm; Bertola, Laura D; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O'Brien, Stephen J; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world's most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats' hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species.

  8. The tiger genome and comparative analysis with lion and snow leopard genomes

    Science.gov (United States)

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-uk; Luo, Shu-Jin; Johnson, Warren E.; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A.; Marker, Laurie; Harper, Cindy; Miller, Susan M.; Jacobs, Wilhelm; Bertola, Laura D.; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O’Brien, Stephen J.; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world’s most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats’ hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858

  9. Some experiences and opportunities for big data in translational research.

    Science.gov (United States)

    Chute, Christopher G; Ullman-Cullere, Mollie; Wood, Grant M; Lin, Simon M; He, Min; Pathak, Jyotishman

    2013-10-01

    Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of "big data." The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records.

  10. Spaces of genomics : exploring the innovation journey of genomics in research on common disease

    NARCIS (Netherlands)

    Bitsch, L.

    2013-01-01

    Genomics was introduced with big promises and expectations of its future contribution to our society. Medical genomics was introduced as that which would lay the foundation for a revolution in our management of common diseases. Genomics would lead the way towards a future of personalised medicine.

  11. Baculoviral delivery of CRISPR/Cas9 facilitates efficient genome editing in human cells.

    Directory of Open Access Journals (Sweden)

    Sanne Hindriksen

    Full Text Available The CRISPR/Cas9 system is a highly effective tool for genome editing. Key to robust genome editing is the efficient delivery of the CRISPR/Cas9 machinery. Viral delivery systems are efficient vehicles for the transduction of foreign genes but commonly used viral vectors suffer from a limited capacity in the genetic information they can carry. Baculovirus however is capable of carrying large exogenous DNA fragments. Here we investigate the use of baculoviral vectors as a delivery vehicle for CRISPR/Cas9 based genome-editing tools. We demonstrate transduction of a panel of cell lines with Cas9 and an sgRNA sequence, which results in efficient knockout of all four targeted subunits of the chromosomal passenger complex (CPC. We further show that introduction of a homology directed repair template into the same CRISPR/Cas9 baculovirus facilitates introduction of specific point mutations and endogenous gene tags. Tagging of the CPC recruitment factor Haspin with the fluorescent reporter YFP allowed us to study its native localization as well as recruitment to the cohesin subunit Pds5B.

  12. Baculoviral delivery of CRISPR/Cas9 facilitates efficient genome editing in human cells.

    Science.gov (United States)

    Hindriksen, Sanne; Bramer, Arne J; Truong, My Anh; Vromans, Martijn J M; Post, Jasmin B; Verlaan-Klink, Ingrid; Snippert, Hugo J; Lens, Susanne M A; Hadders, Michael A

    2017-01-01

    The CRISPR/Cas9 system is a highly effective tool for genome editing. Key to robust genome editing is the efficient delivery of the CRISPR/Cas9 machinery. Viral delivery systems are efficient vehicles for the transduction of foreign genes but commonly used viral vectors suffer from a limited capacity in the genetic information they can carry. Baculovirus however is capable of carrying large exogenous DNA fragments. Here we investigate the use of baculoviral vectors as a delivery vehicle for CRISPR/Cas9 based genome-editing tools. We demonstrate transduction of a panel of cell lines with Cas9 and an sgRNA sequence, which results in efficient knockout of all four targeted subunits of the chromosomal passenger complex (CPC). We further show that introduction of a homology directed repair template into the same CRISPR/Cas9 baculovirus facilitates introduction of specific point mutations and endogenous gene tags. Tagging of the CPC recruitment factor Haspin with the fluorescent reporter YFP allowed us to study its native localization as well as recruitment to the cohesin subunit Pds5B.

  13. Big data integration: scalability and sustainability

    KAUST Repository

    Zhang, Zhang

    2016-01-26

    Integration of various types of omics data is critically indispensable for addressing most important and complex biological questions. In the era of big data, however, data integration becomes increasingly tedious, time-consuming and expensive, posing a significant obstacle to fully exploit the wealth of big biological data. Here we propose a scalable and sustainable architecture that integrates big omics data through community-contributed modules. Community modules are contributed and maintained by different committed groups and each module corresponds to a specific data type, deals with data collection, processing and visualization, and delivers data on-demand via web services. Based on this community-based architecture, we build Information Commons for Rice (IC4R; http://ic4r.org), a rice knowledgebase that integrates a variety of rice omics data from multiple community modules, including genome-wide expression profiles derived entirely from RNA-Seq data, resequencing-based genomic variations obtained from re-sequencing data of thousands of rice varieties, plant homologous genes covering multiple diverse plant species, post-translational modifications, rice-related literatures, and community annotations. Taken together, such architecture achieves integration of different types of data from multiple community-contributed modules and accordingly features scalable, sustainable and collaborative integration of big data as well as low costs for database update and maintenance, thus helpful for building IC4R into a comprehensive knowledgebase covering all aspects of rice data and beneficial for both basic and translational researches.

  14. TIA-1 and TIAR interact with 5'-UTR of enterovirus 71 genome and facilitate viral replication.

    Science.gov (United States)

    Wang, Xiaohui; Wang, Huanru; Li, Yixuan; Jin, Yu; Chu, Ying; Su, Airong; Wu, Zhiwei

    2015-10-16

    Enterovirus 71 is one of the major causative pathogens of HFMD in children. Upon infection, the viral RNA is translated in an IRES-dependent manner and requires several host factors for effective replication. Here, we found that T-cell-restricted intracellular antigen 1 (TIA-1), and TIA-1 related protein (TIAR) were translocated from nucleus to cytoplasm after EV71 infection and localized to the sites of viral replication. We found that TIA-1 and TIAR can facilitate EV71 replication by enhancing the viral genome synthesis in host cells. We demonstrated that both proteins bound to the stem-loop I of 5'-UTR of viral genome and improved the stability of viral genomic RNA. Our results suggest that TIA-1 and TIAR are two new host factors that interact with 5-UTR of EV71 genome and positively regulate viral replication. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Big Data Analysis of Human Genome Variations

    KAUST Repository

    Gojobori, Takashi

    2016-01-01

    Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human

  16. Big data and visual analytics in anaesthesia and health care.

    Science.gov (United States)

    Simpao, A F; Ahumada, L M; Rehman, M A

    2015-09-01

    Advances in computer technology, patient monitoring systems, and electronic health record systems have enabled rapid accumulation of patient data in electronic form (i.e. big data). Organizations such as the Anesthesia Quality Institute and Multicenter Perioperative Outcomes Group have spearheaded large-scale efforts to collect anaesthesia big data for outcomes research and quality improvement. Analytics--the systematic use of data combined with quantitative and qualitative analysis to make decisions--can be applied to big data for quality and performance improvements, such as predictive risk assessment, clinical decision support, and resource management. Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces, and it can facilitate performance of cognitive activities involving big data. Ongoing integration of big data and analytics within anaesthesia and health care will increase demand for anaesthesia professionals who are well versed in both the medical and the information sciences. © The Author 2015. Published by Oxford University Press on behalf of the British Journal of Anaesthesia. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. Big Data and medicine: a big deal?

    Science.gov (United States)

    Mayer-Schönberger, V; Ingelsson, E

    2018-05-01

    Big Data promises huge benefits for medical research. Looking beyond superficial increases in the amount of data collected, we identify three key areas where Big Data differs from conventional analyses of data samples: (i) data are captured more comprehensively relative to the phenomenon under study; this reduces some bias but surfaces important trade-offs, such as between data quantity and data quality; (ii) data are often analysed using machine learning tools, such as neural networks rather than conventional statistical methods resulting in systems that over time capture insights implicit in data, but remain black boxes, rarely revealing causal connections; and (iii) the purpose of the analyses of data is no longer simply answering existing questions, but hinting at novel ones and generating promising new hypotheses. As a consequence, when performed right, Big Data analyses can accelerate research. Because Big Data approaches differ so fundamentally from small data ones, research structures, processes and mindsets need to adjust. The latent value of data is being reaped through repeated reuse of data, which runs counter to existing practices not only regarding data privacy, but data management more generally. Consequently, we suggest a number of adjustments such as boards reviewing responsible data use, and incentives to facilitate comprehensive data sharing. As data's role changes to a resource of insight, we also need to acknowledge the importance of collecting and making data available as a crucial part of our research endeavours, and reassess our formal processes from career advancement to treatment approval. © 2017 The Association for the Publication of the Journal of Internal Medicine.

  18. Facilitating comparative effectiveness research in cancer genomics: evaluating stakeholder perceptions of the engagement process.

    Science.gov (United States)

    Deverka, Patricia A; Lavallee, Danielle C; Desai, Priyanka J; Armstrong, Joanne; Gorman, Mark; Hole-Curry, Leah; O'Leary, James; Ruffner, B W; Watkins, John; Veenstra, David L; Baker, Laurence H; Unger, Joseph M; Ramsey, Scott D

    2012-07-01

    The Center for Comparative Effectiveness Research in Cancer Genomics completed a 2-year stakeholder-guided process for the prioritization of genomic tests for comparative effectiveness research studies. We sought to evaluate the effectiveness of engagement procedures in achieving project goals and to identify opportunities for future improvements. The evaluation included an online questionnaire, one-on-one telephone interviews and facilitated discussion. Responses to the online questionnaire were tabulated for descriptive purposes, while transcripts from key informant interviews were analyzed using a directed content analysis approach. A total of 11 out of 13 stakeholders completed both the online questionnaire and interview process, while nine participated in the facilitated discussion. Eighty-nine percent of questionnaire items received overall ratings of agree or strongly agree; 11% of responses were rated as neutral with the exception of a single rating of disagreement with an item regarding the clarity of how stakeholder input was incorporated into project decisions. Recommendations for future improvement included developing standard recruitment practices, role descriptions and processes for improved communication with clinical and comparative effectiveness research investigators. Evaluation of the stakeholder engagement process provided constructive feedback for future improvements and should be routinely conducted to ensure maximal effectiveness of stakeholder involvement.

  19. [Three applications and the challenge of the big data in otology].

    Science.gov (United States)

    Lei, Guanxiong; Li, Jianan; Shen, Weidong; Yang, Shiming

    2016-03-01

    With the expansion of human practical activities, more and more areas have suffered from big data problems. The emergence of big data requires people to update the research paradigm and develop new technical methods. This review discussed that big data might bring opportunities and challenges in the area of auditory implantation, the deafness genome, and auditory pathophysiology, and pointed out that we needed to find appropriate theories and methods to make this kind of expectation into reality.

  20. Big Data Analytics in Healthcare.

    Science.gov (United States)

    Belle, Ashwin; Thiagarajan, Raghuram; Soroushmehr, S M Reza; Navidi, Fatemeh; Beard, Daniel A; Najarian, Kayvan

    2015-01-01

    The rapidly expanding field of big data analytics has started to play a pivotal role in the evolution of healthcare practices and research. It has provided tools to accumulate, manage, analyze, and assimilate large volumes of disparate, structured, and unstructured data produced by current healthcare systems. Big data analytics has been recently applied towards aiding the process of care delivery and disease exploration. However, the adoption rate and research development in this space is still hindered by some fundamental problems inherent within the big data paradigm. In this paper, we discuss some of these major challenges with a focus on three upcoming and promising areas of medical research: image, signal, and genomics based analytics. Recent research which targets utilization of large volumes of medical data while combining multimodal data from disparate sources is discussed. Potential areas of research within this field which have the ability to provide meaningful impact on healthcare delivery are also examined.

  1. The Study of “big data” to support internal business strategists

    Science.gov (United States)

    Ge, Mei

    2018-01-01

    How is big data different from previous data analysis systems? The primary purpose behind traditional small data analytics that all managers are more or less familiar with is to support internal business strategies. But big data also offers a promising new dimension: to discover new opportunities to offer customers high-value products and services. The study focus to introduce some strategists which big data support to. Business decisions using big data can also involve some areas for analytics. They include customer satisfaction, customer journeys, supply chains, risk management, competitive intelligence, pricing, discovery and experimentation or facilitating big data discovery.

  2. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

    Science.gov (United States)

    Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E

    2015-01-01

    The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Big Data access and infrastructure for modern biology: case studies in data repository utility.

    Science.gov (United States)

    Boles, Nathan C; Stone, Tyler; Bergeron, Charles; Kiehl, Thomas R

    2017-01-01

    Big Data is no longer solely the purview of big organizations with big resources. Today's routine tools and experimental methods can generate large slices of data. For example, high-throughput sequencing can quickly interrogate biological systems for the expression levels of thousands of different RNAs, examine epigenetic marks throughout the genome, and detect differences in the genomes of individuals. Multichannel electrophysiology platforms produce gigabytes of data in just a few minutes of recording. Imaging systems generate videos capturing biological behaviors over the course of days. Thus, any researcher now has access to a veritable wealth of data. However, the ability of any given researcher to utilize that data is limited by her/his own resources and skills for downloading, storing, and analyzing the data. In this paper, we examine the necessary resources required to engage Big Data, survey the state of modern data analysis pipelines, present a few data repository case studies, and touch on current institutions and programs supporting the work that relies on Big Data. © 2016 New York Academy of Sciences.

  4. TOWARDS EFFECTIVE CUSTOMER RELATIONSHIP MANAGEMENT IN OMAN: ROLE OF BIG DATA

    OpenAIRE

    Tarek Khalil; Mohammad Al-Refai; Amer Nizar Fayez; Mohammed Sharaf Qudah

    2017-01-01

    We established a framework to explore the feasibility of enabling big data within the customer relationship management (CRM) strategies in Oman for creating sustainable business profit nationwide. A qualitative evaluation was made based on predictive analytics convergence and big data facilitated CRM. It was found that the big data analytics can meticulously alter the competitive industrial setting, and thereby proffered notable benefits to the business organization in terms of operation, str...

  5. BIG: a large-scale data integration tool for renal physiology.

    Science.gov (United States)

    Zhao, Yue; Yang, Chin-Rang; Raghuram, Viswanathan; Parulekar, Jaya; Knepper, Mark A

    2016-10-01

    Due to recent advances in high-throughput techniques, we and others have generated multiple proteomic and transcriptomic databases to describe and quantify gene expression, protein abundance, or cellular signaling on the scale of the whole genome/proteome in kidney cells. The existence of so much data from diverse sources raises the following question: "How can researchers find information efficiently for a given gene product over all of these data sets without searching each data set individually?" This is the type of problem that has motivated the "Big-Data" revolution in Data Science, which has driven progress in fields such as marketing. Here we present an online Big-Data tool called BIG (Biological Information Gatherer) that allows users to submit a single online query to obtain all relevant information from all indexed databases. BIG is accessible at http://big.nhlbi.nih.gov/.

  6. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease.

    Science.gov (United States)

    Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu

    2016-06-01

    MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. © 2016 WILEY PERIODICALS, INC.

  7. Big data: the next frontier for innovation in therapeutics and healthcare.

    Science.gov (United States)

    Issa, Naiem T; Byers, Stephen W; Dakshanamurthy, Sivanesan

    2014-05-01

    Advancements in genomics and personalized medicine not only effect healthcare delivery from patient and provider standpoints, but also reshape biomedical discovery. We are in the era of the '-omics', wherein an individual's genome, transcriptome, proteome and metabolome can be scrutinized to the finest resolution to paint a personalized biochemical fingerprint that enables tailored treatments, prognoses, risk factors, etc. Digitization of this information parlays into 'big data' informatics-driven evidence-based medical practice. While individualized patient management is a key beneficiary of next-generation medical informatics, this data also harbors a wealth of novel therapeutic discoveries waiting to be uncovered. 'Big data' informatics allows for networks-driven systems pharmacodynamics whereby drug information can be coupled to cellular- and organ-level physiology for determining whole-body outcomes. Patient '-omics' data can be integrated for ontology-based data-mining for the discovery of new biological associations and drug targets. Here we highlight the potential of 'big data' informatics for clinical pharmacology.

  8. GRAFISK FACILITERING - En magtanalyse af styringen i konsulentværktøjet grafisk facilitering

    OpenAIRE

    Munch, Anna; Boholt, Marianne

    2012-01-01

    Abstract The topic of this thesis is the relatively new consultancy tool of graphic facilitation (GF). GF is a method that combines hand-drawn images and big picture thinking. A graphic facilitator leads a group through a process that results in visual output such as a poster or pamphlet. Our thesis analyses this management tool from a power perspective in an attempt to determine the power relations inherent in its practice. Our theoretical basis is French philosopher Michel Foucault’s theory...

  9. Indian microchip for Big Bang research in Geneva

    CERN Multimedia

    Bhabani, Soudhriti

    2007-01-01

    "A premier nuclear physics institute here has come up with India's first indigenously designed microchip that will facilitate research on the Big Bang theory in Geneva's CERN, the world's largest particle physics laboratory." (1 page)

  10. Molecular evolution of colorectal cancer: from multistep carcinogenesis to the big bang.

    Science.gov (United States)

    Amaro, Adriana; Chiara, Silvana; Pfeffer, Ulrich

    2016-03-01

    Colorectal cancer is characterized by exquisite genomic instability either in the form of microsatellite instability or chromosomal instability. Microsatellite instability is the result of mutation of mismatch repair genes or their silencing through promoter methylation as a consequence of the CpG island methylator phenotype. The molecular causes of chromosomal instability are less well characterized. Genomic instability and field cancerization lead to a high degree of intratumoral heterogeneity and determine the formation of cancer stem cells and epithelial-mesenchymal transition mediated by the TGF-β and APC pathways. Recent analyses using integrated genomics reveal different phases of colorectal cancer evolution. An initial phase of genomic instability that yields many clones with different mutations (big bang) is followed by an important, previously not detected phase of cancer evolution that consists in the stabilization of several clones and a relatively flat outgrowth. The big bang model can best explain the coexistence of several stable clones and is compatible with the fact that the analysis of the bulk of the primary tumor yields prognostic information.

  11. Big Data - Smart Health Strategies

    Science.gov (United States)

    2014-01-01

    Summary Objectives To select best papers published in 2013 in the field of big data and smart health strategies, and summarize outstanding research efforts. Methods A systematic search was performed using two major bibliographic databases for relevant journal papers. The references obtained were reviewed in a two-stage process, starting with a blinded review performed by the two section editors, and followed by a peer review process operated by external reviewers recognized as experts in the field. Results The complete review process selected four best papers, illustrating various aspects of the special theme, among them: (a) using large volumes of unstructured data and, specifically, clinical notes from Electronic Health Records (EHRs) for pharmacovigilance; (b) knowledge discovery via querying large volumes of complex (both structured and unstructured) biological data using big data technologies and relevant tools; (c) methodologies for applying cloud computing and big data technologies in the field of genomics, and (d) system architectures enabling high-performance access to and processing of large datasets extracted from EHRs. Conclusions The potential of big data in biomedicine has been pinpointed in various viewpoint papers and editorials. The review of current scientific literature illustrated a variety of interesting methods and applications in the field, but still the promises exceed the current outcomes. As we are getting closer towards a solid foundation with respect to common understanding of relevant concepts and technical aspects, and the use of standardized technologies and tools, we can anticipate to reach the potential that big data offer for personalized medicine and smart health strategies in the near future. PMID:25123721

  12. GeNemo: a search engine for web-based functional genomic data.

    Science.gov (United States)

    Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

    2016-07-08

    A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. [Relevance of big data for molecular diagnostics].

    Science.gov (United States)

    Bonin-Andresen, M; Smiljanovic, B; Stuhlmüller, B; Sörensen, T; Grützkau, A; Häupl, T

    2018-04-01

    Big data analysis raises the expectation that computerized algorithms may extract new knowledge from otherwise unmanageable vast data sets. What are the algorithms behind the big data discussion? In principle, high throughput technologies in molecular research already introduced big data and the development and application of analysis tools into the field of rheumatology some 15 years ago. This includes especially omics technologies, such as genomics, transcriptomics and cytomics. Some basic methods of data analysis are provided along with the technology, however, functional analysis and interpretation requires adaptation of existing or development of new software tools. For these steps, structuring and evaluating according to the biological context is extremely important and not only a mathematical problem. This aspect has to be considered much more for molecular big data than for those analyzed in health economy or epidemiology. Molecular data are structured in a first order determined by the applied technology and present quantitative characteristics that follow the principles of their biological nature. These biological dependencies have to be integrated into software solutions, which may require networks of molecular big data of the same or even different technologies in order to achieve cross-technology confirmation. More and more extensive recording of molecular processes also in individual patients are generating personal big data and require new strategies for management in order to develop data-driven individualized interpretation concepts. With this perspective in mind, translation of information derived from molecular big data will also require new specifications for education and professional competence.

  14. A Survey of Scholarly Data: From Big Data Perspective

    DEFF Research Database (Denmark)

    Khan, Samiya; Liu, Xiufeng; Shakil, Kashish A.

    2017-01-01

    of which, this scholarly reserve is popularly referred to as big scholarly data. In order to facilitate data analytics for big scholarly data, architectures and services for the same need to be developed. The evolving nature of research problems has made them essentially interdisciplinary. As a result......, there is a growing demand for scholarly applications like collaborator discovery, expert finding and research recommendation systems, in addition to several others. This research paper investigates the current trends and identifies the existing challenges in development of a big scholarly data platform......Recently, there has been a shifting focus of organizations and governments towards digitization of academic and technical documents, adding a new facet to the concept of digital libraries. The volume, variety and velocity of this generated data, satisfies the big data definition, as a result...

  15. Big data analytics turning big data into big money

    CERN Document Server

    Ohlhorst, Frank J

    2012-01-01

    Unique insights to implement big data analytics and reap big returns to your bottom line Focusing on the business and financial value of big data analytics, respected technology journalist Frank J. Ohlhorst shares his insights on the newly emerging field of big data analytics in Big Data Analytics. This breakthrough book demonstrates the importance of analytics, defines the processes, highlights the tangible and intangible values and discusses how you can turn a business liability into actionable material that can be used to redefine markets, improve profits and identify new business opportuni

  16. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

    Directory of Open Access Journals (Sweden)

    Deborah Galpert

    2015-01-01

    Full Text Available Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.

  17. BigData as a Driver for Capacity Building in Astrophysics

    Science.gov (United States)

    Shastri, Prajval

    2015-08-01

    Exciting public interest in astrophysics acquires new significance in the era of Big Data. Since Big Data involves advanced technologies of both software and hardware, astrophysics with Big Data has the potential to inspire young minds with diverse inclinations - i.e., not just those attracted to physics but also those pursuing engineering careers. Digital technologies have become steadily cheaper, which can enable expansion of the Big Data user pool considerably, especially to communities that may not yet be in the astrophysics mainstream, but have high potential because of access to thesetechnologies. For success, however, capacity building at the early stages becomes key. The development of on-line pedagogical resources in astrophysics, astrostatistics, data-mining and data visualisation that are designed around the big facilities of the future can be an important effort that drives such capacity building, especially if facilitated by the IAU.

  18. A Thousand Fly Genomes: An Expanded Drosophila Genome Nexus.

    Science.gov (United States)

    Lack, Justin B; Lange, Jeremy D; Tang, Alison D; Corbett-Detig, Russell B; Pool, John E

    2016-12-01

    The Drosophila Genome Nexus is a population genomic resource that provides D. melanogaster genomes from multiple sources. To facilitate comparisons across data sets, genomes are aligned using a common reference alignment pipeline which involves two rounds of mapping. Regions of residual heterozygosity, identity-by-descent, and recent population admixture are annotated to enable data filtering based on the user's needs. Here, we present a significant expansion of the Drosophila Genome Nexus, which brings the current data object to a total of 1,121 wild-derived genomes. New additions include 305 previously unpublished genomes from inbred lines representing six population samples in Egypt, Ethiopia, France, and South Africa, along with another 193 genomes added from recently-published data sets. We also provide an aligned D. simulans genome to facilitate divergence comparisons. This improved resource will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species. The larger set of genomes will also enhance the discovery of functionally relevant natural variation that exists within and between populations. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  19. An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer.

    Science.gov (United States)

    Yang, Xi; Wu, Chengkun; Lu, Kai; Fang, Lin; Zhang, Yong; Li, Shengkang; Guo, Guixin; Du, YunFei

    2017-12-01

    Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion-a big data interface on the Tianhe-2 supercomputer-to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the "allocate-when-needed" paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2.

  20. Identification of genetic loci shared between schizophrenia and the Big Five personality traits.

    Science.gov (United States)

    Smeland, Olav B; Wang, Yunpeng; Lo, Min-Tzu; Li, Wen; Frei, Oleksandr; Witoelar, Aree; Tesli, Martin; Hinds, David A; Tung, Joyce Y; Djurovic, Srdjan; Chen, Chi-Hua; Dale, Anders M; Andreassen, Ole A

    2017-05-22

    Schizophrenia is associated with differences in personality traits, and recent studies suggest that personality traits and schizophrenia share a genetic basis. Here we aimed to identify specific genetic loci shared between schizophrenia and the Big Five personality traits using a Bayesian statistical framework. Using summary statistics from genome-wide association studies (GWAS) on personality traits in the 23andMe cohort (n = 59,225) and schizophrenia in the Psychiatric Genomics Consortium cohort (n = 82,315), we evaluated overlap in common genetic variants. The Big Five personality traits neuroticism, extraversion, openness, agreeableness and conscientiousness were measured using a web implementation of the Big Five Inventory. Applying the conditional false discovery rate approach, we increased discovery of genetic loci and identified two loci shared between neuroticism and schizophrenia and six loci shared between openness and schizophrenia. The study provides new insights into the relationship between personality traits and schizophrenia by highlighting genetic loci involved in their common genetic etiology.

  1. The Rise of Big Data in Oncology.

    Science.gov (United States)

    Fessele, Kristen L

    2018-05-01

    To describe big data and data science in the context of oncology nursing care. Peer-reviewed and lay publications. The rapid expansion of real-world evidence from sources such as the electronic health record, genomic sequencing, administrative claims and other data sources has outstripped the ability of clinicians and researchers to manually review and analyze it. To promote high-quality, high-value cancer care, big data platforms must be constructed from standardized data sources to support extraction of meaningful, comparable insights. Nurses must advocate for the use of standardized vocabularies and common data elements that represent terms and concepts that are meaningful to patient care. Copyright © 2018 Elsevier Inc. All rights reserved.

  2. Big Opportunities and Big Concerns of Big Data in Education

    Science.gov (United States)

    Wang, Yinying

    2016-01-01

    Against the backdrop of the ever-increasing influx of big data, this article examines the opportunities and concerns over big data in education. Specifically, this article first introduces big data, followed by delineating the potential opportunities of using big data in education in two areas: learning analytics and educational policy. Then, the…

  3. ClimateSpark: An in-memory distributed computing framework for big climate data analytics

    Science.gov (United States)

    Hu, Fei; Yang, Chaowei; Schnase, John L.; Duffy, Daniel Q.; Xu, Mengchao; Bowen, Michael K.; Lee, Tsengdar; Song, Weiwei

    2018-06-01

    The unprecedented growth of climate data creates new opportunities for climate studies, and yet big climate data pose a grand challenge to climatologists to efficiently manage and analyze big data. The complexity of climate data content and analytical algorithms increases the difficulty of implementing algorithms on high performance computing systems. This paper proposes an in-memory, distributed computing framework, ClimateSpark, to facilitate complex big data analytics and time-consuming computational tasks. Chunking data structure improves parallel I/O efficiency, while a spatiotemporal index is built for the chunks to avoid unnecessary data reading and preprocessing. An integrated, multi-dimensional, array-based data model (ClimateRDD) and ETL operations are developed to address big climate data variety by integrating the processing components of the climate data lifecycle. ClimateSpark utilizes Spark SQL and Apache Zeppelin to develop a web portal to facilitate the interaction among climatologists, climate data, analytic operations and computing resources (e.g., using SQL query and Scala/Python notebook). Experimental results show that ClimateSpark conducts different spatiotemporal data queries/analytics with high efficiency and data locality. ClimateSpark is easily adaptable to other big multiple-dimensional, array-based datasets in various geoscience domains.

  4. The Sequences of 1504 Mutants in the Model Rice Variety Kitaake Facilitate Rapid Functional Genomic Studies.

    Science.gov (United States)

    Li, Guotian; Jain, Rashmi; Chern, Mawsheng; Pham, Nikki T; Martin, Joel A; Wei, Tong; Schackwitz, Wendy S; Lipzen, Anna M; Duong, Phat Q; Jones, Kyle C; Jiang, Liangrong; Ruan, Deling; Bauer, Diane; Peng, Yi; Barry, Kerrie W; Schmutz, Jeremy; Ronald, Pamela C

    2017-06-01

    The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitate functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake ( Oryza sativa ssp japonica ), which completes its life cycle in 9 weeks. We sequenced 1504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, i.e., 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single-base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportion of loss-of-function mutations. We identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line. This result reveals the usefulness of the resource for efficient, cost-effective identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks. This population complements other available mutant collections and gene-editing technologies. This work demonstrates how inexpensive next-generation sequencing can be applied to generate a high-density catalog of mutations. © 2017 American Society of Plant Biologists. All rights reserved.

  5. Will Big Data Close the Missing Heritability Gap?

    Science.gov (United States)

    Kim, Hwasoon; Grueneberg, Alexander; Vazquez, Ana I; Hsu, Stephen; de Los Campos, Gustavo

    2017-11-01

    Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity ( e.g. , number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing ( n = 22,221) of 0.24 (95% C.I.: 0.23-0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed. Copyright © 2017 by the Genetics Society of America.

  6. Big data: survey, technologies, opportunities, and challenges.

    Science.gov (United States)

    Khan, Nawsher; Yaqoob, Ibrar; Hashem, Ibrahim Abaker Targio; Inayat, Zakira; Ali, Waleed Kamaleldin Mahmoud; Alam, Muhammad; Shiraz, Muhammad; Gani, Abdullah

    2014-01-01

    Big Data has gained much attention from the academia and the IT industry. In the digital and computing world, information is generated and collected at a rate that rapidly exceeds the boundary range. Currently, over 2 billion people worldwide are connected to the Internet, and over 5 billion individuals own mobile phones. By 2020, 50 billion devices are expected to be connected to the Internet. At this point, predicted data production will be 44 times greater than that in 2009. As information is transferred and shared at light speed on optic fiber and wireless networks, the volume of data and the speed of market growth increase. However, the fast growth rate of such large data generates numerous challenges, such as the rapid growth of data, transfer speed, diverse data, and security. Nonetheless, Big Data is still in its infancy stage, and the domain has not been reviewed in general. Hence, this study comprehensively surveys and classifies the various attributes of Big Data, including its nature, definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a data life cycle that uses the technologies and terminologies of Big Data. Future research directions in this field are determined based on opportunities and several open issues in Big Data domination. These research directions facilitate the exploration of the domain and the development of optimal techniques to address Big Data.

  7. Big Data: Survey, Technologies, Opportunities, and Challenges

    Science.gov (United States)

    Khan, Nawsher; Yaqoob, Ibrar; Hashem, Ibrahim Abaker Targio; Inayat, Zakira; Mahmoud Ali, Waleed Kamaleldin; Alam, Muhammad; Shiraz, Muhammad; Gani, Abdullah

    2014-01-01

    Big Data has gained much attention from the academia and the IT industry. In the digital and computing world, information is generated and collected at a rate that rapidly exceeds the boundary range. Currently, over 2 billion people worldwide are connected to the Internet, and over 5 billion individuals own mobile phones. By 2020, 50 billion devices are expected to be connected to the Internet. At this point, predicted data production will be 44 times greater than that in 2009. As information is transferred and shared at light speed on optic fiber and wireless networks, the volume of data and the speed of market growth increase. However, the fast growth rate of such large data generates numerous challenges, such as the rapid growth of data, transfer speed, diverse data, and security. Nonetheless, Big Data is still in its infancy stage, and the domain has not been reviewed in general. Hence, this study comprehensively surveys and classifies the various attributes of Big Data, including its nature, definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a data life cycle that uses the technologies and terminologies of Big Data. Future research directions in this field are determined based on opportunities and several open issues in Big Data domination. These research directions facilitate the exploration of the domain and the development of optimal techniques to address Big Data. PMID:25136682

  8. Harnessing Big Data for Systems Pharmacology.

    Science.gov (United States)

    Xie, Lei; Draizen, Eli J; Bourne, Philip E

    2017-01-06

    Systems pharmacology aims to holistically understand mechanisms of drug actions to support drug discovery and clinical practice. Systems pharmacology modeling (SPM) is data driven. It integrates an exponentially growing amount of data at multiple scales (genetic, molecular, cellular, organismal, and environmental). The goal of SPM is to develop mechanistic or predictive multiscale models that are interpretable and actionable. The current explosions in genomics and other omics data, as well as the tremendous advances in big data technologies, have already enabled biologists to generate novel hypotheses and gain new knowledge through computational models of genome-wide, heterogeneous, and dynamic data sets. More work is needed to interpret and predict a drug response phenotype, which is dependent on many known and unknown factors. To gain a comprehensive understanding of drug actions, SPM requires close collaborations between domain experts from diverse fields and integration of heterogeneous models from biophysics, mathematics, statistics, machine learning, and semantic webs. This creates challenges in model management, model integration, model translation, and knowledge integration. In this review, we discuss several emergent issues in SPM and potential solutions using big data technology and analytics. The concurrent development of high-throughput techniques, cloud computing, data science, and the semantic web will likely allow SPM to be findable, accessible, interoperable, reusable, reliable, interpretable, and actionable.

  9. Reducing Racial Disparities in Breast Cancer Care: The Role of 'Big Data'.

    Science.gov (United States)

    Reeder-Hayes, Katherine E; Troester, Melissa A; Meyer, Anne-Marie

    2017-10-15

    Advances in a wide array of scientific technologies have brought data of unprecedented volume and complexity into the oncology research space. These novel big data resources are applied across a variety of contexts-from health services research using data from insurance claims, cancer registries, and electronic health records, to deeper and broader genomic characterizations of disease. Several forms of big data show promise for improving our understanding of racial disparities in breast cancer, and for powering more intelligent and far-reaching interventions to close the racial gap in breast cancer survival. In this article we introduce several major types of big data used in breast cancer disparities research, highlight important findings to date, and discuss how big data may transform breast cancer disparities research in ways that lead to meaningful, lifesaving changes in breast cancer screening and treatment. We also discuss key challenges that may hinder progress in using big data for cancer disparities research and quality improvement.

  10. Big Data in Caenorhabditis elegans: quo vadis?

    Science.gov (United States)

    Hutter, Harald; Moerman, Donald

    2015-11-05

    A clear definition of what constitutes "Big Data" is difficult to identify, but we find it most useful to define Big Data as a data collection that is complete. By this criterion, researchers on Caenorhabditis elegans have a long history of collecting Big Data, since the organism was selected with the idea of obtaining a complete biological description and understanding of development. The complete wiring diagram of the nervous system, the complete cell lineage, and the complete genome sequence provide a framework to phrase and test hypotheses. Given this history, it might be surprising that the number of "complete" data sets for this organism is actually rather small--not because of lack of effort, but because most types of biological experiments are not currently amenable to complete large-scale data collection. Many are also not inherently limited, so that it becomes difficult to even define completeness. At present, we only have partial data on mutated genes and their phenotypes, gene expression, and protein-protein interaction--important data for many biological questions. Big Data can point toward unexpected correlations, and these unexpected correlations can lead to novel investigations; however, Big Data cannot establish causation. As a result, there is much excitement about Big Data, but there is also a discussion on just what Big Data contributes to solving a biological problem. Because of its relative simplicity, C. elegans is an ideal test bed to explore this issue and at the same time determine what is necessary to build a multicellular organism from a single cell. © 2015 Hutter and Moerman. This article is distributed by The American Society for Cell Biology under license from the author(s). Two months after publication it is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  11. Heritability estimates of the Big Five personality traits based on common genetic variants.

    Science.gov (United States)

    Power, R A; Pluess, M

    2015-07-14

    According to twin studies, the Big Five personality traits have substantial heritable components explaining 40-60% of the variance, but identification of associated genetic variants has remained elusive. Consequently, knowledge regarding the molecular genetic architecture of personality and to what extent it is shared across the different personality traits is limited. Using genomic-relatedness-matrix residual maximum likelihood analysis (GREML), we here estimated the heritability of the Big Five personality factors (extraversion, agreeableness, conscientiousness, neuroticism and openness for experience) in a sample of 5011 European adults from 527,469 single-nucleotide polymorphisms across the genome. We tested for the heritability of each personality trait, as well as for the genetic overlap between the personality factors. We found significant and substantial heritability estimates for neuroticism (15%, s.e. = 0.08, P = 0.04) and openness (21%, s.e. = 0.08, P Big Five personality traits using the GREML approach. Findings should be considered exploratory and suggest that detectable heritability estimates based on common variants is shared between neuroticism and openness to experiences.

  12. Big Data: Survey, Technologies, Opportunities, and Challenges

    Directory of Open Access Journals (Sweden)

    Nawsher Khan

    2014-01-01

    Full Text Available Big Data has gained much attention from the academia and the IT industry. In the digital and computing world, information is generated and collected at a rate that rapidly exceeds the boundary range. Currently, over 2 billion people worldwide are connected to the Internet, and over 5 billion individuals own mobile phones. By 2020, 50 billion devices are expected to be connected to the Internet. At this point, predicted data production will be 44 times greater than that in 2009. As information is transferred and shared at light speed on optic fiber and wireless networks, the volume of data and the speed of market growth increase. However, the fast growth rate of such large data generates numerous challenges, such as the rapid growth of data, transfer speed, diverse data, and security. Nonetheless, Big Data is still in its infancy stage, and the domain has not been reviewed in general. Hence, this study comprehensively surveys and classifies the various attributes of Big Data, including its nature, definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a data life cycle that uses the technologies and terminologies of Big Data. Future research directions in this field are determined based on opportunities and several open issues in Big Data domination. These research directions facilitate the exploration of the domain and the development of optimal techniques to address Big Data.

  13. Grappling with the Future Use of Big Data for Translational Medicine and Clinical Care.

    Science.gov (United States)

    Murphy, S; Castro, V; Mandl, K

    2017-08-01

    Objectives: Although patients may have a wealth of imaging, genomic, monitoring, and personal device data, it has yet to be fully integrated into clinical care. Methods: We identify three reasons for the lack of integration. The first is that "Big Data" is poorly managed by most Electronic Medical Record Systems (EMRS). The data is mostly available on "cloud-native" platforms that are outside the scope of most EMRs, and even checking if such data is available on a patient often must be done outside the EMRS. The second reason is that extracting features from the Big Data that are relevant to healthcare often requires complex machine learning algorithms, such as determining if a genomic variant is protein-altering. The third reason is that applications that present Big Data need to be modified constantly to reflect the current state of knowledge, such as instructing when to order a new set of genomic tests. In some cases, applications need to be updated nightly. Results: A new architecture for EMRS is evolving which could unite Big Data, machine learning, and clinical care through a microservice-based architecture which can host applications focused on quite specific aspects of clinical care, such as managing cancer immunotherapy. Conclusion: Informatics innovation, medical research, and clinical care go hand in hand as we look to infuse science-based practice into healthcare. Innovative methods will lead to a new ecosystem of applications (Apps) interacting with healthcare providers to fulfill a promise that is still to be determined. Georg Thieme Verlag KG Stuttgart.

  14. Implementing genomics and pharmacogenomics in the clinic: The National Human Genome Research Institute's genomic medicine portfolio.

    Science.gov (United States)

    Manolio, Teri A

    2016-10-01

    Increasing knowledge about the influence of genetic variation on human health and growing availability of reliable, cost-effective genetic testing have spurred the implementation of genomic medicine in the clinic. As defined by the National Human Genome Research Institute (NHGRI), genomic medicine uses an individual's genetic information in his or her clinical care, and has begun to be applied effectively in areas such as cancer genomics, pharmacogenomics, and rare and undiagnosed diseases. In 2011 NHGRI published its strategic vision for the future of genomic research, including an ambitious research agenda to facilitate and promote the implementation of genomic medicine. To realize this agenda, NHGRI is consulting and facilitating collaborations with the external research community through a series of "Genomic Medicine Meetings," under the guidance and leadership of the National Advisory Council on Human Genome Research. These meetings have identified and begun to address significant obstacles to implementation, such as lack of evidence of efficacy, limited availability of genomics expertise and testing, lack of standards, and difficulties in integrating genomic results into electronic medical records. The six research and dissemination initiatives comprising NHGRI's genomic research portfolio are designed to speed the evaluation and incorporation, where appropriate, of genomic technologies and findings into routine clinical care. Actual adoption of successful approaches in clinical care will depend upon the willingness, interest, and energy of professional societies, practitioners, patients, and payers to promote their responsible use and share their experiences in doing so. Published by Elsevier Ireland Ltd.

  15. Human CST Facilitates Genome-wide RAD51 Recruitment to GC-Rich Repetitive Sequences in Response to Replication Stress.

    Science.gov (United States)

    Chastain, Megan; Zhou, Qing; Shiva, Olga; Fadri-Moskwik, Maria; Whitmore, Leanne; Jia, Pingping; Dai, Xueyu; Huang, Chenhui; Ye, Ping; Chai, Weihang

    2016-08-02

    The telomeric CTC1/STN1/TEN1 (CST) complex has been implicated in promoting replication recovery under replication stress at genomic regions, yet its precise role is unclear. Here, we report that STN1 is enriched at GC-rich repetitive sequences genome-wide in response to hydroxyurea (HU)-induced replication stress. STN1 deficiency exacerbates the fragility of these sequences under replication stress, resulting in chromosome fragmentation. We find that upon fork stalling, CST proteins form distinct nuclear foci that colocalize with RAD51. Furthermore, replication stress induces physical association of CST with RAD51 in an ATR-dependent manner. Strikingly, CST deficiency diminishes HU-induced RAD51 foci formation and reduces RAD51 recruitment to telomeres and non-telomeric GC-rich fragile sequences. Collectively, our findings establish that CST promotes RAD51 recruitment to GC-rich repetitive sequences in response to replication stress to facilitate replication restart, thereby providing insights into the mechanism underlying genome stability maintenance. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  16. Re-evaluation of the immunological Big Bang.

    Science.gov (United States)

    Flajnik, Martin F

    2014-11-03

    Classically the immunological 'Big Bang' of adaptive immunity was believed to have resulted from the insertion of a transposon into an immunoglobulin superfamily gene member, initiating antigen receptor gene rearrangement via the RAG recombinase in an ancestor of jawed vertebrates. However, the discovery of a second, convergent adaptive immune system in jawless fish, focused on the so-called variable lymphocyte receptors (VLRs), was arguably the most exciting finding of the past decade in immunology and has drastically changed the view of immune origins. The recent report of a new lymphocyte lineage in lampreys, defined by the antigen receptor VLRC, suggests that there were three lymphocyte lineages in the common ancestor of jawless and jawed vertebrates that co-opted different antigen receptor supertypes. The transcriptional control of these lineages during development is predicted to be remarkably similar in both the jawless (agnathan) and jawed (gnathostome) vertebrates, suggesting that an early 'division of labor' among lymphocytes was a driving force in the emergence of adaptive immunity. The recent cartilaginous fish genome project suggests that most effector cytokines and chemokines were also present in these fish, and further studies of the lamprey and hagfish genomes will determine just how explosive the Big Bang actually was. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Building a semantic web-based metadata repository for facilitating detailed clinical modeling in cancer genome studies.

    Science.gov (United States)

    Sharma, Deepak K; Solbrig, Harold R; Tao, Cui; Weng, Chunhua; Chute, Christopher G; Jiang, Guoqian

    2017-06-05

    Detailed Clinical Models (DCMs) have been regarded as the basis for retaining computable meaning when data are exchanged between heterogeneous computer systems. To better support clinical cancer data capturing and reporting, there is an emerging need to develop informatics solutions for standards-based clinical models in cancer study domains. The objective of the study is to develop and evaluate a cancer genome study metadata management system that serves as a key infrastructure in supporting clinical information modeling in cancer genome study domains. We leveraged a Semantic Web-based metadata repository enhanced with both ISO11179 metadata standard and Clinical Information Modeling Initiative (CIMI) Reference Model. We used the common data elements (CDEs) defined in The Cancer Genome Atlas (TCGA) data dictionary, and extracted the metadata of the CDEs using the NCI Cancer Data Standards Repository (caDSR) CDE dataset rendered in the Resource Description Framework (RDF). The ITEM/ITEM_GROUP pattern defined in the latest CIMI Reference Model is used to represent reusable model elements (mini-Archetypes). We produced a metadata repository with 38 clinical cancer genome study domains, comprising a rich collection of mini-Archetype pattern instances. We performed a case study of the domain "clinical pharmaceutical" in the TCGA data dictionary and demonstrated enriched data elements in the metadata repository are very useful in support of building detailed clinical models. Our informatics approach leveraging Semantic Web technologies provides an effective way to build a CIMI-compliant metadata repository that would facilitate the detailed clinical modeling to support use cases beyond TCGA in clinical cancer study domains.

  18. Comparative Genome Viewer

    International Nuclear Information System (INIS)

    Molineris, I.; Sales, G.

    2009-01-01

    The amount of information about genomes, both in the form of complete sequences and annotations, has been exponentially increasing in the last few years. As a result there is the need for tools providing a graphical representation of such information that should be comprehensive and intuitive. Visual representation is especially important in the comparative genomics field since it should provide a combined view of data belonging to different genomes. We believe that existing tools are limited in this respect as they focus on a single genome at a time (conservation histograms) or compress alignment representation to a single dimension. We have therefore developed a web-based tool called Comparative Genome Viewer (Cgv): it integrates a bidimensional representation of alignments between two regions, both at small and big scales, with the richness of annotations present in other genome browsers. We give access to our system through a web-based interface that provides the user with an interactive representation that can be updated in real time using the mouse to move from region to region and to zoom in on interesting details.

  19. Big Data in Plant Science: Resources and Data Mining Tools for Plant Genomics and Proteomics.

    Science.gov (United States)

    Popescu, George V; Noutsos, Christos; Popescu, Sorina C

    2016-01-01

    In modern plant biology, progress is increasingly defined by the scientists' ability to gather and analyze data sets of high volume and complexity, otherwise known as "big data". Arguably, the largest increase in the volume of plant data sets over the last decade is a consequence of the application of the next-generation sequencing and mass-spectrometry technologies to the study of experimental model and crop plants. The increase in quantity and complexity of biological data brings challenges, mostly associated with data acquisition, processing, and sharing within the scientific community. Nonetheless, big data in plant science create unique opportunities in advancing our understanding of complex biological processes at a level of accuracy without precedence, and establish a base for the plant systems biology. In this chapter, we summarize the major drivers of big data in plant science and big data initiatives in life sciences with a focus on the scope and impact of iPlant, a representative cyberinfrastructure platform for plant science.

  20. Implementing genomics and pharmacogenomics in the clinic: The National Human Genome Research Institute’s genomic medicine portfolio

    Science.gov (United States)

    Manolio, Teri A.

    2016-01-01

    Increasing knowledge about the influence of genetic variation on human health and growing availability of reliable, cost-effective genetic testing have spurred the implementation of genomic medicine in the clinic. As defined by the National Human Genome Research Institute (NHGRI), genomic medicine uses an individual’s genetic information in his or her clinical care, and has begun to be applied effectively in areas such as cancer genomics, pharmacogenomics, and rare and undiagnosed diseases. In 2011 NHGRI published its strategic vision for the future of genomic research, including an ambitious research agenda to facilitate and promote the implementation of genomic medicine. To realize this agenda, NHGRI is consulting and facilitating collaborations with the external research community through a series of “Genomic Medicine Meetings,” under the guidance and leadership of the National Advisory Council on Human Genome Research. These meetings have identified and begun to address significant obstacles to implementation, such as lack of evidence of efficacy, limited availability of genomics expertise and testing, lack of standards, and diffficulties in integrating genomic results into electronic medical records. The six research and dissemination initiatives comprising NHGRI’s genomic research portfolio are designed to speed the evaluation and incorporation, where appropriate, of genomic technologies and findings into routine clinical care. Actual adoption of successful approaches in clinical care will depend upon the willingness, interest, and energy of professional societies, practitioners, patients, and payers to promote their responsible use and share their experiences in doing so. PMID:27612677

  1. Investigating core genetic-and-epigenetic cell cycle networks for stemness and carcinogenic mechanisms, and cancer drug design using big database mining and genome-wide next-generation sequencing data.

    Science.gov (United States)

    Li, Cheng-Wei; Chen, Bor-Sen

    2016-10-01

    Recent studies have demonstrated that cell cycle plays a central role in development and carcinogenesis. Thus, the use of big databases and genome-wide high-throughput data to unravel the genetic and epigenetic mechanisms underlying cell cycle progression in stem cells and cancer cells is a matter of considerable interest. Real genetic-and-epigenetic cell cycle networks (GECNs) of embryonic stem cells (ESCs) and HeLa cancer cells were constructed by applying system modeling, system identification, and big database mining to genome-wide next-generation sequencing data. Real GECNs were then reduced to core GECNs of HeLa cells and ESCs by applying principal genome-wide network projection. In this study, we investigated potential carcinogenic and stemness mechanisms for systems cancer drug design by identifying common core and specific GECNs between HeLa cells and ESCs. Integrating drug database information with the specific GECNs of HeLa cells could lead to identification of multiple drugs for cervical cancer treatment with minimal side-effects on the genes in the common core. We found that dysregulation of miR-29C, miR-34A, miR-98, and miR-215; and methylation of ANKRD1, ARID5B, CDCA2, PIF1, STAMBPL1, TROAP, ZNF165, and HIST1H2AJ in HeLa cells could result in cell proliferation and anti-apoptosis through NFκB, TGF-β, and PI3K pathways. We also identified 3 drugs, methotrexate, quercetin, and mimosine, which repressed the activated cell cycle genes, ARID5B, STK17B, and CCL2, in HeLa cells with minimal side-effects.

  2. Lecture 10: The European Bioinformatics Institute - "Big data" for biomedical sciences

    CERN Multimedia

    CERN. Geneva; Dana, Jose

    2013-01-01

    Part 1: Big data for biomedical sciences (Tom Hancocks) Ten years ago witnessed the completion of the first international 'Big Biology' project that sequenced the human genome. In the years since biological sciences, have seen a vast growth in data. In the coming years advances will come from integration of experimental approaches and the translation into applied technologies is the hospital, clinic and even at home. This talk will examine the development of infrastructure, physical and virtual, that will allow millions of life scientists across Europe better access to biological data Tom studied Human Genetics at the University of Leeds and McMaster University, before completing an MSc in Analytical Genomics at the University of Birmingham. He has worked for the UK National Health Service in diagnostic genetics and in training healthcare scientists and clinicians in bioinformatics. Tom joined the EBI in 2012 and is responsible for the scientific development and delivery of training for the BioMedBridges pr...

  3. A Grey Theory Based Approach to Big Data Risk Management Using FMEA

    Directory of Open Access Journals (Sweden)

    Maisa Mendonça Silva

    2016-01-01

    Full Text Available Big data is the term used to denote enormous sets of data that differ from other classic databases in four main ways: (huge volume, (high velocity, (much greater variety, and (big value. In general, data are stored in a distributed fashion and on computing nodes as a result of which big data may be more susceptible to attacks by hackers. This paper presents a risk model for big data, which comprises Failure Mode and Effects Analysis (FMEA and Grey Theory, more precisely grey relational analysis. This approach has several advantages: it provides a structured approach in order to incorporate the impact of big data risk factors; it facilitates the assessment of risk by breaking down the overall risk to big data; and finally its efficient evaluation criteria can help enterprises reduce the risks associated with big data. In order to illustrate the applicability of our proposal in practice, a numerical example, with realistic data based on expert knowledge, was developed. The numerical example analyzes four dimensions, that is, managing identification and access, registering the device and application, managing the infrastructure, and data governance, and 20 failure modes concerning the vulnerabilities of big data. The results show that the most important aspect of risk to big data relates to data governance.

  4. Reframed Genome-Scale Metabolic Model to Facilitate Genetic Design and Integration with Expression Data.

    Science.gov (United States)

    Gu, Deqing; Jian, Xingxing; Zhang, Cheng; Hua, Qiang

    2017-01-01

    Genome-scale metabolic network models (GEMs) have played important roles in the design of genetically engineered strains and helped biologists to decipher metabolism. However, due to the complex gene-reaction relationships that exist in model systems, most algorithms have limited capabilities with respect to directly predicting accurate genetic design for metabolic engineering. In particular, methods that predict reaction knockout strategies leading to overproduction are often impractical in terms of gene manipulations. Recently, we proposed a method named logical transformation of model (LTM) to simplify the gene-reaction associations by introducing intermediate pseudo reactions, which makes it possible to generate genetic design. Here, we propose an alternative method to relieve researchers from deciphering complex gene-reactions by adding pseudo gene controlling reactions. In comparison to LTM, this new method introduces fewer pseudo reactions and generates a much smaller model system named as gModel. We showed that gModel allows two seldom reported applications: identification of minimal genomes and design of minimal cell factories within a modified OptKnock framework. In addition, gModel could be used to integrate expression data directly and improve the performance of the E-Fmin method for predicting fluxes. In conclusion, the model transformation procedure will facilitate genetic research based on GEMs, extending their applications.

  5. How Big Are "Martin's Big Words"? Thinking Big about the Future.

    Science.gov (United States)

    Gardner, Traci

    "Martin's Big Words: The Life of Dr. Martin Luther King, Jr." tells of King's childhood determination to use "big words" through biographical information and quotations. In this lesson, students in grades 3 to 5 explore information on Dr. King to think about his "big" words, then they write about their own…

  6. Asymmetric author-topic model for knowledge discovering of big data in toxicogenomics.

    Science.gov (United States)

    Chung, Ming-Hua; Wang, Yuping; Tang, Hailin; Zou, Wen; Basinger, John; Xu, Xiaowei; Tong, Weida

    2015-01-01

    The advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on keyword search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past 10 years, probabilistic topic modeling has been recognized as an effective machine learning algorithm to annotate the hidden thematic structure of massive collection of documents. The analogy between text corpus and large-scale genomic data enables the application of text mining tools, like probabilistic topic models, to explore hidden patterns of genomic data and to the extension of altered biological functions. In this paper, we developed a generalized probabilistic topic model to analyze a toxicogenomics dataset that consists of a large number of gene expression data from the rat livers treated with drugs in multiple dose and time-points. We discovered the hidden patterns in gene expression associated with the effect of doses and time-points of treatment. Finally, we illustrated the ability of our model to identify the evidence of potential reduction of animal use.

  7. Asymmetric author-topic model for knowledge discovering of big data in toxicogenomics

    Directory of Open Access Journals (Sweden)

    Ming-Hua eChung

    2015-04-01

    Full Text Available The advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on keyword search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past ten years, probabilistic topic modeling has been recognized as an effective machine learning algorithm to annotate the hidden thematic structure of massive collection of documents. The analogy between text corpus and large-scale genomic data enables the application of text mining tools, like probabilistic topic models, to explore hidden patterns of genomic data and to the extension of altered biological functions. In this paper, we developed a generalized probabilistic topic model to analyze a toxicogenomics dataset that consists of a large number of gene expression data from the rat livers treated with drugs in multiple dose and time-points. We discovered the hidden patterns in gene expression associated with the effect of doses and time-points of treatment. Finally, we illustrated the ability of our model to identify the evidence of potential reduction of animal use.

  8. The emerging role of Big Data in key development issues: Opportunities, challenges, and concerns

    Directory of Open Access Journals (Sweden)

    Nir Kshetri

    2014-12-01

    Full Text Available This paper presents a review of academic literature, policy documents from government organizations and international agencies, and reports from industries and popular media on the trends in Big Data utilization in key development issues and its worthwhileness, usefulness, and relevance. By looking at Big Data deployment in a number of key economic sectors, it seeks to provide a better understanding of the opportunities and challenges of using it for addressing key issues facing the developing world. It reviews the uses of Big Data in agriculture and farming activities in developing countries to assess the capabilities required at various levels to benefit from Big Data. It also provides insights into how the current digital divide is associated with and facilitated by the pattern of Big Data diffusion and its effective use in key development areas. It also discusses the lessons that developing countries can learn from the utilization of Big Data in big corporations as well as in other activities in industrialized countries.

  9. Using Globus to Transfer and Share Big Data | Poster

    Science.gov (United States)

    By Ashley DeVine, Staff Writer, and Mark Wance, Guest Writer; photo by Richard Frederickson, Staff Photographer Editor's note: This article was updated April 30, 2018. Transferring big data, such as the genomics data delivered to customers from the Center for Cancer Research Sequencing Facility (CCR SF), has been difficult in the past because the transfer systems have not kept

  10. The GEP: Crowd-Sourcing Big Data Analysis with Undergraduates.

    Science.gov (United States)

    Elgin, Sarah C R; Hauser, Charles; Holzen, Teresa M; Jones, Christopher; Kleinschmit, Adam; Leatherman, Judith

    2017-02-01

    The era of 'big data' is also the era of abundant data, creating new opportunities for student-scientist research partnerships. By coordinating undergraduate efforts, the Genomics Education Partnership produces high-quality annotated data sets and analyses that could not be generated otherwise, leading to scientific publications while providing many students with research experience. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Big defensins, a diverse family of antimicrobial peptides that follows different patterns of expression in hemocytes of the oyster Crassostrea gigas.

    Science.gov (United States)

    Rosa, Rafael D; Santini, Adrien; Fievet, Julie; Bulet, Philippe; Destoumieux-Garzón, Delphine; Bachère, Evelyne

    2011-01-01

    Big defensin is an antimicrobial peptide composed of a highly hydrophobic N-terminal region and a cationic C-terminal region containing six cysteine residues involved in three internal disulfide bridges. While big defensin sequences have been reported in various mollusk species, few studies have been devoted to their sequence diversity, gene organization and their expression in response to microbial infections. Using the high-throughput Digital Gene Expression approach, we have identified in Crassostrea gigas oysters several sequences coding for big defensins induced in response to a Vibrio infection. We showed that the oyster big defensin family is composed of three members (named Cg-BigDef1, Cg-BigDef2 and Cg-BigDef3) that are encoded by distinct genomic sequences. All Cg-BigDefs contain a hydrophobic N-terminal domain and a cationic C-terminal domain that resembles vertebrate β-defensins. Both domains are encoded by separate exons. We found that big defensins form a group predominantly present in mollusks and closer to vertebrate defensins than to invertebrate and fungi CSαβ-containing defensins. Moreover, we showed that Cg-BigDefs are expressed in oyster hemocytes only and follow different patterns of gene expression. While Cg-BigDef3 is non-regulated, both Cg-BigDef1 and Cg-BigDef2 transcripts are strongly induced in response to bacterial challenge. Induction was dependent on pathogen associated molecular patterns but not damage-dependent. The inducibility of Cg-BigDef1 was confirmed by HPLC and mass spectrometry, since ions with a molecular mass compatible with mature Cg-BigDef1 (10.7 kDa) were present in immune-challenged oysters only. From our biochemical data, native Cg-BigDef1 would result from the elimination of a prepropeptide sequence and the cyclization of the resulting N-terminal glutamine residue into a pyroglutamic acid. We provide here the first report showing that big defensins form a family of antimicrobial peptides diverse not only in terms

  12. The Next Generation Precision Medical Record - A Framework for Integrating Genomes and Wearable Sensors with Medical Records

    OpenAIRE

    Batra, Prag; Singh, Enakshi; Bog, Anja; Wright, Mark; Ashley, Euan; Waggott, Daryl

    2016-01-01

    Current medical records are rigid with regards to emerging big biomedical data. Examples of poorly integrated big data that already exist in clinical practice include whole genome sequencing and wearable sensors for real time monitoring. Genome sequencing enables conventional diagnostic interrogation and forms the fundamental baseline for precision health throughout a patients lifetime. Mobile sensors enable tailored monitoring regimes for both reducing risk through precision health intervent...

  13. Big Data, Big Problems: A Healthcare Perspective.

    Science.gov (United States)

    Househ, Mowafa S; Aldosari, Bakheet; Alanazi, Abdullah; Kushniruk, Andre W; Borycki, Elizabeth M

    2017-01-01

    Much has been written on the benefits of big data for healthcare such as improving patient outcomes, public health surveillance, and healthcare policy decisions. Over the past five years, Big Data, and the data sciences field in general, has been hyped as the "Holy Grail" for the healthcare industry promising a more efficient healthcare system with the promise of improved healthcare outcomes. However, more recently, healthcare researchers are exposing the potential and harmful effects Big Data can have on patient care associating it with increased medical costs, patient mortality, and misguided decision making by clinicians and healthcare policy makers. In this paper, we review the current Big Data trends with a specific focus on the inadvertent negative impacts that Big Data could have on healthcare, in general, and specifically, as it relates to patient and clinical care. Our study results show that although Big Data is built up to be as a the "Holy Grail" for healthcare, small data techniques using traditional statistical methods are, in many cases, more accurate and can lead to more improved healthcare outcomes than Big Data methods. In sum, Big Data for healthcare may cause more problems for the healthcare industry than solutions, and in short, when it comes to the use of data in healthcare, "size isn't everything."

  14. Big-Data Based Decision-Support Systems to Improve Clinicians' Cognition.

    Science.gov (United States)

    Roosan, Don; Samore, Matthew; Jones, Makoto; Livnat, Yarden; Clutter, Justin

    2016-01-01

    Complex clinical decision-making could be facilitated by using population health data to inform clinicians. In two previous studies, we interviewed 16 infectious disease experts to understand complex clinical reasoning. For this study, we focused on answers from the experts on how clinical reasoning can be supported by population-based Big-Data. We found cognitive strategies such as trajectory tracking, perspective taking, and metacognition has the potential to improve clinicians' cognition to deal with complex problems. These cognitive strategies could be supported by population health data, and all have important implications for the design of Big-Data based decision-support tools that could be embedded in electronic health records. Our findings provide directions for task allocation and design of decision-support applications for health care industry development of Big data based decision-support systems.

  15. Big Data, data integrity, and the fracturing of the control zone

    Directory of Open Access Journals (Sweden)

    Carl Lagoze

    2014-11-01

    Full Text Available Despite all the attention to Big Data and the claims that it represents a “paradigm shift” in science, we lack understanding about what are the qualities of Big Data that may contribute to this revolutionary impact. In this paper, we look beyond the quantitative aspects of Big Data (i.e. lots of data and examine it from a sociotechnical perspective. We argue that a key factor that distinguishes “Big Data” from “lots of data” lies in changes to the traditional, well-established “control zones” that facilitated clear provenance of scientific data, thereby ensuring data integrity and providing the foundation for credible science. The breakdown of these control zones is a consequence of the manner in which our network technology and culture enable and encourage open, anonymous sharing of information, participation regardless of expertise, and collaboration across geographic, disciplinary, and institutional barriers. We are left with the conundrum—how to reap the benefits of Big Data while re-creating a trust fabric and an accountable chain of responsibility that make credible science possible.

  16. The complete mitochondrial genome of the big-belly seahorse, Hippocampus abdominalis (Lesson 1827).

    Science.gov (United States)

    Wang, Lei; Chen, Zaizhong; Leng, Xiangjun; Gao, Jianzhong; Chen, Xiaowu; Li, Zhongpu; Sun, Peiying; Zhao, Yuming

    2016-11-01

    In this study, the complete mitogenome sequence of the big-belly seahorse, Hippocampus abdominalis (Lesson, 1827) (Syngnathiformes: Syngnathidae), has been sequenced by the next-generation sequencing method. The assembled mitogenome is 16 521 bp in length which includes 13 protein-coding genes, 22 transfer RNAs, and 2 ribosomal RNAs genes. The overall base composition of the seahorse is 31.1% for A, 23.6% for C, 16.0% for G, 29.3% for T and shows 87% identities similar to tiger tail seahorse, Hippocampus comes. The complete mitogenome of the big-belly seahorse provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for seahorse family.

  17. Big Surveys, Big Data Centres

    Science.gov (United States)

    Schade, D.

    2016-06-01

    Well-designed astronomical surveys are powerful and have consistently been keystones of scientific progress. The Byurakan Surveys using a Schmidt telescope with an objective prism produced a list of about 3000 UV-excess Markarian galaxies but these objects have stimulated an enormous amount of further study and appear in over 16,000 publications. The CFHT Legacy Surveys used a wide-field imager to cover thousands of square degrees and those surveys are mentioned in over 1100 publications since 2002. Both ground and space-based astronomy have been increasing their investments in survey work. Survey instrumentation strives toward fair samples and large sky coverage and therefore strives to produce massive datasets. Thus we are faced with the "big data" problem in astronomy. Survey datasets require specialized approaches to data management. Big data places additional challenging requirements for data management. If the term "big data" is defined as data collections that are too large to move then there are profound implications for the infrastructure that supports big data science. The current model of data centres is obsolete. In the era of big data the central problem is how to create architectures that effectively manage the relationship between data collections, networks, processing capabilities, and software, given the science requirements of the projects that need to be executed. A stand alone data silo cannot support big data science. I'll describe the current efforts of the Canadian community to deal with this situation and our successes and failures. I'll talk about how we are planning in the next decade to try to create a workable and adaptable solution to support big data science.

  18. Recht voor big data, big data voor recht

    NARCIS (Netherlands)

    Lafarre, Anne

    Big data is een niet meer weg te denken fenomeen in onze maatschappij. Het is de hype cycle voorbij en de eerste implementaties van big data-technieken worden uitgevoerd. Maar wat is nu precies big data? Wat houden de vijf V's in die vaak genoemd worden in relatie tot big data? Ter inleiding van

  19. Perspectives on making big data analytics work for oncology.

    Science.gov (United States)

    El Naqa, Issam

    2016-12-01

    Oncology, with its unique combination of clinical, physical, technological, and biological data provides an ideal case study for applying big data analytics to improve cancer treatment safety and outcomes. An oncology treatment course such as chemoradiotherapy can generate a large pool of information carrying the 5Vs hallmarks of big data. This data is comprised of a heterogeneous mixture of patient demographics, radiation/chemo dosimetry, multimodality imaging features, and biological markers generated over a treatment period that can span few days to several weeks. Efforts using commercial and in-house tools are underway to facilitate data aggregation, ontology creation, sharing, visualization and varying analytics in a secure environment. However, open questions related to proper data structure representation and effective analytics tools to support oncology decision-making need to be addressed. It is recognized that oncology data constitutes a mix of structured (tabulated) and unstructured (electronic documents) that need to be processed to facilitate searching and subsequent knowledge discovery from relational or NoSQL databases. In this context, methods based on advanced analytics and image feature extraction for oncology applications will be discussed. On the other hand, the classical p (variables)≫n (samples) inference problem of statistical learning is challenged in the Big data realm and this is particularly true for oncology applications where p-omics is witnessing exponential growth while the number of cancer incidences has generally plateaued over the past 5-years leading to a quasi-linear growth in samples per patient. Within the Big data paradigm, this kind of phenomenon may yield undesirable effects such as echo chamber anomalies, Yule-Simpson reversal paradox, or misleading ghost analytics. In this work, we will present these effects as they pertain to oncology and engage small thinking methodologies to counter these effects ranging from

  20. Cloud-based interactive analytics for terabytes of genomic variants data.

    Science.gov (United States)

    Pan, Cuiping; McInnes, Gregory; Deflaux, Nicole; Snyder, Michael; Bingham, Jonathan; Datta, Somalee; Tsao, Philip S

    2017-12-01

    Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. cuiping@stanford.edu or ptsao@stanford.edu. Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.

  1. Big-Data Based Decision-Support Systems to Improve Clinicians’ Cognition

    Science.gov (United States)

    Roosan, Don; Samore, Matthew; Jones, Makoto; Livnat, Yarden; Clutter, Justin

    2016-01-01

    Complex clinical decision-making could be facilitated by using population health data to inform clinicians. In two previous studies, we interviewed 16 infectious disease experts to understand complex clinical reasoning. For this study, we focused on answers from the experts on how clinical reasoning can be supported by population-based Big-Data. We found cognitive strategies such as trajectory tracking, perspective taking, and metacognition has the potential to improve clinicians’ cognition to deal with complex problems. These cognitive strategies could be supported by population health data, and all have important implications for the design of Big-Data based decision-support tools that could be embedded in electronic health records. Our findings provide directions for task allocation and design of decision-support applications for health care industry development of Big data based decision-support systems. PMID:27990498

  2. Big Defensins, a Diverse Family of Antimicrobial Peptides That Follows Different Patterns of Expression in Hemocytes of the Oyster Crassostrea gigas

    Science.gov (United States)

    Rosa, Rafael D.; Santini, Adrien; Fievet, Julie; Bulet, Philippe; Destoumieux-Garzón, Delphine; Bachère, Evelyne

    2011-01-01

    Background Big defensin is an antimicrobial peptide composed of a highly hydrophobic N-terminal region and a cationic C-terminal region containing six cysteine residues involved in three internal disulfide bridges. While big defensin sequences have been reported in various mollusk species, few studies have been devoted to their sequence diversity, gene organization and their expression in response to microbial infections. Findings Using the high-throughput Digital Gene Expression approach, we have identified in Crassostrea gigas oysters several sequences coding for big defensins induced in response to a Vibrio infection. We showed that the oyster big defensin family is composed of three members (named Cg-BigDef1, Cg-BigDef2 and Cg-BigDef3) that are encoded by distinct genomic sequences. All Cg-BigDefs contain a hydrophobic N-terminal domain and a cationic C-terminal domain that resembles vertebrate β-defensins. Both domains are encoded by separate exons. We found that big defensins form a group predominantly present in mollusks and closer to vertebrate defensins than to invertebrate and fungi CSαβ-containing defensins. Moreover, we showed that Cg-BigDefs are expressed in oyster hemocytes only and follow different patterns of gene expression. While Cg-BigDef3 is non-regulated, both Cg-BigDef1 and Cg-BigDef2 transcripts are strongly induced in response to bacterial challenge. Induction was dependent on pathogen associated molecular patterns but not damage-dependent. The inducibility of Cg-BigDef1 was confirmed by HPLC and mass spectrometry, since ions with a molecular mass compatible with mature Cg-BigDef1 (10.7 kDa) were present in immune-challenged oysters only. From our biochemical data, native Cg-BigDef1 would result from the elimination of a prepropeptide sequence and the cyclization of the resulting N-terminal glutamine residue into a pyroglutamic acid. Conclusions We provide here the first report showing that big defensins form a family of antimicrobial

  3. Big defensins, a diverse family of antimicrobial peptides that follows different patterns of expression in hemocytes of the oyster Crassostrea gigas.

    Directory of Open Access Journals (Sweden)

    Rafael D Rosa

    Full Text Available BACKGROUND: Big defensin is an antimicrobial peptide composed of a highly hydrophobic N-terminal region and a cationic C-terminal region containing six cysteine residues involved in three internal disulfide bridges. While big defensin sequences have been reported in various mollusk species, few studies have been devoted to their sequence diversity, gene organization and their expression in response to microbial infections. FINDINGS: Using the high-throughput Digital Gene Expression approach, we have identified in Crassostrea gigas oysters several sequences coding for big defensins induced in response to a Vibrio infection. We showed that the oyster big defensin family is composed of three members (named Cg-BigDef1, Cg-BigDef2 and Cg-BigDef3 that are encoded by distinct genomic sequences. All Cg-BigDefs contain a hydrophobic N-terminal domain and a cationic C-terminal domain that resembles vertebrate β-defensins. Both domains are encoded by separate exons. We found that big defensins form a group predominantly present in mollusks and closer to vertebrate defensins than to invertebrate and fungi CSαβ-containing defensins. Moreover, we showed that Cg-BigDefs are expressed in oyster hemocytes only and follow different patterns of gene expression. While Cg-BigDef3 is non-regulated, both Cg-BigDef1 and Cg-BigDef2 transcripts are strongly induced in response to bacterial challenge. Induction was dependent on pathogen associated molecular patterns but not damage-dependent. The inducibility of Cg-BigDef1 was confirmed by HPLC and mass spectrometry, since ions with a molecular mass compatible with mature Cg-BigDef1 (10.7 kDa were present in immune-challenged oysters only. From our biochemical data, native Cg-BigDef1 would result from the elimination of a prepropeptide sequence and the cyclization of the resulting N-terminal glutamine residue into a pyroglutamic acid. CONCLUSIONS: We provide here the first report showing that big defensins form a family

  4. Mapping our genes: The genome projects: How big, how fast

    Energy Technology Data Exchange (ETDEWEB)

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  5. Mapping Our Genes: The Genome Projects: How Big, How Fast

    Science.gov (United States)

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for ?writing the rules? of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  6. MLBCD: a machine learning tool for big clinical data.

    Science.gov (United States)

    Luo, Gang

    2015-01-01

    Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. The paper describes MLBCD's design in detail. By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care.

  7. SNUGB: a versatile genome browser supporting comparative and functional fungal genomics

    Directory of Open Access Journals (Sweden)

    Kim Seungill

    2008-12-01

    Full Text Available Abstract Background Since the full genome sequences of Saccharomyces cerevisiae were released in 1996, genome sequences of over 90 fungal species have become publicly available. The heterogeneous formats of genome sequences archived in different sequencing centers hampered the integration of the data for efficient and comprehensive comparative analyses. The Comparative Fungal Genomics Platform (CFGP was developed to archive these data via a single standardized format that can support multifaceted and integrated analyses of the data. To facilitate efficient data visualization and utilization within and across species based on the architecture of CFGP and associated databases, a new genome browser was needed. Results The Seoul National University Genome Browser (SNUGB integrates various types of genomic information derived from 98 fungal/oomycete (137 datasets and 34 plant and animal (38 datasets species, graphically presents germane features and properties of each genome, and supports comparison between genomes. The SNUGB provides three different forms of the data presentation interface, including diagram, table, and text, and six different display options to support visualization and utilization of the stored information. Information for individual species can be quickly accessed via a new tool named the taxonomy browser. In addition, SNUGB offers four useful data annotation/analysis functions, including 'BLAST annotation.' The modular design of SNUGB makes its adoption to support other comparative genomic platforms easy and facilitates continuous expansion. Conclusion The SNUGB serves as a powerful platform supporting comparative and functional genomics within the fungal kingdom and also across other kingdoms. All data and functions are available at the web site http://genomebrowser.snu.ac.kr/.

  8. Vertical landscraping, a big regionalism for Dubai.

    Science.gov (United States)

    Wilson, Matthew

    2010-01-01

    Dubai's ecologic and economic complications are exacerbated by six years of accelerated expansion, a fixed top-down approach to urbanism and the construction of iconic single-phase mega-projects. With recent construction delays, project cancellations and growing landscape issues, Dubai's tower typologies have been unresponsive to changing environmental, socio-cultural and economic patterns (BBC, 2009; Gillet, 2009; Lewis, 2009). In this essay, a theory of "Big Regionalism" guides an argument for an economically and ecologically linked tower typology called the Condenser. This phased "box-to-tower" typology is part of a greater Landscape Urbanist strategy called Vertical Landscraping. Within this strategy, the Condenser's role is to densify the city, facilitating the creation of ecologic voids that order the urban region. Delineating "Big Regional" principles, the Condenser provides a time-based, global-local urban growth approach that weaves Bigness into a series of urban-regional, economic and ecological relationships, builds upon the environmental performance of the city's regional architecture and planning, promotes a continuity of Dubai's urban history, and responds to its landscape issues while condensing development. These speculations permit consideration of the overlooked opportunities embedded within Dubai's mega-projects and their long-term impact on the urban morphology.

  9. The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap.

    Science.gov (United States)

    Mann, Richard P; Mushtaq, Faisal; White, Alan D; Mata-Cervantes, Gabriel; Pike, Tom; Coker, Dalton; Murdoch, Stuart; Hiles, Tim; Smith, Clare; Berridge, David; Hinchliffe, Suzanne; Hall, Geoff; Smye, Stephen; Wilkie, Richard M; Lodge, J Peter A; Mon-Williams, Mark

    2016-01-01

    Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public's perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating health-care delivery. We propose that greater attention (and funding) needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit "big data."

  10. Big Bang Tumor Growth and Clonal Evolution.

    Science.gov (United States)

    Sun, Ruping; Hu, Zheng; Curtis, Christina

    2018-05-01

    The advent and application of next-generation sequencing (NGS) technologies to tumor genomes has reinvigorated efforts to understand clonal evolution. Although tumor progression has traditionally been viewed as a gradual stepwise process, recent studies suggest that evolutionary rates in tumors can be variable with periods of punctuated mutational bursts and relative stasis. For example, Big Bang dynamics have been reported, wherein after transformation, growth occurs in the absence of stringent selection, consistent with effectively neutral evolution. Although first noted in colorectal tumors, effective neutrality may be relatively common. Additionally, punctuated evolution resulting from mutational bursts and cataclysmic genomic alterations have been described. In this review, we contrast these findings with the conventional gradualist view of clonal evolution and describe potential clinical and therapeutic implications of different evolutionary modes and tempos. Copyright © 2018 Cold Spring Harbor Laboratory Press; all rights reserved.

  11. BigOP: Generating Comprehensive Big Data Workloads as a Benchmarking Framework

    OpenAIRE

    Zhu, Yuqing; Zhan, Jianfeng; Weng, Chuliang; Nambiar, Raghunath; Zhang, Jinchao; Chen, Xingzhen; Wang, Lei

    2014-01-01

    Big Data is considered proprietary asset of companies, organizations, and even nations. Turning big data into real treasure requires the support of big data systems. A variety of commercial and open source products have been unleashed for big data storage and processing. While big data users are facing the choice of which system best suits their needs, big data system developers are facing the question of how to evaluate their systems with regard to general big data processing needs. System b...

  12. Big data in oncologic imaging.

    Science.gov (United States)

    Regge, Daniele; Mazzetti, Simone; Giannini, Valentina; Bracco, Christian; Stasi, Michele

    2017-06-01

    Cancer is a complex disease and unfortunately understanding how the components of the cancer system work does not help understand the behavior of the system as a whole. In the words of the Greek philosopher Aristotle "the whole is greater than the sum of parts." To date, thanks to improved information technology infrastructures, it is possible to store data from each single cancer patient, including clinical data, medical images, laboratory tests, and pathological and genomic information. Indeed, medical archive storage constitutes approximately one-third of total global storage demand and a large part of the data are in the form of medical images. The opportunity is now to draw insight on the whole to the benefit of each individual patient. In the oncologic patient, big data analysis is at the beginning but several useful applications can be envisaged including development of imaging biomarkers to predict disease outcome, assessing the risk of X-ray dose exposure or of renal damage following the administration of contrast agents, and tracking and optimizing patient workflow. The aim of this review is to present current evidence of how big data derived from medical images may impact on the diagnostic pathway of the oncologic patient.

  13. A public resource facilitating clinical use of genomes.

    NARCIS (Netherlands)

    Ball, M.P.; Thakuria, J.V.; Zaranek, A.W.; Clegg, T.; Rosenbaum, A.M.; Wu, X.; Angrist, M.; Bhak, J.; Bobe, J.; Callow, M.J.; Cano, C.; Chou, M.F.; Chung, W.K.; Douglas, S.M.; Estep, P.W.; Gore, A.; Hulick, P.; Labarga, A.; Lee, J.-H.; Lunshof, J.E.; Kim, B.C.; Kim, J.L.; Li, Z.; Murray, M.F.; Nilsen, G.B.; Peters, B.A.; Raman, A.M.; Rienhoff, H.Y.; Robasky, K.; Wheeler, M.T.; Vandewege, W.; Vorhaus, D.B.; Yang, Y.L.; Yang, L.; Aach, J.; Ashley, E.A.; Drmanac, R.; Kim, S.-J.; Li, J.B.; Peshkin, L.; Seidman, S.E.; Seo, J.-S.; Zhang, K.; Rehm, H.L.; Church, G.M.

    2012-01-01

    Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the

  14. Transcription facilitated genome-wide recruitment of topoisomerase I and DNA gyrase.

    Science.gov (United States)

    Ahmed, Wareed; Sala, Claudia; Hegde, Shubhada R; Jha, Rajiv Kumar; Cole, Stewart T; Nagaraja, Valakunja

    2017-05-01

    Movement of the transcription machinery along a template alters DNA topology resulting in the accumulation of supercoils in DNA. The positive supercoils generated ahead of transcribing RNA polymerase (RNAP) and the negative supercoils accumulating behind impose severe topological constraints impeding transcription process. Previous studies have implied the role of topoisomerases in the removal of torsional stress and the maintenance of template topology but the in vivo interaction of functionally distinct topoisomerases with heterogeneous chromosomal territories is not deciphered. Moreover, how the transcription-induced supercoils influence the genome-wide recruitment of DNA topoisomerases remains to be explored in bacteria. Using ChIP-Seq, we show the genome-wide occupancy profile of both topoisomerase I and DNA gyrase in conjunction with RNAP in Mycobacterium tuberculosis taking advantage of minimal topoisomerase representation in the organism. The study unveils the first in vivo genome-wide interaction of both the topoisomerases with the genomic regions and establishes that transcription-induced supercoils govern their recruitment at genomic sites. Distribution profiles revealed co-localization of RNAP and the two topoisomerases on the active transcriptional units (TUs). At a given locus, topoisomerase I and DNA gyrase were localized behind and ahead of RNAP, respectively, correlating with the twin-supercoiled domains generated. The recruitment of topoisomerases was higher at the genomic loci with higher transcriptional activity and/or at regions under high torsional stress compared to silent genomic loci. Importantly, the occupancy of DNA gyrase, sole type II topoisomerase in Mtb, near the Ter domain of the Mtb chromosome validates its function as a decatenase.

  15. How Big Is Too Big?

    Science.gov (United States)

    Cibes, Margaret; Greenwood, James

    2016-01-01

    Media Clips appears in every issue of Mathematics Teacher, offering readers contemporary, authentic applications of quantitative reasoning based on print or electronic media. This issue features "How Big is Too Big?" (Margaret Cibes and James Greenwood) in which students are asked to analyze the data and tables provided and answer a…

  16. Figure 4 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Gene-list view of genomic data. The gene-list view allows users to compare data across a set of loci. The data in this figure includes copy number, mutation, and clinical data from 202 glioblastoma samples from TCGA. Adapted from Figure 7; Thorvaldsdottir H et al. 2012

  17. Using predictive analytics and big data to optimize pharmaceutical outcomes.

    Science.gov (United States)

    Hernandez, Inmaculada; Zhang, Yuting

    2017-09-15

    The steps involved, the resources needed, and the challenges associated with applying predictive analytics in healthcare are described, with a review of successful applications of predictive analytics in implementing population health management interventions that target medication-related patient outcomes. In healthcare, the term big data typically refers to large quantities of electronic health record, administrative claims, and clinical trial data as well as data collected from smartphone applications, wearable devices, social media, and personal genomics services; predictive analytics refers to innovative methods of analysis developed to overcome challenges associated with big data, including a variety of statistical techniques ranging from predictive modeling to machine learning to data mining. Predictive analytics using big data have been applied successfully in several areas of medication management, such as in the identification of complex patients or those at highest risk for medication noncompliance or adverse effects. Because predictive analytics can be used in predicting different outcomes, they can provide pharmacists with a better understanding of the risks for specific medication-related problems that each patient faces. This information will enable pharmacists to deliver interventions tailored to patients' needs. In order to take full advantage of these benefits, however, clinicians will have to understand the basics of big data and predictive analytics. Predictive analytics that leverage big data will become an indispensable tool for clinicians in mapping interventions and improving patient outcomes. Copyright © 2017 by the American Society of Health-System Pharmacists, Inc. All rights reserved.

  18. The NCI Genomic Data Commons as an engine for precision medicine.

    Science.gov (United States)

    Jensen, Mark A; Ferretti, Vincent; Grossman, Robert L; Staudt, Louis M

    2017-07-27

    The National Cancer Institute Genomic Data Commons (GDC) is an information system for storing, analyzing, and sharing genomic and clinical data from patients with cancer. The recent high-throughput sequencing of cancer genomes and transcriptomes has produced a big data problem that precludes many cancer biologists and oncologists from gleaning knowledge from these data regarding the nature of malignant processes and the relationship between tumor genomic profiles and treatment response. The GDC aims to democratize access to cancer genomic data and to foster the sharing of these data to promote precision medicine approaches to the diagnosis and treatment of cancer.

  19. Figure 5 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Split-Screen View. The split-screen view is useful for exploring relationships of genomic features that are independent of chromosomal location. Color is used here to indicate mate pairs that map to different chromosomes, chromosomes 1 and 6, suggesting a translocation event. Adapted from Figure 8; Thorvaldsdottir H et al. 2012

  20. Figure 2 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Grouping and sorting genomic data in IGV. The IGV user interface displaying 202 glioblastoma samples from TCGA. Samples are grouped by tumor subtype (second annotation column) and data type (first annotation column) and sorted by copy number of the EGFR locus (middle column). Adapted from Figure 1; Robinson et al. 2011

  1. An optimal big data workflow for biomedical image analysis

    Directory of Open Access Journals (Sweden)

    Aurelle Tchagna Kouanou

    Full Text Available Background and objective: In the medical field, data volume is increasingly growing, and traditional methods cannot manage it efficiently. In biomedical computation, the continuous challenges are: management, analysis, and storage of the biomedical data. Nowadays, big data technology plays a significant role in the management, organization, and analysis of data, using machine learning and artificial intelligence techniques. It also allows a quick access to data using the NoSQL database. Thus, big data technologies include new frameworks to process medical data in a manner similar to biomedical images. It becomes very important to develop methods and/or architectures based on big data technologies, for a complete processing of biomedical image data. Method: This paper describes big data analytics for biomedical images, shows examples reported in the literature, briefly discusses new methods used in processing, and offers conclusions. We argue for adapting and extending related work methods in the field of big data software, using Hadoop and Spark frameworks. These provide an optimal and efficient architecture for biomedical image analysis. This paper thus gives a broad overview of big data analytics to automate biomedical image diagnosis. A workflow with optimal methods and algorithm for each step is proposed. Results: Two architectures for image classification are suggested. We use the Hadoop framework to design the first, and the Spark framework for the second. The proposed Spark architecture allows us to develop appropriate and efficient methods to leverage a large number of images for classification, which can be customized with respect to each other. Conclusions: The proposed architectures are more complete, easier, and are adaptable in all of the steps from conception. The obtained Spark architecture is the most complete, because it facilitates the implementation of algorithms with its embedded libraries. Keywords: Biomedical images, Big

  2. Nursing Needs Big Data and Big Data Needs Nursing.

    Science.gov (United States)

    Brennan, Patricia Flatley; Bakken, Suzanne

    2015-09-01

    Contemporary big data initiatives in health care will benefit from greater integration with nursing science and nursing practice; in turn, nursing science and nursing practice has much to gain from the data science initiatives. Big data arises secondary to scholarly inquiry (e.g., -omics) and everyday observations like cardiac flow sensors or Twitter feeds. Data science methods that are emerging ensure that these data be leveraged to improve patient care. Big data encompasses data that exceed human comprehension, that exist at a volume unmanageable by standard computer systems, that arrive at a velocity not under the control of the investigator and possess a level of imprecision not found in traditional inquiry. Data science methods are emerging to manage and gain insights from big data. The primary methods included investigation of emerging federal big data initiatives, and exploration of exemplars from nursing informatics research to benchmark where nursing is already poised to participate in the big data revolution. We provide observations and reflections on experiences in the emerging big data initiatives. Existing approaches to large data set analysis provide a necessary but not sufficient foundation for nursing to participate in the big data revolution. Nursing's Social Policy Statement guides a principled, ethical perspective on big data and data science. There are implications for basic and advanced practice clinical nurses in practice, for the nurse scientist who collaborates with data scientists, and for the nurse data scientist. Big data and data science has the potential to provide greater richness in understanding patient phenomena and in tailoring interventional strategies that are personalized to the patient. © 2015 Sigma Theta Tau International.

  3. BIG Data - BIG Gains? Understanding the Link Between Big Data Analytics and Innovation

    OpenAIRE

    Niebel, Thomas; Rasel, Fabienne; Viete, Steffen

    2017-01-01

    This paper analyzes the relationship between firms’ use of big data analytics and their innovative performance for product innovations. Since big data technologies provide new data information practices, they create new decision-making possibilities, which firms can use to realize innovations. Applying German firm-level data we find suggestive evidence that big data analytics matters for the likelihood of becoming a product innovator as well as the market success of the firms’ product innovat...

  4. Networking for big data

    CERN Document Server

    Yu, Shui; Misic, Jelena; Shen, Xuemin (Sherman)

    2015-01-01

    Networking for Big Data supplies an unprecedented look at cutting-edge research on the networking and communication aspects of Big Data. Starting with a comprehensive introduction to Big Data and its networking issues, it offers deep technical coverage of both theory and applications.The book is divided into four sections: introduction to Big Data, networking theory and design for Big Data, networking security for Big Data, and platforms and systems for Big Data applications. Focusing on key networking issues in Big Data, the book explains network design and implementation for Big Data. It exa

  5. Managing Variant Calling Files the Big Data Way: Using HDFS and Apache Parquet

    NARCIS (Netherlands)

    Boufea, Aikaterini; Finkers, H.J.; Kaauwen, van M.P.W.; Kramer, M.R.; Athanasiadis, I.N.

    2017-01-01

    Big Data has been seen as a remedy for the efficient management of the ever-increasing genomic data. In this paper, we investigate the use of Apache Spark to store and process Variant Calling Files (VCF) on a Hadoop cluster. We demonstrate Tomatula, a software tool for converting VCF files to Apache

  6. Global fluctuation spectra in big-crunch-big-bang string vacua

    International Nuclear Information System (INIS)

    Craps, Ben; Ovrut, Burt A.

    2004-01-01

    We study big-crunch-big-bang cosmologies that correspond to exact world-sheet superconformal field theories of type II strings. The string theory spacetime contains a big crunch and a big bang cosmology, as well as additional 'whisker' asymptotic and intermediate regions. Within the context of free string theory, we compute, unambiguously, the scalar fluctuation spectrum in all regions of spacetime. Generically, the big crunch fluctuation spectrum is altered while passing through the bounce singularity. The change in the spectrum is characterized by a function Δ, which is momentum and time dependent. We compute Δ explicitly and demonstrate that it arises from the whisker regions. The whiskers are also shown to lead to 'entanglement' entropy in the big bang region. Finally, in the Milne orbifold limit of our superconformal vacua, we show that Δ→1 and, hence, the fluctuation spectrum is unaltered by the big-crunch-big-bang singularity. We comment on, but do not attempt to resolve, subtleties related to gravitational back reaction and light winding modes when interactions are taken into account

  7. Accurate Dna Assembly And Direct Genome Integration With Optimized Uracil Excision Cloning To Facilitate Engineering Of Escherichia Coli As A Cell Factory

    DEFF Research Database (Denmark)

    Cavaleiro, Mafalda; Kim, Se Hyeuk; Nørholm, Morten

    2015-01-01

    Plants produce a vast diversity of valuable compounds with medical properties, but these are often difficult to purify from the natural source or produce by organic synthesis. An alternative is to transfer the biosynthetic pathways to an efficient production host like the bacterium Escherichia co......-excision-based cloning and combining it with a genome-engineering approach to allow direct integration of whole metabolic pathways into the genome of E. coli, to facilitate the advanced engineering of cell factories........ Cloning and heterologous gene expression are major bottlenecks in the metabolic engineering field. We are working on standardizing DNA vector design processes to promote automation and collaborations in early phase metabolic engineering projects. Here, we focus on optimizing the already established uracil...

  8. KnowEnG: a knowledge engine for genomics.

    Science.gov (United States)

    Sinha, Saurabh; Song, Jun; Weinshilboum, Richard; Jongeneel, Victor; Han, Jiawei

    2015-11-01

    We describe here the vision, motivations, and research plans of the National Institutes of Health Center for Excellence in Big Data Computing at the University of Illinois, Urbana-Champaign. The Center is organized around the construction of "Knowledge Engine for Genomics" (KnowEnG), an E-science framework for genomics where biomedical scientists will have access to powerful methods of data mining, network mining, and machine learning to extract knowledge out of genomics data. The scientist will come to KnowEnG with their own data sets in the form of spreadsheets and ask KnowEnG to analyze those data sets in the light of a massive knowledge base of community data sets called the "Knowledge Network" that will be at the heart of the system. The Center is undertaking discovery projects aimed at testing the utility of KnowEnG for transforming big data to knowledge. These projects span a broad range of biological enquiry, from pharmacogenomics (in collaboration with Mayo Clinic) to transcriptomics of human behavior. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. Principales parámetros para el estudio de la colaboración científica en Big Science

    Directory of Open Access Journals (Sweden)

    Ortoll, Eva

    2014-12-01

    Full Text Available In several scientific disciplines research has shifted from experiments of a reduced scale to large and complex collaborations. Many recent scientific achievements like the human genome sequencing or the discovery of the Higgs boson have taken place within the “big science” paradigm. The study of scientific collaboration needs to take into account all the diverse factors that have an influence on it. In the case of big science experiments, some of those aspects are particularly important: number of institutions involved, cultural differences, diversity of spaces and infrastructures or the conceptualization of research problems. By considering these specific factors we present a set of parameters for the analysis of scientific collaboration in big science projects. The utility of these parameters is illustrated through a comparative study of two large big science projects: the ATLAS experiment and the Human Genome Project.En varias áreas de la ciencia se ha pasado de trabajar en experimentos reducidos a participar en grandes y complejas colaboraciones. Muchos de los grandes avances científicos recientes como la secuenciación del genoma humano o el descubrimiento del bosón de Higgs se enmarcan en el paradigma denominado big science. El estudio de la colaboración científica debe tener en cuenta los factores de todo tipo que influyen en dicha colaboración. Los experimentos de big science inciden especialmente en algunos de estos aspectos: volumen de instituciones implicadas, diferencias culturales, diversidad de espacios e infraestructuras o la propia conceptualización del problema de investigación. Atendiendo a estas particularidades, en este trabajo presentamos un conjunto de parámetros para el análisis de la colaboración científica en proyectos big science. Ilustramos la utilidad de esos parámetros mediante un estudio comparativo de dos grandes proyectos de big science: el experimento ATLAS y el Proyecto Genoma Humano.

  10. Pillow seal system at the BigRIPS separator

    Energy Technology Data Exchange (ETDEWEB)

    Tanaka, K., E-mail: ktanaka@riken.jp; Inabe, N.; Yoshida, K.; Kusaka, K.; Kubo, T.

    2013-12-15

    Highlights: • Pillow seal system has been installed for a high-intensity RI-beam facility at RIKEN. • It is aimed at facilitating remote maintenance under high residual radiation. • Local radiation shields are integrated with one of the pillow seals. • Pillow seals have been aligned to the beam axis within 1mm accuracy. • A leakage rate of 10{sup –9} Pa m{sup 3}/s has been achieved with our pillow seal system. -- Abstract: We have designed and installed a pillow seal system for the BigRIPS fragment separator at the RIKEN Radioactive Isotope Beam Factory (RIBF) to facilitate remote maintenance in a radioactive environment. The pillow seal system is a device to connect a vacuum chamber and a beam tube. It allows quick attachment and detachment of vacuum connections in the BigRIPS separator and consists of a double diaphragm with a differential pumping system. The leakage rate achieved with this system is as low as 10{sup –9} Pa m{sup 3}/s. We have also designed and installed a local radiation-shielding system, integrated with the pillow seal system, to protect the superconducting magnets and to reduce the heat load on the cryogenic system. We present an overview of the pillow seal and the local shielding systems.

  11. State Responsibility and Accountability in Managing Big Data in Biobank Research

    DEFF Research Database (Denmark)

    Tupasela, Aaro Mikael; Liede, Sandra

    2016-01-01

    , research results and incidental findings in biobanks is becoming, however, an increasingly significant challenge for all biobanks and the countries which are in the process of drafting policy and regulatory frameworks for the management and governance of big data, public health genomics and personalised...... dilemmas in the interpretations we uphold with regard to the processing of personal data. This chapter examines the challenges associated with the data subject´s right of access to data in Finnish biobanking. The Data Protection Directive provides provisions for individuals to confirm “as to whether...... medicine. The Finnish case highlights the challenges that many states are increasingly facing across Europe and elsewhere in terms of how to govern and coordinate the management of biomedical big data....

  12. The genome of Chenopodium quinoa

    KAUST Repository

    Jarvis, David Erwin; Ho, Yung Shwen; Lightfoot, Damien; Schmö ckel, Sandra M.; Li, Bo; Borm, Theo J. A.; Ohyanagi, Hajime; Mineta, Katsuhiko; Michell, Craig; Saber, Noha; Kharbatia, Najeh M.; Rupper, Ryan R.; Sharp, Aaron R.; Dally, Nadine; Boughton, Berin A.; Woo, Yong; Gao, Ge; Schijlen, Elio G. W. M.; Guo, Xiujie; Momin, Afaque Ahmad Imtiyaz; Negrã o, Só nia; Al-Babili, Salim; Gehring, Christoph A; Roessner, Ute; Jung, Christian; Murphy, Kevin; Arold, Stefan T.; Gojobori, Takashi; Linden, C. Gerard van der; Loo, Eibertus N. van; Jellen, Eric N.; Maughan, Peter J.; Tester, Mark A.

    2017-01-01

    Chenopodium quinoa (quinoa) is a highly nutritious grain identified as an important crop to improve world food security. Unfortunately, few resources are available to facilitate its genetic improvement. Here we report the assembly of a high-quality, chromosome-scale reference genome sequence for quinoa, which was produced using single-molecule real-time sequencing in combination with optical, chromosome-contact and genetic maps. We also report the sequencing of two diploids from the ancestral gene pools of quinoa, which enables the identification of sub-genomes in quinoa, and reduced-coverage genome sequences for 22 other samples of the allotetraploid goosefoot complex. The genome sequence facilitated the identification of the transcription factor likely to control the production of anti-nutritional triterpenoid saponins found in quinoa seeds, including a mutation that appears to cause alternative splicing and a premature stop codon in sweet quinoa strains. These genomic resources are an important first step towards the genetic improvement of quinoa.

  13. The genome of Chenopodium quinoa

    KAUST Repository

    Jarvis, David Erwin

    2017-02-08

    Chenopodium quinoa (quinoa) is a highly nutritious grain identified as an important crop to improve world food security. Unfortunately, few resources are available to facilitate its genetic improvement. Here we report the assembly of a high-quality, chromosome-scale reference genome sequence for quinoa, which was produced using single-molecule real-time sequencing in combination with optical, chromosome-contact and genetic maps. We also report the sequencing of two diploids from the ancestral gene pools of quinoa, which enables the identification of sub-genomes in quinoa, and reduced-coverage genome sequences for 22 other samples of the allotetraploid goosefoot complex. The genome sequence facilitated the identification of the transcription factor likely to control the production of anti-nutritional triterpenoid saponins found in quinoa seeds, including a mutation that appears to cause alternative splicing and a premature stop codon in sweet quinoa strains. These genomic resources are an important first step towards the genetic improvement of quinoa.

  14. The genome of Chenopodium quinoa.

    Science.gov (United States)

    Jarvis, David E; Ho, Yung Shwen; Lightfoot, Damien J; Schmöckel, Sandra M; Li, Bo; Borm, Theo J A; Ohyanagi, Hajime; Mineta, Katsuhiko; Michell, Craig T; Saber, Noha; Kharbatia, Najeh M; Rupper, Ryan R; Sharp, Aaron R; Dally, Nadine; Boughton, Berin A; Woo, Yong H; Gao, Ge; Schijlen, Elio G W M; Guo, Xiujie; Momin, Afaque A; Negrão, Sónia; Al-Babili, Salim; Gehring, Christoph; Roessner, Ute; Jung, Christian; Murphy, Kevin; Arold, Stefan T; Gojobori, Takashi; Linden, C Gerard van der; van Loo, Eibertus N; Jellen, Eric N; Maughan, Peter J; Tester, Mark

    2017-02-16

    Chenopodium quinoa (quinoa) is a highly nutritious grain identified as an important crop to improve world food security. Unfortunately, few resources are available to facilitate its genetic improvement. Here we report the assembly of a high-quality, chromosome-scale reference genome sequence for quinoa, which was produced using single-molecule real-time sequencing in combination with optical, chromosome-contact and genetic maps. We also report the sequencing of two diploids from the ancestral gene pools of quinoa, which enables the identification of sub-genomes in quinoa, and reduced-coverage genome sequences for 22 other samples of the allotetraploid goosefoot complex. The genome sequence facilitated the identification of the transcription factor likely to control the production of anti-nutritional triterpenoid saponins found in quinoa seeds, including a mutation that appears to cause alternative splicing and a premature stop codon in sweet quinoa strains. These genomic resources are an important first step towards the genetic improvement of quinoa.

  15. Big Argumentation?

    Directory of Open Access Journals (Sweden)

    Daniel Faltesek

    2013-08-01

    Full Text Available Big Data is nothing new. Public concern regarding the mass diffusion of data has appeared repeatedly with computing innovations, in the formation before Big Data it was most recently referred to as the information explosion. In this essay, I argue that the appeal of Big Data is not a function of computational power, but of a synergistic relationship between aesthetic order and a politics evacuated of a meaningful public deliberation. Understanding, and challenging, Big Data requires an attention to the aesthetics of data visualization and the ways in which those aesthetics would seem to depoliticize information. The conclusion proposes an alternative argumentative aesthetic as the appropriate response to the depoliticization posed by the popular imaginary of Big Data.

  16. Characterization of lettuce big-vein associated virus and Mirafiori lettuce big-vein virus infecting lettuce in Saudi Arabia.

    Science.gov (United States)

    Umar, M; Amer, M A; Al-Saleh, M A; Al-Shahwan, I M; Shakeel, M T; Zakri, A M; Katis, N I

    2017-07-01

    During 2014 and 2015, 97 lettuce plants that showed big-vein-disease-like symptoms and seven weed plants were collected from the Riyadh region. DAS-ELISA revealed that 25% and 9% of the lettuce plants were singly infected with LBVaV and MiLBVV, respectively, whereas 63% had a mixed infection with both viruses. The results were confirmed by multiplex reverse transcription polymerase chain reaction using primers specific for LBVaV and MiLBVV. LBVaV and MiLBVV were also detected in Sonchus oleraceus and Eruca sativa, respectively. The nucleotide sequence of LBVaV and MiLBVV Saudi isolates ranged from 94.3-100%, and their similarities to isolates with sequences in the GenBank database ranged from 93.9 to 99.6% and 93.8 to 99.3%, respectively. Olpidium sp. was present in the roots of lettuce plants with big-vein disease and it was shown to facilitate transmission of both viruses.

  17. Artificial intelligence, physiological genomics, and precision medicine.

    Science.gov (United States)

    Williams, Anna Marie; Liu, Yong; Regner, Kevin R; Jotterand, Fabrice; Liu, Pengyuan; Liang, Mingyu

    2018-04-01

    Big data are a major driver in the development of precision medicine. Efficient analysis methods are needed to transform big data into clinically-actionable knowledge. To accomplish this, many researchers are turning toward machine learning (ML), an approach of artificial intelligence (AI) that utilizes modern algorithms to give computers the ability to learn. Much of the effort to advance ML for precision medicine has been focused on the development and implementation of algorithms and the generation of ever larger quantities of genomic sequence data and electronic health records. However, relevance and accuracy of the data are as important as quantity of data in the advancement of ML for precision medicine. For common diseases, physiological genomic readouts in disease-applicable tissues may be an effective surrogate to measure the effect of genetic and environmental factors and their interactions that underlie disease development and progression. Disease-applicable tissue may be difficult to obtain, but there are important exceptions such as kidney needle biopsy specimens. As AI continues to advance, new analytical approaches, including those that go beyond data correlation, need to be developed and ethical issues of AI need to be addressed. Physiological genomic readouts in disease-relevant tissues, combined with advanced AI, can be a powerful approach for precision medicine for common diseases.

  18. Developing a Big Game for Financial Education Using Service Design Approach

    Science.gov (United States)

    Kang, Myunghee; Yoon, Seonghye; Kang, Minjeng; Jang, JeeEun; Lee, Yujung

    2018-01-01

    The purpose of this study was to design and develop an educational game which facilitates building adolescents' knowledge and attitudes in financial principles of a daily life. To achieve this purpose, the authors designed a learner-centered big game for financial education by applying an experience-based triple-diamond instructional design model…

  19. Comparative genomics and prediction of conditionally dispensable sequences in legume-infecting Fusarium oxysporum formae speciales facilitates identification of candidate effectors.

    Science.gov (United States)

    Williams, Angela H; Sharma, Mamta; Thatcher, Louise F; Azam, Sarwar; Hane, James K; Sperschneider, Jana; Kidd, Brendan N; Anderson, Jonathan P; Ghosh, Raju; Garg, Gagan; Lichtenzveig, Judith; Kistler, H Corby; Shea, Terrance; Young, Sarah; Buck, Sally-Anne G; Kamphuis, Lars G; Saxena, Rachit; Pande, Suresh; Ma, Li-Jun; Varshney, Rajeev K; Singh, Karam B

    2016-03-05

    Soil-borne fungi of the Fusarium oxysporum species complex cause devastating wilt disease on many crops including legumes that supply human dietary protein needs across many parts of the globe. We present and compare draft genome assemblies for three legume-infecting formae speciales (ff. spp.): F. oxysporum f. sp. ciceris (Foc-38-1) and f. sp. pisi (Fop-37622), significant pathogens of chickpea and pea respectively, the world's second and third most important grain legumes, and lastly f. sp. medicaginis (Fom-5190a) for which we developed a model legume pathosystem utilising Medicago truncatula. Focusing on the identification of pathogenicity gene content, we leveraged the reference genomes of Fusarium pathogens F. oxysporum f. sp. lycopersici (tomato-infecting) and F. solani (pea-infecting) and their well-characterised core and dispensable chromosomes to predict genomic organisation in the newly sequenced legume-infecting isolates. Dispensable chromosomes are not essential for growth and in Fusarium species are known to be enriched in host-specificity and pathogenicity-associated genes. Comparative genomics of the publicly available Fusarium species revealed differential patterns of sequence conservation across F. oxysporum formae speciales, with legume-pathogenic formae speciales not exhibiting greater sequence conservation between them relative to non-legume-infecting formae speciales, possibly indicating the lack of a common ancestral source for legume pathogenicity. Combining predicted dispensable gene content with in planta expression in the model legume-infecting isolate, we identified small conserved regions and candidate effectors, four of which shared greatest similarity to proteins from another legume-infecting ff. spp. We demonstrate that distinction of core and potential dispensable genomic regions of novel F. oxysporum genomes is an effective tool to facilitate effector discovery and the identification of gene content possibly linked to host

  20. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    Science.gov (United States)

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Personalized medicine: genome, e-health and intelligent systems. Part 1. Genomics and monitoring of clinical data

    Directory of Open Access Journals (Sweden)

    B. A. Kobrinskii

    2017-01-01

    Full Text Available The transition to personalized medicine in practical terms should combine the solution of the genomics problems as the basis for possible diseases and phenotypic manifestations that are markers and early signs of emerging pathological changes. Most diseases have their first principles in childhood. Therefore, in all age groups, it is necessary to monitor the minimum deviations and their dynamics, use mobile devices for this purpose and accumulate the received data. Processing big data (Big Data will provide more informative information. On this basis it will be possible to identify analogs for targeted therapy in similar variants of diseases in large databases of publications on the problem of interest.

  2. HGVA: the Human Genome Variation Archive

    OpenAIRE

    Lopez, Javier; Coll, Jacobo; Haimel, Matthias; Kandasamy, Swaathi; Tarraga, Joaquin; Furio-Tari, Pedro; Bari, Wasim; Bleda, Marta; Rueda, Antonio; Gr?f, Stefan; Rendon, Augusto; Dopazo, Joaquin; Medina, Ignacio

    2017-01-01

    Abstract High-profile genomic variation projects like the 1000 Genomes project or the Exome Aggregation Consortium, are generating a wealth of human genomic variation knowledge which can be used as an essential reference for identifying disease-causing genotypes. However, accessing these data, contrasting the various studies and integrating those data in downstream analyses remains cumbersome. The Human Genome Variation Archive (HGVA) tackles these challenges and facilitates access to genomic...

  3. An overview of big data and data science education at South African universities

    Directory of Open Access Journals (Sweden)

    Eduan Kotzé

    2016-02-01

    Full Text Available Man and machine are generating data electronically at an astronomical speed and in such a way that society is experiencing cognitive challenges to analyse this data meaningfully. Big data firms, such as Google and Facebook, identified this problem several years ago and are continuously developing new technologies or improving existing technologies in order to facilitate the cognitive analysis process of these large data sets. The purpose of this article is to contribute to our theoretical understanding of the role that big data might play in creating new training opportunities for South African universities. The article investigates emerging literature on the characteristics and main components of big data, together with the Hadoop application stack as an example of big data technology. Due to the rapid development of big data technology, a paradigm shift of human resources is required to analyse these data sets; therefore, this study examines the state of big data teaching at South African universities. This article also provides an overview of possible big data sources for South African universities, as well as relevant big data skills that data scientists need. The study also investigates existing academic programs in South Africa, where the focus is on teaching advanced database systems. The study found that big data and data science topics are introduced to students on a postgraduate level, but that the scope is very limited. This article contributes by proposing important theoretical topics that could be introduced as part of the existing academic programs. More research is required, however, to expand these programs in order to meet the growing demand for data scientists with big data skills.

  4. BIG GEO DATA MANAGEMENT: AN EXPLORATION WITH SOCIAL MEDIA AND TELECOMMUNICATIONS OPEN DATA

    Directory of Open Access Journals (Sweden)

    C. Arias Munoz

    2016-06-01

    Full Text Available The term Big Data has been recently used to define big, highly varied, complex data sets, which are created and updated at a high speed and require faster processing, namely, a reduced time to filter and analyse relevant data. These data is also increasingly becoming Open Data (data that can be freely distributed made public by the government, agencies, private enterprises and among others. There are at least two issues that can obstruct the availability and use of Open Big Datasets: Firstly, the gathering and geoprocessing of these datasets are very computationally intensive; hence, it is necessary to integrate high-performance solutions, preferably internet based, to achieve the goals. Secondly, the problems of heterogeneity and inconsistency in geospatial data are well known and affect the data integration process, but is particularly problematic for Big Geo Data. Therefore, Big Geo Data integration will be one of the most challenging issues to solve. With these applications, we demonstrate that is possible to provide processed Big Geo Data to common users, using open geospatial standards and technologies. NoSQL databases like MongoDB and frameworks like RASDAMAN could offer different functionalities that facilitate working with larger volumes and more heterogeneous geospatial data sources.

  5. Big Geo Data Management: AN Exploration with Social Media and Telecommunications Open Data

    Science.gov (United States)

    Arias Munoz, C.; Brovelli, M. A.; Corti, S.; Zamboni, G.

    2016-06-01

    The term Big Data has been recently used to define big, highly varied, complex data sets, which are created and updated at a high speed and require faster processing, namely, a reduced time to filter and analyse relevant data. These data is also increasingly becoming Open Data (data that can be freely distributed) made public by the government, agencies, private enterprises and among others. There are at least two issues that can obstruct the availability and use of Open Big Datasets: Firstly, the gathering and geoprocessing of these datasets are very computationally intensive; hence, it is necessary to integrate high-performance solutions, preferably internet based, to achieve the goals. Secondly, the problems of heterogeneity and inconsistency in geospatial data are well known and affect the data integration process, but is particularly problematic for Big Geo Data. Therefore, Big Geo Data integration will be one of the most challenging issues to solve. With these applications, we demonstrate that is possible to provide processed Big Geo Data to common users, using open geospatial standards and technologies. NoSQL databases like MongoDB and frameworks like RASDAMAN could offer different functionalities that facilitate working with larger volumes and more heterogeneous geospatial data sources.

  6. The Global Invertebrate Genomics Alliance (GIGA): Developing Community Resources to Study Diverse Invertebrate Genomes

    KAUST Repository

    Bracken-Grissom, Heather

    2013-12-12

    Over 95% of all metazoan (animal) species comprise the invertebrates, but very few genomes from these organisms have been sequenced. We have, therefore, formed a Global Invertebrate Genomics Alliance (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site () has been launched to facilitate this collaborative venture.

  7. Using the Pareto principle in genome-wide breeding value estimation.

    Science.gov (United States)

    Yu, Xijiang; Meuwissen, Theo H E

    2011-11-01

    Genome-wide breeding value (GWEBV) estimation methods can be classified based on the prior distribution assumptions of marker effects. Genome-wide BLUP methods assume a normal prior distribution for all markers with a constant variance, and are computationally fast. In Bayesian methods, more flexible prior distributions of SNP effects are applied that allow for very large SNP effects although most are small or even zero, but these prior distributions are often also computationally demanding as they rely on Monte Carlo Markov chain sampling. In this study, we adopted the Pareto principle to weight available marker loci, i.e., we consider that x% of the loci explain (100 - x)% of the total genetic variance. Assuming this principle, it is also possible to define the variances of the prior distribution of the 'big' and 'small' SNP. The relatively few large SNP explain a large proportion of the genetic variance and the majority of the SNP show small effects and explain a minor proportion of the genetic variance. We name this method MixP, where the prior distribution is a mixture of two normal distributions, i.e. one with a big variance and one with a small variance. Simulation results, using a real Norwegian Red cattle pedigree, show that MixP is at least as accurate as the other methods in all studied cases. This method also reduces the hyper-parameters of the prior distribution from 2 (proportion and variance of SNP with big effects) to 1 (proportion of SNP with big effects), assuming the overall genetic variance is known. The mixture of normal distribution prior made it possible to solve the equations iteratively, which greatly reduced computation loads by two orders of magnitude. In the era of marker density reaching million(s) and whole-genome sequence data, MixP provides a computationally feasible Bayesian method of analysis.

  8. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies.

    Science.gov (United States)

    de Brevern, Alexandre G; Meyniel, Jean-Philippe; Fairhead, Cécile; Neuvéglise, Cécile; Malpertuy, Alain

    2015-01-01

    Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries.

  9. Big data

    DEFF Research Database (Denmark)

    Madsen, Anders Koed; Flyverbom, Mikkel; Hilbert, Martin

    2016-01-01

    is to outline a research agenda that can be used to raise a broader set of sociological and practice-oriented questions about the increasing datafication of international relations and politics. First, it proposes a way of conceptualizing big data that is broad enough to open fruitful investigations......The claim that big data can revolutionize strategy and governance in the context of international relations is increasingly hard to ignore. Scholars of international political sociology have mainly discussed this development through the themes of security and surveillance. The aim of this paper...... into the emerging use of big data in these contexts. This conceptualization includes the identification of three moments contained in any big data practice. Second, it suggests a research agenda built around a set of subthemes that each deserve dedicated scrutiny when studying the interplay between big data...

  10. Genomic signal processing

    CERN Document Server

    Shmulevich, Ilya

    2007-01-01

    Genomic signal processing (GSP) can be defined as the analysis, processing, and use of genomic signals to gain biological knowledge, and the translation of that knowledge into systems-based applications that can be used to diagnose and treat genetic diseases. Situated at the crossroads of engineering, biology, mathematics, statistics, and computer science, GSP requires the development of both nonlinear dynamical models that adequately represent genomic regulation, and diagnostic and therapeutic tools based on these models. This book facilitates these developments by providing rigorous mathema

  11. The Sequenced Angiosperm Genomes and Genome Databases.

    Science.gov (United States)

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  12. Big data computing

    CERN Document Server

    Akerkar, Rajendra

    2013-01-01

    Due to market forces and technological evolution, Big Data computing is developing at an increasing rate. A wide variety of novel approaches and tools have emerged to tackle the challenges of Big Data, creating both more opportunities and more challenges for students and professionals in the field of data computation and analysis. Presenting a mix of industry cases and theory, Big Data Computing discusses the technical and practical issues related to Big Data in intelligent information management. Emphasizing the adoption and diffusion of Big Data tools and technologies in industry, the book i

  13. Genome-wide comparison of paired fresh frozen and formalin-fixed paraffin-embedded gliomas by custom BAC and oligonucleotide array comparative genomic hybridization: facilitating analysis of archival gliomas.

    Science.gov (United States)

    Mohapatra, Gayatry; Engler, David A; Starbuck, Kristen D; Kim, James C; Bernay, Derek C; Scangas, George A; Rousseau, Audrey; Batchelor, Tracy T; Betensky, Rebecca A; Louis, David N

    2011-04-01

    Array comparative genomic hybridization (aCGH) is a powerful tool for detecting DNA copy number alterations (CNA). Because diffuse malignant gliomas are often sampled by small biopsies, formalin-fixed paraffin-embedded (FFPE) blocks are often the only tissue available for genetic analysis; FFPE tissues are also needed to study the intratumoral heterogeneity that characterizes these neoplasms. In this paper, we present a combination of evaluations and technical advances that provide strong support for the ready use of oligonucleotide aCGH on FFPE diffuse gliomas. We first compared aCGH using bacterial artificial chromosome (BAC) arrays in 45 paired frozen and FFPE gliomas, and demonstrate a high concordance rate between FFPE and frozen DNA in an individual clone-level analysis of sensitivity and specificity, assuring that under certain array conditions, frozen and FFPE DNA can perform nearly identically. However, because oligonucleotide arrays offer advantages to BAC arrays in genomic coverage and practical availability, we next developed a method of labeling DNA from FFPE tissue that allows efficient hybridization to oligonucleotide arrays. To demonstrate utility in FFPE tissues, we applied this approach to biphasic anaplastic oligoastrocytomas and demonstrate CNA differences between DNA obtained from the two components. Therefore, BAC and oligonucleotide aCGH can be sensitive and specific tools for detecting CNAs in FFPE DNA, and novel labeling techniques enable the routine use of oligonucleotide arrays for FFPE DNA. In combination, these advances should facilitate genome-wide analysis of rare, small and/or histologically heterogeneous gliomas from FFPE tissues.

  14. From big bang to big crunch and beyond

    International Nuclear Information System (INIS)

    Elitzur, Shmuel; Rabinovici, Eliezer; Giveon, Amit; Kutasov, David

    2002-01-01

    We study a quotient Conformal Field Theory, which describes a 3+1 dimensional cosmological spacetime. Part of this spacetime is the Nappi-Witten (NW) universe, which starts at a 'big bang' singularity, expands and then contracts to a 'big crunch' singularity at a finite time. The gauged WZW model contains a number of copies of the NW spacetime, with each copy connected to the preceding one and to the next one at the respective big bang/big crunch singularities. The sequence of NW spacetimes is further connected at the singularities to a series of non-compact static regions with closed timelike curves. These regions contain boundaries, on which the observables of the theory live. This suggests a holographic interpretation of the physics. (author)

  15. BIG data - BIG gains? Empirical evidence on the link between big data analytics and innovation

    OpenAIRE

    Niebel, Thomas; Rasel, Fabienne; Viete, Steffen

    2017-01-01

    This paper analyzes the relationship between firms’ use of big data analytics and their innovative performance in terms of product innovations. Since big data technologies provide new data information practices, they create novel decision-making possibilities, which are widely believed to support firms’ innovation process. Applying German firm-level data within a knowledge production function framework we find suggestive evidence that big data analytics is a relevant determinant for the likel...

  16. Neuroblastoma, a Paradigm for Big Data Science in Pediatric Oncology.

    Science.gov (United States)

    Salazar, Brittany M; Balczewski, Emily A; Ung, Choong Yong; Zhu, Shizhen

    2016-12-27

    Pediatric cancers rarely exhibit recurrent mutational events when compared to most adult cancers. This poses a challenge in understanding how cancers initiate, progress, and metastasize in early childhood. Also, due to limited detected driver mutations, it is difficult to benchmark key genes for drug development. In this review, we use neuroblastoma, a pediatric solid tumor of neural crest origin, as a paradigm for exploring "big data" applications in pediatric oncology. Computational strategies derived from big data science-network- and machine learning-based modeling and drug repositioning-hold the promise of shedding new light on the molecular mechanisms driving neuroblastoma pathogenesis and identifying potential therapeutics to combat this devastating disease. These strategies integrate robust data input, from genomic and transcriptomic studies, clinical data, and in vivo and in vitro experimental models specific to neuroblastoma and other types of cancers that closely mimic its biological characteristics. We discuss contexts in which "big data" and computational approaches, especially network-based modeling, may advance neuroblastoma research, describe currently available data and resources, and propose future models of strategic data collection and analyses for neuroblastoma and other related diseases.

  17. Neuroblastoma, a Paradigm for Big Data Science in Pediatric Oncology

    Directory of Open Access Journals (Sweden)

    Brittany M. Salazar

    2016-12-01

    Full Text Available Pediatric cancers rarely exhibit recurrent mutational events when compared to most adult cancers. This poses a challenge in understanding how cancers initiate, progress, and metastasize in early childhood. Also, due to limited detected driver mutations, it is difficult to benchmark key genes for drug development. In this review, we use neuroblastoma, a pediatric solid tumor of neural crest origin, as a paradigm for exploring “big data” applications in pediatric oncology. Computational strategies derived from big data science–network- and machine learning-based modeling and drug repositioning—hold the promise of shedding new light on the molecular mechanisms driving neuroblastoma pathogenesis and identifying potential therapeutics to combat this devastating disease. These strategies integrate robust data input, from genomic and transcriptomic studies, clinical data, and in vivo and in vitro experimental models specific to neuroblastoma and other types of cancers that closely mimic its biological characteristics. We discuss contexts in which “big data” and computational approaches, especially network-based modeling, may advance neuroblastoma research, describe currently available data and resources, and propose future models of strategic data collection and analyses for neuroblastoma and other related diseases.

  18. Genome Maps, a new generation genome browser.

    Science.gov (United States)

    Medina, Ignacio; Salavert, Francisco; Sanchez, Rubén; de Maria, Alejandro; Alonso, Roberto; Escobar, Pablo; Bleda, Marta; Dopazo, Joaquín

    2013-07-01

    Genome browsers have gained importance as more genomes and related genomic information become available. However, the increase of information brought about by new generation sequencing technologies is, at the same time, causing a subtle but continuous decrease in the efficiency of conventional genome browsers. Here, we present Genome Maps, a genome browser that implements an innovative model of data transfer and management. The program uses highly efficient technologies from the new HTML5 standard, such as scalable vector graphics, that optimize workloads at both server and client sides and ensure future scalability. Thus, data management and representation are entirely carried out by the browser, without the need of any Java Applet, Flash or other plug-in technology installation. Relevant biological data on genes, transcripts, exons, regulatory features, single-nucleotide polymorphisms, karyotype and so forth, are imported from web services and are available as tracks. In addition, several DAS servers are already included in Genome Maps. As a novelty, this web-based genome browser allows the local upload of huge genomic data files (e.g. VCF or BAM) that can be dynamically visualized in real time at the client side, thus facilitating the management of medical data affected by privacy restrictions. Finally, Genome Maps can easily be integrated in any web application by including only a few lines of code. Genome Maps is an open source collaborative initiative available in the GitHub repository (https://github.com/compbio-bigdata-viz/genome-maps). Genome Maps is available at: http://www.genomemaps.org.

  19. Approaches for in silico finishing of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Frederico Schmitt Kremer

    Full Text Available Abstract The introduction of next-generation sequencing (NGS had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases tools that are available to facilitate genome finishing.

  20. Approaches for in silico finishing of microbial genome sequences.

    Science.gov (United States)

    Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

    The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.

  1. Contributions to In Silico Genome Annotation

    KAUST Repository

    Kalkatawi, Manal M.

    2017-11-30

    Genome annotation is an important topic since it provides information for the foundation of downstream genomic and biological research. It is considered as a way of summarizing part of existing knowledge about the genomic characteristics of an organism. Annotating different regions of a genome sequence is known as structural annotation, while identifying functions of these regions is considered as a functional annotation. In silico approaches can facilitate both tasks that otherwise would be difficult and timeconsuming. This study contributes to genome annotation by introducing several novel bioinformatics methods, some based on machine learning (ML) approaches. First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived a novel feature-set able to characterize properties of the genomic region surrounding the PAS, enabling development of high accuracy optimized ML predictive models. DPS considerably outperformed the state-of-the-art results. The second contribution concerns developing generic models for structural annotation, i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA. We developed DeepGSR, a systematic framework that facilitates generating ML models to predict GSR with high accuracy. To the best of our knowledge, no available generic and automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms to derive highly abstract features that depend mainly on proper data representation and hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and translation initiation sites (TIS) in different organisms, yields a simpler and more precise representation of the problem under study, compared to some other hand-tailored models, while producing high accuracy prediction results. Finally

  2. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies.

    Directory of Open Access Journals (Sweden)

    Anjana Srivatsan

    2008-08-01

    Full Text Available Whole-genome sequencing is a powerful technique for obtaining the reference sequence information of multiple organisms. Its use can be dramatically expanded to rapidly identify genomic variations, which can be linked with phenotypes to obtain biological insights. We explored these potential applications using the emerging next-generation sequencing platform Solexa Genome Analyzer, and the well-characterized model bacterium Bacillus subtilis. Combining sequencing with experimental verification, we first improved the accuracy of the published sequence of the B. subtilis reference strain 168, then obtained sequences of multiple related laboratory strains and different isolates of each strain. This provides a framework for comparing the divergence between different laboratory strains and between their individual isolates. We also demonstrated the power of Solexa sequencing by using its results to predict a defect in the citrate signal transduction pathway of a common laboratory strain, which we verified experimentally. Finally, we examined the molecular nature of spontaneously generated mutations that suppress the growth defect caused by deletion of the stringent response mediator relA. Using whole-genome sequencing, we rapidly mapped these suppressor mutations to two small homologs of relA. Interestingly, stable suppressor strains had mutations in both genes, with each mutation alone partially relieving the relA growth defect. This supports an intriguing three-locus interaction module that is not easily identifiable through traditional suppressor mapping. We conclude that whole-genome sequencing can drastically accelerate the identification of suppressor mutations and complex genetic interactions, and it can be applied as a standard tool to investigate the genetic traits of model organisms.

  3. The perennial ryegrass GenomeZipper: targeted use of genome resources for comparative grass genomics.

    Science.gov (United States)

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F X; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-02-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.

  4. GenColors-based comparative genome databases for small eukaryotic genomes.

    Science.gov (United States)

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  5. Using Globus GridFTP to Transfer and Share Big Data | Poster

    Science.gov (United States)

    By Ashley DeVine, Staff Writer, and Mark Wance, Guest Writer; photo by Richard Frederickson, Staff Photographer Transferring big data, such as the genomics data delivered to customers from the Center for Cancer Research Sequencing Facility (CCR SF), has been difficult in the past because the transfer systems have not kept pace with the size of the data. However, the situation is changing as a result of the Globus GridFTP project.

  6. Principales parámetros para el estudio de la colaboración científica en Big Science

    OpenAIRE

    Ortoll, Eva; Canals, Agustí; Garcia, Montserrat; Cobarsí, Josep

    2014-01-01

    In several scientific disciplines research has shifted from experiments of a reduced scale to large and complex collaborations. Many recent scientific achievements like the human genome sequencing or the discovery of the Higgs boson have taken place within the “big science” paradigm. The study of scientific collaboration needs to take into account all the diverse factors that have an influence on it. In the case of big science experiments, some of those aspects are particularly important: num...

  7. [Big data, medical language and biomedical terminology systems].

    Science.gov (United States)

    Schulz, Stefan; López-García, Pablo

    2015-08-01

    A variety of rich terminology systems, such as thesauri, classifications, nomenclatures and ontologies support information and knowledge processing in health care and biomedical research. Nevertheless, human language, manifested as individually written texts, persists as the primary carrier of information, in the description of disease courses or treatment episodes in electronic medical records, and in the description of biomedical research in scientific publications. In the context of the discussion about big data in biomedicine, we hypothesize that the abstraction of the individuality of natural language utterances into structured and semantically normalized information facilitates the use of statistical data analytics to distil new knowledge out of textual data from biomedical research and clinical routine. Computerized human language technologies are constantly evolving and are increasingly ready to annotate narratives with codes from biomedical terminology. However, this depends heavily on linguistic and terminological resources. The creation and maintenance of such resources is labor-intensive. Nevertheless, it is sensible to assume that big data methods can be used to support this process. Examples include the learning of hierarchical relationships, the grouping of synonymous terms into concepts and the disambiguation of homonyms. Although clear evidence is still lacking, the combination of natural language technologies, semantic resources, and big data analytics is promising.

  8. Benchmarking Big Data Systems and the BigData Top100 List.

    Science.gov (United States)

    Baru, Chaitanya; Bhandarkar, Milind; Nambiar, Raghunath; Poess, Meikel; Rabl, Tilmann

    2013-03-01

    "Big data" has become a major force of innovation across enterprises of all sizes. New platforms with increasingly more features for managing big datasets are being announced almost on a weekly basis. Yet, there is currently a lack of any means of comparability among such platforms. While the performance of traditional database systems is well understood and measured by long-established institutions such as the Transaction Processing Performance Council (TCP), there is neither a clear definition of the performance of big data systems nor a generally agreed upon metric for comparing these systems. In this article, we describe a community-based effort for defining a big data benchmark. Over the past year, a Big Data Benchmarking Community has become established in order to fill this void. The effort focuses on defining an end-to-end application-layer benchmark for measuring the performance of big data applications, with the ability to easily adapt the benchmark specification to evolving challenges in the big data space. This article describes the efforts that have been undertaken thus far toward the definition of a BigData Top100 List. While highlighting the major technical as well as organizational challenges, through this article, we also solicit community input into this process.

  9. Chromosome-wise dissection of the genome of the extremely big mouse line DU6i.

    Science.gov (United States)

    Bevova, Marianna R; Aulchenko, Yurii S; Aksu, Soner; Renne, Ulla; Brockmann, Gudrun A

    2006-01-01

    The extreme high-body-weight-selected mouse line DU6i is a polygenic model for growth research, harboring many small-effect QTL. We dissected the genome of this line into 19 autosomes and the Y chromosome by the construction of a new panel of chromosome substitution strains (CSS). The DU6i chromosomes were transferred to a DBA/2 mice genetic background by marker-assisted recurrent backcrossing. Mitochondria and the X chromosome were of DBA/2 origin in the backcross. During the construction of these novel strains, >4000 animals were generated, phenotyped, and genotyped. Using these data, we studied the genetic control of variation in body weight and weight gain at 21, 42, and 63 days. The unique data set facilitated the analysis of chromosomal interaction with sex and parent-of-origin effects. All analyzed chromosomes affected body weight and weight gain either directly or in interaction with sex or parent of origin. The effects were age specific, with some chromosomes showing opposite effects at different stages of development.

  10. Big data, big knowledge: big data for personalized healthcare.

    Science.gov (United States)

    Viceconti, Marco; Hunter, Peter; Hose, Rod

    2015-07-01

    The idea that the purely phenomenological knowledge that we can extract by analyzing large amounts of data can be useful in healthcare seems to contradict the desire of VPH researchers to build detailed mechanistic models for individual patients. But in practice no model is ever entirely phenomenological or entirely mechanistic. We propose in this position paper that big data analytics can be successfully combined with VPH technologies to produce robust and effective in silico medicine solutions. In order to do this, big data technologies must be further developed to cope with some specific requirements that emerge from this application. Such requirements are: working with sensitive data; analytics of complex and heterogeneous data spaces, including nontextual information; distributed data management under security and performance constraints; specialized analytics to integrate bioinformatics and systems biology information with clinical observations at tissue, organ and organisms scales; and specialized analytics to define the "physiological envelope" during the daily life of each patient. These domain-specific requirements suggest a need for targeted funding, in which big data technologies for in silico medicine becomes the research priority.

  11. Fungal Genomics Program

    Energy Technology Data Exchange (ETDEWEB)

    Grigoriev, Igor

    2012-03-12

    The JGI Fungal Genomics Program aims to scale up sequencing and analysis of fungal genomes to explore the diversity of fungi important for energy and the environment, and to promote functional studies on a system level. Combining new sequencing technologies and comparative genomics tools, JGI is now leading the world in fungal genome sequencing and analysis. Over 120 sequenced fungal genomes with analytical tools are available via MycoCosm (www.jgi.doe.gov/fungi), a web-portal for fungal biologists. Our model of interacting with user communities, unique among other sequencing centers, helps organize these communities, improves genome annotation and analysis work, and facilitates new larger-scale genomic projects. This resulted in 20 high-profile papers published in 2011 alone and contributing to the Genomics Encyclopedia of Fungi, which targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts). Our next grand challenges include larger scale exploration of fungal diversity (1000 fungal genomes), developing molecular tools for DOE-relevant model organisms, and analysis of complex systems and metagenomes.

  12. BigDataBench: a Big Data Benchmark Suite from Internet Services

    OpenAIRE

    Wang, Lei; Zhan, Jianfeng; Luo, Chunjie; Zhu, Yuqing; Yang, Qiang; He, Yongqiang; Gao, Wanling; Jia, Zhen; Shi, Yingjie; Zhang, Shujie; Zheng, Chen; Lu, Gang; Zhan, Kent; Li, Xiaona; Qiu, Bizhu

    2014-01-01

    As architecture, systems, and data management communities pay greater attention to innovative big data systems and architectures, the pressure of benchmarking and evaluating these systems rises. Considering the broad use of big data systems, big data benchmarks must include diversity of data and workloads. Most of the state-of-the-art big data benchmarking efforts target evaluating specific types of applications or system software stacks, and hence they are not qualified for serving the purpo...

  13. Intelligent Decisional Assistant that Facilitate the Choice of a Proper Computer System Applied in Busines

    Directory of Open Access Journals (Sweden)

    Nicolae MARGINEAN

    2009-01-01

    Full Text Available The choice of a proper computer system is not an easy task for a decider. One reason could be the present market development of computer systems applied in business. The big number of the Romanian market players determines a big number of computerized products, with a multitude of various properties. Our proposal tries to optimize and facilitate this decisional process within an e-shop where are sold IT packets applied in business, building an online decisional assistant, a special component conceived to facilitate the decision making needed for the selection of the pertinent IT package that fits the requirements of one certain business, described by the decider. The user interacts with the system as an online buyer that visit an e-shop where are sold IT package applied in economy.

  14. Conociendo Big Data

    Directory of Open Access Journals (Sweden)

    Juan José Camargo-Vega

    2014-12-01

    Full Text Available Teniendo en cuenta la importancia que ha adquirido el término Big Data, la presente investigación buscó estudiar y analizar de manera exhaustiva el estado del arte del Big Data; además, y como segundo objetivo, analizó las características, las herramientas, las tecnologías, los modelos y los estándares relacionados con Big Data, y por último buscó identificar las características más relevantes en la gestión de Big Data, para que con ello se pueda conocer todo lo concerniente al tema central de la investigación.La metodología utilizada incluyó revisar el estado del arte de Big Data y enseñar su situación actual; conocer las tecnologías de Big Data; presentar algunas de las bases de datos NoSQL, que son las que permiten procesar datos con formatos no estructurados, y mostrar los modelos de datos y las tecnologías de análisis de ellos, para terminar con algunos beneficios de Big Data.El diseño metodológico usado para la investigación fue no experimental, pues no se manipulan variables, y de tipo exploratorio, debido a que con esta investigación se empieza a conocer el ambiente del Big Data.

  15. BigDansing

    KAUST Repository

    Khayyat, Zuhair

    2015-06-02

    Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling to big datasets. This presents a serious impediment since data cleansing often involves costly computations such as enumerating pairs of tuples, handling inequality joins, and dealing with user-defined functions. In this paper, we present BigDansing, a Big Data Cleansing system to tackle efficiency, scalability, and ease-of-use issues in data cleansing. The system can run on top of most common general purpose data processing platforms, ranging from DBMSs to MapReduce-like frameworks. A user-friendly programming interface allows users to express data quality rules both declaratively and procedurally, with no requirement of being aware of the underlying distributed platform. BigDansing takes these rules into a series of transformations that enable distributed computations and several optimizations, such as shared scans and specialized joins operators. Experimental results on both synthetic and real datasets show that BigDansing outperforms existing baseline systems up to more than two orders of magnitude without sacrificing the quality provided by the repair algorithms.

  16. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies

    Directory of Open Access Journals (Sweden)

    Alexandre G. de Brevern

    2015-01-01

    Full Text Available Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries.

  17. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies

    Science.gov (United States)

    de Brevern, Alexandre G.; Meyniel, Jean-Philippe; Fairhead, Cécile; Neuvéglise, Cécile; Malpertuy, Alain

    2015-01-01

    Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries. PMID:26125026

  18. Integrative Analysis of Omics Big Data.

    Science.gov (United States)

    Yu, Xiang-Tian; Zeng, Tao

    2018-01-01

    The diversity and huge omics data take biology and biomedicine research and application into a big data era, just like that popular in human society a decade ago. They are opening a new challenge from horizontal data ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data ensemble (e.g., the different types of data collected for a group of person with match information), which requires the integrative analysis in biology and biomedicine and also asks for emergent development of data integration to address the great changes from previous population-guided to newly individual-guided investigations.Data integration is an effective concept to solve the complex problem or understand the complicate system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproducible way. Current integration approaches on biological data have two modes: one is "bottom-up integration" mode with follow-up manual integration, and the other one is "top-down integration" mode with follow-up in silico integration.This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on biological experiment design for effectively integrative study on genomics and then survey the data fusion approaches to give helpful instruction on computational model development for biological significance detection, which have also provided newly data resources and analysis tools to support the precision medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted for integrative analysis of omics big data.

  19. Intelligent Decisional Assistant that Facilitate the Choice of a Proper Computer System Applied in Busines

    OpenAIRE

    Nicolae MARGINEAN

    2009-01-01

    The choice of a proper computer system is not an easy task for a decider. One reason could be the present market development of computer systems applied in business. The big number of the Romanian market players determines a big number of computerized products, with a multitude of various properties. Our proposal tries to optimize and facilitate this decisional process within an e-shop where are sold IT packets applied in business, building an online decisional assistant, a special component ...

  20. Family genome browser: visualizing genomes with pedigree information.

    Science.gov (United States)

    Juan, Liran; Liu, Yongzhuang; Wang, Yongtian; Teng, Mingxiang; Zang, Tianyi; Wang, Yadong

    2015-07-15

    Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. The FGB is available at http://mlg.hit.edu.cn/FGB/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Isolation, characterization and prevalence of a novel Gammaherpesvirus in Eptesicus fuscus, the North American big brown bat.

    Science.gov (United States)

    Subudhi, Sonu; Rapin, Noreen; Dorville, Nicole; Hill, Janet E; Town, Jennifer; Willis, Craig K R; Bollinger, Trent K; Misra, Vikram

    2018-03-01

    Little is known about the relationship of Gammaherpesviruses with their bat hosts. Gammaherpesviruses are of interest because of their long-term infection of lymphoid cells and their potential to cause cancer. Here, we report the characterization of a novel bat herpesvirus isolated from a big brown bat (Eptesicus fuscus) in Canada. The genome of the virus, tentatively named Eptesicus fuscus herpesvirus (EfHV), is 166,748 base pairs. Phylogenetically EfHV is a member of Gammaherpesvirinae, in which it belongs to the Genus Rhadinovirus and is closely related to other bat Gammaherpesviruses. In contrast to other known Gammaherpesviruses, the EfHV genome contains coding sequences similar to those of class I and II host major histocompatibility antigens. The virus is capable of infecting and replicating in human, monkey, cat and pig cell lines. Although we detected EfHV in 20 of 28 big brown bats tested, these bats lacked neutralizing antibodies against the virus. Copyright © 2018 Elsevier Inc. All rights reserved.

  2. Act-Frequency Signatures of the Big Five.

    Science.gov (United States)

    Chapman, Benjamin P; Goldberg, Lewis R

    2017-10-01

    The traditional focus of work on personality and behavior has tended toward "major outcomes" such as health or antisocial behavior, or small sets of behaviors observable over short periods in laboratories or in convenience samples. In a community sample, we examined a wide set (400) of mundane, incidental or "every day" behavioral acts, the frequencies of which were reported over the past year. Using an exploratory methodology similar to genomic approaches (relying on the False Discovery Rate) revealed 26 prototypical acts for Intellect, 24 acts for Extraversion, 13 for Emotional Stability, nine for Conscientiousness, and six for Agreeableness. Many links were consistent with general intuition-for instance, low Conscientiousness with work and procrastination. Some of the most robust associations, however, were for acts too specific for a priori hypothesis. For instance, Extraversion was strongly associated with telling dirty jokes, Intellect with "loung[ing] around [the] house without clothes on", and Agreeableness with singing in the shower. Frequency categories for these acts changed with markedly non-linearity across Big Five Z-scores. Findings may help ground trait scores in emblematic acts, and enrich understanding of mundane or common behavioral signatures of the Big Five.

  3. Using Big Data to Discover Diagnostics and Therapeutics for Gastrointestinal and Liver Diseases

    Science.gov (United States)

    Wooden, Benjamin; Goossens, Nicolas; Hoshida, Yujin; Friedman, Scott L.

    2016-01-01

    Technologies such as genome sequencing, gene expression profiling, proteomic and metabolomic analyses, electronic medical records, and patient-reported health information have produced large amounts of data, from various populations, cell types, and disorders (big data). However, these data must be integrated and analyzed if they are to produce models or concepts about physiologic function or mechanisms of pathogenesis. Many of these data are available to the public, allowing researchers anywhere to search for markers of specific biologic processes or therapeutic targets for specific diseases or patient types. We review recent advances in the fields of computational and systems biology, and highlight opportunities for researchers to use big data sets in the fields of gastroenterology and hepatology, to complement traditional means of diagnostic and therapeutic discovery. PMID:27773806

  4. Facilitating functional annotation of chicken microarray data

    Directory of Open Access Journals (Sweden)

    Gresham Cathy R

    2009-10-01

    Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and

  5. How Big Data, Comparative Effectiveness Research, and Rapid-Learning Health-Care Systems Can Transform Patient Care in Radiation Oncology.

    Science.gov (United States)

    Sanders, Jason C; Showalter, Timothy N

    2018-01-01

    Big data and comparative effectiveness research methodologies can be applied within the framework of a rapid-learning health-care system (RLHCS) to accelerate discovery and to help turn the dream of fully personalized medicine into a reality. We synthesize recent advances in genomics with trends in big data to provide a forward-looking perspective on the potential of new advances to usher in an era of personalized radiation therapy, with emphases on the power of RLHCS to accelerate discovery and the future of individualized radiation treatment planning.

  6. Big science

    CERN Multimedia

    Nadis, S

    2003-01-01

    " "Big science" is moving into astronomy, bringing large experimental teams, multi-year research projects, and big budgets. If this is the wave of the future, why are some astronomers bucking the trend?" (2 pages).

  7. Entomological Collections in the Age of Big Data.

    Science.gov (United States)

    Short, Andrew Edward Z; Dikow, Torsten; Moreau, Corrie S

    2018-01-07

    With a million described species and more than half a billion preserved specimens, the large scale of insect collections is unequaled by those of any other group. Advances in genomics, collection digitization, and imaging have begun to more fully harness the power that such large data stores can provide. These new approaches and technologies have transformed how entomological collections are managed and utilized. While genomic research has fundamentally changed the way many specimens are collected and curated, advances in technology have shown promise for extracting sequence data from the vast holdings already in museums. Efforts to mainstream specimen digitization have taken root and have accelerated traditional taxonomic studies as well as distribution modeling and global change research. Emerging imaging technologies such as microcomputed tomography and confocal laser scanning microscopy are changing how morphology can be investigated. This review provides an overview of how the realization of big data has transformed our field and what may lie in store.

  8. Big bang and big crunch in matrix string theory

    OpenAIRE

    Bedford, J; Papageorgakis, C; Rodríguez-Gómez, D; Ward, J

    2007-01-01

    Following the holographic description of linear dilaton null Cosmologies with a Big Bang in terms of Matrix String Theory put forward by Craps, Sethi and Verlinde, we propose an extended background describing a Universe including both Big Bang and Big Crunch singularities. This belongs to a class of exact string backgrounds and is perturbative in the string coupling far away from the singularities, both of which can be resolved using Matrix String Theory. We provide a simple theory capable of...

  9. Large inserts for big data: artificial chromosomes in the genomic era.

    Science.gov (United States)

    Tocchetti, Arianna; Donadio, Stefano; Sosio, Margherita

    2018-05-01

    The exponential increase in available microbial genome sequences coupled with predictive bioinformatic tools is underscoring the genetic capacity of bacteria to produce an unexpected large number of specialized bioactive compounds. Since most of the biosynthetic gene clusters (BGCs) present in microbial genomes are cryptic, i.e. not expressed under laboratory conditions, a variety of cloning systems and vectors have been devised to harbor DNA fragments large enough to carry entire BGCs and to allow their transfer in suitable heterologous hosts. This minireview provides an overview of the vectors and approaches that have been developed for cloning large BGCs, and successful examples of heterologous expression.

  10. Bliver big data til big business?

    DEFF Research Database (Denmark)

    Ritter, Thomas

    2015-01-01

    Danmark har en digital infrastruktur, en registreringskultur og it-kompetente medarbejdere og kunder, som muliggør en førerposition, men kun hvis virksomhederne gør sig klar til næste big data-bølge.......Danmark har en digital infrastruktur, en registreringskultur og it-kompetente medarbejdere og kunder, som muliggør en førerposition, men kun hvis virksomhederne gør sig klar til næste big data-bølge....

  11. Big data uncertainties.

    Science.gov (United States)

    Maugis, Pierre-André G

    2018-07-01

    Big data-the idea that an always-larger volume of information is being constantly recorded-suggests that new problems can now be subjected to scientific scrutiny. However, can classical statistical methods be used directly on big data? We analyze the problem by looking at two known pitfalls of big datasets. First, that they are biased, in the sense that they do not offer a complete view of the populations under consideration. Second, that they present a weak but pervasive level of dependence between all their components. In both cases we observe that the uncertainty of the conclusion obtained by statistical methods is increased when used on big data, either because of a systematic error (bias), or because of a larger degree of randomness (increased variance). We argue that the key challenge raised by big data is not only how to use big data to tackle new problems, but to develop tools and methods able to rigorously articulate the new risks therein. Copyright © 2016. Published by Elsevier Ltd.

  12. An Overview on “Cloud Control” Photogrammetry in Big Data Era

    OpenAIRE

    ZHANG Zuxun; TAO Pengjie

    2017-01-01

    In the present era of big data, photogrammetric image collection modes are characterized with the progressive course of diversity, efficiency and facilitation, which are producing large sets of photogrammetric image data. They further bring the request for advanced processing with higher level of efficiency, automation and intelligence. However, the efficiency of fundamental photogrammetric processing, known as geometric positioning, is still majorly restricted to control points acquired thro...

  13. FACS-Assisted CRISPR-Cas9 Genome Editing Facilitates Parkinson's Disease Modeling

    Directory of Open Access Journals (Sweden)

    Jonathan Arias-Fuenzalida

    2017-11-01

    Full Text Available Genome editing and human induced pluripotent stem cells hold great promise for the development of isogenic disease models and the correction of disease-associated mutations for isogenic tissue therapy. CRISPR-Cas9 has emerged as a versatile and simple tool for engineering human cells for such purposes. However, the current protocols to derive genome-edited lines require the screening of a great number of clones to obtain one free of random integration or on-locus non-homologous end joining (NHEJ-containing alleles. Here, we describe an efficient method to derive biallelic genome-edited populations by the use of fluorescent markers. We call this technique FACS-assisted CRISPR-Cas9 editing (FACE. FACE allows the derivation of correctly edited polyclones carrying a positive selection fluorescent module and the exclusion of non-edited, random integrations and on-target allele NHEJ-containing cells. We derived a set of isogenic lines containing Parkinson's-disease-associated mutations in α-synuclein and present their comparative phenotypes.

  14. The life cycle of a genome project: perspectives and guidelines inspired by insect genome projects.

    Science.gov (United States)

    Papanicolaou, Alexie

    2016-01-01

    Many research programs on non-model species biology have been empowered by genomics. In turn, genomics is underpinned by a reference sequence and ancillary information created by so-called "genome projects". The most reliable genome projects are the ones created as part of an active research program and designed to address specific questions but their life extends past publication. In this opinion paper I outline four key insights that have facilitated maintaining genomic communities: the key role of computational capability, the iterative process of building genomic resources, the value of community participation and the importance of manual curation. Taken together, these ideas can and do ensure the longevity of genome projects and the growing non-model species community can use them to focus a discussion with regards to its future genomic infrastructure.

  15. Targeted viral-mediated plant genome editing using crispr/cas9

    KAUST Repository

    Mahfouz, Magdy M.; Ali, Zahir

    2015-01-01

    The present disclosure provides a viral-mediated genome-editing platform that facilitates multiplexing, obviates stable transformation, and is applicable across plant species. The RNA2 genome of the tobacco rattle virus (TRV) was engineered to carry and systemically deliver a guide RNA molecules into plants overexpressing Cas9 endonuclease. High genomic modification frequencies were observed in inoculated as well as systemic leaves including the plant growing points. This system facilitates multiplexing and can lead to germinal transmission of the genomic modifications in the progeny, thereby obviating the requirements of repeated transformations and tissue culture. The editing platform of the disclosure is useful in plant genome engineering and applicable across plant species amenable to viral infections for agricultural biotechnology applications.

  16. Targeted viral-mediated plant genome editing using crispr/cas9

    KAUST Repository

    Mahfouz, Magdy M.

    2015-12-17

    The present disclosure provides a viral-mediated genome-editing platform that facilitates multiplexing, obviates stable transformation, and is applicable across plant species. The RNA2 genome of the tobacco rattle virus (TRV) was engineered to carry and systemically deliver a guide RNA molecules into plants overexpressing Cas9 endonuclease. High genomic modification frequencies were observed in inoculated as well as systemic leaves including the plant growing points. This system facilitates multiplexing and can lead to germinal transmission of the genomic modifications in the progeny, thereby obviating the requirements of repeated transformations and tissue culture. The editing platform of the disclosure is useful in plant genome engineering and applicable across plant species amenable to viral infections for agricultural biotechnology applications.

  17. Chromatin dynamics in genome stability

    DEFF Research Database (Denmark)

    Nair, Nidhi; Shoaib, Muhammad; Sørensen, Claus Storgaard

    2017-01-01

    Genomic DNA is compacted into chromatin through packaging with histone and non-histone proteins. Importantly, DNA accessibility is dynamically regulated to ensure genome stability. This is exemplified in the response to DNA damage where chromatin relaxation near genomic lesions serves to promote...... access of relevant enzymes to specific DNA regions for signaling and repair. Furthermore, recent data highlight genome maintenance roles of chromatin through the regulation of endogenous DNA-templated processes including transcription and replication. Here, we review research that shows the importance...... of chromatin structure regulation in maintaining genome integrity by multiple mechanisms including facilitating DNA repair and directly suppressing endogenous DNA damage....

  18. Big data - smart health strategies. Findings from the yearbook 2014 special theme.

    Science.gov (United States)

    Koutkias, V; Thiessard, F

    2014-08-15

    To select best papers published in 2013 in the field of big data and smart health strategies, and summarize outstanding research efforts. A systematic search was performed using two major bibliographic databases for relevant journal papers. The references obtained were reviewed in a two-stage process, starting with a blinded review performed by the two section editors, and followed by a peer review process operated by external reviewers recognized as experts in the field. The complete review process selected four best papers, illustrating various aspects of the special theme, among them: (a) using large volumes of unstructured data and, specifically, clinical notes from Electronic Health Records (EHRs) for pharmacovigilance; (b) knowledge discovery via querying large volumes of complex (both structured and unstructured) biological data using big data technologies and relevant tools; (c) methodologies for applying cloud computing and big data technologies in the field of genomics, and (d) system architectures enabling high-performance access to and processing of large datasets extracted from EHRs. The potential of big data in biomedicine has been pinpointed in various viewpoint papers and editorials. The review of current scientific literature illustrated a variety of interesting methods and applications in the field, but still the promises exceed the current outcomes. As we are getting closer towards a solid foundation with respect to common understanding of relevant concepts and technical aspects, and the use of standardized technologies and tools, we can anticipate to reach the potential that big data offer for personalized medicine and smart health strategies in the near future.

  19. HARNESSING BIG DATA VOLUMES

    Directory of Open Access Journals (Sweden)

    Bogdan DINU

    2014-04-01

    Full Text Available Big Data can revolutionize humanity. Hidden within the huge amounts and variety of the data we are creating we may find information, facts, social insights and benchmarks that were once virtually impossible to find or were simply inexistent. Large volumes of data allow organizations to tap in real time the full potential of all the internal or external information they possess. Big data calls for quick decisions and innovative ways to assist customers and the society as a whole. Big data platforms and product portfolio will help customers harness to the full the value of big data volumes. This paper deals with technical and technological issues related to handling big data volumes in the Big Data environment.

  20. Big bang and big crunch in matrix string theory

    International Nuclear Information System (INIS)

    Bedford, J.; Ward, J.; Papageorgakis, C.; Rodriguez-Gomez, D.

    2007-01-01

    Following the holographic description of linear dilaton null cosmologies with a big bang in terms of matrix string theory put forward by Craps, Sethi, and Verlinde, we propose an extended background describing a universe including both big bang and big crunch singularities. This belongs to a class of exact string backgrounds and is perturbative in the string coupling far away from the singularities, both of which can be resolved using matrix string theory. We provide a simple theory capable of describing the complete evolution of this closed universe

  1. Big data a primer

    CERN Document Server

    Bhuyan, Prachet; Chenthati, Deepak

    2015-01-01

    This book is a collection of chapters written by experts on various aspects of big data. The book aims to explain what big data is and how it is stored and used. The book starts from  the fundamentals and builds up from there. It is intended to serve as a review of the state-of-the-practice in the field of big data handling. The traditional framework of relational databases can no longer provide appropriate solutions for handling big data and making it available and useful to users scattered around the globe. The study of big data covers a wide range of issues including management of heterogeneous data, big data frameworks, change management, finding patterns in data usage and evolution, data as a service, service-generated data, service management, privacy and security. All of these aspects are touched upon in this book. It also discusses big data applications in different domains. The book will prove useful to students, researchers, and practicing database and networking engineers.

  2. Transforming business models through big data in the textile industry

    DEFF Research Database (Denmark)

    Aagaard, Annabeth

    , such as textile, and have led to disruption of established business models (Westerman et al., 2014; Weill and Woerner, 2015). Yet, little is known of the managerial process and facilitation of the digital transformation of business models through big data (McAfee and Brynjolfsson, 2012; Markus and Loebbecke, 2013).......The extensive stream of work on business models (BM) and business model innovation (BMI) has generated many important insights (Amit & Zott, 2001; Osterwalder, 2004; Markides, 2008, 2013; Chesbrough 2010; Teece, 2010; Zott et al, 2011). Yet, our understanding of business models remains fragmented...... as stressed by Zott et al. (2011), Weill et al. (2011) and David J. Teece (2010: 174), who states that: “the concept of a business model lacks theoretical grounding in economics or in business studies”. With the acceleration of digitization and use of big data analytics quality data are accessible...

  3. Finding cancer driver mutations in the era of big data research.

    Science.gov (United States)

    Poulos, Rebecca C; Wong, Jason W H

    2018-04-02

    In the last decade, the costs of genome sequencing have decreased considerably. The commencement of large-scale cancer sequencing projects has enabled cancer genomics to join the big data revolution. One of the challenges still facing cancer genomics research is determining which are the driver mutations in an individual cancer, as these contribute only a small subset of the overall mutation profile of a tumour. Focusing primarily on somatic single nucleotide mutations in this review, we consider both coding and non-coding driver mutations, and discuss how such mutations might be identified from cancer sequencing datasets. We describe some of the tools and database that are available for the annotation of somatic variants and the identification of cancer driver genes. We also address the use of genome-wide variation in mutation load to establish background mutation rates from which to identify driver mutations under positive selection. Finally, we describe the ways in which mutational signatures can act as clues for the identification of cancer drivers, as these mutations may cause, or arise from, certain mutational processes. By defining the molecular changes responsible for driving cancer development, new cancer treatment strategies may be developed or novel preventative measures proposed.

  4. Big data from electronic health records for early and late translational cardiovascular research: challenges and potential.

    Science.gov (United States)

    Hemingway, Harry; Asselbergs, Folkert W; Danesh, John; Dobson, Richard; Maniadakis, Nikolaos; Maggioni, Aldo; van Thiel, Ghislaine J M; Cronin, Maureen; Brobert, Gunnar; Vardas, Panos; Anker, Stefan D; Grobbee, Diederick E; Denaxas, Spiros

    2018-04-21

    Cohorts of millions of people's health records, whole genome sequencing, imaging, sensor, societal and publicly available data present a rapidly expanding digital trace of health. We aimed to critically review, for the first time, the challenges and potential of big data across early and late stages of translational cardiovascular disease research. We sought exemplars based on literature reviews and expertise across the BigData@Heart Consortium. We identified formidable challenges including: data quality, knowing what data exist, the legal and ethical framework for their use, data sharing, building and maintaining public trust, developing standards for defining disease, developing tools for scalable, replicable science and equipping the clinical and scientific work force with new inter-disciplinary skills. Opportunities claimed for big health record data include: richer profiles of health and disease from birth to death and from the molecular to the societal scale; accelerated understanding of disease causation and progression, discovery of new mechanisms and treatment-relevant disease sub-phenotypes, understanding health and diseases in whole populations and whole health systems and returning actionable feedback loops to improve (and potentially disrupt) existing models of research and care, with greater efficiency. In early translational research we identified exemplars including: discovery of fundamental biological processes e.g. linking exome sequences to lifelong electronic health records (EHR) (e.g. human knockout experiments); drug development: genomic approaches to drug target validation; precision medicine: e.g. DNA integrated into hospital EHR for pre-emptive pharmacogenomics. In late translational research we identified exemplars including: learning health systems with outcome trials integrated into clinical care; citizen driven health with 24/7 multi-parameter patient monitoring to improve outcomes and population-based linkages of multiple EHR sources

  5. Microsoft big data solutions

    CERN Document Server

    Jorgensen, Adam; Welch, John; Clark, Dan; Price, Christopher; Mitchell, Brian

    2014-01-01

    Tap the power of Big Data with Microsoft technologies Big Data is here, and Microsoft's new Big Data platform is a valuable tool to help your company get the very most out of it. This timely book shows you how to use HDInsight along with HortonWorks Data Platform for Windows to store, manage, analyze, and share Big Data throughout the enterprise. Focusing primarily on Microsoft and HortonWorks technologies but also covering open source tools, Microsoft Big Data Solutions explains best practices, covers on-premises and cloud-based solutions, and features valuable case studies. Best of all,

  6. An artificial intelligence approach fit for tRNA gene studies in the era of big sequence data.

    Science.gov (United States)

    Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi

    2017-09-12

    Unsupervised data mining capable of extracting a wide range of knowledge from big data without prior knowledge or particular models is a timely application in the era of big sequence data accumulation in genome research. By handling oligonucleotide compositions as high-dimensional data, we have previously modified the conventional self-organizing map (SOM) for genome informatics and established BLSOM, which can analyze more than ten million sequences simultaneously. Here, we develop BLSOM specialized for tRNA genes (tDNAs) that can cluster (self-organize) more than one million microbial tDNAs according to their cognate amino acid solely depending on tetra- and pentanucleotide compositions. This unsupervised clustering can reveal combinatorial oligonucleotide motifs that are responsible for the amino acid-dependent clustering, as well as other functionally and structurally important consensus motifs, which have been evolutionarily conserved. BLSOM is also useful for identifying tDNAs as phylogenetic markers for special phylotypes. When we constructed BLSOM with 'species-unknown' tDNAs from metagenomic sequences plus 'species-known' microbial tDNAs, a large portion of metagenomic tDNAs self-organized with species-known tDNAs, yielding information on microbial communities in environmental samples. BLSOM can also enhance accuracy in the tDNA database obtained from big sequence data. This unsupervised data mining should become important for studying numerous functionally unclear RNAs obtained from a wide range of organisms.

  7. Summary big data

    CERN Document Server

    2014-01-01

    This work offers a summary of Cukier the book: "Big Data: A Revolution That Will Transform How we Live, Work, and Think" by Viktor Mayer-Schonberg and Kenneth. Summary of the ideas in Viktor Mayer-Schonberg's and Kenneth Cukier's book: " Big Data " explains that big data is where we use huge quantities of data to make better predictions based on the fact we identify patters in the data rather than trying to understand the underlying causes in more detail. This summary highlights that big data will be a source of new economic value and innovation in the future. Moreover, it shows that it will

  8. The Perennial Ryegrass GenomeZipper: Targeted Use of Genome Resources for Comparative Grass Genomics1[C][W

    Science.gov (United States)

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F.X.; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-01-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species. PMID:23184232

  9. Mouse Genome Informatics (MGI)

    Data.gov (United States)

    U.S. Department of Health & Human Services — MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human...

  10. Between Two Fern Genomes

    Science.gov (United States)

    2014-01-01

    Ferns are the only major lineage of vascular plants not represented by a sequenced nuclear genome. This lack of genome sequence information significantly impedes our ability to understand and reconstruct genome evolution not only in ferns, but across all land plants. Azolla and Ceratopteris are ideal and complementary candidates to be the first ferns to have their nuclear genomes sequenced. They differ dramatically in genome size, life history, and habit, and thus represent the immense diversity of extant ferns. Together, this pair of genomes will facilitate myriad large-scale comparative analyses across ferns and all land plants. Here we review the unique biological characteristics of ferns and describe a number of outstanding questions in plant biology that will benefit from the addition of ferns to the set of taxa with sequenced nuclear genomes. We explain why the fern clade is pivotal for understanding genome evolution across land plants, and we provide a rationale for how knowledge of fern genomes will enable progress in research beyond the ferns themselves. PMID:25324969

  11. The Scope of Big Data in One Medicine: Unprecedented Opportunities and Challenges

    Directory of Open Access Journals (Sweden)

    Molly E. McCue

    2017-11-01

    Full Text Available Advances in high-throughput molecular biology and electronic health records (EHR, coupled with increasing computer capabilities have resulted in an increased interest in the use of big data in health care. Big data require collection and analysis of data at an unprecedented scale and represents a paradigm shift in health care, offering (1 the capacity to generate new knowledge more quickly than traditional scientific approaches; (2 unbiased collection and analysis of data; and (3 a holistic understanding of biology and pathophysiology. Big data promises more personalized and precision medicine for patients with improved accuracy and earlier diagnosis, and therapy tailored to an individual’s unique combination of genes, environmental risk, and precise disease phenotype. This promise comes from data collected from numerous sources, ranging from molecules to cells, to tissues, to individuals and populations—and the integration of these data into networks that improve understanding of heath and disease. Big data-driven science should play a role in propelling comparative medicine and “one medicine” (i.e., the shared physiology, pathophysiology, and disease risk factors across species forward. Merging of data from EHR across institutions will give access to patient data on a scale previously unimaginable, allowing for precise phenotype definition and objective evaluation of risk factors and response to therapy. High-throughput molecular data will give insight into previously unexplored molecular pathophysiology and disease etiology. Investigation and integration of big data from a variety of sources will result in stronger parallels drawn at the molecular level between human and animal disease, allow for predictive modeling of infectious disease and identification of key areas of intervention, and facilitate step-changes in our understanding of disease that can make a substantial impact on animal and human health. However, the use of big data

  12. The Scope of Big Data in One Medicine: Unprecedented Opportunities and Challenges.

    Science.gov (United States)

    McCue, Molly E; McCoy, Annette M

    2017-01-01

    Advances in high-throughput molecular biology and electronic health records (EHR), coupled with increasing computer capabilities have resulted in an increased interest in the use of big data in health care. Big data require collection and analysis of data at an unprecedented scale and represents a paradigm shift in health care, offering (1) the capacity to generate new knowledge more quickly than traditional scientific approaches; (2) unbiased collection and analysis of data; and (3) a holistic understanding of biology and pathophysiology. Big data promises more personalized and precision medicine for patients with improved accuracy and earlier diagnosis, and therapy tailored to an individual's unique combination of genes, environmental risk, and precise disease phenotype. This promise comes from data collected from numerous sources, ranging from molecules to cells, to tissues, to individuals and populations-and the integration of these data into networks that improve understanding of heath and disease. Big data-driven science should play a role in propelling comparative medicine and "one medicine" (i.e., the shared physiology, pathophysiology, and disease risk factors across species) forward. Merging of data from EHR across institutions will give access to patient data on a scale previously unimaginable, allowing for precise phenotype definition and objective evaluation of risk factors and response to therapy. High-throughput molecular data will give insight into previously unexplored molecular pathophysiology and disease etiology. Investigation and integration of big data from a variety of sources will result in stronger parallels drawn at the molecular level between human and animal disease, allow for predictive modeling of infectious disease and identification of key areas of intervention, and facilitate step-changes in our understanding of disease that can make a substantial impact on animal and human health. However, the use of big data comes with significant

  13. Perspectives of Integrative Cancer Genomics in Next Generation Sequencing Era

    Directory of Open Access Journals (Sweden)

    So Mee Kwon

    2012-06-01

    Full Text Available The explosive development of genomics technologies including microarrays and next generation sequencing (NGS has provided comprehensive maps of cancer genomes, including the expression of mRNAs and microRNAs, DNA copy numbers, sequence variations, and epigenetic changes. These genome-wide profiles of the genetic aberrations could reveal the candidates for diagnostic and/or prognostic biomarkers as well as mechanistic insights into tumor development and progression. Recent efforts to establish the huge cancer genome compendium and integrative omics analyses, so-called "integromics", have extended our understanding on the cancer genome, showing its daunting complexity and heterogeneity. However, the challenges of the structured integration, sharing, and interpretation of the big omics data still remain to be resolved. Here, we review several issues raised in cancer omics data analysis, including NGS, focusing particularly on the study design and analysis strategies. This might be helpful to understand the current trends and strategies of the rapidly evolving cancer genomics research.

  14. Synergy between Medical Informatics and Bioinformatics: Facilitating Genomic Medicine for Future Health Care

    Czech Academy of Sciences Publication Activity Database

    Martin-Sanchez, F.; Iakovidis, I.; Norager, S.; Maojo, V.; de Groen, P.; Van der Lei, J.; Jones, T.; Abraham-Fuchs, K.; Apweiler, R.; Babic, A.; Baud, R.; Breton, V.; Cinquin, P.; Doupi, P.; Dugas, M.; Eils, R.; Engelbrecht, R.; Ghazal, P.; Jehenson, P.; Kulikowski, C.; Lampe, K.; De Moor, G.; Orphanoudakis, S.; Rossing, N.; Sarachan, B.; Sousa, A.; Spekowius, G.; Thireos, G.; Zahlmann, G.; Zvárová, Jana; Hermosilla, I.; Vicente, F. J.

    2004-01-01

    Roč. 37, - (2004), s. 30-42 ISSN 1532-0464 Institutional research plan: CEZ:AV0Z1030915 Keywords : bioinformatics * medical informatics * genomics * genomic medicine * biomedical informatics Subject RIV: BD - Theory of Information Impact factor: 1.013, year: 2004

  15. Big Data en surveillance, deel 1 : Definities en discussies omtrent Big Data

    NARCIS (Netherlands)

    Timan, Tjerk

    2016-01-01

    Naar aanleiding van een (vrij kort) college over surveillance en Big Data, werd me gevraagd iets dieper in te gaan op het thema, definities en verschillende vraagstukken die te maken hebben met big data. In dit eerste deel zal ik proberen e.e.a. uiteen te zetten betreft Big Data theorie en

  16. Comparative Genomic Analysis of Rapid Evolution of an Extreme-Drug-Resistant Acinetobacter baumannii Clone

    Science.gov (United States)

    Tan, Sean Yang-Yi; Chua, Song Lin; Liu, Yang; Høiby, Niels; Andersen, Leif Percival; Givskov, Michael; Song, Zhijun; Yang, Liang

    2013-01-01

    The emergence of extreme-drug-resistant (EDR) bacterial strains in hospital and nonhospital clinical settings is a big and growing public health threat. Understanding the antibiotic resistance mechanisms at the genomic levels can facilitate the development of next-generation agents. Here, comparative genomics has been employed to analyze the rapid evolution of an EDR Acinetobacter baumannii clone from the intensive care unit (ICU) of Rigshospitalet at Copenhagen. Two resistant A. baumannii strains, 48055 and 53264, were sequentially isolated from two individuals who had been admitted to ICU within a 1-month interval. Multilocus sequence typing indicates that these two isolates belonged to ST208. The A. baumannii 53264 strain gained colistin resistance compared with the 48055 strain and became an EDR strain. Genome sequencing indicates that A. baumannii 53264 and 48055 have almost identical genomes—61 single-nucleotide polymorphisms (SNPs) were found between them. The A. baumannii 53264 strain was assembled into 130 contigs, with a total length of 3,976,592 bp with 38.93% GC content. The A. baumannii 48055 strain was assembled into 135 contigs, with a total length of 4,049,562 bp with 39.00% GC content. Genome comparisons showed that this A. baumannii clone is classified as an International clone II strain and has 94% synteny with the A. baumannii ACICU strain. The ResFinder server identified a total of 14 antibiotic resistance genes in the A. baumannii clone. Proteomic analyses revealed that a putative porin protein was down-regulated when A. baumannii 53264 was exposed to antimicrobials, which may reduce the entry of antibiotics into the bacterial cell. PMID:23538992

  17. Sequence based polymorphic (SBP marker technology for targeted genomic regions: its application in generating a molecular map of the Arabidopsis thaliana genome

    Directory of Open Access Journals (Sweden)

    Sahu Binod B

    2012-01-01

    Full Text Available Abstract Background Molecular markers facilitate both genotype identification, essential for modern animal and plant breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have made possible the identification of single nucleotide polymorphisms (SNPs for any genomic regions. Here a sequence based polymorphic (SBP marker technology for generating molecular markers for targeted genomic regions in Arabidopsis is described. Results A ~3X genome coverage sequence of the Arabidopsis thaliana ecotype, Niederzenz (Nd-0 was obtained by applying Illumina's sequencing by synthesis (Solexa technology. Comparison of the Nd-0 genome sequence with the assembled Columbia-0 (Col-0 genome sequence identified putative single nucleotide polymorphisms (SNPs throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated. Conclusions The SBP marker technology described here allowed the development of molecular markers for targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This technology will particularly facilitate the development of high density molecular marker maps, essential for

  18. Big Biomedical data as the key resource for discovery science

    Energy Technology Data Exchange (ETDEWEB)

    Toga, Arthur W.; Foster, Ian; Kesselman, Carl; Madduri, Ravi; Chard, Kyle; Deutsch, Eric W.; Price, Nathan D.; Glusman, Gustavo; Heavner, Benjamin D.; Dinov, Ivo D.; Ames, Joseph; Van Horn, John; Kramer, Roger; Hood, Leroy

    2015-07-21

    Modern biomedical data collection is generating exponentially more data in a multitude of formats. This flood of complex data poses significant opportunities to discover and understand the critical interplay among such diverse domains as genomics, proteomics, metabolomics, and phenomics, including imaging, biometrics, and clinical data. The Big Data for Discovery Science Center is taking an “-ome to home” approach to discover linkages between these disparate data sources by mining existing databases of proteomic and genomic data, brain images, and clinical assessments. In support of this work, the authors developed new technological capabilities that make it easy for researchers to manage, aggregate, manipulate, integrate, and model large amounts of distributed data. Guided by biological domain expertise, the Center’s computational resources and software will reveal relationships and patterns, aiding researchers in identifying biomarkers for the most confounding conditions and diseases, such as Parkinson’s and Alzheimer’s.

  19. KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

    Science.gov (United States)

    Wang, Dapeng; Xu, Jiayue; Yu, Jun

    2015-09-16

    The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.

  20. Big Data in der Cloud

    DEFF Research Database (Denmark)

    Leimbach, Timo; Bachlechner, Daniel

    2014-01-01

    Technology assessment of big data, in particular cloud based big data services, for the Office for Technology Assessment at the German federal parliament (Bundestag)......Technology assessment of big data, in particular cloud based big data services, for the Office for Technology Assessment at the German federal parliament (Bundestag)...

  1. An analysis of cross-sectional differences in big and non-big public accounting firms' audit programs

    NARCIS (Netherlands)

    Blokdijk, J.H. (Hans); Drieenhuizen, F.; Stein, M.T.; Simunic, D.A.

    2006-01-01

    A significant body of prior research has shown that audits by the Big 5 (now Big 4) public accounting firms are quality differentiated relative to non-Big 5 audits. This result can be derived analytically by assuming that Big 5 and non-Big 5 firms face different loss functions for "audit failures"

  2. Big Data is invading big places as CERN

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Big Data technologies are becoming more popular with the constant grow of data generation in different fields such as social networks, internet of things and laboratories like CERN. How is CERN making use of such technologies? How machine learning is applied at CERN with Big Data technologies? How much data we move and how it is analyzed? All these questions will be answered during the talk.

  3. The big bang

    International Nuclear Information System (INIS)

    Chown, Marcus.

    1987-01-01

    The paper concerns the 'Big Bang' theory of the creation of the Universe 15 thousand million years ago, and traces events which physicists predict occurred soon after the creation. Unified theory of the moment of creation, evidence of an expanding Universe, the X-boson -the particle produced very soon after the big bang and which vanished from the Universe one-hundredth of a second after the big bang, and the fate of the Universe, are all discussed. (U.K.)

  4. Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Yoo, Wucherl [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Koo, Michelle [Univ. of California, Berkeley, CA (United States); Cao, Yu [California Inst. of Technology (CalTech), Pasadena, CA (United States); Sim, Alex [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Nugent, Peter [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States); Wu, Kesheng [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2016-09-17

    Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe- art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.

  5. The proactive historian: Methodological opportunities presented by the new archives documenting genomics.

    Science.gov (United States)

    García-Sancho, Miguel

    2016-02-01

    In this paper, I propose a strategy for navigating newly available archives in the study of late-twentieth century genomics. I demonstrate that the alleged 'explosion of data' characteristic of genomics-and of contemporary science in general-is not a new problem and that historians of earlier periods have dealt with information overload by relying on the 'perspective of time': the filtering effect the passage of time naturally exerts on both sources and memories. I argue that this reliance on the selective capacity of time results in inheriting archives curated by others and, consequently, poses the risk of reifying ahistorical scientific discourses. Through a preliminary examination of archives documenting early attempts at mapping and sequencing the human genome, I propose an alternative approach, in which historians proactively problematize and improve available sources. This approach provides historians with a voice in the socio-political management of scientific heritage and advances methodological innovations in the use of oral histories. It also provides a narrative framework in which to address big science initiatives by following second order administrators, rather than individual scientists. The new genomic archives thus represent an opportunity for historians to take an active role in current debates concerning 'big data' and critically embed the humanities in pressing global problems. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Small Big Data Congress 2017

    NARCIS (Netherlands)

    Doorn, J.

    2017-01-01

    TNO, in collaboration with the Big Data Value Center, presents the fourth Small Big Data Congress! Our congress aims at providing an overview of practical and innovative applications based on big data. Do you want to know what is happening in applied research with big data? And what can already be

  7. Big data opportunities and challenges

    CERN Document Server

    2014-01-01

    This ebook aims to give practical guidance for all those who want to understand big data better and learn how to make the most of it. Topics range from big data analysis, mobile big data and managing unstructured data to technologies, governance and intellectual property and security issues surrounding big data.

  8. Big Data and Neuroimaging.

    Science.gov (United States)

    Webb-Vargas, Yenny; Chen, Shaojie; Fisher, Aaron; Mejia, Amanda; Xu, Yuting; Crainiceanu, Ciprian; Caffo, Brian; Lindquist, Martin A

    2017-12-01

    Big Data are of increasing importance in a variety of areas, especially in the biosciences. There is an emerging critical need for Big Data tools and methods, because of the potential impact of advancements in these areas. Importantly, statisticians and statistical thinking have a major role to play in creating meaningful progress in this arena. We would like to emphasize this point in this special issue, as it highlights both the dramatic need for statistical input for Big Data analysis and for a greater number of statisticians working on Big Data problems. We use the field of statistical neuroimaging to demonstrate these points. As such, this paper covers several applications and novel methodological developments of Big Data tools applied to neuroimaging data.

  9. Big Data; A Management Revolution : The emerging role of big data in businesses

    OpenAIRE

    Blasiak, Kevin

    2014-01-01

    Big data is a term that was coined in 2012 and has since then emerged to one of the top trends in business and technology. Big data is an agglomeration of different technologies resulting in data processing capabilities that have been unreached before. Big data is generally characterized by 4 factors. Volume, velocity and variety. These three factors distinct it from the traditional data use. The possibilities to utilize this technology are vast. Big data technology has touch points in differ...

  10. Baculoviral delivery of CRISPR/Cas9 facilitates efficient genome editing in human cells

    NARCIS (Netherlands)

    Hindriksen, Sanne; Bramer, Arne J; Truong, My Anh; Vromans, Martijn J M; Post, Jasmin B; Verlaan-Klink, Ingrid; Snippert, Hugo J; Lens, Susanne M A; Hadders, Michael A

    2017-01-01

    The CRISPR/Cas9 system is a highly effective tool for genome editing. Key to robust genome editing is the efficient delivery of the CRISPR/Cas9 machinery. Viral delivery systems are efficient vehicles for the transduction of foreign genes but commonly used viral vectors suffer from a limited

  11. Personal Spaces in Public Repositories as a Facilitator for Open Educational Resource Usage

    Science.gov (United States)

    Cohen, Anat; Reisman, Sorel; Sperling, Barbra Bied

    2015-01-01

    Learning object repositories are a shared, open and public space; however, the possibility and ability of personal expression in an open, global, public space is crucial. The aim of this study is to explore personal spaces in a big learning object repository as a facilitator for adoption of Open Educational Resources (OER) into teaching practices…

  12. A Call to Investigate the Relationship Between Education and Health Outcomes Using Big Data.

    Science.gov (United States)

    Chahine, Saad; Kulasegaram, Kulamakan Mahan; Wright, Sarah; Monteiro, Sandra; Grierson, Lawrence E M; Barber, Cassandra; Sebok-Syer, Stefanie S; McConnell, Meghan; Yen, Wendy; De Champlain, Andre; Touchie, Claire

    2018-06-01

    There exists an assumption that improving medical education will improve patient care. While seemingly logical, this premise has rarely been investigated. In this Invited Commentary, the authors propose the use of big data to test this assumption. The authors present a few example research studies linking education and patient care outcomes and argue that using big data may more easily facilitate the process needed to investigate this assumption. The authors also propose that collaboration is needed to link educational and health care data. They then introduce a grassroots initiative, inclusive of universities in one Canadian province and national licensing organizations that are working together to collect, organize, link, and analyze big data to study the relationship between pedagogical approaches to medical training and patient care outcomes. While the authors acknowledge the possible challenges and issues associated with harnessing big data, they believe that the benefits supersede these. There is a need for medical education research to go beyond the outcomes of training to study practice and clinical outcomes as well. Without a coordinated effort to harness big data, policy makers, regulators, medical educators, and researchers are left with sometimes costly guesses and assumptions about what works and what does not. As the social, time, and financial investments in medical education continue to increase, it is imperative to understand the relationship between education and health outcomes.

  13. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  14. Development and Validation of Big Four Personality Scales for the Schedule for Nonadaptive and Adaptive Personality-2nd Edition (SNAP-2)

    Science.gov (United States)

    Calabrese, William R.; Rudick, Monica M.; Simms, Leonard J.; Clark, Lee Anna

    2012-01-01

    Recently, integrative, hierarchical models of personality and personality disorder (PD)—such as the Big Three, Big Four and Big Five trait models—have gained support as a unifying dimensional framework for describing PD. However, no measures to date can simultaneously represent each of these potentially interesting levels of the personality hierarchy. To unify these measurement models psychometrically, we sought to develop Big Five trait scales within the Schedule for Adaptive and Nonadaptive Personality–2nd Edition (SNAP-2). Through structural and content analyses, we examined relations between the SNAP-2, Big Five Inventory (BFI), and NEO-Five Factor Inventory (NEO-FFI) ratings in a large data set (N = 8,690), including clinical, military, college, and community participants. Results yielded scales consistent with the Big Four model of personality (i.e., Neuroticism, Conscientiousness, Introversion, and Antagonism) and not the Big Five as there were insufficient items related to Openness. Resulting scale scores demonstrated strong internal consistency and temporal stability. Structural and external validity was supported by strong convergent and discriminant validity patterns between Big Four scale scores and other personality trait scores and expectable patterns of self-peer agreement. Descriptive statistics and community-based norms are provided. The SNAP-2 Big Four Scales enable researchers and clinicians to assess personality at multiple levels of the trait hierarchy and facilitate comparisons among competing “Big Trait” models. PMID:22250598

  15. Gene network inherent in genomic big data improves the accuracy of prognostic prediction for cancer patients.

    Science.gov (United States)

    Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock

    2017-09-29

    Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients.

  16. Cryptography for Big Data Security

    Science.gov (United States)

    2015-07-13

    Cryptography for Big Data Security Book Chapter for Big Data: Storage, Sharing, and Security (3S) Distribution A: Public Release Ariel Hamlin1 Nabil...Email: arkady@ll.mit.edu ii Contents 1 Cryptography for Big Data Security 1 1.1 Introduction...48 Chapter 1 Cryptography for Big Data Security 1.1 Introduction With the amount

  17. Acceptability of Big Books as Mother Tongue-based Reading Materials in Bulusan Dialect

    Directory of Open Access Journals (Sweden)

    Magdalena M. Ocbian

    2015-11-01

    Full Text Available Several studies have proven the superiority of using mother tongue in improving the pupils‟ performance. Research results revealed that using a language familiar to the pupils facilitates reading, writing and learning new concepts. However, at present, teachers are confronted with the insufficiency of instructional materials written in the local dialect and accepted by the end-users as possessing the qualities that could produce the desired learning outcomes. This descriptive evaluative research was conducted to address this problem. It determined the level of acceptability of the six researcher-made big books as mother tongue-based reading materials in Bulusan dialect for Grade 1 pupils. The big books were utilized by 11 Grade 1 teachers of Bulusan District to their pupils and were evaluated along suitability and appropriateness of the materials, visual appeal and quality of the story using checklist and open-ended questionnaire. Same materials were assessed by eight expert jurors. Findings showed that the big books possessed the desired qualities that made them very much acceptable to the Grade 1 teachers and much acceptable to the expert jurors. The comments and suggestions of the respondents served as inputs in the enhancement and revision of the six big books.

  18. Data: Big and Small.

    Science.gov (United States)

    Jones-Schenk, Jan

    2017-02-01

    Big data is a big topic in all leadership circles. Leaders in professional development must develop an understanding of what data are available across the organization that can inform effective planning for forecasting. Collaborating with others to integrate data sets can increase the power of prediction. Big data alone is insufficient to make big decisions. Leaders must find ways to access small data and triangulate multiple types of data to ensure the best decision making. J Contin Educ Nurs. 2017;48(2):60-61. Copyright 2017, SLACK Incorporated.

  19. Challenges and opportunities for genomic developmental neuropsychology: Examples from the Penn-Drexel collaborative battery

    Science.gov (United States)

    Gur, Ruben C.; Irani, Farzin; Seligman, Sarah; Calkins, Monica E.; Richard, Jan; Gur, Raquel E.

    2014-01-01

    Genomics has been revolutionizing medicine over the past decade by offering mechanistic insights into disease processes and harboring the age of “individualized medicine.” Because of the sheer number of measures generated by gene sequencing methods, genomics requires “Big Science” where large datasets on genes are analyzed in reference to electronic medical record data. This revolution has largely bypassed the behavioral neurosciences, mainly because of the paucity of behavioral data in medical records and the labor intensity of available neuropsychological assessment methods. We describe the development and implementation of an efficient neuroscience-based computerized battery, coupled with a computerized clinical assessment procedure. This assessment package has been applied to a genomic study of 10,000 children aged 8-21, of whom 1000 also undergo neuroimaging. Results from the first 3000 participants indicate sensitivity to neurodevelopmental trajectories. Sex differences were evident, with females outperforming males in memory and social cognition domains, while for spatial processing males were more accurate and faster, and they were faster on simple motor tasks. The study illustrates what will hopefully become a major component of the work of clinical and research neuropsychologists as invaluable participants in the dawning age of Big Science neuropsychological genomics. PMID:21902564

  20. Big Data Revisited

    DEFF Research Database (Denmark)

    Kallinikos, Jannis; Constantiou, Ioanna

    2015-01-01

    We elaborate on key issues of our paper New games, new rules: big data and the changing context of strategy as a means of addressing some of the concerns raised by the paper’s commentators. We initially deal with the issue of social data and the role it plays in the current data revolution...... and the technological recording of facts. We further discuss the significance of the very mechanisms by which big data is produced as distinct from the very attributes of big data, often discussed in the literature. In the final section of the paper, we qualify the alleged importance of algorithms and claim...... that the structures of data capture and the architectures in which data generation is embedded are fundamental to the phenomenon of big data....

  1. [Contribution and challenges of Big Data in oncology].

    Science.gov (United States)

    Saintigny, Pierre; Foy, Jean-Philippe; Ferrari, Anthony; Cassier, Philippe; Viari, Alain; Puisieux, Alain

    2017-03-01

    Since the first draft of the human genome sequence published in 2001, the cost of sequencing has dramatically decreased. The development of new technologies such as next generation sequencing led to a comprehensive characterization of a large number of tumors of various types as well as to significant advances in precision medicine. Despite the valuable information this technological revolution has allowed to produce, the vast amount of data generated resulted in the emergence of new challenges for the biomedical community, such as data storage, processing and mining. Here, we describe the contribution and challenges of Big Data in oncology. Copyright © 2016 Société Française du Cancer. Published by Elsevier Masson SAS. All rights reserved.

  2. Big Data Analytics An Overview

    Directory of Open Access Journals (Sweden)

    Jayshree Dwivedi

    2015-08-01

    Full Text Available Big data is a data beyond the storage capacity and beyond the processing power is called big data. Big data term is used for data sets its so large or complex that traditional data it involves data sets with sizes. Big data size is a constantly moving target year by year ranging from a few dozen terabytes to many petabytes of data means like social networking sites the amount of data produced by people is growing rapidly every year. Big data is not only a data rather it become a complete subject which includes various tools techniques and framework. It defines the epidemic possibility and evolvement of data both structured and unstructured. Big data is a set of techniques and technologies that require new forms of assimilate to uncover large hidden values from large datasets that are diverse complex and of a massive scale. It is difficult to work with using most relational database management systems and desktop statistics and visualization packages exacting preferably massively parallel software running on tens hundreds or even thousands of servers. Big data environment is used to grab organize and resolve the various types of data. In this paper we describe applications problems and tools of big data and gives overview of big data.

  3. Urbanising Big

    DEFF Research Database (Denmark)

    Ljungwall, Christer

    2013-01-01

    Development in China raises the question of how big a city can become, and at the same time be sustainable, writes Christer Ljungwall of the Swedish Agency for Growth Policy Analysis.......Development in China raises the question of how big a city can become, and at the same time be sustainable, writes Christer Ljungwall of the Swedish Agency for Growth Policy Analysis....

  4. Big bang nucleosynthesis

    International Nuclear Information System (INIS)

    Boyd, Richard N.

    2001-01-01

    The precision of measurements in modern cosmology has made huge strides in recent years, with measurements of the cosmic microwave background and the determination of the Hubble constant now rivaling the level of precision of the predictions of big bang nucleosynthesis. However, these results are not necessarily consistent with the predictions of the Standard Model of big bang nucleosynthesis. Reconciling these discrepancies may require extensions of the basic tenets of the model, and possibly of the reaction rates that determine the big bang abundances

  5. Development and validation of Big Four personality scales for the Schedule for Nonadaptive and Adaptive Personality--Second Edition (SNAP-2).

    Science.gov (United States)

    Calabrese, William R; Rudick, Monica M; Simms, Leonard J; Clark, Lee Anna

    2012-09-01

    Recently, integrative, hierarchical models of personality and personality disorder (PD)--such as the Big Three, Big Four, and Big Five trait models--have gained support as a unifying dimensional framework for describing PD. However, no measures to date can simultaneously represent each of these potentially interesting levels of the personality hierarchy. To unify these measurement models psychometrically, we sought to develop Big Five trait scales within the Schedule for Nonadaptive and Adaptive Personality--Second Edition (SNAP-2). Through structural and content analyses, we examined relations between the SNAP-2, the Big Five Inventory (BFI), and the NEO Five-Factor Inventory (NEO-FFI) ratings in a large data set (N = 8,690), including clinical, military, college, and community participants. Results yielded scales consistent with the Big Four model of personality (i.e., Neuroticism, Conscientiousness, Introversion, and Antagonism) and not the Big Five, as there were insufficient items related to Openness. Resulting scale scores demonstrated strong internal consistency and temporal stability. Structural validity and external validity were supported by strong convergent and discriminant validity patterns between Big Four scale scores and other personality trait scores and expectable patterns of self-peer agreement. Descriptive statistics and community-based norms are provided. The SNAP-2 Big Four Scales enable researchers and clinicians to assess personality at multiple levels of the trait hierarchy and facilitate comparisons among competing big-trait models. PsycINFO Database Record (c) 2012 APA, all rights reserved.

  6. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    Energy Technology Data Exchange (ETDEWEB)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; Harmon-Smith, Miranda; Doud, Devin; Reddy, T. B. K.; Schulz, Frederik; Jarett, Jessica; Rivers, Adam R.; Eloe-Fadrosh, Emiley A.; Tringe, Susannah G.; Ivanova, Natalia N.; Copeland, Alex; Clum, Alicia; Becraft, Eric D.; Malmstrom, Rex R.; Birren, Bruce; Podar, Mircea; Bork, Peer; Weinstock, George M.; Garrity, George M.; Dodsworth, Jeremy A.; Yooseph, Shibu; Sutton, Granger; Glöckner, Frank O.; Gilbert, Jack A.; Nelson, William C.; Hallam, Steven J.; Jungbluth, Sean P.; Ettema, Thijs J. G.; Tighe, Scott; Konstantinidis, Konstantinos T.; Liu, Wen-Tso; Baker, Brett J.; Rattei, Thomas; Eisen, Jonathan A.; Hedlund, Brian; McMahon, Katherine D.; Fierer, Noah; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Tyson, Gene W.; Rinke, Christian; Kyrpides, Nikos C.; Schriml, Lynn; Garrity, George M.; Hugenholtz, Philip; Sutton, Granger; Yilmaz, Pelin; Meyer, Folker; Glöckner, Frank O.; Gilbert, Jack A.; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Lapidus, Alla; Meyer, Folker; Yilmaz, Pelin; Parks, Donovan H.; Eren, A. M.; Schriml, Lynn; Banfield, Jillian F.; Hugenholtz, Philip; Woyke, Tanja

    2017-08-08

    We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.

  7. The ethics of big data in big agriculture

    OpenAIRE

    Carbonell (Isabelle M.)

    2016-01-01

    This paper examines the ethics of big data in agriculture, focusing on the power asymmetry between farmers and large agribusinesses like Monsanto. Following the recent purchase of Climate Corp., Monsanto is currently the most prominent biotech agribusiness to buy into big data. With wireless sensors on tractors monitoring or dictating every decision a farmer makes, Monsanto can now aggregate large quantities of previously proprietary farming data, enabling a privileged position with unique in...

  8. Improving public transport decision making, planning and operations by using big data : Cases from Sweden and the Netherlands

    NARCIS (Netherlands)

    Van Oort, N.; Cats, O.

    2015-01-01

    New big data (sources) in the public transport industry enable to deal with major challenges such as elevating efficiency, increasing passenger ridership and satisfaction and facilitate the information flow between service providers and service users. This paper presents two actual cases from the

  9. The big data-big model (BDBM) challenges in ecological research

    Science.gov (United States)

    Luo, Y.

    2015-12-01

    The field of ecology has become a big-data science in the past decades due to development of new sensors used in numerous studies in the ecological community. Many sensor networks have been established to collect data. For example, satellites, such as Terra and OCO-2 among others, have collected data relevant on global carbon cycle. Thousands of field manipulative experiments have been conducted to examine feedback of terrestrial carbon cycle to global changes. Networks of observations, such as FLUXNET, have measured land processes. In particular, the implementation of the National Ecological Observatory Network (NEON), which is designed to network different kinds of sensors at many locations over the nation, will generate large volumes of ecological data every day. The raw data from sensors from those networks offer an unprecedented opportunity for accelerating advances in our knowledge of ecological processes, educating teachers and students, supporting decision-making, testing ecological theory, and forecasting changes in ecosystem services. Currently, ecologists do not have the infrastructure in place to synthesize massive yet heterogeneous data into resources for decision support. It is urgent to develop an ecological forecasting system that can make the best use of multiple sources of data to assess long-term biosphere change and anticipate future states of ecosystem services at regional and continental scales. Forecasting relies on big models that describe major processes that underlie complex system dynamics. Ecological system models, despite great simplification of the real systems, are still complex in order to address real-world problems. For example, Community Land Model (CLM) incorporates thousands of processes related to energy balance, hydrology, and biogeochemistry. Integration of massive data from multiple big data sources with complex models has to tackle Big Data-Big Model (BDBM) challenges. Those challenges include interoperability of multiple

  10. A Big Video Manifesto

    DEFF Research Database (Denmark)

    Mcilvenny, Paul Bruce; Davidsen, Jacob

    2017-01-01

    and beautiful visualisations. However, we also need to ask what the tools of big data can do both for the Humanities and for more interpretative approaches and methods. Thus, we prefer to explore how the power of computation, new sensor technologies and massive storage can also help with video-based qualitative......For the last few years, we have witnessed a hype about the potential results and insights that quantitative big data can bring to the social sciences. The wonder of big data has moved into education, traffic planning, and disease control with a promise of making things better with big numbers...

  11. Identifying Dwarfs Workloads in Big Data Analytics

    OpenAIRE

    Gao, Wanling; Luo, Chunjie; Zhan, Jianfeng; Ye, Hainan; He, Xiwen; Wang, Lei; Zhu, Yuqing; Tian, Xinhui

    2015-01-01

    Big data benchmarking is particularly important and provides applicable yardsticks for evaluating booming big data systems. However, wide coverage and great complexity of big data computing impose big challenges on big data benchmarking. How can we construct a benchmark suite using a minimum set of units of computation to represent diversity of big data analytics workloads? Big data dwarfs are abstractions of extracting frequently appearing operations in big data computing. One dwarf represen...

  12. Applications of Big Data in Education

    OpenAIRE

    Faisal Kalota

    2015-01-01

    Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners' needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in educa...

  13. Advancing biomarker research: utilizing 'Big Data' approaches for the characterization and prevention of bipolar disorder.

    Science.gov (United States)

    McIntyre, Roger S; Cha, Danielle S; Jerrell, Jeanette M; Swardfager, Walter; Kim, Rachael D; Costa, Leonardo G; Baskaran, Anusha; Soczynska, Joanna K; Woldeyohannes, Hanna O; Mansur, Rodrigo B; Brietzke, Elisa; Powell, Alissa M; Gallaugher, Ashley; Kudlow, Paul; Kaidanovich-Beilin, Oksana; Alsuwaidan, Mohammad

    2014-08-01

    To provide a strategic framework for the prevention of bipolar disorder (BD) that incorporates a 'Big Data' approach to risk assessment for BD. Computerized databases (e.g., Pubmed, PsychInfo, and MedlinePlus) were used to access English-language articles published between 1966 and 2012 with the search terms bipolar disorder, prodrome, 'Big Data', and biomarkers cross-referenced with genomics/genetics, transcriptomics, proteomics, metabolomics, inflammation, oxidative stress, neurotrophic factors, cytokines, cognition, neurocognition, and neuroimaging. Papers were selected from the initial search if the primary outcome(s) of interest was (were) categorized in any of the following domains: (i) 'omics' (e.g., genomics), (ii) molecular, (iii) neuroimaging, and (iv) neurocognitive. The current strategic approach to identifying individuals at risk for BD, with an emphasis on phenotypic information and family history, has insufficient predictive validity and is clinically inadequate. The heterogeneous clinical presentation of BD, as well as its pathoetiological complexity, suggests that it is unlikely that a single biomarker (or an exclusive biomarker approach) will sufficiently augment currently inadequate phenotypic-centric prediction models. We propose a 'Big Data'- bioinformatics approach that integrates vast and complex phenotypic, anamnestic, behavioral, family, and personal 'omics' profiling. Bioinformatic processing approaches, utilizing cloud- and grid-enabled computing, are now capable of analyzing data on the order of tera-, peta-, and exabytes, providing hitherto unheard of opportunities to fundamentally revolutionize how psychiatric disorders are predicted, prevented, and treated. High-throughput networks dedicated to research on, and the treatment of, BD, integrating both adult and younger populations, will be essential to sufficiently enroll adequate samples of individuals across the neurodevelopmental trajectory in studies to enable the characterization

  14. Big Data Semantics

    NARCIS (Netherlands)

    Ceravolo, Paolo; Azzini, Antonia; Angelini, Marco; Catarci, Tiziana; Cudré-Mauroux, Philippe; Damiani, Ernesto; Mazak, Alexandra; van Keulen, Maurice; Jarrar, Mustafa; Santucci, Giuseppe; Sattler, Kai-Uwe; Scannapieco, Monica; Wimmer, Manuel; Wrembel, Robert; Zaraket, Fadi

    2018-01-01

    Big Data technology has discarded traditional data modeling approaches as no longer applicable to distributed data processing. It is, however, largely recognized that Big Data impose novel challenges in data and infrastructure management. Indeed, multiple components and procedures must be

  15. Application of Genomic Tools in Plant Breeding

    OpenAIRE

    Pérez-de-Castro, A.M.; Vilanova, S.; Cañizares, J.; Pascual, L.; Blanca, J.M.; Díez, M.J.; Prohens, J.; Picó, B.

    2012-01-01

    Plant breeding has been very successful in developing improved varieties using conventional tools and methodologies. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic...

  16. Comparative validity of brief to medium-length Big Five and Big Six personality questionnaires

    NARCIS (Netherlands)

    Thalmayer, A.G.; Saucier, G.; Eigenhuis, A.

    2011-01-01

    A general consensus on the Big Five model of personality attributes has been highly generative for the field of personality psychology. Many important psychological and life outcome correlates with Big Five trait dimensions have been established. But researchers must choose between multiple Big Five

  17. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

    Science.gov (United States)

    Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron

    2011-05-04

    Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first

  18. From darwin to the census of marine life: marine biology as big science.

    Science.gov (United States)

    Vermeulen, Niki

    2013-01-01

    With the development of the Human Genome Project, a heated debate emerged on biology becoming 'big science'. However, biology already has a long tradition of collaboration, as natural historians were part of the first collective scientific efforts: exploring the variety of life on earth. Such mappings of life still continue today, and if field biology is gradually becoming an important subject of studies into big science, research into life in the world's oceans is not taken into account yet. This paper therefore explores marine biology as big science, presenting the historical development of marine research towards the international 'Census of Marine Life' (CoML) making an inventory of life in the world's oceans. Discussing various aspects of collaboration--including size, internationalisation, research practice, technological developments, application, and public communication--I will ask if CoML still resembles traditional collaborations to collect life. While showing both continuity and change, I will argue that marine biology is a form of natural history: a specific way of working together in biology that has transformed substantially in interaction with recent developments in the life sciences and society. As a result, the paper does not only give an overview of transformations towards large scale research in marine biology, but also shines a new light on big biology, suggesting new ways to deepen the understanding of collaboration in the life sciences by distinguishing between different 'collective ways of knowing'.

  19. From darwin to the census of marine life: marine biology as big science.

    Directory of Open Access Journals (Sweden)

    Niki Vermeulen

    Full Text Available With the development of the Human Genome Project, a heated debate emerged on biology becoming 'big science'. However, biology already has a long tradition of collaboration, as natural historians were part of the first collective scientific efforts: exploring the variety of life on earth. Such mappings of life still continue today, and if field biology is gradually becoming an important subject of studies into big science, research into life in the world's oceans is not taken into account yet. This paper therefore explores marine biology as big science, presenting the historical development of marine research towards the international 'Census of Marine Life' (CoML making an inventory of life in the world's oceans. Discussing various aspects of collaboration--including size, internationalisation, research practice, technological developments, application, and public communication--I will ask if CoML still resembles traditional collaborations to collect life. While showing both continuity and change, I will argue that marine biology is a form of natural history: a specific way of working together in biology that has transformed substantially in interaction with recent developments in the life sciences and society. As a result, the paper does not only give an overview of transformations towards large scale research in marine biology, but also shines a new light on big biology, suggesting new ways to deepen the understanding of collaboration in the life sciences by distinguishing between different 'collective ways of knowing'.

  20. Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions.

    Science.gov (United States)

    Gerlt, John A

    2017-08-22

    The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.

  1. Wireless Transmission of Big Data: A Transmission Time Analysis over Fading Channel

    KAUST Repository

    Wang, Wen-Jing; Yang, Hong-Chuan; Alouini, Mohamed-Slim

    2018-01-01

    In this paper, we investigate the transmission time of a large amount of data over fading wireless channel with adaptive modulation and coding (AMC). Unlike traditional transmission systems, where the transmission time of a fixed amount of data is typically regarded as a constant, the transmission time with AMC becomes a random variable, as the transmission rate varies with the fading channel condition. To facilitate the design and optimization of wireless transmission schemes for big data applications, we present an analytical framework to determine statistical characterizations for the transmission time of big data with AMC. In particular, we derive the exact statistics of transmission time over block fading channels. The probability mass function (PMF) and cumulative distribution function (CDF) of transmission time are obtained for both slow and fast fading scenarios. We further extend our analysis to Markov channel, where transmission time becomes the sum of a sequence of exponentially distributed time slots. Analytical expression for the probability density function (PDF) of transmission time is derived for both fast fading and slow fading scenarios. These analytical results are essential to the optimal design and performance analysis of future wireless transmission systems for big data applications.

  2. Wireless Transmission of Big Data: A Transmission Time Analysis over Fading Channel

    KAUST Repository

    Wang, Wen-Jing

    2018-04-10

    In this paper, we investigate the transmission time of a large amount of data over fading wireless channel with adaptive modulation and coding (AMC). Unlike traditional transmission systems, where the transmission time of a fixed amount of data is typically regarded as a constant, the transmission time with AMC becomes a random variable, as the transmission rate varies with the fading channel condition. To facilitate the design and optimization of wireless transmission schemes for big data applications, we present an analytical framework to determine statistical characterizations for the transmission time of big data with AMC. In particular, we derive the exact statistics of transmission time over block fading channels. The probability mass function (PMF) and cumulative distribution function (CDF) of transmission time are obtained for both slow and fast fading scenarios. We further extend our analysis to Markov channel, where transmission time becomes the sum of a sequence of exponentially distributed time slots. Analytical expression for the probability density function (PDF) of transmission time is derived for both fast fading and slow fading scenarios. These analytical results are essential to the optimal design and performance analysis of future wireless transmission systems for big data applications.

  3. Big data need big theory too.

    Science.gov (United States)

    Coveney, Peter V; Dougherty, Edward R; Highfield, Roger R

    2016-11-13

    The current interest in big data, machine learning and data analytics has generated the widespread impression that such methods are capable of solving most problems without the need for conventional scientific methods of inquiry. Interest in these methods is intensifying, accelerated by the ease with which digitized data can be acquired in virtually all fields of endeavour, from science, healthcare and cybersecurity to economics, social sciences and the humanities. In multiscale modelling, machine learning appears to provide a shortcut to reveal correlations of arbitrary complexity between processes at the atomic, molecular, meso- and macroscales. Here, we point out the weaknesses of pure big data approaches with particular focus on biology and medicine, which fail to provide conceptual accounts for the processes to which they are applied. No matter their 'depth' and the sophistication of data-driven methods, such as artificial neural nets, in the end they merely fit curves to existing data. Not only do these methods invariably require far larger quantities of data than anticipated by big data aficionados in order to produce statistically reliable results, but they can also fail in circumstances beyond the range of the data used to train them because they are not designed to model the structural characteristics of the underlying system. We argue that it is vital to use theory as a guide to experimental design for maximal efficiency of data collection and to produce reliable predictive models and conceptual knowledge. Rather than continuing to fund, pursue and promote 'blind' big data projects with massive budgets, we call for more funding to be allocated to the elucidation of the multiscale and stochastic processes controlling the behaviour of complex systems, including those of life, medicine and healthcare.This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'. © 2015 The Authors.

  4. Assessing Big Data

    DEFF Research Database (Denmark)

    Leimbach, Timo; Bachlechner, Daniel

    2015-01-01

    In recent years, big data has been one of the most controversially discussed technologies in terms of its possible positive and negative impact. Therefore, the need for technology assessments is obvious. This paper first provides, based on the results of a technology assessment study, an overview...... of the potential and challenges associated with big data and then describes the problems experienced during the study as well as methods found helpful to address them. The paper concludes with reflections on how the insights from the technology assessment study may have an impact on the future governance of big...... data....

  5. From big data to diagnosis and prognosis: gene expression signatures in liver hepatocellular carcinoma

    Directory of Open Access Journals (Sweden)

    Hong Yang

    2017-03-01

    Full Text Available Background Liver hepatocellular carcinoma accounts for the overwhelming majority of primary liver cancers and its belated diagnosis and poor prognosis call for novel biomarkers to be discovered, which, in the era of big data, innovative bioinformatics and computational techniques can prove to be highly helpful in. Methods Big data aggregated from The Cancer Genome Atlas and Natural Language Processing were integrated to generate differentially expressed genes. Relevant signaling pathways of differentially expressed genes went through Gene Ontology enrichment analysis, Kyoto Encyclopedia of Genes and Genomes and Panther pathway enrichment analysis and protein-protein interaction network. The pathway ranked high in the enrichment analysis was further investigated, and selected genes with top priority were evaluated and assessed in terms of their diagnostic and prognostic values. Results A list of 389 genes was generated by overlapping genes from The Cancer Genome Atlas and Natural Language Processing. Three pathways demonstrated top priorities, and the one with specific associations with cancers, ‘pathways in cancer,’ was analyzed with its four highlighted genes, namely, BIRC5, E2F1, CCNE1, and CDKN2A, which were validated using Oncomine. The detection pool composed of the four genes presented satisfactory diagnostic power with an outstanding integrated AUC of 0.990 (95% CI [0.982–0.998], P < 0.001, sensitivity: 96.0%, specificity: 96.5%. BIRC5 (P = 0.021 and CCNE1 (P = 0.027 were associated with poor prognosis, while CDKN2A (P = 0.066 and E2F1 (P = 0.088 demonstrated no statistically significant differences. Discussion The study illustrates liver hepatocellular carcinoma gene signatures, related pathways and networks from the perspective of big data, featuring the cancer-specific pathway with priority, ‘pathways in cancer.’ The detection pool of the four highlighted genes, namely BIRC5, E2F1, CCNE1 and CDKN2A, should be

  6. Genome sequence analysis with MonetDB - A case study on Ebola virus diversity

    NARCIS (Netherlands)

    Cijvat, R.; Manegold, S.; Kersten, M.; Klau, G.W.; Schönhuth, A.; Marschall, T.; Zhang, Y.

    2015-01-01

    Next-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but yields terabytes of data to be stored and analyzed. Biologists are often exposed to excessively time consuming and error-prone data management and

  7. Big data, big responsibilities

    Directory of Open Access Journals (Sweden)

    Primavera De Filippi

    2014-01-01

    Full Text Available Big data refers to the collection and aggregation of large quantities of data produced by and about people, things or the interactions between them. With the advent of cloud computing, specialised data centres with powerful computational hardware and software resources can be used for processing and analysing a humongous amount of aggregated data coming from a variety of different sources. The analysis of such data is all the more valuable to the extent that it allows for specific patterns to be found and new correlations to be made between different datasets, so as to eventually deduce or infer new information, as well as to potentially predict behaviours or assess the likelihood for a certain event to occur. This article will focus specifically on the legal and moral obligations of online operators collecting and processing large amounts of data, to investigate the potential implications of big data analysis on the privacy of individual users and on society as a whole.

  8. Comparative validity of brief to medium-length Big Five and Big Six Personality Questionnaires.

    Science.gov (United States)

    Thalmayer, Amber Gayle; Saucier, Gerard; Eigenhuis, Annemarie

    2011-12-01

    A general consensus on the Big Five model of personality attributes has been highly generative for the field of personality psychology. Many important psychological and life outcome correlates with Big Five trait dimensions have been established. But researchers must choose between multiple Big Five inventories when conducting a study and are faced with a variety of options as to inventory length. Furthermore, a 6-factor model has been proposed to extend and update the Big Five model, in part by adding a dimension of Honesty/Humility or Honesty/Propriety. In this study, 3 popular brief to medium-length Big Five measures (NEO Five Factor Inventory, Big Five Inventory [BFI], and International Personality Item Pool), and 3 six-factor measures (HEXACO Personality Inventory, Questionnaire Big Six Scales, and a 6-factor version of the BFI) were placed in competition to best predict important student life outcomes. The effect of test length was investigated by comparing brief versions of most measures (subsets of items) with original versions. Personality questionnaires were administered to undergraduate students (N = 227). Participants' college transcripts and student conduct records were obtained 6-9 months after data was collected. Six-factor inventories demonstrated better predictive ability for life outcomes than did some Big Five inventories. Additional behavioral observations made on participants, including their Facebook profiles and cell-phone text usage, were predicted similarly by Big Five and 6-factor measures. A brief version of the BFI performed surprisingly well; across inventory platforms, increasing test length had little effect on predictive validity. Comparative validity of the models and measures in terms of outcome prediction and parsimony is discussed.

  9. Whole Genome and Tandem Duplicate Retention facilitated Glucosinolate Pathway Diversification in the Mustard Family.

    NARCIS (Netherlands)

    Hofberger, J.A.; Lyons, E.; Edger, P.P.; Pires, J.C.; Schranz, M.E.

    2013-01-01

    Plants share a common history of successive whole genome duplication (WGD) events retaining genomic patterns of duplicate gene copies (ohnologs) organized in conserved syntenic blocks. Duplication was often proposed to affect the origin of novel traits during evolution. However, genetic evidence

  10. Big Machines and Big Science: 80 Years of Accelerators at Stanford

    Energy Technology Data Exchange (ETDEWEB)

    Loew, Gregory

    2008-12-16

    Longtime SLAC physicist Greg Loew will present a trip through SLAC's origins, highlighting its scientific achievements, and provide a glimpse of the lab's future in 'Big Machines and Big Science: 80 Years of Accelerators at Stanford.'

  11. GenoSets: visual analytic methods for comparative genomics.

    Directory of Open Access Journals (Sweden)

    Aurora A Cain

    Full Text Available Many important questions in biology are, fundamentally, comparative, and this extends to our analysis of a growing number of sequenced genomes. Existing genomic analysis tools are often organized around literal views of genomes as linear strings. Even when information is highly condensed, these views grow cumbersome as larger numbers of genomes are added. Data aggregation and summarization methods from the field of visual analytics can provide abstracted comparative views, suitable for sifting large multi-genome datasets to identify critical similarities and differences. We introduce a software system for visual analysis of comparative genomics data. The system automates the process of data integration, and provides the analysis platform to identify and explore features of interest within these large datasets. GenoSets borrows techniques from business intelligence and visual analytics to provide a rich interface of interactive visualizations supported by a multi-dimensional data warehouse. In GenoSets, visual analytic approaches are used to enable querying based on orthology, functional assignment, and taxonomic or user-defined groupings of genomes. GenoSets links this information together with coordinated, interactive visualizations for both detailed and high-level categorical analysis of summarized data. GenoSets has been designed to simplify the exploration of multiple genome datasets and to facilitate reasoning about genomic comparisons. Case examples are included showing the use of this system in the analysis of 12 Brucella genomes. GenoSets software and the case study dataset are freely available at http://genosets.uncc.edu. We demonstrate that the integration of genomic data using a coordinated multiple view approach can simplify the exploration of large comparative genomic data sets, and facilitate reasoning about comparisons and features of interest.

  12. Dual of big bang and big crunch

    International Nuclear Information System (INIS)

    Bak, Dongsu

    2007-01-01

    Starting from the Janus solution and its gauge theory dual, we obtain the dual gauge theory description of the cosmological solution by the procedure of double analytic continuation. The coupling is driven either to zero or to infinity at the big-bang and big-crunch singularities, which are shown to be related by the S-duality symmetry. In the dual Yang-Mills theory description, these are nonsingular as the coupling goes to zero in the N=4 super Yang-Mills theory. The cosmological singularities simply signal the failure of the supergravity description of the full type IIB superstring theory

  13. Comparative Validity of Brief to Medium-Length Big Five and Big Six Personality Questionnaires

    Science.gov (United States)

    Thalmayer, Amber Gayle; Saucier, Gerard; Eigenhuis, Annemarie

    2011-01-01

    A general consensus on the Big Five model of personality attributes has been highly generative for the field of personality psychology. Many important psychological and life outcome correlates with Big Five trait dimensions have been established. But researchers must choose between multiple Big Five inventories when conducting a study and are…

  14. Genome sequences and comparative genomics of two Lactobacillus ruminis strains from the bovine and human intestinal tracts

    LENUS (Irish Health Repository)

    2011-08-30

    Abstract Background The genus Lactobacillus is characterized by an extraordinary degree of phenotypic and genotypic diversity, which recent genomic analyses have further highlighted. However, the choice of species for sequencing has been non-random and unequal in distribution, with only a single representative genome from the L. salivarius clade available to date. Furthermore, there is no data to facilitate a functional genomic analysis of motility in the lactobacilli, a trait that is restricted to the L. salivarius clade. Results The 2.06 Mb genome of the bovine isolate Lactobacillus ruminis ATCC 27782 comprises a single circular chromosome, and has a G+C content of 44.4%. In silico analysis identified 1901 coding sequences, including genes for a pediocin-like bacteriocin, a single large exopolysaccharide-related cluster, two sortase enzymes, two CRISPR loci and numerous IS elements and pseudogenes. A cluster of genes related to a putative pilin was identified, and shown to be transcribed in vitro. A high quality draft assembly of the genome of a second L. ruminis strain, ATCC 25644 isolated from humans, suggested a slightly larger genome of 2.138 Mb, that exhibited a high degree of synteny with the ATCC 27782 genome. In contrast, comparative analysis of L. ruminis and L. salivarius identified a lack of long-range synteny between these closely related species. Comparison of the L. salivarius clade core proteins with those of nine other Lactobacillus species distributed across 4 major phylogenetic groups identified the set of shared proteins, and proteins unique to each group. Conclusions The genome of L. ruminis provides a comparative tool for directing functional analyses of other members of the L. salivarius clade, and it increases understanding of the divergence of this distinct Lactobacillus lineage from other commensal lactobacilli. The genome sequence provides a definitive resource to facilitate investigation of the genetics, biochemistry and host

  15. Implementing Operational Analytics using Big Data Technologies to Detect and Predict Sensor Anomalies

    Science.gov (United States)

    Coughlin, J.; Mital, R.; Nittur, S.; SanNicolas, B.; Wolf, C.; Jusufi, R.

    2016-09-01

    Operational analytics when combined with Big Data technologies and predictive techniques have been shown to be valuable in detecting mission critical sensor anomalies that might be missed by conventional analytical techniques. Our approach helps analysts and leaders make informed and rapid decisions by analyzing large volumes of complex data in near real-time and presenting it in a manner that facilitates decision making. It provides cost savings by being able to alert and predict when sensor degradations pass a critical threshold and impact mission operations. Operational analytics, which uses Big Data tools and technologies, can process very large data sets containing a variety of data types to uncover hidden patterns, unknown correlations, and other relevant information. When combined with predictive techniques, it provides a mechanism to monitor and visualize these data sets and provide insight into degradations encountered in large sensor systems such as the space surveillance network. In this study, data from a notional sensor is simulated and we use big data technologies, predictive algorithms and operational analytics to process the data and predict sensor degradations. This study uses data products that would commonly be analyzed at a site. This study builds on a big data architecture that has previously been proven valuable in detecting anomalies. This paper outlines our methodology of implementing an operational analytic solution through data discovery, learning and training of data modeling and predictive techniques, and deployment. Through this methodology, we implement a functional architecture focused on exploring available big data sets and determine practical analytic, visualization, and predictive technologies.

  16. Secondary Use and Analysis of Big Data Collected for Patient Care.

    Science.gov (United States)

    Martin-Sanchez, F J; Aguiar-Pulido, V; Lopez-Campos, G H; Peek, N; Sacchi, L

    2017-08-01

    Objectives: To identify common methodological challenges and review relevant initiatives related to the re-use of patient data collected in routine clinical care, as well as to analyze the economic benefits derived from the secondary use of this data. Through the use of several examples, this article aims to provide a glimpse into the different areas of application, namely clinical research, genomic research, study of environmental factors, and population and health services research. This paper describes some of the informatics methods and Big Data resources developed in this context, such as electronic phenotyping, clinical research networks, biorepositories, screening data banks, and wide association studies. Lastly, some of the potential limitations of these approaches are discussed, focusing on confounding factors and data quality. Methods: A series of literature searches in main bibliographic databases have been conducted in order to assess the extent to which existing patient data has been repurposed for research. This contribution from the IMIA working group on "Data mining and Big Data analytics" focuses on the literature published during the last two years, covering the timeframe since the working group's last survey. Results and Conclusions: Although most of the examples of secondary use of patient data lie in the arena of clinical and health services research, we have started to witness other important applications, particularly in the area of genomic research and the study of health effects of environmental factors. Further research is needed to characterize the economic impact of secondary use across the broad spectrum of translational research. Georg Thieme Verlag KG Stuttgart.

  17. The challenge of big data in public health: an opportunity for visual analytics.

    Science.gov (United States)

    Ola, Oluwakemi; Sedig, Kamran

    2014-01-01

    Public health (PH) data can generally be characterized as big data. The efficient and effective use of this data determines the extent to which PH stakeholders can sufficiently address societal health concerns as they engage in a variety of work activities. As stakeholders interact with data, they engage in various cognitive activities such as analytical reasoning, decision-making, interpreting, and problem solving. Performing these activities with big data is a challenge for the unaided mind as stakeholders encounter obstacles relating to the data's volume, variety, velocity, and veracity. Such being the case, computer-based information tools are needed to support PH stakeholders. Unfortunately, while existing computational tools are beneficial in addressing certain work activities, they fall short in supporting cognitive activities that involve working with large, heterogeneous, and complex bodies of data. This paper presents visual analytics (VA) tools, a nascent category of computational tools that integrate data analytics with interactive visualizations, to facilitate the performance of cognitive activities involving big data. Historically, PH has lagged behind other sectors in embracing new computational technology. In this paper, we discuss the role that VA tools can play in addressing the challenges presented by big data. In doing so, we demonstrate the potential benefit of incorporating VA tools into PH practice, in addition to highlighting the need for further systematic and focused research.

  18. Earth Science Data Analysis in the Era of Big Data

    Science.gov (United States)

    Kuo, K.-S.; Clune, T. L.; Ramachandran, R.

    2014-01-01

    Anyone with even a cursory interest in information technology cannot help but recognize that "Big Data" is one of the most fashionable catchphrases of late. From accurate voice and facial recognition, language translation, and airfare prediction and comparison, to monitoring the real-time spread of flu, Big Data techniques have been applied to many seemingly intractable problems with spectacular successes. They appear to be a rewarding way to approach many currently unsolved problems. Few fields of research can claim a longer history with problems involving voluminous data than Earth science. The problems we are facing today with our Earth's future are more complex and carry potentially graver consequences than the examples given above. How has our climate changed? Beside natural variations, what is causing these changes? What are the processes involved and through what mechanisms are these connected? How will they impact life as we know it? In attempts to answer these questions, we have resorted to observations and numerical simulations with ever-finer resolutions, which continue to feed the "data deluge." Plausibly, many Earth scientists are wondering: How will Big Data technologies benefit Earth science research? As an example from the global water cycle, one subdomain among many in Earth science, how would these technologies accelerate the analysis of decades of global precipitation to ascertain the changes in its characteristics, to validate these changes in predictive climate models, and to infer the implications of these changes to ecosystems, economies, and public health? Earth science researchers need a viable way to harness the power of Big Data technologies to analyze large volumes and varieties of data with velocity and veracity. Beyond providing speedy data analysis capabilities, Big Data technologies can also play a crucial, albeit indirect, role in boosting scientific productivity by facilitating effective collaboration within an analysis environment

  19. Big Data: Implications for Health System Pharmacy.

    Science.gov (United States)

    Stokes, Laura B; Rogers, Joseph W; Hertig, John B; Weber, Robert J

    2016-07-01

    Big Data refers to datasets that are so large and complex that traditional methods and hardware for collecting, sharing, and analyzing them are not possible. Big Data that is accurate leads to more confident decision making, improved operational efficiency, and reduced costs. The rapid growth of health care information results in Big Data around health services, treatments, and outcomes, and Big Data can be used to analyze the benefit of health system pharmacy services. The goal of this article is to provide a perspective on how Big Data can be applied to health system pharmacy. It will define Big Data, describe the impact of Big Data on population health, review specific implications of Big Data in health system pharmacy, and describe an approach for pharmacy leaders to effectively use Big Data. A few strategies involved in managing Big Data in health system pharmacy include identifying potential opportunities for Big Data, prioritizing those opportunities, protecting privacy concerns, promoting data transparency, and communicating outcomes. As health care information expands in its content and becomes more integrated, Big Data can enhance the development of patient-centered pharmacy services.

  20. Parallel computing in genomic research: advances and applications.

    Science.gov (United States)

    Ocaña, Kary; de Oliveira, Daniel

    2015-01-01

    Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.

  1. Generalized formal model of Big Data

    OpenAIRE

    Shakhovska, N.; Veres, O.; Hirnyak, M.

    2016-01-01

    This article dwells on the basic characteristic features of the Big Data technologies. It is analyzed the existing definition of the “big data” term. The article proposes and describes the elements of the generalized formal model of big data. It is analyzed the peculiarities of the application of the proposed model components. It is described the fundamental differences between Big Data technology and business analytics. Big Data is supported by the distributed file system Google File System ...

  2. Application of genomic tools in plant breeding.

    Science.gov (United States)

    Pérez-de-Castro, A M; Vilanova, S; Cañizares, J; Pascual, L; Blanca, J M; Díez, M J; Prohens, J; Picó, B

    2012-05-01

    Plant breeding has been very successful in developing improved varieties using conventional tools and methodologies. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic information. The analysis of NGS data by means of bioinformatics developments allows discovering new genes and regulatory sequences and their positions, and makes available large collections of molecular markers. Genome-wide expression studies provide breeders with an understanding of the molecular basis of complex traits. Genomic approaches include TILLING and EcoTILLING, which make possible to screen mutant and germplasm collections for allelic variants in target genes. Re-sequencing of genomes is very useful for the genome-wide discovery of markers amenable for high-throughput genotyping platforms, like SSRs and SNPs, or the construction of high density genetic maps. All these tools and resources facilitate studying the genetic diversity, which is important for germplasm management, enhancement and use. Also, they allow the identification of markers linked to genes and QTLs, using a diversity of techniques like bulked segregant analysis (BSA), fine genetic mapping, or association mapping. These new markers are used for marker assisted selection, including marker assisted backcross selection, 'breeding by design', or new strategies, like genomic selection. In conclusion, advances in genomics are providing breeders with new tools and methodologies that allow a great leap forward in plant breeding, including the 'superdomestication' of crops and the genetic dissection and breeding for complex traits.

  3. Exploration of plant genomes in the FLAGdb++ environment

    Directory of Open Access Journals (Sweden)

    Leplé Jean-Charles

    2011-03-01

    Full Text Available Abstract Background In the contexts of genomics, post-genomics and systems biology approaches, data integration presents a major concern. Databases provide crucial solutions: they store, organize and allow information to be queried, they enhance the visibility of newly produced data by comparing them with previously published results, and facilitate the exploration and development of both existing hypotheses and new ideas. Results The FLAGdb++ information system was developed with the aim of using whole plant genomes as physical references in order to gather and merge available genomic data from in silico or experimental approaches. Available through a JAVA application, original interfaces and tools assist the functional study of plant genes by considering them in their specific context: chromosome, gene family, orthology group, co-expression cluster and functional network. FLAGdb++ is mainly dedicated to the exploration of large gene groups in order to decipher functional connections, to highlight shared or specific structural or functional features, and to facilitate translational tasks between plant species (Arabidopsis thaliana, Oryza sativa, Populus trichocarpa and Vitis vinifera. Conclusion Combining original data with the output of experts and graphical displays that differ from classical plant genome browsers, FLAGdb++ presents a powerful complementary tool for exploring plant genomes and exploiting structural and functional resources, without the need for computer programming knowledge. First launched in 2002, a 15th version of FLAGdb++ is now available and comprises four model plant genomes and over eight million genomic features.

  4. The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

    Science.gov (United States)

    Lack, Justin B; Cardeno, Charis M; Crepeau, Marc W; Taylor, William; Corbett-Detig, Russell B; Stevens, Kristian A; Langley, Charles H; Pool, John E

    2015-04-01

    Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets. Copyright © 2015 by the Genetics Society of America.

  5. Big data-driven business how to use big data to win customers, beat competitors, and boost profits

    CERN Document Server

    Glass, Russell

    2014-01-01

    Get the expert perspective and practical advice on big data The Big Data-Driven Business: How to Use Big Data to Win Customers, Beat Competitors, and Boost Profits makes the case that big data is for real, and more than just big hype. The book uses real-life examples-from Nate Silver to Copernicus, and Apple to Blackberry-to demonstrate how the winners of the future will use big data to seek the truth. Written by a marketing journalist and the CEO of a multi-million-dollar B2B marketing platform that reaches more than 90% of the U.S. business population, this book is a comprehens

  6. Big Game Reporting Stations

    Data.gov (United States)

    Vermont Center for Geographic Information — Point locations of big game reporting stations. Big game reporting stations are places where hunters can legally report harvested deer, bear, or turkey. These are...

  7. Facilitating genome navigation : survey sequencing and dense radiation-hybrid gene mapping

    NARCIS (Netherlands)

    Hitte, C; Madeoy, J; Kirkness, EF; Priat, C; Lorentzen, TD; Senger, F; Thomas, D; Derrien, T; Ramirez, C; Scott, C; Evanno, G; Pullar, B; Cadieu, E; Oza, [No Value; Lourgant, K; Jaffe, DB; Tacher, S; Dreano, S; Berkova, N; Andre, C; Deloukas, P; Fraser, C; Lindblad-Toh, K; Ostrander, EA; Galibert, F

    Accurate and comprehensive sequence coverage for large genomes has been restricted to only a few species of specific interest. Lower sequence coverage (survey sequencing) of related species can yield a wealth of information about gene content and putative regulatory elements. But survey sequences

  8. Structural features in the HIV-1 repeat region facilitate strand transfer during reverse transcription

    NARCIS (Netherlands)

    Berkhout, B.; Vastenhouw, N. L.; Klasens, B. I.; Huthoff, H.

    2001-01-01

    Two obligatory DNA strand transfers take place during reverse transcription of a retroviral RNA genome. The first strand transfer is facilitated by terminal repeat (R) elements in the viral genome. This strand-transfer reaction depends on base pairing between the cDNA of the 5'R and the 3'R. There

  9. Stalin's Big Fleet Program

    National Research Council Canada - National Science Library

    Mauner, Milan

    2002-01-01

    Although Dr. Milan Hauner's study 'Stalin's Big Fleet program' has focused primarily on the formation of Big Fleets during the Tsarist and Soviet periods of Russia's naval history, there are important lessons...

  10. Recurrent DNA inversion rearrangements in the human genome

    DEFF Research Database (Denmark)

    Flores, Margarita; Morales, Lucía; Gonzaga-Jauregui, Claudia

    2007-01-01

    Several lines of evidence suggest that reiterated sequences in the human genome are targets for nonallelic homologous recombination (NAHR), which facilitates genomic rearrangements. We have used a PCR-based approach to identify breakpoint regions of rearranged structures in the human genome...... to human genomic variation is discussed........ In particular, we have identified intrachromosomal identical repeats that are located in reverse orientation, which may lead to chromosomal inversions. A bioinformatic workflow pathway to select appropriate regions for analysis was developed. Three such regions overlapping with known human genes, located...

  11. Five Big, Big Five Issues : Rationale, Content, Structure, Status, and Crosscultural Assessment

    NARCIS (Netherlands)

    De Raad, Boele

    1998-01-01

    This article discusses the rationale, content, structure, status, and crosscultural assessment of the Big Five trait factors, focusing on topics of dispute and misunderstanding. Taxonomic restrictions of the original Big Five forerunner, the "Norman Five," are discussed, and criticisms regarding the

  12. D3GB: An Interactive Genome Browser for R, Python, and WordPress.

    Science.gov (United States)

    Barrios, David; Prieto, Carlos

    2017-05-01

    Genome browsers are useful not only for showing final results but also for improving analysis protocols, testing data quality, and generating result drafts. Its integration in analysis pipelines allows the optimization of parameters, which leads to better results. New developments that facilitate the creation and utilization of genome browsers could contribute to improving analysis results and supporting the quick visualization of genomic data. D3 Genome Browser is an interactive genome browser that can be easily integrated in analysis protocols and shared on the Web. It is distributed as an R package, a Python module, and a WordPress plugin to facilitate its integration in pipelines and the utilization of platform capabilities. It is compatible with popular data formats such as GenBank, GFF, BED, FASTA, and VCF, and enables the exploration of genomic data with a Web browser.

  13. Big data challenges

    DEFF Research Database (Denmark)

    Bachlechner, Daniel; Leimbach, Timo

    2016-01-01

    Although reports on big data success stories have been accumulating in the media, most organizations dealing with high-volume, high-velocity and high-variety information assets still face challenges. Only a thorough understanding of these challenges puts organizations into a position in which...... they can make an informed decision for or against big data, and, if the decision is positive, overcome the challenges smoothly. The combination of a series of interviews with leading experts from enterprises, associations and research institutions, and focused literature reviews allowed not only...... framework are also relevant. For large enterprises and startups specialized in big data, it is typically easier to overcome the challenges than it is for other enterprises and public administration bodies....

  14. Building a model: developing genomic resources for common milkweed (Asclepias syriaca with low coverage genome sequencing

    Directory of Open Access Journals (Sweden)

    Weitemier Kevin

    2011-05-01

    Full Text Available Abstract Background Milkweeds (Asclepias L. have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L. could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp and 5S rDNA (120 bp sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp, with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae unigenes (median coverage of 0.29× and 66% of single copy orthologs (COSII in asterids (median coverage of 0.14×. From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites and phylogenetics (low-copy nuclear genes studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species

  15. Big Data and HPC collocation: Using HPC idle resources for Big Data Analytics

    OpenAIRE

    MERCIER , Michael; Glesser , David; Georgiou , Yiannis; Richard , Olivier

    2017-01-01

    International audience; Executing Big Data workloads upon High Performance Computing (HPC) infrastractures has become an attractive way to improve their performances. However, the collocation of HPC and Big Data workloads is not an easy task, mainly because of their core concepts' differences. This paper focuses on the challenges related to the scheduling of both Big Data and HPC workloads on the same computing platform. In classic HPC workloads, the rigidity of jobs tends to create holes in ...

  16. Big Data as Governmentality

    DEFF Research Database (Denmark)

    Flyverbom, Mikkel; Madsen, Anders Koed; Rasche, Andreas

    This paper conceptualizes how large-scale data and algorithms condition and reshape knowledge production when addressing international development challenges. The concept of governmentality and four dimensions of an analytics of government are proposed as a theoretical framework to examine how big...... data is constituted as an aspiration to improve the data and knowledge underpinning development efforts. Based on this framework, we argue that big data’s impact on how relevant problems are governed is enabled by (1) new techniques of visualizing development issues, (2) linking aspects...... shows that big data problematizes selected aspects of traditional ways to collect and analyze data for development (e.g. via household surveys). We also demonstrate that using big data analyses to address development challenges raises a number of questions that can deteriorate its impact....

  17. Boarding to Big data

    Directory of Open Access Journals (Sweden)

    Oana Claudia BRATOSIN

    2016-05-01

    Full Text Available Today Big data is an emerging topic, as the quantity of the information grows exponentially, laying the foundation for its main challenge, the value of the information. The information value is not only defined by the value extraction from huge data sets, as fast and optimal as possible, but also by the value extraction from uncertain and inaccurate data, in an innovative manner using Big data analytics. At this point, the main challenge of the businesses that use Big data tools is to clearly define the scope and the necessary output of the business so that the real value can be gained. This article aims to explain the Big data concept, its various classifications criteria, architecture, as well as the impact in the world wide processes.

  18. Big data - a 21st century science Maginot Line? No-boundary thinking: shifting from the big data paradigm.

    Science.gov (United States)

    Huang, Xiuzhen; Jennings, Steven F; Bruce, Barry; Buchan, Alison; Cai, Liming; Chen, Pengyin; Cramer, Carole L; Guan, Weihua; Hilgert, Uwe Kk; Jiang, Hongmei; Li, Zenglu; McClure, Gail; McMullen, Donald F; Nanduri, Bindu; Perkins, Andy; Rekepalli, Bhanu; Salem, Saeed; Specker, Jennifer; Walker, Karl; Wunsch, Donald; Xiong, Donghai; Zhang, Shuzhong; Zhang, Yu; Zhao, Zhongming; Moore, Jason H

    2015-01-01

    Whether your interests lie in scientific arenas, the corporate world, or in government, you have certainly heard the praises of big data: Big data will give you new insights, allow you to become more efficient, and/or will solve your problems. While big data has had some outstanding successes, many are now beginning to see that it is not the Silver Bullet that it has been touted to be. Here our main concern is the overall impact of big data; the current manifestation of big data is constructing a Maginot Line in science in the 21st century. Big data is not "lots of data" as a phenomena anymore; The big data paradigm is putting the spirit of the Maginot Line into lots of data. Big data overall is disconnecting researchers and science challenges. We propose No-Boundary Thinking (NBT), applying no-boundary thinking in problem defining to address science challenges.

  19. Big Egos in Big Science

    DEFF Research Database (Denmark)

    Andersen, Kristina Vaarst; Jeppesen, Jacob

    In this paper we investigate the micro-mechanisms governing structural evolution and performance of scientific collaboration. Scientific discovery tends not to be lead by so called lone ?stars?, or big egos, but instead by collaboration among groups of researchers, from a multitude of institutions...

  20. Big Data and Big Science

    OpenAIRE

    Di Meglio, Alberto

    2014-01-01

    Brief introduction to the challenges of big data in scientific research based on the work done by the HEP community at CERN and how the CERN openlab promotes collaboration among research institutes and industrial IT companies. Presented at the FutureGov 2014 conference in Singapore.

  1. Quality control and conduct of genome-wide association meta-analyses

    DEFF Research Database (Denmark)

    Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C

    2014-01-01

    Rigorous organization and quality control (QC) are necessary to facilitate successful genome-wide association meta-analyses (GWAMAs) of statistics aggregated across multiple genome-wide association studies. This protocol provides guidelines for (i) organizational aspects of GWAMAs, and for (ii) QC...

  2. Challenges of Big Data Analysis.

    Science.gov (United States)

    Fan, Jianqing; Han, Fang; Liu, Han

    2014-06-01

    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

  3. A universal genomic coordinate translator for comparative genomics.

    Science.gov (United States)

    Zamani, Neda; Sundström, Görel; Meadows, Jennifer R S; Höppner, Marc P; Dainat, Jacques; Lantz, Henrik; Haas, Brian J; Grabherr, Manfred G

    2014-06-30

    species. Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.

  4. Big data is not a monolith

    CERN Document Server

    Ekbia, Hamid R; Mattioli, Michael

    2016-01-01

    Big data is ubiquitous but heterogeneous. Big data can be used to tally clicks and traffic on web pages, find patterns in stock trades, track consumer preferences, identify linguistic correlations in large corpuses of texts. This book examines big data not as an undifferentiated whole but contextually, investigating the varied challenges posed by big data for health, science, law, commerce, and politics. Taken together, the chapters reveal a complex set of problems, practices, and policies. The advent of big data methodologies has challenged the theory-driven approach to scientific knowledge in favor of a data-driven one. Social media platforms and self-tracking tools change the way we see ourselves and others. The collection of data by corporations and government threatens privacy while promoting transparency. Meanwhile, politicians, policy makers, and ethicists are ill-prepared to deal with big data's ramifications. The contributors look at big data's effect on individuals as it exerts social control throu...

  5. Big universe, big data

    DEFF Research Database (Denmark)

    Kremer, Jan; Stensbo-Smidt, Kristoffer; Gieseke, Fabian Cristian

    2017-01-01

    , modern astronomy requires big data know-how, in particular it demands highly efficient machine learning and image analysis algorithms. But scalability is not the only challenge: Astronomy applications touch several current machine learning research questions, such as learning from biased data and dealing......, and highlight some recent methodological advancements in machine learning and image analysis triggered by astronomical applications....

  6. Poker Player Behavior After Big Wins and Big Losses

    OpenAIRE

    Gary Smith; Michael Levere; Robert Kurtzman

    2009-01-01

    We find that experienced poker players typically change their style of play after winning or losing a big pot--most notably, playing less cautiously after a big loss, evidently hoping for lucky cards that will erase their loss. This finding is consistent with Kahneman and Tversky's (Kahneman, D., A. Tversky. 1979. Prospect theory: An analysis of decision under risk. Econometrica 47(2) 263-292) break-even hypothesis and suggests that when investors incur a large loss, it might be time to take ...

  7. Big Data and Chemical Education

    Science.gov (United States)

    Pence, Harry E.; Williams, Antony J.

    2016-01-01

    The amount of computerized information that organizations collect and process is growing so large that the term Big Data is commonly being used to describe the situation. Accordingly, Big Data is defined by a combination of the Volume, Variety, Velocity, and Veracity of the data being processed. Big Data tools are already having an impact in…

  8. SIGMA: A System for Integrative Genomic Microarray Analysis of Cancer Genomes

    Directory of Open Access Journals (Sweden)

    Davies Jonathan J

    2006-12-01

    Full Text Available Abstract Background The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes. Results We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types. Conclusion In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA of cancer genomes, can be accessed at http://sigma.bccrc.ca.

  9. Big data in Finnish financial services

    OpenAIRE

    Laurila, M. (Mikko)

    2017-01-01

    Abstract This thesis aims to explore the concept of big data, and create understanding of big data maturity in the Finnish financial services industry. The research questions of this thesis are “What kind of big data solutions are being implemented in the Finnish financial services sector?” and “Which factors impede faster implementation of big data solutions in the Finnish financial services sector?”. ...

  10. Big data in fashion industry

    Science.gov (United States)

    Jain, S.; Bruniaux, J.; Zeng, X.; Bruniaux, P.

    2017-10-01

    Significant work has been done in the field of big data in last decade. The concept of big data includes analysing voluminous data to extract valuable information. In the fashion world, big data is increasingly playing a part in trend forecasting, analysing consumer behaviour, preference and emotions. The purpose of this paper is to introduce the term fashion data and why it can be considered as big data. It also gives a broad classification of the types of fashion data and briefly defines them. Also, the methodology and working of a system that will use this data is briefly described.

  11. Interaction between the cellular protein eEF1A and the 3'-terminal stem-loop of West Nile virus genomic RNA facilitates viral minus-strand RNA synthesis.

    Science.gov (United States)

    Davis, William G; Blackwell, Jerry L; Shi, Pei-Yong; Brinton, Margo A

    2007-09-01

    RNase footprinting and nitrocellulose filter binding assays were previously used to map one major and two minor binding sites for the cell protein eEF1A on the 3'(+) stem-loop (SL) RNA of West Nile virus (WNV) (3). Base substitutions in the major eEF1A binding site or adjacent areas of the 3'(+) SL were engineered into a WNV infectious clone. Mutations that decreased, as well as ones that increased, eEF1A binding in in vitro assays had a negative effect on viral growth. None of these mutations affected the efficiency of translation of the viral polyprotein from the genomic RNA, but all of the mutations that decreased in vitro eEF1A binding to the 3' SL RNA also decreased viral minus-strand RNA synthesis in transfected cells. Also, a mutation that increased the efficiency of eEF1A binding to the 3' SL RNA increased minus-strand RNA synthesis in transfected cells, which resulted in decreased synthesis of genomic RNA. These results strongly suggest that the interaction between eEF1A and the WNV 3' SL facilitates viral minus-strand synthesis. eEF1A colocalized with viral replication complexes (RC) in infected cells and antibody to eEF1A coimmunoprecipitated viral RC proteins, suggesting that eEF1A facilitates an interaction between the 3' end of the genome and the RC. eEF1A bound with similar efficiencies to the 3'-terminal SL RNAs of four divergent flaviviruses, including a tick-borne flavivirus, and colocalized with dengue virus RC in infected cells. These results suggest that eEF1A plays a similar role in RNA replication for all flaviviruses.

  12. Big biomedical data as the key resource for discovery science.

    Science.gov (United States)

    Toga, Arthur W; Foster, Ian; Kesselman, Carl; Madduri, Ravi; Chard, Kyle; Deutsch, Eric W; Price, Nathan D; Glusman, Gustavo; Heavner, Benjamin D; Dinov, Ivo D; Ames, Joseph; Van Horn, John; Kramer, Roger; Hood, Leroy

    2015-11-01

    Modern biomedical data collection is generating exponentially more data in a multitude of formats. This flood of complex data poses significant opportunities to discover and understand the critical interplay among such diverse domains as genomics, proteomics, metabolomics, and phenomics, including imaging, biometrics, and clinical data. The Big Data for Discovery Science Center is taking an "-ome to home" approach to discover linkages between these disparate data sources by mining existing databases of proteomic and genomic data, brain images, and clinical assessments. In support of this work, the authors developed new technological capabilities that make it easy for researchers to manage, aggregate, manipulate, integrate, and model large amounts of distributed data. Guided by biological domain expertise, the Center's computational resources and software will reveal relationships and patterns, aiding researchers in identifying biomarkers for the most confounding conditions and diseases, such as Parkinson's and Alzheimer's. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  13. Big data bioinformatics.

    Science.gov (United States)

    Greene, Casey S; Tan, Jie; Ung, Matthew; Moore, Jason H; Cheng, Chao

    2014-12-01

    Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the "big data" era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both "machine learning" algorithms as well as "unsupervised" and "supervised" examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. © 2014 Wiley Periodicals, Inc.

  14. Changing the personality of a face: Perceived Big Two and Big Five personality factors modeled in real photographs.

    Science.gov (United States)

    Walker, Mirella; Vetter, Thomas

    2016-04-01

    General, spontaneous evaluations of strangers based on their faces have been shown to reflect judgments of these persons' intention and ability to harm. These evaluations can be mapped onto a 2D space defined by the dimensions trustworthiness (intention) and dominance (ability). Here we go beyond general evaluations and focus on more specific personality judgments derived from the Big Two and Big Five personality concepts. In particular, we investigate whether Big Two/Big Five personality judgments can be mapped onto the 2D space defined by the dimensions trustworthiness and dominance. Results indicate that judgments of the Big Two personality dimensions almost perfectly map onto the 2D space. In contrast, at least 3 of the Big Five dimensions (i.e., neuroticism, extraversion, and conscientiousness) go beyond the 2D space, indicating that additional dimensions are necessary to describe more specific face-based personality judgments accurately. Building on this evidence, we model the Big Two/Big Five personality dimensions in real facial photographs. Results from 2 validation studies show that the Big Two/Big Five are perceived reliably across different samples of faces and participants. Moreover, results reveal that participants differentiate reliably between the different Big Two/Big Five dimensions. Importantly, this high level of agreement and differentiation in personality judgments from faces likely creates a subjective reality which may have serious consequences for those being perceived-notably, these consequences ensue because the subjective reality is socially shared, irrespective of the judgments' validity. The methodological approach introduced here might prove useful in various psychological disciplines. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  15. The BigBOSS Experiment

    Energy Technology Data Exchange (ETDEWEB)

    Schelgel, D.; Abdalla, F.; Abraham, T.; Ahn, C.; Allende Prieto, C.; Annis, J.; Aubourg, E.; Azzaro, M.; Bailey, S.; Baltay, C.; Baugh, C.; /APC, Paris /Brookhaven /IRFU, Saclay /Marseille, CPPM /Marseille, CPT /Durham U. / /IEU, Seoul /Fermilab /IAA, Granada /IAC, La Laguna

    2011-01-01

    BigBOSS will obtain observational constraints that will bear on three of the four 'science frontier' questions identified by the Astro2010 Cosmology and Fundamental Phyics Panel of the Decadal Survey: Why is the universe accelerating; what is dark matter and what are the properties of neutrinos? Indeed, the BigBOSS project was recommended for substantial immediate R and D support the PASAG report. The second highest ground-based priority from the Astro2010 Decadal Survey was the creation of a funding line within the NSF to support a 'Mid-Scale Innovations' program, and it used BigBOSS as a 'compelling' example for support. This choice was the result of the Decadal Survey's Program Priorization panels reviewing 29 mid-scale projects and recommending BigBOSS 'very highly'.

  16. Big game hunting practices, meanings, motivations and constraints: a survey of Oregon big game hunters

    Science.gov (United States)

    Suresh K. Shrestha; Robert C. Burns

    2012-01-01

    We conducted a self-administered mail survey in September 2009 with randomly selected Oregon hunters who had purchased big game hunting licenses/tags for the 2008 hunting season. Survey questions explored hunting practices, the meanings of and motivations for big game hunting, the constraints to big game hunting participation, and the effects of age, years of hunting...

  17. Google BigQuery analytics

    CERN Document Server

    Tigani, Jordan

    2014-01-01

    How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addit

  18. Big data for dummies

    CERN Document Server

    Hurwitz, Judith; Halper, Fern; Kaufman, Marcia

    2013-01-01

    Find the right big data solution for your business or organization Big data management is one of the major challenges facing business, industry, and not-for-profit organizations. Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you'll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You'll learn what it is, why it m

  19. Exploring complex and big data

    Directory of Open Access Journals (Sweden)

    Stefanowski Jerzy

    2017-12-01

    Full Text Available This paper shows how big data analysis opens a range of research and technological problems and calls for new approaches. We start with defining the essential properties of big data and discussing the main types of data involved. We then survey the dedicated solutions for storing and processing big data, including a data lake, virtual integration, and a polystore architecture. Difficulties in managing data quality and provenance are also highlighted. The characteristics of big data imply also specific requirements and challenges for data mining algorithms, which we address as well. The links with related areas, including data streams and deep learning, are discussed. The common theme that naturally emerges from this characterization is complexity. All in all, we consider it to be the truly defining feature of big data (posing particular research and technological challenges, which ultimately seems to be of greater importance than the sheer data volume.

  20. The genome of Chenopodium quinoa

    NARCIS (Netherlands)

    Jarvis, D.E.; Shwen Ho, Yung; Lightfoot, Damien J.; Schmöckel, Sandra M.; Li, Bo; Borm, T.J.A.; Ohyanagi, Hajime; Mineta, Katsuhiko; Mitchell, Craig T.; Saber, Noha; Kharbatia, Najeh M.; Rupper, Ryan R.; Sharp, Aaron R.; Dally, Nadine; Boughton, Berin A.; Woo, Yong H.; Gao, Ge; Schijlen, E.G.W.M.; Guo, Xiujie; Momin, Afaque A.; Negräo, Sónia; Al-Babili, Salim; Gehring, Christoph; Roessner, Ute; Jung, Christian; Murphy, Kevin; Arold, Stefan T.; Gojobori, Takashi; Linden, van der C.G.; Loo, van E.N.; Jellen, Eric N.; Maughan, Peter J.; Tester, Mark

    2017-01-01

    Chenopodium quinoa (quinoa) is a highly nutritious grain identified as an important crop to improve world food security. Unfortunately, few resources are available to facilitate its genetic improvement. Here we report the assembly of a high-quality, chromosome-scale reference genome sequence for

  1. Genomic resources for gene discovery, functional genome annotation, and evolutionary studies of maize and its close relatives.

    Science.gov (United States)

    Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S S; Kudrna, David A; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A; Luo, Meizhong

    2013-11-01

    Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.

  2. Was there a big bang

    International Nuclear Information System (INIS)

    Narlikar, J.

    1981-01-01

    In discussing the viability of the big-bang model of the Universe relative evidence is examined including the discrepancies in the age of the big-bang Universe, the red shifts of quasars, the microwave background radiation, general theory of relativity aspects such as the change of the gravitational constant with time, and quantum theory considerations. It is felt that the arguments considered show that the big-bang picture is not as soundly established, either theoretically or observationally, as it is usually claimed to be, that the cosmological problem is still wide open and alternatives to the standard big-bang picture should be seriously investigated. (U.K.)

  3. BIG DATA-DRIVEN MARKETING: AN ABSTRACT

    OpenAIRE

    Suoniemi, Samppa; Meyer-Waarden, Lars; Munzel, Andreas

    2017-01-01

    Customer information plays a key role in managing successful relationships with valuable customers. Big data customer analytics use (BD use), i.e., the extent to which customer information derived from big data analytics guides marketing decisions, helps firms better meet customer needs for competitive advantage. This study addresses three research questions: What are the key antecedents of big data customer analytics use? How, and to what extent, does big data customer an...

  4. The trashing of Big Green

    International Nuclear Information System (INIS)

    Felten, E.

    1990-01-01

    The Big Green initiative on California's ballot lost by a margin of 2-to-1. Green measures lost in five other states, shocking ecology-minded groups. According to the postmortem by environmentalists, Big Green was a victim of poor timing and big spending by the opposition. Now its supporters plan to break up the bill and try to pass some provisions in the Legislature

  5. Integrity, standards, and QC-related issues with big data in pre-clinical drug discovery.

    Science.gov (United States)

    Brothers, John F; Ung, Matthew; Escalante-Chong, Renan; Ross, Jermaine; Zhang, Jenny; Cha, Yoonjeong; Lysaght, Andrew; Funt, Jason; Kusko, Rebecca

    2018-06-01

    The tremendous expansion of data analytics and public and private big datasets presents an important opportunity for pre-clinical drug discovery and development. In the field of life sciences, the growth of genetic, genomic, transcriptomic and proteomic data is partly driven by a rapid decline in experimental costs as biotechnology improves throughput, scalability, and speed. Yet far too many researchers tend to underestimate the challenges and consequences involving data integrity and quality standards. Given the effect of data integrity on scientific interpretation, these issues have significant implications during preclinical drug development. We describe standardized approaches for maximizing the utility of publicly available or privately generated biological data and address some of the common pitfalls. We also discuss the increasing interest to integrate and interpret cross-platform data. Principles outlined here should serve as a useful broad guide for existing analytical practices and pipelines and as a tool for developing additional insights into therapeutics using big data. Copyright © 2018 Elsevier Inc. All rights reserved.

  6. Genome engineering in human cells.

    Science.gov (United States)

    Song, Minjung; Kim, Young-Hoon; Kim, Jin-Soo; Kim, Hyongbum

    2014-01-01

    Genome editing in human cells is of great value in research, medicine, and biotechnology. Programmable nucleases including zinc-finger nucleases, transcription activator-like effector nucleases, and RNA-guided engineered nucleases recognize a specific target sequence and make a double-strand break at that site, which can result in gene disruption, gene insertion, gene correction, or chromosomal rearrangements. The target sequence complexities of these programmable nucleases are higher than 3.2 mega base pairs, the size of the haploid human genome. Here, we briefly introduce the structure of the human genome and the characteristics of each programmable nuclease, and review their applications in human cells including pluripotent stem cells. In addition, we discuss various delivery methods for nucleases, programmable nickases, and enrichment of gene-edited human cells, all of which facilitate efficient and precise genome editing in human cells.

  7. The Big Bang Singularity

    Science.gov (United States)

    Ling, Eric

    The big bang theory is a model of the universe which makes the striking prediction that the universe began a finite amount of time in the past at the so called "Big Bang singularity." We explore the physical and mathematical justification of this surprising result. After laying down the framework of the universe as a spacetime manifold, we combine physical observations with global symmetrical assumptions to deduce the FRW cosmological models which predict a big bang singularity. Next we prove a couple theorems due to Stephen Hawking which show that the big bang singularity exists even if one removes the global symmetrical assumptions. Lastly, we investigate the conditions one needs to impose on a spacetime if one wishes to avoid a singularity. The ideas and concepts used here to study spacetimes are similar to those used to study Riemannian manifolds, therefore we compare and contrast the two geometries throughout.

  8. Reframing Open Big Data

    DEFF Research Database (Denmark)

    Marton, Attila; Avital, Michel; Jensen, Tina Blegind

    2013-01-01

    Recent developments in the techniques and technologies of collecting, sharing and analysing data are challenging the field of information systems (IS) research let alone the boundaries of organizations and the established practices of decision-making. Coined ‘open data’ and ‘big data......’, these developments introduce an unprecedented level of societal and organizational engagement with the potential of computational data to generate new insights and information. Based on the commonalities shared by open data and big data, we develop a research framework that we refer to as open big data (OBD......) by employing the dimensions of ‘order’ and ‘relationality’. We argue that these dimensions offer a viable approach for IS research on open and big data because they address one of the core value propositions of IS; i.e. how to support organizing with computational data. We contrast these dimensions with two...

  9. Genome Evolution of Plant-Parasitic Nematodes.

    Science.gov (United States)

    Kikuchi, Taisei; Eves-van den Akker, Sebastian; Jones, John T

    2017-08-04

    Plant parasitism has evolved independently on at least four separate occasions in the phylum Nematoda. The application of next-generation sequencing (NGS) to plant-parasitic nematodes has allowed a wide range of genome- or transcriptome-level comparisons, and these have identified genome adaptations that enable parasitism of plants. Current genome data suggest that horizontal gene transfer, gene family expansions, evolution of new genes that mediate interactions with the host, and parasitism-specific gene regulation are important adaptations that allow nematodes to parasitize plants. Sequencing of a larger number of nematode genomes, including plant parasites that show different modes of parasitism or that have evolved in currently unsampled clades, and using free-living taxa as comparators would allow more detailed analysis and a better understanding of the organization of key genes within the genomes. This would facilitate a more complete understanding of the way in which parasitism has shaped the genomes of plant-parasitic nematodes.

  10. Analysis of cis-elements that facilitate extrachromosomal persistence of human papillomavirus genomes

    International Nuclear Information System (INIS)

    Pittayakhajonwut, Daraporn; Angeletti, Peter C.

    2008-01-01

    Human papillomaviruses (HPVs) are maintained latently in dividing epithelial cells as nuclear plasmids. Two virally encoded proteins, E1, a helicase, and E2, a transcription factor, are important players in replication and stable plasmid maintenance in host cells. Recent experiments in yeast have demonstrated that viral genomes retain replication and maintenance function independently of E1 and E2 [Angeletti, P.C., Kim, K., Fernandes, F.J., and Lambert, P.F. (2002). Stable replication of papillomavirus genomes in Saccharomyces cerevisiae. J. Virol. 76(7), 3350-8; Kim, K., Angeletti, P.C., Hassebroek, E.C., and Lambert, P.F. (2005). Identification of cis-acting elements that mediate the replication and maintenance of human papillomavirus type 16 genomes in Saccharomyces cerevisiae. J. Virol. 79(10), 5933-42]. Flow cytometry studies of EGFP-reporter vectors containing subgenomic HPV fragments with or without a human ARS (hARS), revealed that six fragments located in E6-E7, E1-E2, L1, and L2 regions showed a capacity for plasmid stabilization in the absence of E1 and E2 proteins. Interestingly, four fragments within E7, the 3' end of L2, and the 5' end of L1 exhibited stability in plasmids that lacked an hARS, indicating that they possess both replication and maintenance functions. Two fragments lying in E1-E2 and the 3' region of L1 were stable only in the presence of hARS, that they contained only maintenance function. Mutational analyses of HPV16-GFP reporter constructs provided evidence that genomes lacking E1 and E2 could replicate to an extent similar to wild type HPV16. Together these results support the concept that cellular factors influence HPV replication and maintenance, independently, and perhaps in conjunction with E1 and E2, suggesting a role in the persistent phase of the viral lifecycle

  11. Ginseng Genome Database: an open-access platform for genomics of Panax ginseng.

    Science.gov (United States)

    Jayakodi, Murukarthick; Choi, Beom-Soon; Lee, Sang-Choon; Kim, Nam-Hoon; Park, Jee Young; Jang, Woojong; Lakshmanan, Meiyappan; Mohan, Shobhana V G; Lee, Dong-Yup; Yang, Tae-Jin

    2018-04-12

    The ginseng (Panax ginseng C.A. Meyer) is a perennial herbaceous plant that has been used in traditional oriental medicine for thousands of years. Ginsenosides, which have significant pharmacological effects on human health, are the foremost bioactive constituents in this plant. Having realized the importance of this plant to humans, an integrated omics resource becomes indispensable to facilitate genomic research, molecular breeding and pharmacological study of this herb. The first draft genome sequences of P. ginseng cultivar "Chunpoong" were reported recently. Here, using the draft genome, transcriptome, and functional annotation datasets of P. ginseng, we have constructed the Ginseng Genome Database http://ginsengdb.snu.ac.kr /, the first open-access platform to provide comprehensive genomic resources of P. ginseng. The current version of this database provides the most up-to-date draft genome sequence (of approximately 3000 Mbp of scaffold sequences) along with the structural and functional annotations for 59,352 genes and digital expression of genes based on transcriptome data from different tissues, growth stages and treatments. In addition, tools for visualization and the genomic data from various analyses are provided. All data in the database were manually curated and integrated within a user-friendly query page. This database provides valuable resources for a range of research fields related to P. ginseng and other species belonging to the Apiales order as well as for plant research communities in general. Ginseng genome database can be accessed at http://ginsengdb.snu.ac.kr /.

  12. Medical big data: promise and challenges.

    Science.gov (United States)

    Lee, Choong Ho; Yoon, Hyung-Jin

    2017-03-01

    The concept of big data, commonly characterized by volume, variety, velocity, and veracity, goes far beyond the data type and includes the aspects of data analysis, such as hypothesis-generating, rather than hypothesis-testing. Big data focuses on temporal stability of the association, rather than on causal relationship and underlying probability distribution assumptions are frequently not required. Medical big data as material to be analyzed has various features that are not only distinct from big data of other disciplines, but also distinct from traditional clinical epidemiology. Big data technology has many areas of application in healthcare, such as predictive modeling and clinical decision support, disease or safety surveillance, public health, and research. Big data analytics frequently exploits analytic methods developed in data mining, including classification, clustering, and regression. Medical big data analyses are complicated by many technical issues, such as missing values, curse of dimensionality, and bias control, and share the inherent limitations of observation study, namely the inability to test causality resulting from residual confounding and reverse causation. Recently, propensity score analysis and instrumental variable analysis have been introduced to overcome these limitations, and they have accomplished a great deal. Many challenges, such as the absence of evidence of practical benefits of big data, methodological issues including legal and ethical issues, and clinical integration and utility issues, must be overcome to realize the promise of medical big data as the fuel of a continuous learning healthcare system that will improve patient outcome and reduce waste in areas including nephrology.

  13. Medical big data: promise and challenges

    Directory of Open Access Journals (Sweden)

    Choong Ho Lee

    2017-03-01

    Full Text Available The concept of big data, commonly characterized by volume, variety, velocity, and veracity, goes far beyond the data type and includes the aspects of data analysis, such as hypothesis-generating, rather than hypothesis-testing. Big data focuses on temporal stability of the association, rather than on causal relationship and underlying probability distribution assumptions are frequently not required. Medical big data as material to be analyzed has various features that are not only distinct from big data of other disciplines, but also distinct from traditional clinical epidemiology. Big data technology has many areas of application in healthcare, such as predictive modeling and clinical decision support, disease or safety surveillance, public health, and research. Big data analytics frequently exploits analytic methods developed in data mining, including classification, clustering, and regression. Medical big data analyses are complicated by many technical issues, such as missing values, curse of dimensionality, and bias control, and share the inherent limitations of observation study, namely the inability to test causality resulting from residual confounding and reverse causation. Recently, propensity score analysis and instrumental variable analysis have been introduced to overcome these limitations, and they have accomplished a great deal. Many challenges, such as the absence of evidence of practical benefits of big data, methodological issues including legal and ethical issues, and clinical integration and utility issues, must be overcome to realize the promise of medical big data as the fuel of a continuous learning healthcare system that will improve patient outcome and reduce waste in areas including nephrology.

  14. Molecular cytogenetic and genomic analyses reveal new insights into the origin of the wheat B genome.

    Science.gov (United States)

    Zhang, Wei; Zhang, Mingyi; Zhu, Xianwen; Cao, Yaping; Sun, Qing; Ma, Guojia; Chao, Shiaoman; Yan, Changhui; Xu, Steven S; Cai, Xiwen

    2018-02-01

    This work pinpointed the goatgrass chromosomal segment in the wheat B genome using modern cytogenetic and genomic technologies, and provided novel insights into the origin of the wheat B genome. Wheat is a typical allopolyploid with three homoeologous subgenomes (A, B, and D). The donors of the subgenomes A and D had been identified, but not for the subgenome B. The goatgrass Aegilops speltoides (genome SS) has been controversially considered a possible candidate for the donor of the wheat B genome. However, the relationship of the Ae. speltoides S genome with the wheat B genome remains largely obscure. The present study assessed the homology of the B and S genomes using an integrative cytogenetic and genomic approach, and revealed the contribution of Ae. speltoides to the origin of the wheat B genome. We discovered noticeable homology between wheat chromosome 1B and Ae. speltoides chromosome 1S, but not between other chromosomes in the B and S genomes. An Ae. speltoides-originated segment spanning a genomic region of approximately 10.46 Mb was detected on the long arm of wheat chromosome 1B (1BL). The Ae. speltoides-originated segment on 1BL was found to co-evolve with the rest of the B genome. Evidently, Ae. speltoides had been involved in the origin of the wheat B genome, but should not be considered an exclusive donor of this genome. The wheat B genome might have a polyphyletic origin with multiple ancestors involved, including Ae. speltoides. These novel findings will facilitate genome studies in wheat and other polyploids.

  15. What is beyond the big five?

    Science.gov (United States)

    Saucier, G; Goldberg, L R

    1998-08-01

    Previous investigators have proposed that various kinds of person-descriptive content--such as differences in attitudes or values, in sheer evaluation, in attractiveness, or in height and girth--are not adequately captured by the Big Five Model. We report on a rather exhaustive search for reliable sources of Big Five-independent variation in data from person-descriptive adjectives. Fifty-three candidate clusters were developed in a college sample using diverse approaches and sources. In a nonstudent adult sample, clusters were evaluated with respect to a minimax criterion: minimum multiple correlation with factors from Big Five markers and maximum reliability. The most clearly Big Five-independent clusters referred to Height, Girth, Religiousness, Employment Status, Youthfulness and Negative Valence (or low-base-rate attributes). Clusters referring to Fashionableness, Sensuality/Seductiveness, Beauty, Masculinity, Frugality, Humor, Wealth, Prejudice, Folksiness, Cunning, and Luck appeared to be potentially beyond the Big Five, although each of these clusters demonstrated Big Five multiple correlations of .30 to .45, and at least one correlation of .20 and over with a Big Five factor. Of all these content areas, Religiousness, Negative Valence, and the various aspects of Attractiveness were found to be represented by a substantial number of distinct, common adjectives. Results suggest directions for supplementing the Big Five when one wishes to extend variable selection outside the domain of personality traits as conventionally defined.

  16. Big Data Analytics and Its Applications

    Directory of Open Access Journals (Sweden)

    Mashooque A. Memon

    2017-10-01

    Full Text Available The term, Big Data, has been authored to refer to the extensive heave of data that can't be managed by traditional data handling methods or techniques. The field of Big Data plays an indispensable role in various fields, such as agriculture, banking, data mining, education, chemistry, finance, cloud computing, marketing, health care stocks. Big data analytics is the method for looking at big data to reveal hidden patterns, incomprehensible relationship and other important data that can be utilize to resolve on enhanced decisions. There has been a perpetually expanding interest for big data because of its fast development and since it covers different areas of applications. Apache Hadoop open source technology created in Java and keeps running on Linux working framework was used. The primary commitment of this exploration is to display an effective and free solution for big data application in a distributed environment, with its advantages and indicating its easy use. Later on, there emerge to be a required for an analytical review of new developments in the big data technology. Healthcare is one of the best concerns of the world. Big data in healthcare imply to electronic health data sets that are identified with patient healthcare and prosperity. Data in the healthcare area is developing past managing limit of the healthcare associations and is relied upon to increment fundamentally in the coming years.

  17. Alternatives to relational databases in precision medicine: Comparison of NoSQL approaches for big data storage using supercomputers

    Science.gov (United States)

    Velazquez, Enrique Israel

    Improvements in medical and genomic technologies have dramatically increased the production of electronic data over the last decade. As a result, data management is rapidly becoming a major determinant, and urgent challenge, for the development of Precision Medicine. Although successful data management is achievable using Relational Database Management Systems (RDBMS), exponential data growth is a significant contributor to failure scenarios. Growing amounts of data can also be observed in other sectors, such as economics and business, which, together with the previous facts, suggests that alternate database approaches (NoSQL) may soon be required for efficient storage and management of big databases. However, this hypothesis has been difficult to test in the Precision Medicine field since alternate database architectures are complex to assess and means to integrate heterogeneous electronic health records (EHR) with dynamic genomic data are not easily available. In this dissertation, we present a novel set of experiments for identifying NoSQL database approaches that enable effective data storage and management in Precision Medicine using patients' clinical and genomic information from the cancer genome atlas (TCGA). The first experiment draws on performance and scalability from biologically meaningful queries with differing complexity and database sizes. The second experiment measures performance and scalability in database updates without schema changes. The third experiment assesses performance and scalability in database updates with schema modifications due dynamic data. We have identified two NoSQL approach, based on Cassandra and Redis, which seems to be the ideal database management systems for our precision medicine queries in terms of performance and scalability. We present NoSQL approaches and show how they can be used to manage clinical and genomic big data. Our research is relevant to the public health since we are focusing on one of the main

  18. Measuring the Promise of Big Data Syllabi

    Science.gov (United States)

    Friedman, Alon

    2018-01-01

    Growing interest in Big Data is leading industries, academics and governments to accelerate Big Data research. However, how teachers should teach Big Data has not been fully examined. This article suggests criteria for redesigning Big Data syllabi in public and private degree-awarding higher education establishments. The author conducted a survey…

  19. 77 FR 27245 - Big Stone National Wildlife Refuge, Big Stone and Lac Qui Parle Counties, MN

    Science.gov (United States)

    2012-05-09

    ... DEPARTMENT OF THE INTERIOR Fish and Wildlife Service [FWS-R3-R-2012-N069; FXRS1265030000S3-123-FF03R06000] Big Stone National Wildlife Refuge, Big Stone and Lac Qui Parle Counties, MN AGENCY: Fish and... plan (CCP) and environmental assessment (EA) for Big Stone National Wildlife Refuge (Refuge, NWR) for...

  20. The Genome 10K Project: a way forward.

    Science.gov (United States)

    Koepfli, Klaus-Peter; Paten, Benedict; O'Brien, Stephen J

    2015-01-01

    The Genome 10K Project was established in 2009 by a consortium of biologists and genome scientists determined to facilitate the sequencing and analysis of the complete genomes of 10,000 vertebrate species. Since then the number of selected and initiated species has risen from ∼26 to 277 sequenced or ongoing with funding, an approximately tenfold increase in five years. Here we summarize the advances and commitments that have occurred by mid-2014 and outline the achievements and present challenges of reaching the 10,000-species goal. We summarize the status of known vertebrate genome projects, recommend standards for pronouncing a genome as sequenced or completed, and provide our present and future vision of the landscape of Genome 10K. The endeavor is ambitious, bold, expensive, and uncertain, but together the Genome 10K Consortium of Scientists and the worldwide genomics community are moving toward their goal of delivering to the coming generation the gift of genome empowerment for many vertebrate species.

  1. The BigBoss Experiment

    Energy Technology Data Exchange (ETDEWEB)

    Schelgel, D.; Abdalla, F.; Abraham, T.; Ahn, C.; Allende Prieto, C.; Annis, J.; Aubourg, E.; Azzaro, M.; Bailey, S.; Baltay, C.; Baugh, C.; Bebek, C.; Becerril, S.; Blanton, M.; Bolton, A.; Bromley, B.; Cahn, R.; Carton, P.-H.; Cervanted-Cota, J.L.; Chu, Y.; Cortes, M.; /APC, Paris /Brookhaven /IRFU, Saclay /Marseille, CPPM /Marseille, CPT /Durham U. / /IEU, Seoul /Fermilab /IAA, Granada /IAC, La Laguna / /IAC, Mexico / / /Madrid, IFT /Marseille, Lab. Astrophys. / / /New York U. /Valencia U.

    2012-06-07

    BigBOSS is a Stage IV ground-based dark energy experiment to study baryon acoustic oscillations (BAO) and the growth of structure with a wide-area galaxy and quasar redshift survey over 14,000 square degrees. It has been conditionally accepted by NOAO in response to a call for major new instrumentation and a high-impact science program for the 4-m Mayall telescope at Kitt Peak. The BigBOSS instrument is a robotically-actuated, fiber-fed spectrograph capable of taking 5000 simultaneous spectra over a wavelength range from 340 nm to 1060 nm, with a resolution R = {lambda}/{Delta}{lambda} = 3000-4800. Using data from imaging surveys that are already underway, spectroscopic targets are selected that trace the underlying dark matter distribution. In particular, targets include luminous red galaxies (LRGs) up to z = 1.0, extending the BOSS LRG survey in both redshift and survey area. To probe the universe out to even higher redshift, BigBOSS will target bright [OII] emission line galaxies (ELGs) up to z = 1.7. In total, 20 million galaxy redshifts are obtained to measure the BAO feature, trace the matter power spectrum at smaller scales, and detect redshift space distortions. BigBOSS will provide additional constraints on early dark energy and on the curvature of the universe by measuring the Ly-alpha forest in the spectra of over 600,000 2.2 < z < 3.5 quasars. BigBOSS galaxy BAO measurements combined with an analysis of the broadband power, including the Ly-alpha forest in BigBOSS quasar spectra, achieves a FOM of 395 with Planck plus Stage III priors. This FOM is based on conservative assumptions for the analysis of broad band power (k{sub max} = 0.15), and could grow to over 600 if current work allows us to push the analysis to higher wave numbers (k{sub max} = 0.3). BigBOSS will also place constraints on theories of modified gravity and inflation, and will measure the sum of neutrino masses to 0.024 eV accuracy.

  2. Bridging the gap between Big Genome Data Analysis and Database Management Systems

    NARCIS (Netherlands)

    C.P. Cijvat (Robin)

    2014-01-01

    textabstractThe bioinformatics field has encountered a data deluge over the last years, due to in- creasing speed and decreasing cost of DNA sequencing technology. Today, sequencing the DNA of a single genome only takes about a week, and it can result in up to a ter- abyte of data. The sequencing

  3. The Naked Mole Rat Genome Resource : facilitating analyses of cancer and longevity-related adaptations

    OpenAIRE

    Keane, Michael; Craig, Thomas; Alfoldi, Jessica; Berlin, Aaron M; Johnson, Jeremy; Seluanov, Andrei; Gorbunova, Vera; Di Palma, Federica; Lindblad-Toh, Kerstin; Church, George M; de Magalhaes, Joao Pedro

    2014-01-01

    MOTIVATION: The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. RESULTS: We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat's extraordinary traits, inc...

  4. Big data and educational research

    OpenAIRE

    Beneito-Montagut, Roser

    2017-01-01

    Big data and data analytics offer the promise to enhance teaching and learning, improve educational research and progress education governance. This chapter aims to contribute to the conceptual and methodological understanding of big data and analytics within educational research. It describes the opportunities and challenges that big data and analytics bring to education as well as critically explore the perils of applying a data driven approach to education. Despite the claimed value of the...

  5. Thick-Big Descriptions

    DEFF Research Database (Denmark)

    Lai, Signe Sophus

    The paper discusses the rewards and challenges of employing commercial audience measurements data – gathered by media industries for profitmaking purposes – in ethnographic research on the Internet in everyday life. It questions claims to the objectivity of big data (Anderson 2008), the assumption...... communication systems, language and behavior appear as texts, outputs, and discourses (data to be ‘found’) – big data then documents things that in earlier research required interviews and observations (data to be ‘made’) (Jensen 2014). However, web-measurement enterprises build audiences according...... to a commercial logic (boyd & Crawford 2011) and is as such directed by motives that call for specific types of sellable user data and specific segmentation strategies. In combining big data and ‘thick descriptions’ (Geertz 1973) scholars need to question how ethnographic fieldwork might map the ‘data not seen...

  6. GIGGLE: a search engine for large-scale integrated genome analysis.

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-02-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

  7. Resources for Functional Genomics Studies in Drosophila melanogaster

    Science.gov (United States)

    Mohr, Stephanie E.; Hu, Yanhui; Kim, Kevin; Housden, Benjamin E.; Perrimon, Norbert

    2014-01-01

    Drosophila melanogaster has become a system of choice for functional genomic studies. Many resources, including online databases and software tools, are now available to support design or identification of relevant fly stocks and reagents or analysis and mining of existing functional genomic, transcriptomic, proteomic, etc. datasets. These include large community collections of fly stocks and plasmid clones, “meta” information sites like FlyBase and FlyMine, and an increasing number of more specialized reagents, databases, and online tools. Here, we introduce key resources useful to plan large-scale functional genomics studies in Drosophila and to analyze, integrate, and mine the results of those studies in ways that facilitate identification of highest-confidence results and generation of new hypotheses. We also discuss ways in which existing resources can be used and might be improved and suggest a few areas of future development that would further support large- and small-scale studies in Drosophila and facilitate use of Drosophila information by the research community more generally. PMID:24653003

  8. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    Science.gov (United States)

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813

  9. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle

    DEFF Research Database (Denmark)

    Daetwyler, Hans D; Capitan, Aurélien; Pausch, Hubert

    2014-01-01

    The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls. In the first phase of the 1000 bull genomes p...

  10. Big Data's Role in Precision Public Health.

    Science.gov (United States)

    Dolley, Shawn

    2018-01-01

    Precision public health is an emerging practice to more granularly predict and understand public health risks and customize treatments for more specific and homogeneous subpopulations, often using new data, technologies, and methods. Big data is one element that has consistently helped to achieve these goals, through its ability to deliver to practitioners a volume and variety of structured or unstructured data not previously possible. Big data has enabled more widespread and specific research and trials of stratifying and segmenting populations at risk for a variety of health problems. Examples of success using big data are surveyed in surveillance and signal detection, predicting future risk, targeted interventions, and understanding disease. Using novel big data or big data approaches has risks that remain to be resolved. The continued growth in volume and variety of available data, decreased costs of data capture, and emerging computational methods mean big data success will likely be a required pillar of precision public health into the future. This review article aims to identify the precision public health use cases where big data has added value, identify classes of value that big data may bring, and outline the risks inherent in using big data in precision public health efforts.

  11. Big Data, indispensable today

    Directory of Open Access Journals (Sweden)

    Radu-Ioan ENACHE

    2015-10-01

    Full Text Available Big data is and will be used more in the future as a tool for everything that happens both online and offline. Of course , online is a real hobbit, Big Data is found in this medium , offering many advantages , being a real help for all consumers. In this paper we talked about Big Data as being a plus in developing new applications, by gathering useful information about the users and their behaviour.We've also presented the key aspects of real-time monitoring and the architecture principles of this technology. The most important benefit brought to this paper is presented in the cloud section.

  12. BGD: a database of bat genomes.

    Science.gov (United States)

    Fang, Jianfei; Wang, Xuan; Mu, Shuo; Zhang, Shuyi; Dong, Dong

    2015-01-01

    Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.

  13. BGD: a database of bat genomes.

    Directory of Open Access Journals (Sweden)

    Jianfei Fang

    Full Text Available Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD. BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/.

  14. Antigravity and the big crunch/big bang transition

    Science.gov (United States)

    Bars, Itzhak; Chen, Shih-Hung; Steinhardt, Paul J.; Turok, Neil

    2012-08-01

    We point out a new phenomenon which seems to be generic in 4d effective theories of scalar fields coupled to Einstein gravity, when applied to cosmology. A lift of such theories to a Weyl-invariant extension allows one to define classical evolution through cosmological singularities unambiguously, and hence construct geodesically complete background spacetimes. An attractor mechanism ensures that, at the level of the effective theory, generic solutions undergo a big crunch/big bang transition by contracting to zero size, passing through a brief antigravity phase, shrinking to zero size again, and re-emerging into an expanding normal gravity phase. The result may be useful for the construction of complete bouncing cosmologies like the cyclic model.

  15. Antigravity and the big crunch/big bang transition

    Energy Technology Data Exchange (ETDEWEB)

    Bars, Itzhak [Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089-2535 (United States); Chen, Shih-Hung [Perimeter Institute for Theoretical Physics, Waterloo, ON N2L 2Y5 (Canada); Department of Physics and School of Earth and Space Exploration, Arizona State University, Tempe, AZ 85287-1404 (United States); Steinhardt, Paul J., E-mail: steinh@princeton.edu [Department of Physics and Princeton Center for Theoretical Physics, Princeton University, Princeton, NJ 08544 (United States); Turok, Neil [Perimeter Institute for Theoretical Physics, Waterloo, ON N2L 2Y5 (Canada)

    2012-08-29

    We point out a new phenomenon which seems to be generic in 4d effective theories of scalar fields coupled to Einstein gravity, when applied to cosmology. A lift of such theories to a Weyl-invariant extension allows one to define classical evolution through cosmological singularities unambiguously, and hence construct geodesically complete background spacetimes. An attractor mechanism ensures that, at the level of the effective theory, generic solutions undergo a big crunch/big bang transition by contracting to zero size, passing through a brief antigravity phase, shrinking to zero size again, and re-emerging into an expanding normal gravity phase. The result may be useful for the construction of complete bouncing cosmologies like the cyclic model.

  16. Antigravity and the big crunch/big bang transition

    International Nuclear Information System (INIS)

    Bars, Itzhak; Chen, Shih-Hung; Steinhardt, Paul J.; Turok, Neil

    2012-01-01

    We point out a new phenomenon which seems to be generic in 4d effective theories of scalar fields coupled to Einstein gravity, when applied to cosmology. A lift of such theories to a Weyl-invariant extension allows one to define classical evolution through cosmological singularities unambiguously, and hence construct geodesically complete background spacetimes. An attractor mechanism ensures that, at the level of the effective theory, generic solutions undergo a big crunch/big bang transition by contracting to zero size, passing through a brief antigravity phase, shrinking to zero size again, and re-emerging into an expanding normal gravity phase. The result may be useful for the construction of complete bouncing cosmologies like the cyclic model.

  17. Big data: een zoektocht naar instituties

    NARCIS (Netherlands)

    van der Voort, H.G.; Crompvoets, J

    2016-01-01

    Big data is a well-known phenomenon, even a buzzword nowadays. It refers to an abundance of data and new possibilities to process and use them. Big data is subject of many publications. Some pay attention to the many possibilities of big data, others warn us for their consequences. This special

  18. Data, Data, Data : Big, Linked & Open

    NARCIS (Netherlands)

    Folmer, E.J.A.; Krukkert, D.; Eckartz, S.M.

    2013-01-01

    De gehele business en IT-wereld praat op dit moment over Big Data, een trend die medio 2013 Cloud Computing is gepasseerd (op basis van Google Trends). Ook beleidsmakers houden zich actief bezig met Big Data. Neelie Kroes, vice-president van de Europese Commissie, spreekt over de ‘Big Data

  19. Soil biogeochemistry in the age of big data

    Science.gov (United States)

    Cécillon, Lauric; Barré, Pierre; Coissac, Eric; Plante, Alain; Rasse, Daniel

    2015-04-01

    Data is becoming one of the key resource of the XXIst century. Soil biogeochemistry is not spared by this new movement. The conservation of soils and their services recently came into the political agenda. However, clear knowledge on the links between soil characteristics and the various processes ensuring the provision of soil services is rare at the molecular or the plot scale, and does not exist at the landscape scale. This split between society's expectations on its natural capital, and scientific knowledge on the most complex material on earth has lead to an increasing number of studies on soils, using an increasing number of techniques of increasing complexity, with an increasing spatial and temporal coverage. From data scarcity with a basic data management system, soil biogeochemistry is now facing a proliferation of data, with few quality controls from data collection to publication and few skills to deal with them. Based on this observation, here we (1) address how big data could help in making sense of all these soil biogeochemical data, (2) point out several shortcomings of big data that most biogeochemists will experience in their future career. Massive storage of data is now common and recent opportunities for cloud storage enables data sharing among researchers all over the world. The need for integrative and collaborative computational databases in soil biogeochemistry is emerging through pioneering initiatives in this direction (molTERdb; earthcube), following soil microbiologists (GenBank). We expect that a series of data storage and management systems will rapidly revolutionize the way of accessing raw biogeochemical data, published or not. Data mining techniques combined with cluster or cloud computing hold significant promises for facilitating the use of complex analytical methods, and for revealing new insights previously hidden in complex data on soil mineralogy, organic matter and biodiversity. Indeed, important scientific advances have

  20. Methods and tools for big data visualization

    OpenAIRE

    Zubova, Jelena; Kurasova, Olga

    2015-01-01

    In this paper, methods and tools for big data visualization have been investigated. Challenges faced by the big data analysis and visualization have been identified. Technologies for big data analysis have been discussed. A review of methods and tools for big data visualization has been done. Functionalities of the tools have been demonstrated by examples in order to highlight their advantages and disadvantages.

  1. Sunflower Hybrid Breeding: From Markers to Genomic Selection.

    Science.gov (United States)

    Dimitrijevic, Aleksandra; Horn, Renate

    2017-01-01

    In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi , or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches

  2. Sunflower Hybrid Breeding: From Markers to Genomic Selection

    Directory of Open Access Journals (Sweden)

    Aleksandra Dimitrijevic

    2018-01-01

    Full Text Available In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare

  3. Sunflower Hybrid Breeding: From Markers to Genomic Selection

    Science.gov (United States)

    Dimitrijevic, Aleksandra; Horn, Renate

    2018-01-01

    In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches

  4. The Big bang and the Quantum

    Science.gov (United States)

    Ashtekar, Abhay

    2010-06-01

    General relativity predicts that space-time comes to an end and physics comes to a halt at the big-bang. Recent developments in loop quantum cosmology have shown that these predictions cannot be trusted. Quantum geometry effects can resolve singularities, thereby opening new vistas. Examples are: The big bang is replaced by a quantum bounce; the `horizon problem' disappears; immediately after the big bounce, there is a super-inflationary phase with its own phenomenological ramifications; and, in presence of a standard inflation potential, initial conditions are naturally set for a long, slow roll inflation independently of what happens in the pre-big bang branch. As in my talk at the conference, I will first discuss the foundational issues and then the implications of the new Planck scale physics near the Big Bang.

  5. Discovery of Nigri/nox and Panto/pox site-specific recombinase systems facilitates advanced genome engineering.

    Science.gov (United States)

    Karimova, Madina; Splith, Victoria; Karpinski, Janet; Pisabarro, M Teresa; Buchholz, Frank

    2016-07-22

    Precise genome engineering is instrumental for biomedical research and holds great promise for future therapeutic applications. Site-specific recombinases (SSRs) are valuable tools for genome engineering due to their exceptional ability to mediate precise excision, integration and inversion of genomic DNA in living systems. The ever-increasing complexity of genome manipulations and the desire to understand the DNA-binding specificity of these enzymes are driving efforts to identify novel SSR systems with unique properties. Here, we describe two novel tyrosine site-specific recombination systems designated Nigri/nox and Panto/pox. Nigri originates from Vibrio nigripulchritudo (plasmid VIBNI_pA) and recombines its target site nox with high efficiency and high target-site selectivity, without recombining target sites of the well established SSRs Cre, Dre, Vika and VCre. Panto, derived from Pantoea sp. aB, is less specific and in addition to its native target site, pox also recombines the target site for Dre recombinase, called rox. This relaxed specificity allowed the identification of residues that are involved in target site selectivity, thereby advancing our understanding of how SSRs recognize their respective DNA targets.

  6. Sequencing of a new target genome: the Pediculus humanus humanus (Phthiraptera: Pediculidae) genome project.

    Science.gov (United States)

    Pittendrigh, B R; Clark, J M; Johnston, J S; Lee, S H; Romero-Severson, J; Dasch, G A

    2006-11-01

    The human body louse, Pediculus humanus humanus (L.), and the human head louse, Pediculus humanus capitis, belong to the hemimetabolous order Phthiraptera. The body louse is the primary vector that transmits the bacterial agents of louse-borne relapsing fever, trench fever, and epidemic typhus. The genomes of the bacterial causative agents of several of these aforementioned diseases have been sequenced. Thus, determining the body louse genome will enhance studies of host-vector-pathogen interactions. Although not important as a major disease vector, head lice are of major social concern. Resistance to traditional pesticides used to control head and body lice have developed. It is imperative that new molecular targets be discovered for the development of novel compounds to control these insects. No complete genome sequence exists for a hemimetabolous insect species primarily because hemimetabolous insects often have large (2000 Mb) to very large (up to 16,300 Mb) genomes. Fortuitously, we determined that the human body louse has one of the smallest genome sizes known in insects, suggesting it may be a suitable choice as a minimal hemimetabolous genome in which many genes have been eliminated during its adaptation to human parasitism. Because many louse species infest birds and mammals, the body louse genome-sequencing project will facilitate studies of their comparative genomics. A 6-8X coverage of the body louse genome, plus sequenced expressed sequence tags, should provide the entomological, evolutionary biology, medical, and public health communities with useful genetic information.

  7. [Evolution of genomic imprinting in mammals: what a zoo!].

    Science.gov (United States)

    Proudhon, Charlotte; Bourc'his, Déborah

    2010-05-01

    Genomic imprinting imposes an obligate mode of biparental reproduction in mammals. This phenomenon results from the monoparental expression of a subset of genes. This specific gene regulation mechanism affects viviparous mammals, especially eutherians, but also marsupials to a lesser extent. Oviparous mammals, or monotremes, do not seem to demonstrate monoparental allele expression. This phylogenic confinement suggests that the evolution of the placenta imposed a selective pressure for the emergence of genomic imprinting. This physiological argument is now complemented by recent genomic evidence facilitated by the sequencing of the platypus genome, a rare modern day case of a monotreme. Analysis of the platypus genome in comparison to eutherian genomes shows a chronological and functional coincidence between the appearance of genomic imprinting and transposable element accumulation. The systematic comparative analyses of genomic sequences in different species is essential for the further understanding of genomic imprinting emergence and divergent evolution along mammalian speciation.

  8. Big Bang baryosynthesis

    International Nuclear Information System (INIS)

    Turner, M.S.; Chicago Univ., IL

    1983-01-01

    In these lectures I briefly review Big Bang baryosynthesis. In the first lecture I discuss the evidence which exists for the BAU, the failure of non-GUT symmetrical cosmologies, the qualitative picture of baryosynthesis, and numerical results of detailed baryosynthesis calculations. In the second lecture I discuss the requisite CP violation in some detail, further the statistical mechanics of baryosynthesis, possible complications to the simplest scenario, and one cosmological implication of Big Bang baryosynthesis. (orig./HSI)

  9. Exploiting big data for critical care research.

    Science.gov (United States)

    Docherty, Annemarie B; Lone, Nazir I

    2015-10-01

    Over recent years the digitalization, collection and storage of vast quantities of data, in combination with advances in data science, has opened up a new era of big data. In this review, we define big data, identify examples of critical care research using big data, discuss the limitations and ethical concerns of using these large datasets and finally consider scope for future research. Big data refers to datasets whose size, complexity and dynamic nature are beyond the scope of traditional data collection and analysis methods. The potential benefits to critical care are significant, with faster progress in improving health and better value for money. Although not replacing clinical trials, big data can improve their design and advance the field of precision medicine. However, there are limitations to analysing big data using observational methods. In addition, there are ethical concerns regarding maintaining confidentiality of patients who contribute to these datasets. Big data have the potential to improve medical care and reduce costs, both by individualizing medicine, and bringing together multiple sources of data about individual patients. As big data become increasingly mainstream, it will be important to maintain public confidence by safeguarding data security, governance and confidentiality.

  10. Empathy and the Big Five

    OpenAIRE

    Paulus, Christoph

    2016-01-01

    Del Barrio et al. (2004) haben vor mehr als 10 Jahren versucht, eine direkte Beziehung zwischen Empathie und den Big Five herzustellen. Im Mittel hatten in ihrer Stichprobe Frauen höhere Werte in der Empathie und auf den Big Five-Faktoren mit Ausnahme des Faktors Neurotizismus. Zusammenhänge zu Empathie fanden sie in den Bereichen Offenheit, Verträglichkeit, Gewissenhaftigkeit und Extraversion. In unseren Daten besitzen Frauen sowohl in der Empathie als auch den Big Five signifikant höhere We...

  11. Big data management challenges in health research-a literature review.

    Science.gov (United States)

    Wang, Xiaoming; Williams, Carolyn; Liu, Zhen Hua; Croghan, Joe

    2017-08-07

    Big data management for information centralization (i.e. making data of interest findable) and integration (i.e. making related data connectable) in health research is a defining challenge in biomedical informatics. While essential to create a foundation for knowledge discovery, optimized solutions to deliver high-quality and easy-to-use information resources are not thoroughly explored. In this review, we identify the gaps between current data management approaches and the need for new capacity to manage big data generated in advanced health research. Focusing on these unmet needs and well-recognized problems, we introduce state-of-the-art concepts, approaches and technologies for data management from computing academia and industry to explore improvement solutions. We explain the potential and significance of these advances for biomedical informatics. In addition, we discuss specific issues that have a great impact on technical solutions for developing the next generation of digital products (tools and data) to facilitate the raw-data-to-knowledge process in health research. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.

  12. Single virus genomics: a new tool for virus discovery.

    Directory of Open Access Journals (Sweden)

    Lisa Zeigler Allen

    Full Text Available Whole genome amplification and sequencing of single microbial cells has significantly influenced genomics and microbial ecology by facilitating direct recovery of reference genome data. However, viral genomics continues to suffer due to difficulties related to the isolation and characterization of uncultivated viruses. We report here on a new approach called 'Single Virus Genomics', which enabled the isolation and complete genome sequencing of the first single virus particle. A mixed assemblage comprised of two known viruses; E. coli bacteriophages lambda and T4, were sorted using flow cytometric methods and subsequently immobilized in an agarose matrix. Genome amplification was then achieved in situ via multiple displacement amplification (MDA. The complete lambda phage genome was recovered with an average depth of coverage of approximately 437X. The isolation and genome sequencing of uncultivated viruses using Single Virus Genomics approaches will enable researchers to address questions about viral diversity, evolution, adaptation and ecology that were previously unattainable.

  13. The Role of Distributed Computing in Big Data Science: Case Studies in Forensics and Bioinformatics

    OpenAIRE

    Roscigno, Gianluca

    2016-01-01

    2014 - 2015 The era of Big Data is leading the generation of large amounts of data, which require storage and analysis capabilities that can be only ad- dressed by distributed computing systems. To facilitate large-scale distributed computing, many programming paradigms and frame- works have been proposed, such as MapReduce and Apache Hadoop, which transparently address some issues of distributed systems and hide most of their technical details. Hadoop is curren...

  14. Next Generation DNA Sequencing and the Future of Genomic Medicine

    OpenAIRE

    Anderson, Matthew W.; Schrijver, Iris

    2010-01-01

    In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpreta...

  15. GIGGLE: a search engine for large-scale integrated genome analysis

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-01-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation. PMID:29309061

  16. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.

    Science.gov (United States)

    Tanizawa, Yasuhiro; Fujisawa, Takatomo; Nakamura, Yasukazu

    2018-03-15

    We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 min, with rich information such as pseudogenes, translation exceptions and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future. The software is implemented in Python 3 and runs in both Python 2.7 and 3.4-on Macintosh and Linux systems. It is freely available at https://github.com/nigyta/dfast_core/under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at https://dfast.nig.ac.jp/. yn@nig.ac.jp. Supplementary data are available at Bioinformatics online.

  17. Making sense of big data in health research: Towards an EU action plan

    CERN Document Server

    Auffray, Charles; Barroso, Inês; Bencze, László; Benson, Mikael; Bergeron, Jay; Bernal-Delgado, Enrique; Blomberg, Niklas; Bock, Christoph; Conesa, Ana; Del Signore, Susanna; Delogne, Christophe; Devilee, Peter; Di Meglio, Alberto; Eijkemans, Marinus; Flicek, Paul; Graf, Norbert; Grimm, Vera; Guchelaar, Henk-Jan; Guo, Yi-Ke; Gut, Ivo Glynne; Hanbury, Allan; Hanif, Shahid; Hilgers, Ralf-Dieter; Honrado, Ángel; Hose, D. Rod; Houwing-Duistermaat, Jeanine; Hubbard, Tim; Janacek, Sophie Helen; Karanikas, Haralampos; Kievits, Tim; Kohler, Manfred; Kremer, Andreas; Lanfear, Jerry; Lengauer, Thomas; Maes, Edith; Meert, Theo; Müller, Werner; Nickel, Dörthe; Oledzki, Peter; Pedersen, Bertrand; Petkovic, Milan; Pliakos, Konstantinos; Rattray, Magnus; i Màs, Josep Redón; Schneider, Reinhard; Sengstag, Thierry; Serra-Picamal, Xavier; Spek, Wouter; Vaas, Lea A. I.; van Batenburg, Okker; Vandelaer, Marc; Varnai, Peter; Villoslada, Pablo; Vizcaíno, Juan Antonio; Wubbe, John Peter Mary; Zanetti, Gianluigi

    2016-01-01

    Medicine and healthcare are undergoing profound changes. Whole-genome sequencing and high-resolution imaging technologies are key drivers of this rapid and crucial transformation. Technological innovation combined with automation and miniaturization has triggered an explosion in data production that will soon reach exabyte proportions. How are we going to deal with this exponential increase in data production? The potential of “big data” for improving health is enormous but, at the same time, we face a wide range of challenges to overcome urgently. Europe is very proud of its cultural diversity; however, exploitation of the data made available through advances in genomic medicine, imaging, and a wide range of mobile health applications or connected devices is hampered by numerous historical, technical, legal, and political barriers. European health systems and databases are diverse and fragmented. There is a lack of harmonization of data formats, processing, analysis, and data transfer, which leads to inc...

  18. gEVE: a genome-based endogenous viral element database provides comprehensive viral protein-coding sequences in mammalian genomes.

    Science.gov (United States)

    Nakagawa, So; Takahashi, Mahoko Ueda

    2016-01-01

    In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. © The Author(s) 2016. Published by Oxford University Press.

  19. Big domains are novel Ca²+-binding modules: evidences from big domains of Leptospira immunoglobulin-like (Lig) proteins.

    Science.gov (United States)

    Raman, Rajeev; Rajanikanth, V; Palaniappan, Raghavan U M; Lin, Yi-Pin; He, Hongxuan; McDonough, Sean P; Sharma, Yogendra; Chang, Yung-Fu

    2010-12-29

    Many bacterial surface exposed proteins mediate the host-pathogen interaction more effectively in the presence of Ca²+. Leptospiral immunoglobulin-like (Lig) proteins, LigA and LigB, are surface exposed proteins containing Bacterial immunoglobulin like (Big) domains. The function of proteins which contain Big fold is not known. Based on the possible similarities of immunoglobulin and βγ-crystallin folds, we here explore the important question whether Ca²+ binds to a Big domains, which would provide a novel functional role of the proteins containing Big fold. We selected six individual Big domains for this study (three from the conserved part of LigA and LigB, denoted as Lig A3, Lig A4, and LigBCon5; two from the variable region of LigA, i.e., 9(th) (Lig A9) and 10(th) repeats (Lig A10); and one from the variable region of LigB, i.e., LigBCen2. We have also studied the conserved region covering the three and six repeats (LigBCon1-3 and LigCon). All these proteins bind the calcium-mimic dye Stains-all. All the selected four domains bind Ca²+ with dissociation constants of 2-4 µM. Lig A9 and Lig A10 domains fold well with moderate thermal stability, have β-sheet conformation and form homodimers. Fluorescence spectra of Big domains show a specific doublet (at 317 and 330 nm), probably due to Trp interaction with a Phe residue. Equilibrium unfolding of selected Big domains is similar and follows a two-state model, suggesting the similarity in their fold. We demonstrate that the Lig are Ca²+-binding proteins, with Big domains harbouring the binding motif. We conclude that despite differences in sequence, a Big motif binds Ca²+. This work thus sets up a strong possibility for classifying the proteins containing Big domains as a novel family of Ca²+-binding proteins. Since Big domain is a part of many proteins in bacterial kingdom, we suggest a possible function these proteins via Ca²+ binding.

  20. Parallel computing in genomic research: advances and applications

    Directory of Open Access Journals (Sweden)

    Ocaña K

    2015-11-01

    Full Text Available Kary Ocaña,1 Daniel de Oliveira2 1National Laboratory of Scientific Computing, Petrópolis, Rio de Janeiro, 2Institute of Computing, Fluminense Federal University, Niterói, Brazil Abstract: Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. Keywords: high-performance computing, genomic research, cloud computing, grid computing, cluster computing, parallel computing

  1. Harnessing CRISPR-Cas systems for bacterial genome editing.

    Science.gov (United States)

    Selle, Kurt; Barrangou, Rodolphe

    2015-04-01

    Manipulation of genomic sequences facilitates the identification and characterization of key genetic determinants in the investigation of biological processes. Genome editing via clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) constitutes a next-generation method for programmable and high-throughput functional genomics. CRISPR-Cas systems are readily reprogrammed to induce sequence-specific DNA breaks at target loci, resulting in fixed mutations via host-dependent DNA repair mechanisms. Although bacterial genome editing is a relatively unexplored and underrepresented application of CRISPR-Cas systems, recent studies provide valuable insights for the widespread future implementation of this technology. This review summarizes recent progress in bacterial genome editing and identifies fundamental genetic and phenotypic outcomes of CRISPR targeting in bacteria, in the context of tool development, genome homeostasis, and DNA repair. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Semantic Web Technologies and Big Data Infrastructures: SPARQL Federated Querying of Heterogeneous Big Data Stores

    OpenAIRE

    Konstantopoulos, Stasinos; Charalambidis, Angelos; Mouchakis, Giannis; Troumpoukis, Antonis; Jakobitsch, Jürgen; Karkaletsis, Vangelis

    2016-01-01

    The ability to cross-link large scale data with each other and with structured Semantic Web data, and the ability to uniformly process Semantic Web and other data adds value to both the Semantic Web and to the Big Data community. This paper presents work in progress towards integrating Big Data infrastructures with Semantic Web technologies, allowing for the cross-linking and uniform retrieval of data stored in both Big Data infrastructures and Semantic Web data. The technical challenges invo...

  3. Quantum fields in a big-crunch-big-bang spacetime

    International Nuclear Information System (INIS)

    Tolley, Andrew J.; Turok, Neil

    2002-01-01

    We consider quantum field theory on a spacetime representing the big-crunch-big-bang transition postulated in ekpyrotic or cyclic cosmologies. We show via several independent methods that an essentially unique matching rule holds connecting the incoming state, in which a single extra dimension shrinks to zero, to the outgoing state in which it reexpands at the same rate. For free fields in our construction there is no particle production from the incoming adiabatic vacuum. When interactions are included the particle production for fixed external momentum is finite at the tree level. We discuss a formal correspondence between our construction and quantum field theory on de Sitter spacetime

  4. Turning big bang into big bounce: II. Quantum dynamics

    Energy Technology Data Exchange (ETDEWEB)

    Malkiewicz, Przemyslaw; Piechocki, Wlodzimierz, E-mail: pmalk@fuw.edu.p, E-mail: piech@fuw.edu.p [Theoretical Physics Department, Institute for Nuclear Studies, Hoza 69, 00-681 Warsaw (Poland)

    2010-11-21

    We analyze the big bounce transition of the quantum Friedmann-Robertson-Walker model in the setting of the nonstandard loop quantum cosmology (LQC). Elementary observables are used to quantize composite observables. The spectrum of the energy density operator is bounded and continuous. The spectrum of the volume operator is bounded from below and discrete. It has equally distant levels defining a quantum of the volume. The discreteness may imply a foamy structure of spacetime at a semiclassical level which may be detected in astro-cosmo observations. The nonstandard LQC method has a free parameter that should be fixed in some way to specify the big bounce transition.

  5. Scaling Big Data Cleansing

    KAUST Repository

    Khayyat, Zuhair

    2017-07-31

    Data cleansing approaches have usually focused on detecting and fixing errors with little attention to big data scaling. This presents a serious impediment since identify- ing and repairing dirty data often involves processing huge input datasets, handling sophisticated error discovery approaches and managing huge arbitrary errors. With large datasets, error detection becomes overly expensive and complicated especially when considering user-defined functions. Furthermore, a distinctive algorithm is de- sired to optimize inequality joins in sophisticated error discovery rather than na ̈ıvely parallelizing them. Also, when repairing large errors, their skewed distribution may obstruct effective error repairs. In this dissertation, I present solutions to overcome the above three problems in scaling data cleansing. First, I present BigDansing as a general system to tackle efficiency, scalability, and ease-of-use issues in data cleansing for Big Data. It automatically parallelizes the user’s code on top of general-purpose distributed platforms. Its programming inter- face allows users to express data quality rules independently from the requirements of parallel and distributed environments. Without sacrificing their quality, BigDans- ing also enables parallel execution of serial repair algorithms by exploiting the graph representation of discovered errors. The experimental results show that BigDansing outperforms existing baselines up to more than two orders of magnitude. Although BigDansing scales cleansing jobs, it still lacks the ability to handle sophisticated error discovery requiring inequality joins. Therefore, I developed IEJoin as an algorithm for fast inequality joins. It is based on sorted arrays and space efficient bit-arrays to reduce the problem’s search space. By comparing IEJoin against well- known optimizations, I show that it is more scalable, and several orders of magnitude faster. BigDansing depends on vertex-centric graph systems, i.e., Pregel

  6. The ethics of big data in big agriculture

    Directory of Open Access Journals (Sweden)

    Isabelle M. Carbonell

    2016-03-01

    Full Text Available This paper examines the ethics of big data in agriculture, focusing on the power asymmetry between farmers and large agribusinesses like Monsanto. Following the recent purchase of Climate Corp., Monsanto is currently the most prominent biotech agribusiness to buy into big data. With wireless sensors on tractors monitoring or dictating every decision a farmer makes, Monsanto can now aggregate large quantities of previously proprietary farming data, enabling a privileged position with unique insights on a field-by-field basis into a third or more of the US farmland. This power asymmetry may be rebalanced through open-sourced data, and publicly-funded data analytic tools which rival Climate Corp. in complexity and innovation for use in the public domain.

  7. Leveraging Big Data Tools and Technologies: Addressing the Challenges of the Water Quality Sector

    Directory of Open Access Journals (Sweden)

    Juan Manuel Ponce Romero

    2017-11-01

    Full Text Available The water utility sector is subject to stringent legislation, seeking to address both the evolution of practices within the chemical/pharmaceutical industry, and the safeguarding of environmental protection, and which is informed by stakeholder views. Growing public environmental awareness is balanced by fair apportionment of liability within-sector. This highly complex and dynamic context poses challenges for water utilities seeking to manage the diverse chemicals arising from disparate sources reaching Wastewater Treatment Plants, including residential, commercial, and industrial points of origin, and diffuse sources including agricultural and hard surface water run-off. Effluents contain broad ranges of organic and inorganic compounds, herbicides, pesticides, phosphorus, pharmaceuticals, and chemicals of emerging concern. These potential pollutants can be in dissolved form, or arise in association with organic matter, the associated risks posing significant environmental challenges. This paper examines how the adoption of new Big Data tools and computational technologies can offer great advantage to the water utility sector in addressing this challenge. Big Data approaches facilitate improved understanding and insight of these challenges, by industry, regulator, and public alike. We discuss how Big Data approaches can be used to improve the outputs of tools currently in use by the water industry, such as SAGIS (Source Apportionment GIS system, helping to reveal new relationships between chemicals, the environment, and human health, and in turn provide better understanding of contaminants in wastewater (origin, pathways, and persistence. We highlight how the sector can draw upon Big Data tools to add value to legacy datasets, such as the Chemicals Investigation Programme in the UK, combined with contemporary data sources, extending the lifespan of data, focusing monitoring strategies, and helping users adapt and plan more efficiently. Despite

  8. Homogeneous and isotropic big rips?

    CERN Document Server

    Giovannini, Massimo

    2005-01-01

    We investigate the way big rips are approached in a fully inhomogeneous description of the space-time geometry. If the pressure and energy densities are connected by a (supernegative) barotropic index, the spatial gradients and the anisotropic expansion decay as the big rip is approached. This behaviour is contrasted with the usual big-bang singularities. A similar analysis is performed in the case of sudden (quiescent) singularities and it is argued that the spatial gradients may well be non-negligible in the vicinity of pressure singularities.

  9. Rate Change Big Bang Theory

    Science.gov (United States)

    Strickland, Ken

    2013-04-01

    The Rate Change Big Bang Theory redefines the birth of the universe with a dramatic shift in energy direction and a new vision of the first moments. With rate change graph technology (RCGT) we can look back 13.7 billion years and experience every step of the big bang through geometrical intersection technology. The analysis of the Big Bang includes a visualization of the first objects, their properties, the astounding event that created space and time as well as a solution to the mystery of anti-matter.

  10. Intelligent Test Mechanism Design of Worn Big Gear

    Directory of Open Access Journals (Sweden)

    Hong-Yu LIU

    2014-10-01

    Full Text Available With the continuous development of national economy, big gear was widely applied in metallurgy and mine domains. So, big gear plays an important role in above domains. In practical production, big gear abrasion and breach take place often. It affects normal production and causes unnecessary economic loss. A kind of intelligent test method was put forward on worn big gear mainly aimed at the big gear restriction conditions of high production cost, long production cycle and high- intensity artificial repair welding work. The measure equations transformations were made on involute straight gear. Original polar coordinate equations were transformed into rectangular coordinate equations. Big gear abrasion measure principle was introduced. Detection principle diagram was given. Detection route realization method was introduced. OADM12 laser sensor was selected. Detection on big gear abrasion area was realized by detection mechanism. Tested data of unworn gear and worn gear were led in designed calculation program written by Visual Basic language. Big gear abrasion quantity can be obtained. It provides a feasible method for intelligent test and intelligent repair welding on worn big gear.

  11. [Big data in medicine and healthcare].

    Science.gov (United States)

    Rüping, Stefan

    2015-08-01

    Healthcare is one of the business fields with the highest Big Data potential. According to the prevailing definition, Big Data refers to the fact that data today is often too large and heterogeneous and changes too quickly to be stored, processed, and transformed into value by previous technologies. The technological trends drive Big Data: business processes are more and more executed electronically, consumers produce more and more data themselves - e.g. in social networks - and finally ever increasing digitalization. Currently, several new trends towards new data sources and innovative data analysis appear in medicine and healthcare. From the research perspective, omics-research is one clear Big Data topic. In practice, the electronic health records, free open data and the "quantified self" offer new perspectives for data analytics. Regarding analytics, significant advances have been made in the information extraction from text data, which unlocks a lot of data from clinical documentation for analytics purposes. At the same time, medicine and healthcare is lagging behind in the adoption of Big Data approaches. This can be traced to particular problems regarding data complexity and organizational, legal, and ethical challenges. The growing uptake of Big Data in general and first best-practice examples in medicine and healthcare in particular, indicate that innovative solutions will be coming. This paper gives an overview of the potentials of Big Data in medicine and healthcare.

  12. From Big Data to Big Business

    DEFF Research Database (Denmark)

    Lund Pedersen, Carsten

    2017-01-01

    Idea in Brief: Problem: There is an enormous profit potential for manufacturing firms in big data, but one of the key barriers to obtaining data-driven growth is the lack of knowledge about which capabilities are needed to extract value and profit from data. Solution: We (BDBB research group at C...

  13. Precision Nutrition 4.0: A Big Data and Ethics Foresight Analysis--Convergence of Agrigenomics, Nutrigenomics, Nutriproteomics, and Nutrimetabolomics.

    Science.gov (United States)

    Özdemir, Vural; Kolker, Eugene

    2016-02-01

    Nutrition is central to sustenance of good health, not to mention its role as a cultural object that brings together or draws lines among societies. Undoubtedly, understanding the future paths of nutrition science in the current era of Big Data remains firmly on science, technology, and innovation strategy agendas around the world. Nutrigenomics, the confluence of nutrition science with genomics, brought about a new focus on and legitimacy for "variability science" (i.e., the study of mechanisms of person-to-person and population differences in response to food, and the ways in which food variably impacts the host, for example, nutrient-related disease outcomes). Societal expectations, both public and private, and claims over genomics-guided and individually-tailored precision diets continue to proliferate. While the prospects of nutrition science, and nutrigenomics in particular, are established, there is a need to integrate the efforts in four Big Data domains that are naturally allied--agrigenomics, nutrigenomics, nutriproteomics, and nutrimetabolomics--that address complementary variability questions pertaining to individual differences in response to food-related environmental exposures. The joint use of these four omics knowledge domains, coined as Precision Nutrition 4.0 here, has sadly not been realized to date, but the potentials for such integrated knowledge innovation are enormous. Future personalized nutrition practices would benefit from a seamless planning of life sciences funding, research, and practice agendas from "farm to clinic to supermarket to society," and from "genome to proteome to metabolome." Hence, this innovation foresight analysis explains the already existing potentials waiting to be realized, and suggests ways forward for innovation in both technology and ethics foresight frames on precision nutrition. We propose the creation of a new Precision Nutrition Evidence Barometer for periodic, independent, and ongoing retrieval, screening

  14. «Sochi Goes International» – Required Actions to be Taken to Facilitate the Stay for International Tourists

    Directory of Open Access Journals (Sweden)

    Eduard Besel

    2012-09-01

    Full Text Available As host city of many big international sport events, Sochi's publicity is increasing and people from all over the world will discover it as a new travel destination. To guarantee and enlarge the attractiveness and popularity of the city, some essential improvements, which would facilitate the stay for international tourists, are recommended.

  15. Supersize me: how whole-genome sequencing and big data are transforming epidemiology.

    Science.gov (United States)

    Kao, Rowland R; Haydon, Daniel T; Lycett, Samantha J; Murcia, Pablo R

    2014-05-01

    In epidemiology, the identification of 'who infected whom' allows us to quantify key characteristics such as incubation periods, heterogeneity in transmission rates, duration of infectiousness, and the existence of high-risk groups. Although invaluable, the existence of many plausible infection pathways makes this difficult, and epidemiological contact tracing either uncertain, logistically prohibitive, or both. The recent advent of next-generation sequencing technology allows the identification of traceable differences in the pathogen genome that are transforming our ability to understand high-resolution disease transmission, sometimes even down to the host-to-host scale. We review recent examples of the use of pathogen whole-genome sequencing for the purpose of forensic tracing of transmission pathways, focusing on the particular problems where evolutionary dynamics must be supplemented by epidemiological information on the most likely timing of events as well as possible transmission pathways. We also discuss potential pitfalls in the over-interpretation of these data, and highlight the manner in which a confluence of this technology with sophisticated mathematical and statistical approaches has the potential to produce a paradigm shift in our understanding of infectious disease transmission and control. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  17. Making big sense from big data in toxicology by read-across.

    Science.gov (United States)

    Hartung, Thomas

    2016-01-01

    Modern information technologies have made big data available in safety sciences, i.e., extremely large data sets that may be analyzed only computationally to reveal patterns, trends and associations. This happens by (1) compilation of large sets of existing data, e.g., as a result of the European REACH regulation, (2) the use of omics technologies and (3) systematic robotized testing in a high-throughput manner. All three approaches and some other high-content technologies leave us with big data--the challenge is now to make big sense of these data. Read-across, i.e., the local similarity-based intrapolation of properties, is gaining momentum with increasing data availability and consensus on how to process and report it. It is predominantly applied to in vivo test data as a gap-filling approach, but can similarly complement other incomplete datasets. Big data are first of all repositories for finding similar substances and ensure that the available data is fully exploited. High-content and high-throughput approaches similarly require focusing on clusters, in this case formed by underlying mechanisms such as pathways of toxicity. The closely connected properties, i.e., structural and biological similarity, create the confidence needed for predictions of toxic properties. Here, a new web-based tool under development called REACH-across, which aims to support and automate structure-based read-across, is presented among others.

  18. Comparative Genomics of Methanopyrus sp. SNP6 and KOL6 Revealing Genomic Regions of Plasticity Implicated in Extremely Thermophilic Profiles

    Directory of Open Access Journals (Sweden)

    Zhiliang Yu

    2017-07-01

    Full Text Available Methanopyrus spp. are usually isolated from harsh niches, such as high osmotic pressure and extreme temperature. However, the molecular mechanisms for their environmental adaption are poorly understood. Archaeal species is commonly considered as primitive organism. The evolutional placement of archaea is a fundamental and intriguing scientific question. We sequenced the genomes of Methanopyrus strains SNP6 and KOL6 isolated from the Atlantic and Iceland, respectively. Comparative genomic analysis revealed genetic diversity and instability implicated in niche adaption, including a number of transporter- and integrase/transposase-related genes. Pan-genome analysis also defined the gene pool of Methanopyrus spp., in addition of ~120-Kb genomic region of plasticity impacting cognate genomic architecture. We believe that Methanopyrus genomics could facilitate efficient investigation/recognition of archaeal phylogenetic diverse patterns, as well as improve understanding of biological roles and significance of these versatile microbes.

  19. [Big data in official statistics].

    Science.gov (United States)

    Zwick, Markus

    2015-08-01

    The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.

  20. Big-Leaf Mahogany on CITES Appendix II: Big Challenge, Big Opportunity

    Science.gov (United States)

    JAMES GROGAN; PAULO BARRETO

    2005-01-01

    On 15 November 2003, big-leaf mahogany (Swietenia macrophylla King, Meliaceae), the most valuable widely traded Neotropical timber tree, gained strengthened regulatory protection from its listing on Appendix II of the Convention on International Trade in Endangered Species ofWild Fauna and Flora (CITES). CITES is a United Nations-chartered agreement signed by 164...

  1. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers.

    Science.gov (United States)

    Galpert, Deborah; Fernández, Alberto; Herrera, Francisco; Antunes, Agostinho; Molina-Ruiz, Reinaldo; Agüero-Chapin, Guillermin

    2018-05-03

    The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were

  2. Clinical genomics, big data, and electronic medical records: reconciling patient rights with research when privacy and science collide.

    Science.gov (United States)

    Kulynych, Jennifer; Greely, Henry T

    2017-04-01

    Widespread use of medical records for research, without consent, attracts little scrutiny compared to biospecimen research, where concerns about genomic privacy prompted recent federal proposals to mandate consent. This paper explores an important consequence of the proliferation of electronic health records (EHRs) in this permissive atmosphere: with the advent of clinical gene sequencing, EHR-based secondary research poses genetic privacy risks akin to those of biospecimen research, yet regulators still permit researchers to call gene sequence data 'de-identified', removing such data from the protection of the federal Privacy Rule and federal human subjects regulations. Medical centers and other providers seeking to offer genomic 'personalized medicine' now confront the problem of governing the secondary use of clinical genomic data as privacy risks escalate. We argue that regulators should no longer permit HIPAA-covered entities to treat dense genomic data as de-identified health information. Even with this step, the Privacy Rule would still permit disclosure of clinical genomic data for research, without consent, under a data use agreement, so we also urge that providers give patients specific notice before disclosing clinical genomic data for research, permitting (where possible) some degree of choice and control. To aid providers who offer clinical gene sequencing, we suggest both general approaches and specific actions to reconcile patients' rights and interests with genomic research.

  3. Clinical genomics, big data, and electronic medical records: reconciling patient rights with research when privacy and science collide

    Science.gov (United States)

    Greely, Henry T.

    2017-01-01

    Abstract Widespread use of medical records for research, without consent, attracts little scrutiny compared to biospecimen research, where concerns about genomic privacy prompted recent federal proposals to mandate consent. This paper explores an important consequence of the proliferation of electronic health records (EHRs) in this permissive atmosphere: with the advent of clinical gene sequencing, EHR-based secondary research poses genetic privacy risks akin to those of biospecimen research, yet regulators still permit researchers to call gene sequence data ‘de-identified’, removing such data from the protection of the federal Privacy Rule and federal human subjects regulations. Medical centers and other providers seeking to offer genomic ‘personalized medicine’ now confront the problem of governing the secondary use of clinical genomic data as privacy risks escalate. We argue that regulators should no longer permit HIPAA-covered entities to treat dense genomic data as de-identified health information. Even with this step, the Privacy Rule would still permit disclosure of clinical genomic data for research, without consent, under a data use agreement, so we also urge that providers give patients specific notice before disclosing clinical genomic data for research, permitting (where possible) some degree of choice and control. To aid providers who offer clinical gene sequencing, we suggest both general approaches and specific actions to reconcile patients’ rights and interests with genomic research. PMID:28852559

  4. Big Data as Information Barrier

    Directory of Open Access Journals (Sweden)

    Victor Ya. Tsvetkov

    2014-07-01

    Full Text Available The article covers analysis of ‘Big Data’ which has been discussed over last 10 years. The reasons and factors for the issue are revealed. It has proved that the factors creating ‘Big Data’ issue has existed for quite a long time, and from time to time, would cause the informational barriers. Such barriers were successfully overcome through the science and technologies. The conducted analysis refers the “Big Data” issue to a form of informative barrier. This issue may be solved correctly and encourages development of scientific and calculating methods.

  5. Big Data in Space Science

    OpenAIRE

    Barmby, Pauline

    2018-01-01

    It seems like “big data” is everywhere these days. In planetary science and astronomy, we’ve been dealing with large datasets for a long time. So how “big” is our data? How does it compare to the big data that a bank or an airline might have? What new tools do we need to analyze big datasets, and how can we make better use of existing tools? What kinds of science problems can we address with these? I’ll address these questions with examples including ESA’s Gaia mission, ...

  6. Main Issues in Big Data Security

    Directory of Open Access Journals (Sweden)

    Julio Moreno

    2016-09-01

    Full Text Available Data is currently one of the most important assets for companies in every field. The continuous growth in the importance and volume of data has created a new problem: it cannot be handled by traditional analysis techniques. This problem was, therefore, solved through the creation of a new paradigm: Big Data. However, Big Data originated new issues related not only to the volume or the variety of the data, but also to data security and privacy. In order to obtain a full perspective of the problem, we decided to carry out an investigation with the objective of highlighting the main issues regarding Big Data security, and also the solutions proposed by the scientific community to solve them. In this paper, we explain the results obtained after applying a systematic mapping study to security in the Big Data ecosystem. It is almost impossible to carry out detailed research into the entire topic of security, and the outcome of this research is, therefore, a big picture of the main problems related to security in a Big Data system, along with the principal solutions to them proposed by the research community.

  7. SAMHD1 Promotes DNA End Resection to Facilitate DNA Repair by Homologous Recombination

    Directory of Open Access Journals (Sweden)

    Waaqo Daddacha

    2017-08-01

    Full Text Available DNA double-strand break (DSB repair by homologous recombination (HR is initiated by CtIP/MRN-mediated DNA end resection to maintain genome integrity. SAMHD1 is a dNTP triphosphohydrolase, which restricts HIV-1 infection, and mutations are associated with Aicardi-Goutières syndrome and cancer. We show that SAMHD1 has a dNTPase-independent function in promoting DNA end resection to facilitate DSB repair by HR. SAMHD1 deficiency or Vpx-mediated degradation causes hypersensitivity to DSB-inducing agents, and SAMHD1 is recruited to DSBs. SAMHD1 complexes with CtIP via a conserved C-terminal domain and recruits CtIP to DSBs to facilitate end resection and HR. Significantly, a cancer-associated mutant with impaired CtIP interaction, but not dNTPase-inactive SAMHD1, fails to rescue the end resection impairment of SAMHD1 depletion. Our findings define a dNTPase-independent function for SAMHD1 in HR-mediated DSB repair by facilitating CtIP accrual to promote DNA end resection, providing insight into how SAMHD1 promotes genome integrity.

  8. Harnessing NGS and Big Data Optimally: Comparison of miRNA Prediction from Assembled versus Non-assembled Sequencing Data--The Case of the Grass Aegilops tauschii Complex Genome.

    Science.gov (United States)

    Budak, Hikmet; Kantar, Melda

    2015-07-01

    MicroRNAs (miRNAs) are small, endogenous, non-coding RNA molecules that regulate gene expression at the post-transcriptional level. As high-throughput next generation sequencing (NGS) and Big Data rapidly accumulate for various species, efforts for in silico identification of miRNAs intensify. Surprisingly, the effect of the input genomics sequence on the robustness of miRNA prediction was not evaluated in detail to date. In the present study, we performed a homology-based miRNA and isomiRNA prediction of the 5D chromosome of bread wheat progenitor, Aegilops tauschii, using two distinct sequence data sets as input: (1) raw sequence reads obtained from 454-GS FLX Titanium sequencing platform and (2) an assembly constructed from these reads. We also compared this method with a number of available plant sequence datasets. We report here the identification of 62 and 22 miRNAs from raw reads and the assembly, respectively, of which 16 were predicted with high confidence from both datasets. While raw reads promoted sensitivity with the high number of miRNAs predicted, 55% (12 out of 22) of the assembly-based predictions were supported by previous observations, bringing specificity forward compared to the read-based predictions, of which only 37% were supported. Importantly, raw reads could identify several repeat-related miRNAs that could not be detected with the assembly. However, raw reads could not capture 6 miRNAs, for which the stem-loops could only be covered by the relatively longer sequences from the assembly. In summary, the comparison of miRNA datasets obtained by these two strategies revealed that utilization of raw reads, as well as assemblies for in silico prediction, have distinct advantages and disadvantages. Consideration of these important nuances can benefit future miRNA identification efforts in the current age of NGS and Big Data driven life sciences innovation.

  9. Harnessing the Power of Big Data to Improve Graduate Medical Education: Big Idea or Bust?

    Science.gov (United States)

    Arora, Vineet M

    2018-06-01

    With the advent of electronic medical records (EMRs) fueling the rise of big data, the use of predictive analytics, machine learning, and artificial intelligence are touted as transformational tools to improve clinical care. While major investments are being made in using big data to transform health care delivery, little effort has been directed toward exploiting big data to improve graduate medical education (GME). Because our current system relies on faculty observations of competence, it is not unreasonable to ask whether big data in the form of clinical EMRs and other novel data sources can answer questions of importance in GME such as when is a resident ready for independent practice.The timing is ripe for such a transformation. A recent National Academy of Medicine report called for reforms to how GME is delivered and financed. While many agree on the need to ensure that GME meets our nation's health needs, there is little consensus on how to measure the performance of GME in meeting this goal. During a recent workshop at the National Academy of Medicine on GME outcomes and metrics in October 2017, a key theme emerged: Big data holds great promise to inform GME performance at individual, institutional, and national levels. In this Invited Commentary, several examples are presented, such as using big data to inform clinical experience and provide clinically meaningful data to trainees, and using novel data sources, including ambient data, to better measure the quality of GME training.

  10. A SWOT Analysis of Big Data

    Science.gov (United States)

    Ahmadi, Mohammad; Dileepan, Parthasarati; Wheatley, Kathleen K.

    2016-01-01

    This is the decade of data analytics and big data, but not everyone agrees with the definition of big data. Some researchers see it as the future of data analysis, while others consider it as hype and foresee its demise in the near future. No matter how it is defined, big data for the time being is having its glory moment. The most important…

  11. A survey of big data research

    Science.gov (United States)

    Fang, Hua; Zhang, Zhaoyang; Wang, Chanpaul Jin; Daneshmand, Mahmoud; Wang, Chonggang; Wang, Honggang

    2015-01-01

    Big data create values for business and research, but pose significant challenges in terms of networking, storage, management, analytics and ethics. Multidisciplinary collaborations from engineers, computer scientists, statisticians and social scientists are needed to tackle, discover and understand big data. This survey presents an overview of big data initiatives, technologies and research in industries and academia, and discusses challenges and potential solutions. PMID:26504265

  12. Big Data in Action for Government : Big Data Innovation in Public Services, Policy, and Engagement

    OpenAIRE

    World Bank

    2017-01-01

    Governments have an opportunity to harness big data solutions to improve productivity, performance and innovation in service delivery and policymaking processes. In developing countries, governments have an opportunity to adopt big data solutions and leapfrog traditional administrative approaches

  13. HGVA: the Human Genome Variation Archive.

    Science.gov (United States)

    Lopez, Javier; Coll, Jacobo; Haimel, Matthias; Kandasamy, Swaathi; Tarraga, Joaquin; Furio-Tari, Pedro; Bari, Wasim; Bleda, Marta; Rueda, Antonio; Gräf, Stefan; Rendon, Augusto; Dopazo, Joaquin; Medina, Ignacio

    2017-07-03

    High-profile genomic variation projects like the 1000 Genomes project or the Exome Aggregation Consortium, are generating a wealth of human genomic variation knowledge which can be used as an essential reference for identifying disease-causing genotypes. However, accessing these data, contrasting the various studies and integrating those data in downstream analyses remains cumbersome. The Human Genome Variation Archive (HGVA) tackles these challenges and facilitates access to genomic data for key reference projects in a clean, fast and integrated fashion. HGVA provides an efficient and intuitive web-interface for easy data mining, a comprehensive RESTful API and client libraries in Python, Java and JavaScript for fast programmatic access to its knowledge base. HGVA calculates population frequencies for these projects and enriches their data with variant annotation provided by CellBase, a rich and fast annotation solution. HGVA serves as a proof-of-concept of the genome analysis developments being carried out by the University of Cambridge together with UK's 100 000 genomes project and the National Institute for Health Research BioResource Rare-Diseases, in particular, deploying open-source for Computational Biology (OpenCB) software platform for storing and analyzing massive genomic datasets. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Single-Cell-Genomics-Facilitated Read Binning of Candidate Phylum EM19 Genomes from Geothermal Spring Metagenomes.

    Science.gov (United States)

    Becraft, Eric D; Dodsworth, Jeremy A; Murugapiran, Senthil K; Ohlsson, J Ingemar; Briggs, Brandon R; Kanbar, Jad; De Vlaminck, Iwijn; Quake, Stephen R; Dong, Hailiang; Hedlund, Brian P; Swingley, Wesley D

    2016-02-15

    The vast majority of microbial life remains uncatalogued due to the inability to cultivate these organisms in the laboratory. This "microbial dark matter" represents a substantial portion of the tree of life and of the populations that contribute to chemical cycling in many ecosystems. In this work, we leveraged an existing single-cell genomic data set representing the candidate bacterial phylum "Calescamantes" (EM19) to calibrate machine learning algorithms and define metagenomic bins directly from pyrosequencing reads derived from Great Boiling Spring in the U.S. Great Basin. Compared to other assembly-based methods, taxonomic binning with a read-based machine learning approach yielded final assemblies with the highest predicted genome completeness of any method tested. Read-first binning subsequently was used to extract Calescamantes bins from all metagenomes with abundant Calescamantes populations, including metagenomes from Octopus Spring and Bison Pool in Yellowstone National Park and Gongxiaoshe Spring in Yunnan Province, China. Metabolic reconstruction suggests that Calescamantes are heterotrophic, facultative anaerobes, which can utilize oxidized nitrogen sources as terminal electron acceptors for respiration in the absence of oxygen and use proteins as their primary carbon source. Despite their phylogenetic divergence, the geographically separate Calescamantes populations were highly similar in their predicted metabolic capabilities and core gene content, respiring O2, or oxidized nitrogen species for energy conservation in distant but chemically similar hot springs. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  15. 78 FR 3911 - Big Stone National Wildlife Refuge, Big Stone and Lac Qui Parle Counties, MN; Final Comprehensive...

    Science.gov (United States)

    2013-01-17

    ... DEPARTMENT OF THE INTERIOR Fish and Wildlife Service [FWS-R3-R-2012-N259; FXRS1265030000-134-FF03R06000] Big Stone National Wildlife Refuge, Big Stone and Lac Qui Parle Counties, MN; Final Comprehensive... significant impact (FONSI) for the environmental assessment (EA) for Big Stone National Wildlife Refuge...

  16. Big domains are novel Ca²+-binding modules: evidences from big domains of Leptospira immunoglobulin-like (Lig proteins.

    Directory of Open Access Journals (Sweden)

    Rajeev Raman

    Full Text Available BACKGROUND: Many bacterial surface exposed proteins mediate the host-pathogen interaction more effectively in the presence of Ca²+. Leptospiral immunoglobulin-like (Lig proteins, LigA and LigB, are surface exposed proteins containing Bacterial immunoglobulin like (Big domains. The function of proteins which contain Big fold is not known. Based on the possible similarities of immunoglobulin and βγ-crystallin folds, we here explore the important question whether Ca²+ binds to a Big domains, which would provide a novel functional role of the proteins containing Big fold. PRINCIPAL FINDINGS: We selected six individual Big domains for this study (three from the conserved part of LigA and LigB, denoted as Lig A3, Lig A4, and LigBCon5; two from the variable region of LigA, i.e., 9(th (Lig A9 and 10(th repeats (Lig A10; and one from the variable region of LigB, i.e., LigBCen2. We have also studied the conserved region covering the three and six repeats (LigBCon1-3 and LigCon. All these proteins bind the calcium-mimic dye Stains-all. All the selected four domains bind Ca²+ with dissociation constants of 2-4 µM. Lig A9 and Lig A10 domains fold well with moderate thermal stability, have β-sheet conformation and form homodimers. Fluorescence spectra of Big domains show a specific doublet (at 317 and 330 nm, probably due to Trp interaction with a Phe residue. Equilibrium unfolding of selected Big domains is similar and follows a two-state model, suggesting the similarity in their fold. CONCLUSIONS: We demonstrate that the Lig are Ca²+-binding proteins, with Big domains harbouring the binding motif. We conclude that despite differences in sequence, a Big motif binds Ca²+. This work thus sets up a strong possibility for classifying the proteins containing Big domains as a novel family of Ca²+-binding proteins. Since Big domain is a part of many proteins in bacterial kingdom, we suggest a possible function these proteins via Ca²+ binding.

  17. Big and complex data analysis methodologies and applications

    CERN Document Server

    2017-01-01

    This volume conveys some of the surprises, puzzles and success stories in high-dimensional and complex data analysis and related fields. Its peer-reviewed contributions showcase recent advances in variable selection, estimation and prediction strategies for a host of useful models, as well as essential new developments in the field. The continued and rapid advancement of modern technology now allows scientists to collect data of increasingly unprecedented size and complexity. Examples include epigenomic data, genomic data, proteomic data, high-resolution image data, high-frequency financial data, functional and longitudinal data, and network data. Simultaneous variable selection and estimation is one of the key statistical problems involved in analyzing such big and complex data. The purpose of this book is to stimulate research and foster interaction between researchers in the area of high-dimensional data analysis. More concretely, its goals are to: 1) highlight and expand the breadth of existing methods in...

  18. Odysseus/DFS: Integration of DBMS and Distributed File System for Transaction Processing of Big Data

    OpenAIRE

    Kim, Jun-Sung; Whang, Kyu-Young; Kwon, Hyuk-Yoon; Song, Il-Yeol

    2014-01-01

    The relational DBMS (RDBMS) has been widely used since it supports various high-level functionalities such as SQL, schemas, indexes, and transactions that do not exist in the O/S file system. But, a recent advent of big data technology facilitates development of new systems that sacrifice the DBMS functionality in order to efficiently manage large-scale data. Those so-called NoSQL systems use a distributed file system, which support scalability and reliability. They support scalability of the...

  19. New 'bigs' in cosmology

    International Nuclear Information System (INIS)

    Yurov, Artyom V.; Martin-Moruno, Prado; Gonzalez-Diaz, Pedro F.

    2006-01-01

    This paper contains a detailed discussion on new cosmic solutions describing the early and late evolution of a universe that is filled with a kind of dark energy that may or may not satisfy the energy conditions. The main distinctive property of the resulting space-times is that they make to appear twice the single singular events predicted by the corresponding quintessential (phantom) models in a manner which can be made symmetric with respect to the origin of cosmic time. Thus, big bang and big rip singularity are shown to take place twice, one on the positive branch of time and the other on the negative one. We have also considered dark energy and phantom energy accretion onto black holes and wormholes in the context of these new cosmic solutions. It is seen that the space-times of these holes would then undergo swelling processes leading to big trip and big hole events taking place on distinct epochs along the evolution of the universe. In this way, the possibility is considered that the past and future be connected in a non-paradoxical manner in the universes described by means of the new symmetric solutions

  20. 2nd INNS Conference on Big Data

    CERN Document Server

    Manolopoulos, Yannis; Iliadis, Lazaros; Roy, Asim; Vellasco, Marley

    2017-01-01

    The book offers a timely snapshot of neural network technologies as a significant component of big data analytics platforms. It promotes new advances and research directions in efficient and innovative algorithmic approaches to analyzing big data (e.g. deep networks, nature-inspired and brain-inspired algorithms); implementations on different computing platforms (e.g. neuromorphic, graphics processing units (GPUs), clouds, clusters); and big data analytics applications to solve real-world problems (e.g. weather prediction, transportation, energy management). The book, which reports on the second edition of the INNS Conference on Big Data, held on October 23–25, 2016, in Thessaloniki, Greece, depicts an interesting collaborative adventure of neural networks with big data and other learning technologies.

  1. The ethics of biomedical big data

    CERN Document Server

    Mittelstadt, Brent Daniel

    2016-01-01

    This book presents cutting edge research on the new ethical challenges posed by biomedical Big Data technologies and practices. ‘Biomedical Big Data’ refers to the analysis of aggregated, very large datasets to improve medical knowledge and clinical care. The book describes the ethical problems posed by aggregation of biomedical datasets and re-use/re-purposing of data, in areas such as privacy, consent, professionalism, power relationships, and ethical governance of Big Data platforms. Approaches and methods are discussed that can be used to address these problems to achieve the appropriate balance between the social goods of biomedical Big Data research and the safety and privacy of individuals. Seventeen original contributions analyse the ethical, social and related policy implications of the analysis and curation of biomedical Big Data, written by leading experts in the areas of biomedical research, medical and technology ethics, privacy, governance and data protection. The book advances our understan...

  2. Scalable privacy-preserving big data aggregation mechanism

    Directory of Open Access Journals (Sweden)

    Dapeng Wu

    2016-08-01

    Full Text Available As the massive sensor data generated by large-scale Wireless Sensor Networks (WSNs recently become an indispensable part of ‘Big Data’, the collection, storage, transmission and analysis of the big sensor data attract considerable attention from researchers. Targeting the privacy requirements of large-scale WSNs and focusing on the energy-efficient collection of big sensor data, a Scalable Privacy-preserving Big Data Aggregation (Sca-PBDA method is proposed in this paper. Firstly, according to the pre-established gradient topology structure, sensor nodes in the network are divided into clusters. Secondly, sensor data is modified by each node according to the privacy-preserving configuration message received from the sink. Subsequently, intra- and inter-cluster data aggregation is employed during the big sensor data reporting phase to reduce energy consumption. Lastly, aggregated results are recovered by the sink to complete the privacy-preserving big data aggregation. Simulation results validate the efficacy and scalability of Sca-PBDA and show that the big sensor data generated by large-scale WSNs is efficiently aggregated to reduce network resource consumption and the sensor data privacy is effectively protected to meet the ever-growing application requirements.

  3. Ethische aspecten van big data

    NARCIS (Netherlands)

    N. (Niek) van Antwerpen; Klaas Jan Mollema

    2017-01-01

    Big data heeft niet alleen geleid tot uitdagende technische vraagstukken, ook gaat het gepaard met allerlei nieuwe ethische en morele kwesties. Om verantwoord met big data om te gaan, moet ook over deze kwesties worden nagedacht. Want slecht datagebruik kan nadelige gevolgen hebben voor

  4. Advancing stroke genomic research in the age of Trans-Omics big data science: Emerging priorities and opportunities.

    Science.gov (United States)

    Owolabi, Mayowa; Peprah, Emmanuel; Xu, Huichun; Akinyemi, Rufus; Tiwari, Hemant K; Irvin, Marguerite R; Wahab, Kolawole Wasiu; Arnett, Donna K; Ovbiagele, Bruce

    2017-11-15

    We systematically reviewed the genetic variants associated with stroke in genome-wide association studies (GWAS) and examined the emerging priorities and opportunities for rapidly advancing stroke research in the era of Trans-Omics science. Using the PRISMA guideline, we searched PubMed and NHGRI- EBI GWAS catalog for stroke studies from 2007 till May 2017. We included 31 studies. The major challenge is that the few validated variants could not account for the full genetic risk of stroke and have not been translated for clinical use. None of the studies included continental Africans. Genomic study of stroke among Africans presents a unique opportunity for the discovery, validation, functional annotation, Trans-Omics study and translation of genomic determinants of stroke with implications for global populations. This is because all humans originated from Africa, a continent with a unique genomic architecture and a distinctive epidemiology of stroke; as well as substantially higher heritability and resolution of fine mapping of stroke genes. Understanding the genomic determinants of stroke and the corresponding molecular mechanisms will revolutionize the development of a new set of precise biomarkers for stroke prediction, diagnosis and prognostic estimates as well as personalized interventions for reducing the global burden of stroke. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Epidemiology in wonderland: Big Data and precision medicine.

    Science.gov (United States)

    Saracci, Rodolfo

    2018-03-01

    Big Data and precision medicine, two major contemporary challenges for epidemiology, are critically examined from two different angles. In Part 1 Big Data collected for research purposes (Big research Data) and Big Data used for research although collected for other primary purposes (Big secondary Data) are discussed in the light of the fundamental common requirement of data validity, prevailing over "bigness". Precision medicine is treated developing the key point that high relative risks are as a rule required to make a variable or combination of variables suitable for prediction of disease occurrence, outcome or response to treatment; the commercial proliferation of allegedly predictive tests of unknown or poor validity is commented. Part 2 proposes a "wise epidemiology" approach to: (a) choosing in a context imprinted by Big Data and precision medicine-epidemiological research projects actually relevant to population health, (b) training epidemiologists, (c) investigating the impact on clinical practices and doctor-patient relation of the influx of Big Data and computerized medicine and (d) clarifying whether today "health" may be redefined-as some maintain in purely technological terms.

  6. Big Data and Analytics in Healthcare.

    Science.gov (United States)

    Tan, S S-L; Gao, G; Koch, S

    2015-01-01

    This editorial is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". The amount of data being generated in the healthcare industry is growing at a rapid rate. This has generated immense interest in leveraging the availability of healthcare data (and "big data") to improve health outcomes and reduce costs. However, the nature of healthcare data, and especially big data, presents unique challenges in processing and analyzing big data in healthcare. This Focus Theme aims to disseminate some novel approaches to address these challenges. More specifically, approaches ranging from efficient methods of processing large clinical data to predictive models that could generate better predictions from healthcare data are presented.

  7. Big Data for Business Ecosystem Players

    Directory of Open Access Journals (Sweden)

    Perko Igor

    2016-06-01

    Full Text Available In the provided research, some of the Big Data most prospective usage domains connect with distinguished player groups found in the business ecosystem. Literature analysis is used to identify the state of the art of Big Data related research in the major domains of its use-namely, individual marketing, health treatment, work opportunities, financial services, and security enforcement. System theory was used to identify business ecosystem major player types disrupted by Big Data: individuals, small and mid-sized enterprises, large organizations, information providers, and regulators. Relationships between the domains and players were explained through new Big Data opportunities and threats and by players’ responsive strategies. System dynamics was used to visualize relationships in the provided model.

  8. "Big data" in economic history.

    Science.gov (United States)

    Gutmann, Myron P; Merchant, Emily Klancher; Roberts, Evan

    2018-03-01

    Big data is an exciting prospect for the field of economic history, which has long depended on the acquisition, keying, and cleaning of scarce numerical information about the past. This article examines two areas in which economic historians are already using big data - population and environment - discussing ways in which increased frequency of observation, denser samples, and smaller geographic units allow us to analyze the past with greater precision and often to track individuals, places, and phenomena across time. We also explore promising new sources of big data: organically created economic data, high resolution images, and textual corpora.

  9. Big Data Knowledge in Global Health Education.

    Science.gov (United States)

    Olayinka, Olaniyi; Kekeh, Michele; Sheth-Chandra, Manasi; Akpinar-Elci, Muge

    The ability to synthesize and analyze massive amounts of data is critical to the success of organizations, including those that involve global health. As countries become highly interconnected, increasing the risk for pandemics and outbreaks, the demand for big data is likely to increase. This requires a global health workforce that is trained in the effective use of big data. To assess implementation of big data training in global health, we conducted a pilot survey of members of the Consortium of Universities of Global Health. More than half the respondents did not have a big data training program at their institution. Additionally, the majority agreed that big data training programs will improve global health deliverables, among other favorable outcomes. Given the observed gap and benefits, global health educators may consider investing in big data training for students seeking a career in global health. Copyright © 2017 Icahn School of Medicine at Mount Sinai. Published by Elsevier Inc. All rights reserved.

  10. GEOSS: Addressing Big Data Challenges

    Science.gov (United States)

    Nativi, S.; Craglia, M.; Ochiai, O.

    2014-12-01

    In the sector of Earth Observation, the explosion of data is due to many factors including: new satellite constellations, the increased capabilities of sensor technologies, social media, crowdsourcing, and the need for multidisciplinary and collaborative research to face Global Changes. In this area, there are many expectations and concerns about Big Data. Vendors have attempted to use this term for their commercial purposes. It is necessary to understand whether Big Data is a radical shift or an incremental change for the existing digital infrastructures. This presentation tries to explore and discuss the impact of Big Data challenges and new capabilities on the Global Earth Observation System of Systems (GEOSS) and particularly on its common digital infrastructure called GCI. GEOSS is a global and flexible network of content providers allowing decision makers to access an extraordinary range of data and information at their desk. The impact of the Big Data dimensionalities (commonly known as 'V' axes: volume, variety, velocity, veracity, visualization) on GEOSS is discussed. The main solutions and experimentation developed by GEOSS along these axes are introduced and analyzed. GEOSS is a pioneering framework for global and multidisciplinary data sharing in the Earth Observation realm; its experience on Big Data is valuable for the many lessons learned.

  11. Big data for bipolar disorder.

    Science.gov (United States)

    Monteith, Scott; Glenn, Tasha; Geddes, John; Whybrow, Peter C; Bauer, Michael

    2016-12-01

    The delivery of psychiatric care is changing with a new emphasis on integrated care, preventative measures, population health, and the biological basis of disease. Fundamental to this transformation are big data and advances in the ability to analyze these data. The impact of big data on the routine treatment of bipolar disorder today and in the near future is discussed, with examples that relate to health policy, the discovery of new associations, and the study of rare events. The primary sources of big data today are electronic medical records (EMR), claims, and registry data from providers and payers. In the near future, data created by patients from active monitoring, passive monitoring of Internet and smartphone activities, and from sensors may be integrated with the EMR. Diverse data sources from outside of medicine, such as government financial data, will be linked for research. Over the long term, genetic and imaging data will be integrated with the EMR, and there will be more emphasis on predictive models. Many technical challenges remain when analyzing big data that relates to size, heterogeneity, complexity, and unstructured text data in the EMR. Human judgement and subject matter expertise are critical parts of big data analysis, and the active participation of psychiatrists is needed throughout the analytical process.

  12. BIG DATA IN TAMIL: OPPORTUNITIES, BENEFITS AND CHALLENGES

    OpenAIRE

    R.S. Vignesh Raj; Babak Khazaei; Ashik Ali

    2015-01-01

    This paper gives an overall introduction on big data and has tried to introduce Big Data in Tamil. It discusses the potential opportunities, benefits and likely challenges from a very Tamil and Tamil Nadu perspective. The paper has also made original contribution by proposing the ‘big data’s’ terminology in Tamil. The paper further suggests a few areas to explore using big data Tamil on the lines of the Tamil Nadu Government ‘vision 2023’. Whilst, big data has something to offer everyone, it ...

  13. Big data in biomedicine.

    Science.gov (United States)

    Costa, Fabricio F

    2014-04-01

    The increasing availability and growth rate of biomedical information, also known as 'big data', provides an opportunity for future personalized medicine programs that will significantly improve patient care. Recent advances in information technology (IT) applied to biomedicine are changing the landscape of privacy and personal information, with patients getting more control of their health information. Conceivably, big data analytics is already impacting health decisions and patient care; however, specific challenges need to be addressed to integrate current discoveries into medical practice. In this article, I will discuss the major breakthroughs achieved in combining omics and clinical health data in terms of their application to personalized medicine. I will also review the challenges associated with using big data in biomedicine and translational science. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Big Data’s Role in Precision Public Health

    Science.gov (United States)

    Dolley, Shawn

    2018-01-01

    Precision public health is an emerging practice to more granularly predict and understand public health risks and customize treatments for more specific and homogeneous subpopulations, often using new data, technologies, and methods. Big data is one element that has consistently helped to achieve these goals, through its ability to deliver to practitioners a volume and variety of structured or unstructured data not previously possible. Big data has enabled more widespread and specific research and trials of stratifying and segmenting populations at risk for a variety of health problems. Examples of success using big data are surveyed in surveillance and signal detection, predicting future risk, targeted interventions, and understanding disease. Using novel big data or big data approaches has risks that remain to be resolved. The continued growth in volume and variety of available data, decreased costs of data capture, and emerging computational methods mean big data success will likely be a required pillar of precision public health into the future. This review article aims to identify the precision public health use cases where big data has added value, identify classes of value that big data may bring, and outline the risks inherent in using big data in precision public health efforts. PMID:29594091

  15. Using Big Data Analytics to Advance Precision Radiation Oncology.

    Science.gov (United States)

    McNutt, Todd R; Benedict, Stanley H; Low, Daniel A; Moore, Kevin; Shpitser, Ilya; Jiang, Wei; Lakshminarayanan, Pranav; Cheng, Zhi; Han, Peijin; Hui, Xuan; Nakatsugawa, Minoru; Lee, Junghoon; Moore, Joseph A; Robertson, Scott P; Shah, Veeraj; Taylor, Russ; Quon, Harry; Wong, John; DeWeese, Theodore

    2018-06-01

    Big clinical data analytics as a primary component of precision medicine is discussed, identifying where these emerging tools fit in the spectrum of genomics and radiomics research. A learning health system (LHS) is conceptualized that uses clinically acquired data with machine learning to advance the initiatives of precision medicine. The LHS is comprehensive and can be used for clinical decision support, discovery, and hypothesis derivation. These developing uses can positively impact the ultimate management and therapeutic course for patients. The conceptual model for each use of clinical data, however, is different, and an overview of the implications is discussed. With advancements in technologies and culture to improve the efficiency, accuracy, and breadth of measurements of the patient condition, the concept of an LHS may be realized in precision radiation therapy. Copyright © 2018 Elsevier Inc. All rights reserved.

  16. Big inquiry

    Energy Technology Data Exchange (ETDEWEB)

    Wynne, B [Lancaster Univ. (UK)

    1979-06-28

    The recently published report entitled 'The Big Public Inquiry' from the Council for Science and Society and the Outer Circle Policy Unit is considered, with especial reference to any future enquiry which may take place into the first commercial fast breeder reactor. Proposals embodied in the report include stronger rights for objectors and an attempt is made to tackle the problem that participation in a public inquiry is far too late to be objective. It is felt by the author that the CSS/OCPU report is a constructive contribution to the debate about big technology inquiries but that it fails to understand the deeper currents in the economic and political structure of technology which so influence the consequences of whatever formal procedures are evolved.

  17. Exploring the potential of open big data from ticketing websites to characterize travel patterns within the Chinese high-speed rail system.

    Directory of Open Access Journals (Sweden)

    Sheng Wei

    Full Text Available Big data have contributed to deepen our understanding in regards to many human systems, particularly human mobility patterns and the structure and functioning of transportation systems. Resonating the recent call for 'open big data,' big data from various sources on a range of scales have become increasingly accessible to the public. However, open big data relevant to travelers within public transit tools remain scarce, hindering any further in-depth study on human mobility patterns. Here, we explore ticketing-website derived data that are publically available but have been largely neglected. We demonstrate the power, potential and limitations of this open big data, using the Chinese high-speed rail (HSR system as an example. Using an application programming interface, we automatically collected the data on the remaining tickets (RTD for scheduled trains at the last second before departure in order to retrieve information on unused transit capacity, occupancy rate of trains, and passenger flux at stations. We show that this information is highly useful in characterizing the spatiotemporal patterns of traveling behaviors on the Chinese HSR, such as weekend traveling behavior, imbalanced commuting behavior, and station functionality. Our work facilitates the understanding of human traveling patterns along the Chinese HSR, and the functionality of the largest HSR system in the world. We expect our work to attract attention regarding this unique open big data source for the study of analogous transportation systems.

  18. Big data analytics with R and Hadoop

    CERN Document Server

    Prajapati, Vignesh

    2013-01-01

    Big Data Analytics with R and Hadoop is a tutorial style book that focuses on all the powerful big data tasks that can be achieved by integrating R and Hadoop.This book is ideal for R developers who are looking for a way to perform big data analytics with Hadoop. This book is also aimed at those who know Hadoop and want to build some intelligent applications over Big data with R packages. It would be helpful if readers have basic knowledge of R.

  19. BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics

    DEFF Research Database (Denmark)

    Zhao, Wenming; Wang, Jing; He, Ximiao

    2004-01-01

    Rice is a major food staple for the world's population and serves as a model species in cereal genome research. The Beijing Genomics Institute (BGI) has long been devoting itself to sequencing, information analysis and biological research of the rice and other crop genomes. In order to facilitate....... Designed as a basic platform, BGI-RIS presents the sequenced genomes and related information in systematic and graphical ways for the convenience of in-depth comparative studies (http://rise.genomics.org.cn/). Udgivelsesdato: 2004-Jan-1...

  20. Big data in forensic science and medicine.

    Science.gov (United States)

    Lefèvre, Thomas

    2018-07-01

    In less than a decade, big data in medicine has become quite a phenomenon and many biomedical disciplines got their own tribune on the topic. Perspectives and debates are flourishing while there is a lack for a consensual definition for big data. The 3Vs paradigm is frequently evoked to define the big data principles and stands for Volume, Variety and Velocity. Even according to this paradigm, genuine big data studies are still scarce in medicine and may not meet all expectations. On one hand, techniques usually presented as specific to the big data such as machine learning techniques are supposed to support the ambition of personalized, predictive and preventive medicines. These techniques are mostly far from been new and are more than 50 years old for the most ancient. On the other hand, several issues closely related to the properties of big data and inherited from other scientific fields such as artificial intelligence are often underestimated if not ignored. Besides, a few papers temper the almost unanimous big data enthusiasm and are worth attention since they delineate what is at stakes. In this context, forensic science is still awaiting for its position papers as well as for a comprehensive outline of what kind of contribution big data could bring to the field. The present situation calls for definitions and actions to rationally guide research and practice in big data. It is an opportunity for grounding a true interdisciplinary approach in forensic science and medicine that is mainly based on evidence. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  1. A Targeted Enrichment Strategy for Massively Parallel Sequencing of Angiosperm Plastid Genomes

    Directory of Open Access Journals (Sweden)

    Gregory W. Stull

    2013-02-01

    Full Text Available Premise of the study: We explored a targeted enrichment strategy to facilitate rapid and low-cost next-generation sequencing (NGS of numerous complete plastid genomes from across the phylogenetic breadth of angiosperms. Methods and Results: A custom RNA probe set including the complete sequences of 22 previously sequenced eudicot plastomes was designed to facilitate hybridization-based targeted enrichment of eudicot plastid genomes. Using this probe set and an Agilent SureSelect targeted enrichment kit, we conducted an enrichment experiment including 24 angiosperms (22 eudicots, two monocots, which were subsequently sequenced on a single lane of the Illumina GAIIx with single-end, 100-bp reads. This approach yielded nearly complete to complete plastid genomes with exceptionally high coverage (mean coverage: 717×, even for the two monocots. Conclusions: Our enrichment experiment was highly successful even though many aspects of the capture process employed were suboptimal. Hence, significant improvements to this methodology are feasible. With this general approach and probe set, it should be possible to sequence more than 300 essentially complete plastid genomes in a single Illumina GAIIx lane (achieving 50× mean coverage. However, given the complications of pooling numerous samples for multiplex sequencing and the limited number of barcodes (e.g., 96 available in commercial kits, we recommend 96 samples as a current practical maximum for multiplex plastome sequencing. This high-throughput approach should facilitate large-scale plastid genome sequencing at any level of phylogenetic diversity in angiosperms.

  2. BIG DATA ANALYTICS AND PRECISION ANIMAL AGRICULTURE SYMPOSIUM: Data to decisions.

    Science.gov (United States)

    White, B J; Amrine, D E; Larson, R L

    2018-04-14

    Big data are frequently used in many facets of business and agronomy to enhance knowledge needed to improve operational decisions. Livestock operations collect data of sufficient quantity to perform predictive analytics. Predictive analytics can be defined as a methodology and suite of data evaluation techniques to generate a prediction for specific target outcomes. The objective of this manuscript is to describe the process of using big data and the predictive analytic framework to create tools to drive decisions in livestock production, health, and welfare. The predictive analytic process involves selecting a target variable, managing the data, partitioning the data, then creating algorithms, refining algorithms, and finally comparing accuracy of the created classifiers. The partitioning of the datasets allows model building and refining to occur prior to testing the predictive accuracy of the model with naive data to evaluate overall accuracy. Many different classification algorithms are available for predictive use and testing multiple algorithms can lead to optimal results. Application of a systematic process for predictive analytics using data that is currently collected or that could be collected on livestock operations will facilitate precision animal management through enhanced livestock operational decisions.

  3. NASA's Big Data Task Force

    Science.gov (United States)

    Holmes, C. P.; Kinter, J. L.; Beebe, R. F.; Feigelson, E.; Hurlburt, N. E.; Mentzel, C.; Smith, G.; Tino, C.; Walker, R. J.

    2017-12-01

    Two years ago NASA established the Ad Hoc Big Data Task Force (BDTF - https://science.nasa.gov/science-committee/subcommittees/big-data-task-force), an advisory working group with the NASA Advisory Council system. The scope of the Task Force included all NASA Big Data programs, projects, missions, and activities. The Task Force focused on such topics as exploring the existing and planned evolution of NASA's science data cyber-infrastructure that supports broad access to data repositories for NASA Science Mission Directorate missions; best practices within NASA, other Federal agencies, private industry and research institutions; and Federal initiatives related to big data and data access. The BDTF has completed its two-year term and produced several recommendations plus four white papers for NASA's Science Mission Directorate. This presentation will discuss the activities and results of the TF including summaries of key points from its focused study topics. The paper serves as an introduction to the papers following in this ESSI session.

  4. Big Data Technologies

    Science.gov (United States)

    Bellazzi, Riccardo; Dagliati, Arianna; Sacchi, Lucia; Segagni, Daniele

    2015-01-01

    The so-called big data revolution provides substantial opportunities to diabetes management. At least 3 important directions are currently of great interest. First, the integration of different sources of information, from primary and secondary care to administrative information, may allow depicting a novel view of patient’s care processes and of single patient’s behaviors, taking into account the multifaceted nature of chronic care. Second, the availability of novel diabetes technologies, able to gather large amounts of real-time data, requires the implementation of distributed platforms for data analysis and decision support. Finally, the inclusion of geographical and environmental information into such complex IT systems may further increase the capability of interpreting the data gathered and extract new knowledge from them. This article reviews the main concepts and definitions related to big data, it presents some efforts in health care, and discusses the potential role of big data in diabetes care. Finally, as an example, it describes the research efforts carried on in the MOSAIC project, funded by the European Commission. PMID:25910540

  5. Genome - wide variation and demographic history of small cats with a focus on Felis species

    Directory of Open Access Journals (Sweden)

    Anubhab Khan

    2017-10-01

    Full Text Available Majority of the 38 known cat species are classified as small and they inhabit five of the seven continents. They survive in a vast range of habitats but still 12 out of the 18 threatened felids are small cats. However, there has not been enough progress in the field of small cat research as they generally get overshadowed by the charismatic big cats. Here we attempt to create a resource for small cat research especially of the genus Felis which has six species out of which two are classified as vulnerable by IUCN and at least one more is at risk. We collected tissue samples of four Felis chaus (Jungle cat from central India and used available whole genome sequences of nine individuals from four other Felis species, two individuals of Prionailurus bengalensis and an Otocolobus manul. These whole genome sequences were filtered and aligned with the already published domestic cat (Felis catus genome assembly. Felids are closely related species and reads from all species in our study aligned with the domestic cat genome with a rate of at least 93%. We estimated the existing genomic variation by calculating heterozygous SNP encounter rate. So far, it seems that all wild cats have more genetic variation than Felis catus species. This can be attributed to the inbreeding in these cats. Among the wild cats, Felis silvestris seems to have the highest level of genetic variation. To understand the reasons behind the distribution of genetic variation in small cats, we estimated the demographic histories of each of the species using PSMC. This method can only detect demographic changes more than 1000 generations ago. We observe that roughly all species share a parallel history in terms of population increase. The most interesting and important feature might be that all wild small cat population sizes increased exponentially around twenty thousand years ago as opposed to domestic cat and big cats which declined around this time. Another interesting feature of

  6. The Berlin Inventory of Gambling behavior - Screening (BIG-S): Validation using a clinical sample.

    Science.gov (United States)

    Wejbera, Martin; Müller, Kai W; Becker, Jan; Beutel, Manfred E

    2017-05-18

    Published diagnostic questionnaires for gambling disorder in German are either based on DSM-III criteria or focus on aspects other than life time prevalence. This study was designed to assess the usability of the DSM-IV criteria based Berlin Inventory of Gambling Behavior Screening tool in a clinical sample and adapt it to DSM-5 criteria. In a sample of 432 patients presenting for behavioral addiction assessment at the University Medical Center Mainz, we checked the screening tool's results against clinical diagnosis and compared a subsample of n=300 clinically diagnosed gambling disorder patients with a comparison group of n=132. The BIG-S produced a sensitivity of 99.7% and a specificity of 96.2%. The instrument's unidimensionality and the diagnostic improvements of DSM-5 criteria were verified by exploratory and confirmatory factor analysis as well as receiver operating characteristic analysis. The BIG-S is a reliable and valid screening tool for gambling disorder and demonstrated its concise and comprehensible operationalization of current DSM-5 criteria in a clinical setting.

  7. LifeStyle-Specific-Islands (LiSSI): Integrated Bioinformatics Platform for Genomic Island Analysis

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Rottger, Richard; Hauschild, Anne-Christin

    2017-01-01

    Distinct bacteria are able to cope with highly diverse lifestyles; for instance, they can be free living or host-associated. Thus, these organisms must possess a large and varied genomic arsenal to withstand different environmental conditions. To facilitate the identification of genomic features ...

  8. Traffic information computing platform for big data

    Energy Technology Data Exchange (ETDEWEB)

    Duan, Zongtao, E-mail: ztduan@chd.edu.cn; Li, Ying, E-mail: ztduan@chd.edu.cn; Zheng, Xibin, E-mail: ztduan@chd.edu.cn; Liu, Yan, E-mail: ztduan@chd.edu.cn; Dai, Jiting, E-mail: ztduan@chd.edu.cn; Kang, Jun, E-mail: ztduan@chd.edu.cn [Chang' an University School of Information Engineering, Xi' an, China and Shaanxi Engineering and Technical Research Center for Road and Traffic Detection, Xi' an (China)

    2014-10-06

    Big data environment create data conditions for improving the quality of traffic information service. The target of this article is to construct a traffic information computing platform for big data environment. Through in-depth analysis the connotation and technology characteristics of big data and traffic information service, a distributed traffic atomic information computing platform architecture is proposed. Under the big data environment, this type of traffic atomic information computing architecture helps to guarantee the traffic safety and efficient operation, more intelligent and personalized traffic information service can be used for the traffic information users.

  9. Traffic information computing platform for big data

    International Nuclear Information System (INIS)

    Duan, Zongtao; Li, Ying; Zheng, Xibin; Liu, Yan; Dai, Jiting; Kang, Jun

    2014-01-01

    Big data environment create data conditions for improving the quality of traffic information service. The target of this article is to construct a traffic information computing platform for big data environment. Through in-depth analysis the connotation and technology characteristics of big data and traffic information service, a distributed traffic atomic information computing platform architecture is proposed. Under the big data environment, this type of traffic atomic information computing architecture helps to guarantee the traffic safety and efficient operation, more intelligent and personalized traffic information service can be used for the traffic information users

  10. Fremtidens landbrug bliver big business

    DEFF Research Database (Denmark)

    Hansen, Henning Otte

    2016-01-01

    Landbrugets omverdensforhold og konkurrencevilkår ændres, og det vil nødvendiggøre en udvikling i retning af “big business“, hvor landbrugene bliver endnu større, mere industrialiserede og koncentrerede. Big business bliver en dominerende udvikling i dansk landbrug - men ikke den eneste...

  11. Quantum nature of the big bang.

    Science.gov (United States)

    Ashtekar, Abhay; Pawlowski, Tomasz; Singh, Parampreet

    2006-04-14

    Some long-standing issues concerning the quantum nature of the big bang are resolved in the context of homogeneous isotropic models with a scalar field. Specifically, the known results on the resolution of the big-bang singularity in loop quantum cosmology are significantly extended as follows: (i) the scalar field is shown to serve as an internal clock, thereby providing a detailed realization of the "emergent time" idea; (ii) the physical Hilbert space, Dirac observables, and semiclassical states are constructed rigorously; (iii) the Hamiltonian constraint is solved numerically to show that the big bang is replaced by a big bounce. Thanks to the nonperturbative, background independent methods, unlike in other approaches the quantum evolution is deterministic across the deep Planck regime.

  12. Mentoring in Schools: An Impact Study of Big Brothers Big Sisters School-Based Mentoring

    Science.gov (United States)

    Herrera, Carla; Grossman, Jean Baldwin; Kauh, Tina J.; McMaken, Jennifer

    2011-01-01

    This random assignment impact study of Big Brothers Big Sisters School-Based Mentoring involved 1,139 9- to 16-year-old students in 10 cities nationwide. Youth were randomly assigned to either a treatment group (receiving mentoring) or a control group (receiving no mentoring) and were followed for 1.5 school years. At the end of the first school…

  13. Dcode.org anthology of comparative genomic tools.

    Science.gov (United States)

    Loots, Gabriela G; Ovcharenko, Ivan

    2005-07-01

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.

  14. CRISPR-Cpf1: A New Tool for Plant Genome Editing

    KAUST Repository

    Zaidi, Syed Shan-e-Ali; Mahfouz, Magdy M.; Mansoor, Shahid

    2017-01-01

    Clustered regularly interspaced palindromic repeats (CRISPR)-CRISPR-associated proteins (CRISPR-Cas), a groundbreaking genome-engineering tool, has facilitated targeted trait improvement in plants. Recently, CRISPR-CRISPR from Prevotella and Francisella 1 (Cpf1) has emerged as a new tool for efficient genome editing, including DNA-free editing in plants, with higher efficiency, specificity, and potentially wider applications than CRISPR-Cas9.

  15. CRISPR-Cpf1: A New Tool for Plant Genome Editing

    KAUST Repository

    Zaidi, Syed Shan-e-Ali

    2017-05-19

    Clustered regularly interspaced palindromic repeats (CRISPR)-CRISPR-associated proteins (CRISPR-Cas), a groundbreaking genome-engineering tool, has facilitated targeted trait improvement in plants. Recently, CRISPR-CRISPR from Prevotella and Francisella 1 (Cpf1) has emerged as a new tool for efficient genome editing, including DNA-free editing in plants, with higher efficiency, specificity, and potentially wider applications than CRISPR-Cas9.

  16. Big data processing in the cloud - Challenges and platforms

    Science.gov (United States)

    Zhelev, Svetoslav; Rozeva, Anna

    2017-12-01

    Choosing the appropriate architecture and technologies for a big data project is a difficult task, which requires extensive knowledge in both the problem domain and in the big data landscape. The paper analyzes the main big data architectures and the most widely implemented technologies used for processing and persisting big data. Clouds provide for dynamic resource scaling, which makes them a natural fit for big data applications. Basic cloud computing service models are presented. Two architectures for processing big data are discussed, Lambda and Kappa architectures. Technologies for big data persistence are presented and analyzed. Stream processing as the most important and difficult to manage is outlined. The paper highlights main advantages of cloud and potential problems.

  17. Ethics and Epistemology in Big Data Research.

    Science.gov (United States)

    Lipworth, Wendy; Mason, Paul H; Kerridge, Ian; Ioannidis, John P A

    2017-12-01

    Biomedical innovation and translation are increasingly emphasizing research using "big data." The hope is that big data methods will both speed up research and make its results more applicable to "real-world" patients and health services. While big data research has been embraced by scientists, politicians, industry, and the public, numerous ethical, organizational, and technical/methodological concerns have also been raised. With respect to technical and methodological concerns, there is a view that these will be resolved through sophisticated information technologies, predictive algorithms, and data analysis techniques. While such advances will likely go some way towards resolving technical and methodological issues, we believe that the epistemological issues raised by big data research have important ethical implications and raise questions about the very possibility of big data research achieving its goals.

  18. Victoria Stodden: Scholarly Communication in the Era of Big Data and Big Computation

    OpenAIRE

    Stodden, Victoria

    2015-01-01

    Victoria Stodden gave the keynote address for Open Access Week 2015. "Scholarly communication in the era of big data and big computation" was sponsored by the University Libraries, Computational Modeling and Data Analytics, the Department of Computer Science, the Department of Statistics, the Laboratory for Interdisciplinary Statistical Analysis (LISA), and the Virginia Bioinformatics Institute. Victoria Stodden is an associate professor in the Graduate School of Library and Information Scien...

  19. Big Data: Concept, Potentialities and Vulnerabilities

    Directory of Open Access Journals (Sweden)

    Fernando Almeida

    2018-03-01

    Full Text Available The evolution of information systems and the growth in the use of the Internet and social networks has caused an explosion in the amount of available data relevant to the activities of the companies. Therefore, the treatment of these available data is vital to support operational, tactical and strategic decisions. This paper aims to present the concept of big data and the main technologies that support the analysis of large data volumes. The potential of big data is explored considering nine sectors of activity, such as financial, retail, healthcare, transports, agriculture, energy, manufacturing, public, and media and entertainment. In addition, the main current opportunities, vulnerabilities and privacy challenges of big data are discussed. It was possible to conclude that despite the potential for using the big data to grow in the previously identified areas, there are still some challenges that need to be considered and mitigated, namely the privacy of information, the existence of qualified human resources to work with Big Data and the promotion of a data-driven organizational culture.

  20. Big data analytics a management perspective

    CERN Document Server

    Corea, Francesco

    2016-01-01

    This book is about innovation, big data, and data science seen from a business perspective. Big data is a buzzword nowadays, and there is a growing necessity within practitioners to understand better the phenomenon, starting from a clear stated definition. This book aims to be a starting reading for executives who want (and need) to keep the pace with the technological breakthrough introduced by new analytical techniques and piles of data. Common myths about big data will be explained, and a series of different strategic approaches will be provided. By browsing the book, it will be possible to learn how to implement a big data strategy and how to use a maturity framework to monitor the progress of the data science team, as well as how to move forward from one stage to the next. Crucial challenges related to big data will be discussed, where some of them are more general - such as ethics, privacy, and ownership – while others concern more specific business situations (e.g., initial public offering, growth st...

  1. Managing the genomic revolution in cancer diagnostics.

    Science.gov (United States)

    Nguyen, Doreen; Gocke, Christopher D

    2017-08-01

    Molecular tumor profiling is now a routine part of patient care, revealing targetable genomic alterations and molecularly distinct tumor subtypes with therapeutic and prognostic implications. The widespread adoption of next-generation sequencing technologies has greatly facilitated clinical implementation of genomic data and opened the door for high-throughput multigene-targeted sequencing. Herein, we discuss the variability of cancer genetic profiling currently offered by clinical laboratories, the challenges of applying rapidly evolving medical knowledge to individual patients, and the need for more standardized population-based molecular profiling.

  2. Human factors in Big Data

    NARCIS (Netherlands)

    Boer, J. de

    2016-01-01

    Since 2014 I am involved in various (research) projects that try to make the hype around Big Data more concrete and tangible for the industry and government. Big Data is about multiple sources of (real-time) data that can be analysed, transformed to information and be used to make 'smart' decisions.

  3. Slaves to Big Data. Or Are We?

    Directory of Open Access Journals (Sweden)

    Mireille Hildebrandt

    2013-10-01

    Full Text Available

    In this contribution, the notion of Big Data is discussed in relation to the monetisation of personal data. The claim of some proponents, as well as adversaries, that Big Data implies that ‘n = all’, meaning that we no longer need to rely on samples because we have all the data, is scrutinised and found to be both overly optimistic and unnecessarily pessimistic. A set of epistemological and ethical issues is presented, focusing on the implications of Big Data for our perception, cognition, fairness, privacy and due process. The article then looks into the idea of user-centric personal data management to investigate to what extent it provides solutions for some of the problems triggered by the Big Data conundrum. Special attention is paid to the core principle of data protection legislation, namely purpose binding. Finally, this contribution seeks to inquire into the influence of Big Data politics on self, mind and society, and asks how we can prevent ourselves from becoming slaves to Big Data.

  4. Will Organization Design Be Affected By Big Data?

    Directory of Open Access Journals (Sweden)

    Giles Slinger

    2014-12-01

    Full Text Available Computing power and analytical methods allow us to create, collate, and analyze more data than ever before. When datasets are unusually large in volume, velocity, and variety, they are referred to as “big data.” Some observers have suggested that in order to cope with big data (a organizational structures will need to change and (b the processes used to design organizations will be different. In this article, we differentiate big data from relatively slow-moving, linked people data. We argue that big data will change organizational structures as organizations pursue the opportunities presented by big data. The processes by which organizations are designed, however, will be relatively unaffected by big data. Instead, organization design processes will be more affected by the complex links found in people data.

  5. Official statistics and Big Data

    Directory of Open Access Journals (Sweden)

    Peter Struijs

    2014-07-01

    Full Text Available The rise of Big Data changes the context in which organisations producing official statistics operate. Big Data provides opportunities, but in order to make optimal use of Big Data, a number of challenges have to be addressed. This stimulates increased collaboration between National Statistical Institutes, Big Data holders, businesses and universities. In time, this may lead to a shift in the role of statistical institutes in the provision of high-quality and impartial statistical information to society. In this paper, the changes in context, the opportunities, the challenges and the way to collaborate are addressed. The collaboration between the various stakeholders will involve each partner building on and contributing different strengths. For national statistical offices, traditional strengths include, on the one hand, the ability to collect data and combine data sources with statistical products and, on the other hand, their focus on quality, transparency and sound methodology. In the Big Data era of competing and multiplying data sources, they continue to have a unique knowledge of official statistical production methods. And their impartiality and respect for privacy as enshrined in law uniquely position them as a trusted third party. Based on this, they may advise on the quality and validity of information of various sources. By thus positioning themselves, they will be able to play their role as key information providers in a changing society.

  6. Big Data

    OpenAIRE

    Bútora, Matúš

    2017-01-01

    Cieľom bakalárskej práca je popísať problematiku Big Data a agregačné operácie OLAP pre podporu rozhodovania, ktoré sú na ne aplikované pomocou technológie Apache Hadoop. Prevažná časť práce je venovaná popisu práve tejto technológie. Posledná kapitola sa zaoberá spôsobom aplikovania agregačných operácií a problematikou ich realizácie. Nasleduje celkové zhodnotenie práce a možnosti využitia výsledného systému do budúcna. The aim of the bachelor thesis is to describe the Big Data issue and ...

  7. Recombination and evolution of duplicate control regions in the mitochondrial genome of the Asian big-headed turtle, Platysternon megacephalum.

    Directory of Open Access Journals (Sweden)

    Chenfei Zheng

    Full Text Available Complete mitochondrial (mt genome sequences with duplicate control regions (CRs have been detected in various animal species. In Testudines, duplicate mtCRs have been reported in the mtDNA of the Asian big-headed turtle, Platysternon megacephalum, which has three living subspecies. However, the evolutionary pattern of these CRs remains unclear. In this study, we report the completed sequences of duplicate CRs from 20 individuals belonging to three subspecies of this turtle and discuss the micro-evolutionary analysis of the evolution of duplicate CRs. Genetic distances calculated with MEGA 4.1 using the complete duplicate CR sequences revealed that within turtle subspecies, genetic distances between orthologous copies from different individuals were 0.63% for CR1 and 1.2% for CR2app:addword:respectively, and the average distance between paralogous copies of CR1 and CR2 was 4.8%. Phylogenetic relationships were reconstructed from the CR sequences, excluding the variable number of tandem repeats (VNTRs at the 3' end using three methods: neighbor-joining, maximum likelihood algorithm, and Bayesian inference. These data show that any two CRs within individuals were more genetically distant from orthologous genes in different individuals within the same subspecies. This suggests independent evolution of the two mtCRs within each P. megacephalum subspecies. Reconstruction of separate phylogenetic trees using different CR components (TAS, CD, CSB, and VNTRs suggested the role of recombination in the evolution of duplicate CRs. Consequently, recombination events were detected using RDP software with break points at ≈290 bp and ≈1,080 bp. Based on these results, we hypothesize that duplicate CRs in P. megacephalum originated from heterological ancestral recombination of mtDNA. Subsequent recombination could have resulted in homogenization during independent evolutionary events, thus maintaining the functions of duplicate CRs in the mtDNA of P

  8. Application of Genomic In Situ Hybridization in Horticultural Science

    Directory of Open Access Journals (Sweden)

    Fahad Ramzan

    2017-01-01

    Full Text Available Molecular cytogenetic techniques, such as in situ hybridization methods, are admirable tools to analyze the genomic structure and function, chromosome constituents, recombination patterns, alien gene introgression, genome evolution, aneuploidy, and polyploidy and also genome constitution visualization and chromosome discrimination from different genomes in allopolyploids of various horticultural crops. Using GISH advancement as multicolor detection is a significant approach to analyze the small and numerous chromosomes in fruit species, for example, Diospyros hybrids. This analytical technique has proved to be the most exact and effective way for hybrid status confirmation and helps remarkably to distinguish donor parental genomes in hybrids such as Clivia, Rhododendron, and Lycoris ornamental hybrids. The genome characterization facilitates in hybrid selection having potential desirable characteristics during the early hybridization breeding, as this technique expedites to detect introgressed sequence chromosomes. This review study epitomizes applications and advancements of genomic in situ hybridization (GISH techniques in horticultural plants.

  9. Introns: The Functional Benefits of Introns in Genomes

    Directory of Open Access Journals (Sweden)

    Bong-Seok Jo

    2015-12-01

    Full Text Available The intron has been a big biological mystery since it was first discovered in several aspects. First, all of the completely sequenced eukaryotes harbor introns in the genomic structure, whereas no prokaryotes identified so far carry introns. Second, the amount of total introns varies in different species. Third, the length and number of introns vary in different genes, even within the same species genome. Fourth, all introns are copied into RNAs by transcription and DNAs by replication processes, but intron sequences do not participate in protein-coding sequences. The existence of introns in the genome should be a burden to some cells, because cells have to consume a great deal of energy to copy and excise them exactly at the correct positions with the help of complicated spliceosomal machineries. The existence throughout the long evolutionary history is explained, only if selective advantages of carrying introns are assumed to be given to cells to overcome the negative effect of introns. In that regard, we summarize previous research about the functional roles or benefits of introns. Additionally, several other studies strongly suggesting that introns should not be junk will be introduced.

  10. Introns: The Functional Benefits of Introns in Genomes.

    Science.gov (United States)

    Jo, Bong-Seok; Choi, Sun Shim

    2015-12-01

    The intron has been a big biological mystery since it was first discovered in several aspects. First, all of the completely sequenced eukaryotes harbor introns in the genomic structure, whereas no prokaryotes identified so far carry introns. Second, the amount of total introns varies in different species. Third, the length and number of introns vary in different genes, even within the same species genome. Fourth, all introns are copied into RNAs by transcription and DNAs by replication processes, but intron sequences do not participate in protein-coding sequences. The existence of introns in the genome should be a burden to some cells, because cells have to consume a great deal of energy to copy and excise them exactly at the correct positions with the help of complicated spliceosomal machineries. The existence throughout the long evolutionary history is explained, only if selective advantages of carrying introns are assumed to be given to cells to overcome the negative effect of introns. In that regard, we summarize previous research about the functional roles or benefits of introns. Additionally, several other studies strongly suggesting that introns should not be junk will be introduced.

  11. BigDansing

    KAUST Repository

    Khayyat, Zuhair; Ilyas, Ihab F.; Jindal, Alekh; Madden, Samuel; Ouzzani, Mourad; Papotti, Paolo; Quiané -Ruiz, Jorge-Arnulfo; Tang, Nan; Yin, Si

    2015-01-01

    of the underlying distributed platform. BigDansing takes these rules into a series of transformations that enable distributed computations and several optimizations, such as shared scans and specialized joins operators. Experimental results on both synthetic

  12. Leveraging Mobile Network Big Data for Developmental Policy ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Some argue that big data and big data users offer advantages to generate evidence. ... Supported by IDRC, this research focused on transportation planning in urban ... Using mobile network big data for land use classification CPRsouth 2015.

  13. dartr: An r package to facilitate analysis of SNP data generated from reduced representation genome sequencing.

    Science.gov (United States)

    Gruber, Bernd; Unmack, Peter J; Berry, Oliver F; Georges, Arthur

    2018-05-01

    Although vast technological advances have been made and genetic software packages are growing in number, it is not a trivial task to analyse SNP data. We announce a new r package, dartr, enabling the analysis of single nucleotide polymorphism data for population genomic and phylogenomic applications. dartr provides user-friendly functions for data quality control and marker selection, and permits rigorous evaluations of conformation to Hardy-Weinberg equilibrium, gametic-phase disequilibrium and neutrality. The package reports standard descriptive statistics, permits exploration of patterns in the data through principal components analysis and conducts standard F-statistics, as well as basic phylogenetic analyses, population assignment, isolation by distance and exports data to a variety of commonly used downstream applications (e.g., newhybrids, faststructure and phylogeny applications) outside of the r environment. The package serves two main purposes: first, a user-friendly approach to lower the hurdle to analyse such data-therefore, the package comes with a detailed tutorial targeted to the r beginner to allow data analysis without requiring deep knowledge of r. Second, we use a single, well-established format-genlight from the adegenet package-as input for all our functions to avoid data reformatting. By strictly using the genlight format, we hope to facilitate this format as the de facto standard of future software developments and hence reduce the format jungle of genetic data sets. The dartr package is available via the r CRAN network and GitHub. © 2017 John Wiley & Sons Ltd.

  14. Practice Variation in Big-4 Transparency Reports

    DEFF Research Database (Denmark)

    Girdhar, Sakshi; Klarskov Jeppesen, Kim

    2018-01-01

    Purpose: The purpose of this paper is to examine the transparency reports published by the Big-4 public accounting firms in the UK, Germany and Denmark to understand the determinants of their content within the networks of big accounting firms. Design/methodology/approach: The study draws...... on a qualitative research approach, in which the content of transparency reports is analyzed and semi-structured interviews are conducted with key people from the Big-4 firms who are responsible for developing the transparency reports. Findings: The findings show that the content of transparency reports...... is inconsistent and the transparency reporting practice is not uniform within the Big-4 networks. Differences were found in the way in which the transparency reporting practices are coordinated globally by the respective central governing bodies of the Big-4. The content of the transparency reports...

  15. Alignment of whole genomes.

    Science.gov (United States)

    Delcher, A L; Kasif, S; Fleischmann, R D; Peterson, J; White, O; Salzberg, S L

    1999-01-01

    A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications. PMID:10325427

  16. Summarizing an Ontology: A "Big Knowledge" Coverage Approach.

    Science.gov (United States)

    Zheng, Ling; Perl, Yehoshua; Elhanan, Gai; Ochs, Christopher; Geller, James; Halper, Michael

    2017-01-01

    Maintenance and use of a large ontology, consisting of thousands of knowledge assertions, are hampered by its scope and complexity. It is important to provide tools for summarization of ontology content in order to facilitate user "big picture" comprehension. We present a parameterized methodology for the semi-automatic summarization of major topics in an ontology, based on a compact summary of the ontology, called an "aggregate partial-area taxonomy", followed by manual enhancement. An experiment is presented to test the effectiveness of such summarization measured by coverage of a given list of major topics of the corresponding application domain. SNOMED CT's Specimen hierarchy is the test-bed. A domain-expert provided a list of topics that serves as a gold standard. The enhanced results show that the aggregate taxonomy covers most of the domain's main topics.

  17. Big data and biomedical informatics: a challenging opportunity.

    Science.gov (United States)

    Bellazzi, R

    2014-05-22

    Big data are receiving an increasing attention in biomedicine and healthcare. It is therefore important to understand the reason why big data are assuming a crucial role for the biomedical informatics community. The capability of handling big data is becoming an enabler to carry out unprecedented research studies and to implement new models of healthcare delivery. Therefore, it is first necessary to deeply understand the four elements that constitute big data, namely Volume, Variety, Velocity, and Veracity, and their meaning in practice. Then, it is mandatory to understand where big data are present, and where they can be beneficially collected. There are research fields, such as translational bioinformatics, which need to rely on big data technologies to withstand the shock wave of data that is generated every day. Other areas, ranging from epidemiology to clinical care, can benefit from the exploitation of the large amounts of data that are nowadays available, from personal monitoring to primary care. However, building big data-enabled systems carries on relevant implications in terms of reproducibility of research studies and management of privacy and data access; proper actions should be taken to deal with these issues. An interesting consequence of the big data scenario is the availability of new software, methods, and tools, such as map-reduce, cloud computing, and concept drift machine learning algorithms, which will not only contribute to big data research, but may be beneficial in many biomedical informatics applications. The way forward with the big data opportunity will require properly applied engineering principles to design studies and applications, to avoid preconceptions or over-enthusiasms, to fully exploit the available technologies, and to improve data processing and data management regulations.

  18. Was the big bang hot

    International Nuclear Information System (INIS)

    Wright, E.L.

    1983-01-01

    The author considers experiments to confirm the substantial deviations from a Planck curve in the Woody and Richards spectrum of the microwave background, and search for conducting needles in our galaxy. Spectral deviations and needle-shaped grains are expected for a cold Big Bang, but are not required by a hot Big Bang. (Auth.)

  19. Passport to the Big Bang

    CERN Multimedia

    De Melis, Cinzia

    2013-01-01

    Le 2 juin 2013, le CERN inaugure le projet Passeport Big Bang lors d'un grand événement public. Affiche et programme. On 2 June 2013 CERN launches a scientific tourist trail through the Pays de Gex and the Canton of Geneva known as the Passport to the Big Bang. Poster and Programme.

  20. Biomedical Big Data Training Collaborative (BBDTC): An effort to bridge the talent gap in biomedical science and research.

    Science.gov (United States)

    Purawat, Shweta; Cowart, Charles; Amaro, Rommie E; Altintas, Ilkay

    2016-06-01

    The BBDTC (https://biobigdata.ucsd.edu) is a community-oriented platform to encourage high-quality knowledge dissemination with the aim of growing a well-informed biomedical big data community through collaborative efforts on training and education. The BBDTC collaborative is an e-learning platform that supports the biomedical community to access, develop and deploy open training materials. The BBDTC supports Big Data skill training for biomedical scientists at all levels, and from varied backgrounds. The natural hierarchy of courses allows them to be broken into and handled as modules . Modules can be reused in the context of multiple courses and reshuffled, producing a new and different, dynamic course called a playlist . Users may create playlists to suit their learning requirements and share it with individual users or the wider public. BBDTC leverages the maturity and design of the HUBzero content-management platform for delivering educational content. To facilitate the migration of existing content, the BBDTC supports importing and exporting course material from the edX platform. Migration tools will be extended in the future to support other platforms. Hands-on training software packages, i.e., toolboxes , are supported through Amazon EC2 and Virtualbox virtualization technologies, and they are available as: ( i ) downloadable lightweight Virtualbox Images providing a standardized software tool environment with software packages and test data on their personal machines, and ( ii ) remotely accessible Amazon EC2 Virtual Machines for accessing biomedical big data tools and scalable big data experiments. At the moment, the BBDTC site contains three open Biomedical big data training courses with lecture contents, videos and hands-on training utilizing VM toolboxes, covering diverse topics. The courses have enhanced the hands-on learning environment by providing structured content that users can use at their own pace. A four course biomedical big data series is