WorldWideScience

Sample records for big genomes facilitate

  1. Big Data Analytics for Genomic Medicine.

    Science.gov (United States)

    He, Karen Y; Ge, Dongliang; He, Max M

    2017-02-15

    Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients' genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.

  2. Big Data Analytics for Genomic Medicine

    Science.gov (United States)

    He, Karen Y.; Ge, Dongliang; He, Max M.

    2017-01-01

    Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients’ genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs. PMID:28212287

  3. Privacy Challenges of Genomic Big Data.

    Science.gov (United States)

    Shen, Hong; Ma, Jian

    2017-01-01

    With the rapid advancement of high-throughput DNA sequencing technologies, genomics has become a big data discipline where large-scale genetic information of human individuals can be obtained efficiently with low cost. However, such massive amount of personal genomic data creates tremendous challenge for privacy, especially given the emergence of direct-to-consumer (DTC) industry that provides genetic testing services. Here we review the recent development in genomic big data and its implications on privacy. We also discuss the current dilemmas and future challenges of genomic privacy.

  4. Big Data Analysis of Human Genome Variations

    KAUST Repository

    Gojobori, Takashi

    2016-01-25

    Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations: 1) HapMap Data (1,417 individuals) (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/), 2) HGDP (Human Genome Diversity Project) Data (940 individuals) (http://www.hagsc.org/hgdp/files.html), 3) 1000 genomes Data (2,504 individuals) http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets. Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.

  5. Genome Variation Map: a data repository of genome variations in BIG Data Center

    OpenAIRE

    Song, Shuhui; Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang; Zhang, Zhang

    2017-01-01

    Abstract The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research a...

  6. The Human Genome Project: big science transforms biology and medicine

    OpenAIRE

    Hood, Leroy; Rowen, Lee

    2013-01-01

    The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called ‘big science’ - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and a...

  7. Big Data Analysis of Human Genome Variations

    KAUST Repository

    Gojobori, Takashi

    2016-01-01

    Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human

  8. The Human Genome Project: big science transforms biology and medicine.

    Science.gov (United States)

    Hood, Leroy; Rowen, Lee

    2013-01-01

    The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called 'big science' - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and analytical tools, and how it brought the expertise of engineers, computer scientists and mathematicians together with biologists. It established an open approach to data sharing and open-source software, thereby making the data resulting from the project accessible to all. The genome sequences of microbes, plants and animals have revolutionized many fields of science, including microbiology, virology, infectious disease and plant biology. Moreover, deeper knowledge of human sequence variation has begun to alter the practice of medicine. The Human Genome Project has inspired subsequent large-scale data acquisition initiatives such as the International HapMap Project, 1000 Genomes, and The Cancer Genome Atlas, as well as the recently announced Human Brain Project and the emerging Human Proteome Project.

  9. Genome Variation Map: a data repository of genome variations in BIG Data Center.

    Science.gov (United States)

    Song, Shuhui; Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang; Zhang, Zhang

    2018-01-04

    The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Unlike existing related databases, GVM features integration of a large number of genome variations for a broad diversity of species including human, cultivated plants and domesticated animals. Specifically, the current implementation of GVM not only houses a total of ∼4.9 billion variants for 19 species including chicken, dog, goat, human, poplar, rice and tomato, but also incorporates 8669 individual genotypes and 13 262 manually curated high-quality genotype-to-phenotype associations for non-human species. In addition, GVM provides friendly intuitive web interfaces for data submission, browse, search and visualization. Collectively, GVM serves as an important resource for archiving genomic variation data, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Genome Variation Map: a data repository of genome variations in BIG Data Center

    Science.gov (United States)

    Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang

    2018-01-01

    Abstract The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Unlike existing related databases, GVM features integration of a large number of genome variations for a broad diversity of species including human, cultivated plants and domesticated animals. Specifically, the current implementation of GVM not only houses a total of ∼4.9 billion variants for 19 species including chicken, dog, goat, human, poplar, rice and tomato, but also incorporates 8669 individual genotypes and 13 262 manually curated high-quality genotype-to-phenotype associations for non-human species. In addition, GVM provides friendly intuitive web interfaces for data submission, browse, search and visualization. Collectively, GVM serves as an important resource for archiving genomic variation data, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes. PMID:29069473

  11. Genome-wide signatures of complex introgression and adaptive evolution in the big cats

    Science.gov (United States)

    Figueiró, Henrique V.; Li, Gang; Trindade, Fernanda J.; Assis, Juliana; Pais, Fabiano; Fernandes, Gabriel; Santos, Sarah H. D.; Hughes, Graham M.; Komissarov, Aleksey; Antunes, Agostinho; Trinca, Cristine S.; Rodrigues, Maíra R.; Linderoth, Tyler; Bi, Ke; Silveira, Leandro; Azevedo, Fernando C. C.; Kantek, Daniel; Ramalho, Emiliano; Brassaloti, Ricardo A.; Villela, Priscilla M. S.; Nunes, Adauto L. V.; Teixeira, Rodrigo H. F.; Morato, Ronaldo G.; Loska, Damian; Saragüeta, Patricia; Gabaldón, Toni; Teeling, Emma C.; O’Brien, Stephen J.; Nielsen, Rasmus; Coutinho, Luiz L.; Oliveira, Guilherme; Murphy, William J.; Eizirik, Eduardo

    2017-01-01

    The great cats of the genus Panthera comprise a recent radiation whose evolutionary history is poorly understood. Their rapid diversification poses challenges to resolving their phylogeny while offering opportunities to investigate the historical dynamics of adaptive divergence. We report the sequence, de novo assembly, and annotation of the jaguar (Panthera onca) genome, a novel genome sequence for the leopard (Panthera pardus), and comparative analyses encompassing all living Panthera species. Demographic reconstructions indicated that all of these species have experienced variable episodes of population decline during the Pleistocene, ultimately leading to small effective sizes in present-day genomes. We observed pervasive genealogical discordance across Panthera genomes, caused by both incomplete lineage sorting and complex patterns of historical interspecific hybridization. We identified multiple signatures of species-specific positive selection, affecting genes involved in craniofacial and limb development, protein metabolism, hypoxia, reproduction, pigmentation, and sensory perception. There was remarkable concordance in pathways enriched in genomic segments implicated in interspecies introgression and in positive selection, suggesting that these processes were connected. We tested this hypothesis by developing exome capture probes targeting ~19,000 Panthera genes and applying them to 30 wild-caught jaguars. We found at least two genes (DOCK3 and COL4A5, both related to optic nerve development) bearing significant signatures of interspecies introgression and within-species positive selection. These findings indicate that post-speciation admixture has contributed genetic material that facilitated the adaptive evolution of big cat lineages. PMID:28776029

  12. Big data or bust: realizing the microbial genomics revolution.

    Science.gov (United States)

    Raza, Sobia; Luheshi, Leila

    2016-02-01

    Pathogen genomics has the potential to transform the clinical and public health management of infectious diseases through improved diagnosis, detection and tracking of antimicrobial resistance and outbreak control. However, the wide-ranging benefits of this technology can only fully be realized through the timely collation, integration and sharing of genomic and clinical/epidemiological metadata by all those involved in the delivery of genomic-informed services. As part of our review on bringing pathogen genomics into 'health-service' practice, we undertook extensive stakeholder consultation to examine the factors integral to achieving effective data sharing and integration. Infrastructure tailored to the needs of clinical users, as well as practical support and policies to facilitate the timely and responsible sharing of data with relevant health authorities and beyond, are all essential. We propose a tiered data sharing and integration model to maximize the immediate and longer term utility of microbial genomics in healthcare. Realizing this model at the scale and sophistication necessary to support national and international infection management services is not uncomplicated. Yet the establishment of a clear data strategy is paramount if failures in containing disease spread due to inadequate knowledge sharing are to be averted, and substantial progress made in tackling the dangers posed by infectious diseases.

  13. Opportunities and challenges of big data for the social sciences: The case of genomic data.

    Science.gov (United States)

    Liu, Hexuan; Guo, Guang

    2016-09-01

    In this paper, we draw attention to one unique and valuable source of big data, genomic data, by demonstrating the opportunities they provide to social scientists. We discuss different types of large-scale genomic data and recent advances in statistical methods and computational infrastructure used to address challenges in managing and analyzing such data. We highlight how these data and methods can be used to benefit social science research. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.

    Science.gov (United States)

    Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi

    2014-01-01

    With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).

  15. Mapping our genes: The genome projects: How big, how fast

    Energy Technology Data Exchange (ETDEWEB)

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  16. Mapping Our Genes: The Genome Projects: How Big, How Fast

    Science.gov (United States)

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for ?writing the rules? of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  17. A public resource facilitating clinical use of genomes.

    NARCIS (Netherlands)

    Ball, M.P.; Thakuria, J.V.; Zaranek, A.W.; Clegg, T.; Rosenbaum, A.M.; Wu, X.; Angrist, M.; Bhak, J.; Bobe, J.; Callow, M.J.; Cano, C.; Chou, M.F.; Chung, W.K.; Douglas, S.M.; Estep, P.W.; Gore, A.; Hulick, P.; Labarga, A.; Lee, J.-H.; Lunshof, J.E.; Kim, B.C.; Kim, J.L.; Li, Z.; Murray, M.F.; Nilsen, G.B.; Peters, B.A.; Raman, A.M.; Rienhoff, H.Y.; Robasky, K.; Wheeler, M.T.; Vandewege, W.; Vorhaus, D.B.; Yang, Y.L.; Yang, L.; Aach, J.; Ashley, E.A.; Drmanac, R.; Kim, S.-J.; Li, J.B.; Peshkin, L.; Seidman, S.E.; Seo, J.-S.; Zhang, K.; Rehm, H.L.; Church, G.M.

    2012-01-01

    Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the

  18. Personalized medicine beyond genomics: alternative futures in big data-proteomics, environtome and the social proteome.

    Science.gov (United States)

    Özdemir, Vural; Dove, Edward S; Gürsoy, Ulvi K; Şardaş, Semra; Yıldırım, Arif; Yılmaz, Şenay Görücü; Ömer Barlas, I; Güngör, Kıvanç; Mete, Alper; Srivastava, Sanjeeva

    2017-01-01

    No field in science and medicine today remains untouched by Big Data, and psychiatry is no exception. Proteomics is a Big Data technology and a next generation biomarker, supporting novel system diagnostics and therapeutics in psychiatry. Proteomics technology is, in fact, much older than genomics and dates to the 1970s, well before the launch of the international Human Genome Project. While the genome has long been framed as the master or "elite" executive molecule in cell biology, the proteome by contrast is humble. Yet the proteome is critical for life-it ensures the daily functioning of cells and whole organisms. In short, proteins are the blue-collar workers of biology, the down-to-earth molecules that we cannot live without. Since 2010, proteomics has found renewed meaning and international attention with the launch of the Human Proteome Project and the growing interest in Big Data technologies such as proteomics. This article presents an interdisciplinary technology foresight analysis and conceptualizes the terms "environtome" and "social proteome". We define "environtome" as the entire complement of elements external to the human host, from microbiome, ambient temperature and weather conditions to government innovation policies, stock market dynamics, human values, political power and social norms that collectively shape the human host spatially and temporally. The "social proteome" is the subset of the environtome that influences the transition of proteomics technology to innovative applications in society. The social proteome encompasses, for example, new reimbursement schemes and business innovation models for proteomics diagnostics that depart from the "once-a-life-time" genotypic tests and the anticipated hype attendant to context and time sensitive proteomics tests. Building on the "nesting principle" for governance of complex systems as discussed by Elinor Ostrom, we propose here a 3-tiered organizational architecture for Big Data science such as

  19. Complete mitochondrial genome of the big-eared horseshoe bat Rhinolophus macrotis (Chiroptera, Rhinolophidae).

    Science.gov (United States)

    Zhang, Lin; Sun, Keping; Feng, Jiang

    2016-11-01

    We sequenced and characterized the complete mitochondrial genome of the big-eared horseshoe bat, Rhinolophus macrotis. Total length of the mitogenome is 16,848 bp, with a base composition of 31.2% A, 25.3% T, 28.8% C and 14.7% G. The mitogenome consists of 13 protein-coding genes, 2 rRNA (12S and 16S rRNA) genes, 22 tRNA genes and 1 control region. It has the same gene arrangement pattern as those of typical vertebrate mitochondrial genome. The results will contribute to our understanding of the taxonomic status and evolution in the genus Rhinolophus bats.

  20. Figure 4 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Gene-list view of genomic data. The gene-list view allows users to compare data across a set of loci. The data in this figure includes copy number, mutation, and clinical data from 202 glioblastoma samples from TCGA. Adapted from Figure 7; Thorvaldsdottir H et al. 2012

  1. Figure 2 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Grouping and sorting genomic data in IGV. The IGV user interface displaying 202 glioblastoma samples from TCGA. Samples are grouped by tumor subtype (second annotation column) and data type (first annotation column) and sorted by copy number of the EGFR locus (middle column). Adapted from Figure 1; Robinson et al. 2011

  2. Figure 5 from Integrative Genomics Viewer: Visualizing Big Data | Office of Cancer Genomics

    Science.gov (United States)

    Split-Screen View. The split-screen view is useful for exploring relationships of genomic features that are independent of chromosomal location. Color is used here to indicate mate pairs that map to different chromosomes, chromosomes 1 and 6, suggesting a translocation event. Adapted from Figure 8; Thorvaldsdottir H et al. 2012

  3. Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying.

    Science.gov (United States)

    Masseroli, Marco; Kaitoua, Abdulrahman; Pinoli, Pietro; Ceri, Stefano

    2016-12-01

    While a huge amount of (epi)genomic data of multiple types is becoming available by using Next Generation Sequencing (NGS) technologies, the most important emerging problem is the so-called tertiary analysis, concerned with sense making, e.g., discovering how different (epi)genomic regions and their products interact and cooperate with each other. We propose a paradigm shift in tertiary analysis, based on the use of the Genomic Data Model (GDM), a simple data model which links genomic feature data to their associated experimental, biological and clinical metadata. GDM encompasses all the data formats which have been produced for feature extraction from (epi)genomic datasets. We specifically describe the mapping to GDM of SAM (Sequence Alignment/Map), VCF (Variant Call Format), NARROWPEAK (for called peaks produced by NGS ChIP-seq or DNase-seq methods), and BED (Browser Extensible Data) formats, but GDM supports as well all the formats describing experimental datasets (e.g., including copy number variations, DNA somatic mutations, or gene expressions) and annotations (e.g., regarding transcription start sites, genes, enhancers or CpG islands). We downloaded and integrated samples of all the above-mentioned data types and formats from multiple sources. The GDM is able to homogeneously describe semantically heterogeneous data and makes the ground for providing data interoperability, e.g., achieved through the GenoMetric Query Language (GMQL), a high-level, declarative query language for genomic big data. The combined use of the data model and the query language allows comprehensive processing of multiple heterogeneous data, and supports the development of domain-specific data-driven computations and bio-molecular knowledge discovery. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Harnessing Omics Big Data in Nine Vertebrate Species by Genome-Wide Prioritization of Sequence Variants with the Highest Predicted Deleterious Effect on Protein Function.

    Science.gov (United States)

    Rozman, Vita; Kunej, Tanja

    2018-05-10

    Harnessing the genomics big data requires innovation in how we extract and interpret biologically relevant variants. Currently, there is no established catalog of prioritized missense variants associated with deleterious protein function phenotypes. We report in this study, to the best of our knowledge, the first genome-wide prioritization of sequence variants with the most deleterious effect on protein function (potentially deleterious variants [pDelVars]) in nine vertebrate species: human, cattle, horse, sheep, pig, dog, rat, mouse, and zebrafish. The analysis was conducted using the Ensembl/BioMart tool. Genes comprising pDelVars in the highest number of examined species were identified using a Python script. Multiple genomic alignments of the selected genes were built to identify interspecies orthologous potentially deleterious variants, which we defined as the "ortho-pDelVars." Genome-wide prioritization revealed that in humans, 0.12% of the known variants are predicted to be deleterious. In seven out of nine examined vertebrate species, the genes encoding the multiple PDZ domain crumbs cell polarity complex component (MPDZ) and the transforming acidic coiled-coil containing protein 2 (TACC2) comprise pDelVars. Five interspecies ortho-pDelVars were identified in three genes. These findings offer new ways to harness genomics big data by facilitating the identification of functional polymorphisms in humans and animal models and thus provide a future basis for optimization of protocols for whole genome prioritization of pDelVars and screening of orthologous sequence variants. The approach presented here can inform various postgenomic applications such as personalized medicine and multiomics study of health interventions (iatromics).

  5. Considering the Role of Personality in the Work-Family Experience: Relationships of the Big Five to Work-Family Conflict and Facilitation

    Science.gov (United States)

    Wayne, Julie Holliday; Musisca, Nicholas; Fleeson, William

    2004-01-01

    Using a national, random sample (N=2130), we investigated the relationship between each of the Big Five personality traits and conflict and facilitation between work and family roles. Extraversion was related to greater facilitation between roles but was not related to conflict, whereas neuroticism was related to greater conflict but only weakly…

  6. Big data, open science and the brain: lessons learned from genomics

    Directory of Open Access Journals (Sweden)

    Suparna eChoudhury

    2014-05-01

    Full Text Available The BRAIN Initiative aims to break new ground in the scale and speed of data collection in neuroscience, requiring tools to handle data in the magnitude of yottabytes (1024. The scale, investment and organization of it are being compared to the Human Genome Project (HGP, which has exemplified ‘big science’ for biology. In line with the trend towards Big Data in genomic research, the promise of the BRAIN Initiative, as well as the European Human Brain Project, rests on the possibility to amass vast quantities of data to model the complex interactions between the brain and behaviour and inform the diagnosis and prevention of neurological disorders and psychiatric disease. Advocates of this ‘data driven’ paradigm in neuroscience argue that harnessing the large quantities of data generated across laboratories worldwide has numerous methodological, ethical and economic advantages, but it requires the neuroscience community to adopt a culture of data sharing and open access to benefit from them. In this article, we examine the rationale for data sharing among advocates and briefly exemplify these in terms of new ‘open neuroscience’ projects. Then, drawing on the frequently invoked model of data sharing in genomics, we go on to demonstrate the complexities of data sharing, shedding light on the sociological and ethical challenges within the realms of institutions, researchers and participants, namely dilemmas around public/private interests in data, (lack of motivation to share in the academic community, and potential loss of participant anonymity. Our paper serves to highlight some foreseeable tensions around data sharing relevant to the emergent ‘open neuroscience’ movement.

  7. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease

    NARCIS (Netherlands)

    L. Shen (Lishuang); M.A. Diroma (Maria Angela); M. Gonzalez (Michael); D. Navarro-Gomez (Daniel); J. Leipzig (Jeremy); M.T. Lott (Marie T.); M. van Oven (Mannis); D.C. Wallace; C.C. Muraresku (Colleen Clarke); Z. Zolkipli-Cunningham (Zarazuela); P.F. Chinnery (Patrick); M. Attimonelli (Marcella); S. Zuchner (Stephan); M.J. Falk (Marni J.); X. Gai (Xiaowu)

    2016-01-01

    textabstractMSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes,

  8. TIA-1 and TIAR interact with 5'-UTR of enterovirus 71 genome and facilitate viral replication.

    Science.gov (United States)

    Wang, Xiaohui; Wang, Huanru; Li, Yixuan; Jin, Yu; Chu, Ying; Su, Airong; Wu, Zhiwei

    2015-10-16

    Enterovirus 71 is one of the major causative pathogens of HFMD in children. Upon infection, the viral RNA is translated in an IRES-dependent manner and requires several host factors for effective replication. Here, we found that T-cell-restricted intracellular antigen 1 (TIA-1), and TIA-1 related protein (TIAR) were translocated from nucleus to cytoplasm after EV71 infection and localized to the sites of viral replication. We found that TIA-1 and TIAR can facilitate EV71 replication by enhancing the viral genome synthesis in host cells. We demonstrated that both proteins bound to the stem-loop I of 5'-UTR of viral genome and improved the stability of viral genomic RNA. Our results suggest that TIA-1 and TIAR are two new host factors that interact with 5-UTR of EV71 genome and positively regulate viral replication. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. The Naked Mole Rat Genome Resource: facilitating analyses of cancer and longevity-related adaptations.

    Science.gov (United States)

    Keane, Michael; Craig, Thomas; Alföldi, Jessica; Berlin, Aaron M; Johnson, Jeremy; Seluanov, Andrei; Gorbunova, Vera; Di Palma, Federica; Lindblad-Toh, Kerstin; Church, George M; de Magalhães, João Pedro

    2014-12-15

    The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat's extraordinary traits, including in regions of p53, and the hyaluronan receptors CD44 and HMMR (RHAMM). Furthermore, we developed a freely available web portal, the Naked Mole Rat Genome Resource (http://www.naked-mole-rat.org), featuring the data and results of our analysis, to assist researchers interested in the genome and genes of the naked mole rat, and also to facilitate further studies on this fascinating species. © The Author 2014. Published by Oxford University Press.

  10. Genomic sequencing: assessing the health care system, policy, and big-data implications.

    Science.gov (United States)

    Phillips, Kathryn A; Trosman, Julia R; Kelley, Robin K; Pletcher, Mark J; Douglas, Michael P; Weldon, Christine B

    2014-07-01

    New genomic sequencing technologies enable the high-speed analysis of multiple genes simultaneously, including all of those in a person's genome. Sequencing is a prominent example of a "big data" technology because of the massive amount of information it produces and its complexity, diversity, and timeliness. Our objective in this article is to provide a policy primer on sequencing and illustrate how it can affect health care system and policy issues. Toward this end, we developed an easily applied classification of sequencing based on inputs, methods, and outputs. We used it to examine the implications of sequencing for three health care system and policy issues: making care more patient-centered, developing coverage and reimbursement policies, and assessing economic value. We conclude that sequencing has great promise but that policy challenges include how to optimize patient engagement as well as privacy, develop coverage policies that distinguish research from clinical uses and account for bioinformatics costs, and determine the economic value of sequencing through complex economic models that take into account multiple findings and downstream costs. Project HOPE—The People-to-People Health Foundation, Inc.

  11. Case-based learning facilitates critical thinking in undergraduate nutrition education: students describe the big picture.

    Science.gov (United States)

    Harman, Tara; Bertrand, Brenda; Greer, Annette; Pettus, Arianna; Jennings, Jill; Wall-Bassett, Elizabeth; Babatunde, Oyinlola Toyin

    2015-03-01

    The vision of dietetics professions is based on interdependent education, credentialing, and practice. Case-based learning is a method of problem-based learning that is designed to heighten higher-order thinking. Case-based learning can assist students to connect education and specialized practice while developing professional skills for entry-level practice in nutrition and dietetics. This study examined student perspectives of their learning after immersion into case-based learning in nutrition courses. The theoretical frameworks of phenomenology and Bloom's Taxonomy of Educational Objectives triangulated the design of this qualitative study. Data were drawn from 426 written responses and three focus group discussions among 85 students from three upper-level undergraduate nutrition courses. Coding served to deconstruct the essence of respondent meaning given to case-based learning as a learning method. The analysis of the coding was the constructive stage that led to configuration of themes and theoretical practice pathways about student learning. Four leading themes emerged. Story or Scenario represents the ways that students described case-based learning, changes in student thought processes to accommodate case-based learning are illustrated in Method of Learning, higher cognitive learning that was achieved from case-based learning is represented in Problem Solving, and Future Practice details how students explained perceived professional competency gains from case-based learning. The skills that students acquired are consistent with those identified as essential to professional practice. In addition, the common concept of Big Picture was iterated throughout the themes and demonstrated that case-based learning prepares students for multifaceted problems that they are likely to encounter in professional practice. Copyright © 2015 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.

  12. The spotted gar genome illuminates vertebrate evolution and facilitates human-to-teleost comparisons

    Science.gov (United States)

    Braasch, Ingo; Gehrke, Andrew R.; Smith, Jeramiah J.; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M.; Campbell, Michael S.; Barrell, Daniel; Martin, Kyle J.; Mulley, John F.; Ravi, Vydianathan; Lee, Alison P.; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E. G.; Sun, Yi; Hertel, Jana; Beam, Michael J.; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H.; Litman, Gary W.; Litman, Ronda T.; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F.; Wang, Han; Taylor, John S.; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M. J.; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A.; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T.; Venkatesh, Byrappa; Holland, Peter W. H.; Guiguen, Yann; Bobe, Julien; Shubin, Neil H.; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H.

    2016-01-01

    To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before the teleost genome duplication (TGD). The slowly evolving gar genome conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization, and development (e.g., Hox, ParaHox, and miRNA genes). Numerous conserved non-coding elements (CNEs, often cis-regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles of such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses revealed that the sum of expression domains and levels from duplicated teleost genes often approximate patterns and levels of gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes, and the function of human regulatory sequences. PMID:26950095

  13. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons.

    Science.gov (United States)

    Braasch, Ingo; Gehrke, Andrew R; Smith, Jeramiah J; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M; Campbell, Michael S; Barrell, Daniel; Martin, Kyle J; Mulley, John F; Ravi, Vydianathan; Lee, Alison P; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E G; Sun, Yi; Hertel, Jana; Beam, Michael J; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H; Litman, Gary W; Litman, Ronda T; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F; Wang, Han; Taylor, John S; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M J; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T; Venkatesh, Byrappa; Holland, Peter W H; Guiguen, Yann; Bobe, Julien; Shubin, Neil H; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H

    2016-04-01

    To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before teleost genome duplication (TGD). The slowly evolving gar genome has conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization and development (mediated, for example, by Hox, ParaHox and microRNA genes). Numerous conserved noncoding elements (CNEs; often cis regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles for such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses showed that the sums of expression domains and expression levels for duplicated teleost genes often approximate the patterns and levels of expression for gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes and the function of human regulatory sequences.

  14. Crowd-funded micro-grants for genomics and "big data": an actionable idea connecting small (artisan) science, infrastructure science, and citizen philanthropy.

    Science.gov (United States)

    Özdemir, Vural; Badr, Kamal F; Dove, Edward S; Endrenyi, Laszlo; Geraci, Christy Jo; Hotez, Peter J; Milius, Djims; Neves-Pereira, Maria; Pang, Tikki; Rotimi, Charles N; Sabra, Ramzi; Sarkissian, Christineh N; Srivastava, Sanjeeva; Tims, Hesther; Zgheib, Nathalie K; Kickbusch, Ilona

    2013-04-01

    Biomedical science in the 21(st) century is embedded in, and draws from, a digital commons and "Big Data" created by high-throughput Omics technologies such as genomics. Classic Edisonian metaphors of science and scientists (i.e., "the lone genius" or other narrow definitions of expertise) are ill equipped to harness the vast promises of the 21(st) century digital commons. Moreover, in medicine and life sciences, experts often under-appreciate the important contributions made by citizen scholars and lead users of innovations to design innovative products and co-create new knowledge. We believe there are a large number of users waiting to be mobilized so as to engage with Big Data as citizen scientists-only if some funding were available. Yet many of these scholars may not meet the meta-criteria used to judge expertise, such as a track record in obtaining large research grants or a traditional academic curriculum vitae. This innovation research article describes a novel idea and action framework: micro-grants, each worth $1000, for genomics and Big Data. Though a relatively small amount at first glance, this far exceeds the annual income of the "bottom one billion"-the 1.4 billion people living below the extreme poverty level defined by the World Bank ($1.25/day). We describe two types of micro-grants. Type 1 micro-grants can be awarded through established funding agencies and philanthropies that create micro-granting programs to fund a broad and highly diverse array of small artisan labs and citizen scholars to connect genomics and Big Data with new models of discovery such as open user innovation. Type 2 micro-grants can be funded by existing or new science observatories and citizen think tanks through crowd-funding mechanisms described herein. Type 2 micro-grants would also facilitate global health diplomacy by co-creating crowd-funded micro-granting programs across nation-states in regions facing political and financial instability, while sharing similar disease

  15. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

    Science.gov (United States)

    Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E

    2015-01-01

    The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Gene network inherent in genomic big data improves the accuracy of prognostic prediction for cancer patients.

    Science.gov (United States)

    Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock

    2017-09-29

    Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients.

  17. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease

    OpenAIRE

    Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T.; Oven, Mannis; Wallace, D.C.; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J.; Gai, Xiaowu

    2016-01-01

    textabstractMSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR ...

  18. The Sequences of 1504 Mutants in the Model Rice Variety Kitaake Facilitate Rapid Functional Genomic Studies.

    Science.gov (United States)

    Li, Guotian; Jain, Rashmi; Chern, Mawsheng; Pham, Nikki T; Martin, Joel A; Wei, Tong; Schackwitz, Wendy S; Lipzen, Anna M; Duong, Phat Q; Jones, Kyle C; Jiang, Liangrong; Ruan, Deling; Bauer, Diane; Peng, Yi; Barry, Kerrie W; Schmutz, Jeremy; Ronald, Pamela C

    2017-06-01

    The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitate functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake ( Oryza sativa ssp japonica ), which completes its life cycle in 9 weeks. We sequenced 1504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, i.e., 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single-base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportion of loss-of-function mutations. We identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line. This result reveals the usefulness of the resource for efficient, cost-effective identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks. This population complements other available mutant collections and gene-editing technologies. This work demonstrates how inexpensive next-generation sequencing can be applied to generate a high-density catalog of mutations. © 2017 American Society of Plant Biologists. All rights reserved.

  19. Facilitating comparative effectiveness research in cancer genomics: evaluating stakeholder perceptions of the engagement process.

    Science.gov (United States)

    Deverka, Patricia A; Lavallee, Danielle C; Desai, Priyanka J; Armstrong, Joanne; Gorman, Mark; Hole-Curry, Leah; O'Leary, James; Ruffner, B W; Watkins, John; Veenstra, David L; Baker, Laurence H; Unger, Joseph M; Ramsey, Scott D

    2012-07-01

    The Center for Comparative Effectiveness Research in Cancer Genomics completed a 2-year stakeholder-guided process for the prioritization of genomic tests for comparative effectiveness research studies. We sought to evaluate the effectiveness of engagement procedures in achieving project goals and to identify opportunities for future improvements. The evaluation included an online questionnaire, one-on-one telephone interviews and facilitated discussion. Responses to the online questionnaire were tabulated for descriptive purposes, while transcripts from key informant interviews were analyzed using a directed content analysis approach. A total of 11 out of 13 stakeholders completed both the online questionnaire and interview process, while nine participated in the facilitated discussion. Eighty-nine percent of questionnaire items received overall ratings of agree or strongly agree; 11% of responses were rated as neutral with the exception of a single rating of disagreement with an item regarding the clarity of how stakeholder input was incorporated into project decisions. Recommendations for future improvement included developing standard recruitment practices, role descriptions and processes for improved communication with clinical and comparative effectiveness research investigators. Evaluation of the stakeholder engagement process provided constructive feedback for future improvements and should be routinely conducted to ensure maximal effectiveness of stakeholder involvement.

  20. Baculoviral delivery of CRISPR/Cas9 facilitates efficient genome editing in human cells.

    Directory of Open Access Journals (Sweden)

    Sanne Hindriksen

    Full Text Available The CRISPR/Cas9 system is a highly effective tool for genome editing. Key to robust genome editing is the efficient delivery of the CRISPR/Cas9 machinery. Viral delivery systems are efficient vehicles for the transduction of foreign genes but commonly used viral vectors suffer from a limited capacity in the genetic information they can carry. Baculovirus however is capable of carrying large exogenous DNA fragments. Here we investigate the use of baculoviral vectors as a delivery vehicle for CRISPR/Cas9 based genome-editing tools. We demonstrate transduction of a panel of cell lines with Cas9 and an sgRNA sequence, which results in efficient knockout of all four targeted subunits of the chromosomal passenger complex (CPC. We further show that introduction of a homology directed repair template into the same CRISPR/Cas9 baculovirus facilitates introduction of specific point mutations and endogenous gene tags. Tagging of the CPC recruitment factor Haspin with the fluorescent reporter YFP allowed us to study its native localization as well as recruitment to the cohesin subunit Pds5B.

  1. Baculoviral delivery of CRISPR/Cas9 facilitates efficient genome editing in human cells.

    Science.gov (United States)

    Hindriksen, Sanne; Bramer, Arne J; Truong, My Anh; Vromans, Martijn J M; Post, Jasmin B; Verlaan-Klink, Ingrid; Snippert, Hugo J; Lens, Susanne M A; Hadders, Michael A

    2017-01-01

    The CRISPR/Cas9 system is a highly effective tool for genome editing. Key to robust genome editing is the efficient delivery of the CRISPR/Cas9 machinery. Viral delivery systems are efficient vehicles for the transduction of foreign genes but commonly used viral vectors suffer from a limited capacity in the genetic information they can carry. Baculovirus however is capable of carrying large exogenous DNA fragments. Here we investigate the use of baculoviral vectors as a delivery vehicle for CRISPR/Cas9 based genome-editing tools. We demonstrate transduction of a panel of cell lines with Cas9 and an sgRNA sequence, which results in efficient knockout of all four targeted subunits of the chromosomal passenger complex (CPC). We further show that introduction of a homology directed repair template into the same CRISPR/Cas9 baculovirus facilitates introduction of specific point mutations and endogenous gene tags. Tagging of the CPC recruitment factor Haspin with the fluorescent reporter YFP allowed us to study its native localization as well as recruitment to the cohesin subunit Pds5B.

  2. Coastal Thematic Exploitation Platform (C-TEP): An innovative and collaborative platform to facilitate Big Data coastal research

    Science.gov (United States)

    Tuohy, Eimear; Clerc, Sebastien; Politi, Eirini; Mangin, Antoine; Datcu, Mihai; Vignudelli, Stefano; Illuzzi, Diomede; Craciunescu, Vasile; Aspetsberger, Michael

    2017-04-01

    The Coastal Thematic Exploitation Platform (C-TEP) is an on-going European Space Agency (ESA) funded project to develop a web service dedicated to the observation of the coastal environment and to support coastal management and monitoring. For over 20 years ESA satellites have provided a wealth of environmental data. The availability of an ever increasing volume of environmental data from satellite remote sensing provides a unique opportunity for exploratory science and the development of coastal applications. However, the diversity and complexity of EO data available, the need for efficient data access, information extraction, data management and high spec processing tools pose major challenges to achieving its full potential in terms of Big Data exploitation. C-TEP will provide a new means to handle the technical challenges of the observation of coastal areas and contribute to improved understanding and decision-making with respect to coastal resources and environments. C-TEP will unlock coastal knowledge and innovation as a collaborative, virtual work environment providing access to a comprehensive database of coastal Earth Observation (EO) data, in-situ data, model data and the tools and processors necessary to fully exploit these vast and heterogeneous datasets. The cloud processing capabilities provided, allow users to perform heavy processing tasks through a user-friendly Graphical User Interface (GUI). A connection to the PEPS (Plateforme pour l'Exploitation des Produits Sentinel) archive will provide data from Sentinel missions 1, 2 and 3. Automatic comparison tools will be provided to exploit the in-situ datasets in synergy with EO data. In addition, users may develop, test and share their own advanced algorithms for the extraction of coastal information. Algorithm validation will be facilitated by the capabilities to compute statistics over long time-series. Finally, C-TEP subscription services will allow users to perform automatic monitoring of some key

  3. Chromosome-wise dissection of the genome of the extremely big mouse line DU6i.

    Science.gov (United States)

    Bevova, Marianna R; Aulchenko, Yurii S; Aksu, Soner; Renne, Ulla; Brockmann, Gudrun A

    2006-01-01

    The extreme high-body-weight-selected mouse line DU6i is a polygenic model for growth research, harboring many small-effect QTL. We dissected the genome of this line into 19 autosomes and the Y chromosome by the construction of a new panel of chromosome substitution strains (CSS). The DU6i chromosomes were transferred to a DBA/2 mice genetic background by marker-assisted recurrent backcrossing. Mitochondria and the X chromosome were of DBA/2 origin in the backcross. During the construction of these novel strains, >4000 animals were generated, phenotyped, and genotyped. Using these data, we studied the genetic control of variation in body weight and weight gain at 21, 42, and 63 days. The unique data set facilitated the analysis of chromosomal interaction with sex and parent-of-origin effects. All analyzed chromosomes affected body weight and weight gain either directly or in interaction with sex or parent of origin. The effects were age specific, with some chromosomes showing opposite effects at different stages of development.

  4. Crowd-Funded Micro-Grants for Genomics and “Big Data”: An Actionable Idea Connecting Small (Artisan) Science, Infrastructure Science, and Citizen Philanthropy

    Science.gov (United States)

    Badr, Kamal F.; Dove, Edward S.; Endrenyi, Laszlo; Geraci, Christy Jo; Hotez, Peter J.; Milius, Djims; Neves-Pereira, Maria; Pang, Tikki; Rotimi, Charles N.; Sabra, Ramzi; Sarkissian, Christineh N.; Srivastava, Sanjeeva; Tims, Hesther; Zgheib, Nathalie K.; Kickbusch, Ilona

    2013-01-01

    Abstract Biomedical science in the 21st century is embedded in, and draws from, a digital commons and “Big Data” created by high-throughput Omics technologies such as genomics. Classic Edisonian metaphors of science and scientists (i.e., “the lone genius” or other narrow definitions of expertise) are ill equipped to harness the vast promises of the 21st century digital commons. Moreover, in medicine and life sciences, experts often under-appreciate the important contributions made by citizen scholars and lead users of innovations to design innovative products and co-create new knowledge. We believe there are a large number of users waiting to be mobilized so as to engage with Big Data as citizen scientists—only if some funding were available. Yet many of these scholars may not meet the meta-criteria used to judge expertise, such as a track record in obtaining large research grants or a traditional academic curriculum vitae. This innovation research article describes a novel idea and action framework: micro-grants, each worth $1000, for genomics and Big Data. Though a relatively small amount at first glance, this far exceeds the annual income of the “bottom one billion”—the 1.4 billion people living below the extreme poverty level defined by the World Bank ($1.25/day). We describe two types of micro-grants. Type 1 micro-grants can be awarded through established funding agencies and philanthropies that create micro-granting programs to fund a broad and highly diverse array of small artisan labs and citizen scholars to connect genomics and Big Data with new models of discovery such as open user innovation. Type 2 micro-grants can be funded by existing or new science observatories and citizen think tanks through crowd-funding mechanisms described herein. Type 2 micro-grants would also facilitate global health diplomacy by co-creating crowd-funded micro-granting programs across nation-states in regions facing political and financial instability, while

  5. BigQ: a NoSQL based framework to handle genomic variants in i2b2.

    Science.gov (United States)

    Gabetta, Matteo; Limongelli, Ivan; Rizzo, Ettore; Riva, Alberto; Segagni, Daniele; Bellazzi, Riccardo

    2015-12-29

    Precision medicine requires the tight integration of clinical and molecular data. To this end, it is mandatory to define proper technological solutions able to manage the overwhelming amount of high throughput genomic data needed to test associations between genomic signatures and human phenotypes. The i2b2 Center (Informatics for Integrating Biology and the Bedside) has developed a widely internationally adopted framework to use existing clinical data for discovery research that can help the definition of precision medicine interventions when coupled with genetic data. i2b2 can be significantly advanced by designing efficient management solutions of Next Generation Sequencing data. We developed BigQ, an extension of the i2b2 framework, which integrates patient clinical phenotypes with genomic variant profiles generated by Next Generation Sequencing. A visual programming i2b2 plugin allows retrieving variants belonging to the patients in a cohort by applying filters on genomic variant annotations. We report an evaluation of the query performance of our system on more than 11 million variants, showing that the implemented solution scales linearly in terms of query time and disk space with the number of variants. In this paper we describe a new i2b2 web service composed of an efficient and scalable document-based database that manages annotations of genomic variants and of a visual programming plug-in designed to dynamically perform queries on clinical and genetic data. The system therefore allows managing the fast growing volume of genomic variants and can be used to integrate heterogeneous genomic annotations.

  6. The Widening Gulf between Genomics Data Generation and Consumption: A Practical Guide to Big Data Transfer Technology

    Science.gov (United States)

    Feltus, Frank A.; Breen, Joseph R.; Deng, Juan; Izard, Ryan S.; Konger, Christopher A.; Ligon, Walter B.; Preuss, Don; Wang, Kuang-Ching

    2015-01-01

    In the last decade, high-throughput DNA sequencing has become a disruptive technology and pushed the life sciences into a distributed ecosystem of sequence data producers and consumers. Given the power of genomics and declining sequencing costs, biology is an emerging “Big Data” discipline that will soon enter the exabyte data range when all subdisciplines are combined. These datasets must be transferred across commercial and research networks in creative ways since sending data without thought can have serious consequences on data processing time frames. Thus, it is imperative that biologists, bioinformaticians, and information technology engineers recalibrate data processing paradigms to fit this emerging reality. This review attempts to provide a snapshot of Big Data transfer across networks, which is often overlooked by many biologists. Specifically, we discuss four key areas: 1) data transfer networks, protocols, and applications; 2) data transfer security including encryption, access, firewalls, and the Science DMZ; 3) data flow control with software-defined networking; and 4) data storage, staging, archiving and access. A primary intention of this article is to orient the biologist in key aspects of the data transfer process in order to frame their genomics-oriented needs to enterprise IT professionals. PMID:26568680

  7. Reframed Genome-Scale Metabolic Model to Facilitate Genetic Design and Integration with Expression Data.

    Science.gov (United States)

    Gu, Deqing; Jian, Xingxing; Zhang, Cheng; Hua, Qiang

    2017-01-01

    Genome-scale metabolic network models (GEMs) have played important roles in the design of genetically engineered strains and helped biologists to decipher metabolism. However, due to the complex gene-reaction relationships that exist in model systems, most algorithms have limited capabilities with respect to directly predicting accurate genetic design for metabolic engineering. In particular, methods that predict reaction knockout strategies leading to overproduction are often impractical in terms of gene manipulations. Recently, we proposed a method named logical transformation of model (LTM) to simplify the gene-reaction associations by introducing intermediate pseudo reactions, which makes it possible to generate genetic design. Here, we propose an alternative method to relieve researchers from deciphering complex gene-reactions by adding pseudo gene controlling reactions. In comparison to LTM, this new method introduces fewer pseudo reactions and generates a much smaller model system named as gModel. We showed that gModel allows two seldom reported applications: identification of minimal genomes and design of minimal cell factories within a modified OptKnock framework. In addition, gModel could be used to integrate expression data directly and improve the performance of the E-Fmin method for predicting fluxes. In conclusion, the model transformation procedure will facilitate genetic research based on GEMs, extending their applications.

  8. Big Data in Plant Science: Resources and Data Mining Tools for Plant Genomics and Proteomics.

    Science.gov (United States)

    Popescu, George V; Noutsos, Christos; Popescu, Sorina C

    2016-01-01

    In modern plant biology, progress is increasingly defined by the scientists' ability to gather and analyze data sets of high volume and complexity, otherwise known as "big data". Arguably, the largest increase in the volume of plant data sets over the last decade is a consequence of the application of the next-generation sequencing and mass-spectrometry technologies to the study of experimental model and crop plants. The increase in quantity and complexity of biological data brings challenges, mostly associated with data acquisition, processing, and sharing within the scientific community. Nonetheless, big data in plant science create unique opportunities in advancing our understanding of complex biological processes at a level of accuracy without precedence, and establish a base for the plant systems biology. In this chapter, we summarize the major drivers of big data in plant science and big data initiatives in life sciences with a focus on the scope and impact of iPlant, a representative cyberinfrastructure platform for plant science.

  9. The complete mitochondrial genome of the big-belly seahorse, Hippocampus abdominalis (Lesson 1827).

    Science.gov (United States)

    Wang, Lei; Chen, Zaizhong; Leng, Xiangjun; Gao, Jianzhong; Chen, Xiaowu; Li, Zhongpu; Sun, Peiying; Zhao, Yuming

    2016-11-01

    In this study, the complete mitogenome sequence of the big-belly seahorse, Hippocampus abdominalis (Lesson, 1827) (Syngnathiformes: Syngnathidae), has been sequenced by the next-generation sequencing method. The assembled mitogenome is 16 521 bp in length which includes 13 protein-coding genes, 22 transfer RNAs, and 2 ribosomal RNAs genes. The overall base composition of the seahorse is 31.1% for A, 23.6% for C, 16.0% for G, 29.3% for T and shows 87% identities similar to tiger tail seahorse, Hippocampus comes. The complete mitogenome of the big-belly seahorse provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for seahorse family.

  10. LLNL's Big Science Capabilities Help Spur Over $796 Billion in U.S. Economic Activity Sequencing the Human Genome

    Energy Technology Data Exchange (ETDEWEB)

    Stewart, Jeffrey S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2015-07-28

    LLNL’s successful history of taking on big science projects spans beyond national security and has helped create billions of dollars per year in new economic activity. One example is LLNL’s role in helping sequence the human genome. Over $796 billion in new economic activity in over half a dozen fields has been documented since LLNL successfully completed this Grand Challenge.

  11. dartr: An r package to facilitate analysis of SNP data generated from reduced representation genome sequencing.

    Science.gov (United States)

    Gruber, Bernd; Unmack, Peter J; Berry, Oliver F; Georges, Arthur

    2018-05-01

    Although vast technological advances have been made and genetic software packages are growing in number, it is not a trivial task to analyse SNP data. We announce a new r package, dartr, enabling the analysis of single nucleotide polymorphism data for population genomic and phylogenomic applications. dartr provides user-friendly functions for data quality control and marker selection, and permits rigorous evaluations of conformation to Hardy-Weinberg equilibrium, gametic-phase disequilibrium and neutrality. The package reports standard descriptive statistics, permits exploration of patterns in the data through principal components analysis and conducts standard F-statistics, as well as basic phylogenetic analyses, population assignment, isolation by distance and exports data to a variety of commonly used downstream applications (e.g., newhybrids, faststructure and phylogeny applications) outside of the r environment. The package serves two main purposes: first, a user-friendly approach to lower the hurdle to analyse such data-therefore, the package comes with a detailed tutorial targeted to the r beginner to allow data analysis without requiring deep knowledge of r. Second, we use a single, well-established format-genlight from the adegenet package-as input for all our functions to avoid data reformatting. By strictly using the genlight format, we hope to facilitate this format as the de facto standard of future software developments and hence reduce the format jungle of genetic data sets. The dartr package is available via the r CRAN network and GitHub. © 2017 John Wiley & Sons Ltd.

  12. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease.

    Science.gov (United States)

    Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu

    2016-06-01

    MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. © 2016 WILEY PERIODICALS, INC.

  13. Large inserts for big data: artificial chromosomes in the genomic era.

    Science.gov (United States)

    Tocchetti, Arianna; Donadio, Stefano; Sosio, Margherita

    2018-05-01

    The exponential increase in available microbial genome sequences coupled with predictive bioinformatic tools is underscoring the genetic capacity of bacteria to produce an unexpected large number of specialized bioactive compounds. Since most of the biosynthetic gene clusters (BGCs) present in microbial genomes are cryptic, i.e. not expressed under laboratory conditions, a variety of cloning systems and vectors have been devised to harbor DNA fragments large enough to carry entire BGCs and to allow their transfer in suitable heterologous hosts. This minireview provides an overview of the vectors and approaches that have been developed for cloning large BGCs, and successful examples of heterologous expression.

  14. Chromosome-wise dissection of the genome of the extremely big mouse line DU6i

    NARCIS (Netherlands)

    M.R. Bevova (Marianna); Y.S. Aulchenko (Yurii); G. Aksu (Guzide); U. Renne (Ulla); K. Brockmann

    2006-01-01

    textabstractThe extreme high-body-weight-selected mouse line DU6i is a polygenic model for growth research, harboring many small-effect QTL. We dissected the genome of this line into 19 autosomes and the Y chromosome by the construction of a new panel of chromosome substitution strains (CSS). The

  15. Bridging the gap between Big Genome Data Analysis and Database Management Systems

    NARCIS (Netherlands)

    C.P. Cijvat (Robin)

    2014-01-01

    textabstractThe bioinformatics field has encountered a data deluge over the last years, due to in- creasing speed and decreasing cost of DNA sequencing technology. Today, sequencing the DNA of a single genome only takes about a week, and it can result in up to a ter- abyte of data. The sequencing

  16. The Influence of Big (Clinical) Data and Genomics on Precision Medicine and Drug Development.

    Science.gov (United States)

    Denny, Joshua C; Van Driest, Sara L; Wei, Wei-Qi; Roden, Dan M

    2018-03-01

    Drug development continues to be costly and slow, with medications failing due to lack of efficacy or presence of toxicity. The promise of pharmacogenomic discovery includes tailoring therapeutics based on an individual's genetic makeup, rational drug development, and repurposing medications. Rapid growth of large research cohorts, linked to electronic health record (EHR) data, fuels discovery of new genetic variants predicting drug action, supports Mendelian randomization experiments to show drug efficacy, and suggests new indications for existing medications. New biomedical informatics and machine-learning approaches advance the ability to interpret clinical information, enabling identification of complex phenotypes and subpopulations of patients. We review the recent history of use of "big data" from EHR-based cohorts and biobanks supporting these activities. Future studies using EHR data, other information sources, and new methods will promote a foundation for discovery to more rapidly advance precision medicine. © 2017 American Society for Clinical Pharmacology and Therapeutics.

  17. Baculoviral delivery of CRISPR/Cas9 facilitates efficient genome editing in human cells

    NARCIS (Netherlands)

    Hindriksen, Sanne; Bramer, Arne J; Truong, My Anh; Vromans, Martijn J M; Post, Jasmin B; Verlaan-Klink, Ingrid; Snippert, Hugo J; Lens, Susanne M A; Hadders, Michael A

    2017-01-01

    The CRISPR/Cas9 system is a highly effective tool for genome editing. Key to robust genome editing is the efficient delivery of the CRISPR/Cas9 machinery. Viral delivery systems are efficient vehicles for the transduction of foreign genes but commonly used viral vectors suffer from a limited

  18. Whole Genome and Tandem Duplicate Retention facilitated Glucosinolate Pathway Diversification in the Mustard Family.

    NARCIS (Netherlands)

    Hofberger, J.A.; Lyons, E.; Edger, P.P.; Pires, J.C.; Schranz, M.E.

    2013-01-01

    Plants share a common history of successive whole genome duplication (WGD) events retaining genomic patterns of duplicate gene copies (ohnologs) organized in conserved syntenic blocks. Duplication was often proposed to affect the origin of novel traits during evolution. However, genetic evidence

  19. Synergy between Medical Informatics and Bioinformatics: Facilitating Genomic Medicine for Future Health Care

    Czech Academy of Sciences Publication Activity Database

    Martin-Sanchez, F.; Iakovidis, I.; Norager, S.; Maojo, V.; de Groen, P.; Van der Lei, J.; Jones, T.; Abraham-Fuchs, K.; Apweiler, R.; Babic, A.; Baud, R.; Breton, V.; Cinquin, P.; Doupi, P.; Dugas, M.; Eils, R.; Engelbrecht, R.; Ghazal, P.; Jehenson, P.; Kulikowski, C.; Lampe, K.; De Moor, G.; Orphanoudakis, S.; Rossing, N.; Sarachan, B.; Sousa, A.; Spekowius, G.; Thireos, G.; Zahlmann, G.; Zvárová, Jana; Hermosilla, I.; Vicente, F. J.

    2004-01-01

    Roč. 37, - (2004), s. 30-42 ISSN 1532-0464 Institutional research plan: CEZ:AV0Z1030915 Keywords : bioinformatics * medical informatics * genomics * genomic medicine * biomedical informatics Subject RIV: BD - Theory of Information Impact factor: 1.013, year: 2004

  20. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle

    DEFF Research Database (Denmark)

    Daetwyler, Hans D; Capitan, Aurélien; Pausch, Hubert

    2014-01-01

    The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls. In the first phase of the 1000 bull genomes p...

  1. Characterizing Big Data Management

    OpenAIRE

    Rogério Rossi; Kechi Hirama

    2015-01-01

    Big data management is a reality for an increasing number of organizations in many areas and represents a set of challenges involving big data modeling, storage and retrieval, analysis and visualization. However, technological resources, people and processes are crucial to facilitate the management of big data in any kind of organization, allowing information and knowledge from a large volume of data to support decision-making. Big data management can be supported by these three dimensions: t...

  2. Supersize me: how whole-genome sequencing and big data are transforming epidemiology.

    Science.gov (United States)

    Kao, Rowland R; Haydon, Daniel T; Lycett, Samantha J; Murcia, Pablo R

    2014-05-01

    In epidemiology, the identification of 'who infected whom' allows us to quantify key characteristics such as incubation periods, heterogeneity in transmission rates, duration of infectiousness, and the existence of high-risk groups. Although invaluable, the existence of many plausible infection pathways makes this difficult, and epidemiological contact tracing either uncertain, logistically prohibitive, or both. The recent advent of next-generation sequencing technology allows the identification of traceable differences in the pathogen genome that are transforming our ability to understand high-resolution disease transmission, sometimes even down to the host-to-host scale. We review recent examples of the use of pathogen whole-genome sequencing for the purpose of forensic tracing of transmission pathways, focusing on the particular problems where evolutionary dynamics must be supplemented by epidemiological information on the most likely timing of events as well as possible transmission pathways. We also discuss potential pitfalls in the over-interpretation of these data, and highlight the manner in which a confluence of this technology with sophisticated mathematical and statistical approaches has the potential to produce a paradigm shift in our understanding of infectious disease transmission and control. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. The Naked Mole Rat Genome Resource : facilitating analyses of cancer and longevity-related adaptations

    OpenAIRE

    Keane, Michael; Craig, Thomas; Alfoldi, Jessica; Berlin, Aaron M; Johnson, Jeremy; Seluanov, Andrei; Gorbunova, Vera; Di Palma, Federica; Lindblad-Toh, Kerstin; Church, George M; de Magalhaes, Joao Pedro

    2014-01-01

    MOTIVATION: The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. RESULTS: We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat's extraordinary traits, inc...

  4. Transcription facilitated genome-wide recruitment of topoisomerase I and DNA gyrase.

    Science.gov (United States)

    Ahmed, Wareed; Sala, Claudia; Hegde, Shubhada R; Jha, Rajiv Kumar; Cole, Stewart T; Nagaraja, Valakunja

    2017-05-01

    Movement of the transcription machinery along a template alters DNA topology resulting in the accumulation of supercoils in DNA. The positive supercoils generated ahead of transcribing RNA polymerase (RNAP) and the negative supercoils accumulating behind impose severe topological constraints impeding transcription process. Previous studies have implied the role of topoisomerases in the removal of torsional stress and the maintenance of template topology but the in vivo interaction of functionally distinct topoisomerases with heterogeneous chromosomal territories is not deciphered. Moreover, how the transcription-induced supercoils influence the genome-wide recruitment of DNA topoisomerases remains to be explored in bacteria. Using ChIP-Seq, we show the genome-wide occupancy profile of both topoisomerase I and DNA gyrase in conjunction with RNAP in Mycobacterium tuberculosis taking advantage of minimal topoisomerase representation in the organism. The study unveils the first in vivo genome-wide interaction of both the topoisomerases with the genomic regions and establishes that transcription-induced supercoils govern their recruitment at genomic sites. Distribution profiles revealed co-localization of RNAP and the two topoisomerases on the active transcriptional units (TUs). At a given locus, topoisomerase I and DNA gyrase were localized behind and ahead of RNAP, respectively, correlating with the twin-supercoiled domains generated. The recruitment of topoisomerases was higher at the genomic loci with higher transcriptional activity and/or at regions under high torsional stress compared to silent genomic loci. Importantly, the occupancy of DNA gyrase, sole type II topoisomerase in Mtb, near the Ter domain of the Mtb chromosome validates its function as a decatenase.

  5. Single-Cell-Genomics-Facilitated Read Binning of Candidate Phylum EM19 Genomes from Geothermal Spring Metagenomes.

    Science.gov (United States)

    Becraft, Eric D; Dodsworth, Jeremy A; Murugapiran, Senthil K; Ohlsson, J Ingemar; Briggs, Brandon R; Kanbar, Jad; De Vlaminck, Iwijn; Quake, Stephen R; Dong, Hailiang; Hedlund, Brian P; Swingley, Wesley D

    2016-02-15

    The vast majority of microbial life remains uncatalogued due to the inability to cultivate these organisms in the laboratory. This "microbial dark matter" represents a substantial portion of the tree of life and of the populations that contribute to chemical cycling in many ecosystems. In this work, we leveraged an existing single-cell genomic data set representing the candidate bacterial phylum "Calescamantes" (EM19) to calibrate machine learning algorithms and define metagenomic bins directly from pyrosequencing reads derived from Great Boiling Spring in the U.S. Great Basin. Compared to other assembly-based methods, taxonomic binning with a read-based machine learning approach yielded final assemblies with the highest predicted genome completeness of any method tested. Read-first binning subsequently was used to extract Calescamantes bins from all metagenomes with abundant Calescamantes populations, including metagenomes from Octopus Spring and Bison Pool in Yellowstone National Park and Gongxiaoshe Spring in Yunnan Province, China. Metabolic reconstruction suggests that Calescamantes are heterotrophic, facultative anaerobes, which can utilize oxidized nitrogen sources as terminal electron acceptors for respiration in the absence of oxygen and use proteins as their primary carbon source. Despite their phylogenetic divergence, the geographically separate Calescamantes populations were highly similar in their predicted metabolic capabilities and core gene content, respiring O2, or oxidized nitrogen species for energy conservation in distant but chemically similar hot springs. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  6. Facilitating genome navigation : survey sequencing and dense radiation-hybrid gene mapping

    NARCIS (Netherlands)

    Hitte, C; Madeoy, J; Kirkness, EF; Priat, C; Lorentzen, TD; Senger, F; Thomas, D; Derrien, T; Ramirez, C; Scott, C; Evanno, G; Pullar, B; Cadieu, E; Oza, [No Value; Lourgant, K; Jaffe, DB; Tacher, S; Dreano, S; Berkova, N; Andre, C; Deloukas, P; Fraser, C; Lindblad-Toh, K; Ostrander, EA; Galibert, F

    Accurate and comprehensive sequence coverage for large genomes has been restricted to only a few species of specific interest. Lower sequence coverage (survey sequencing) of related species can yield a wealth of information about gene content and putative regulatory elements. But survey sequences

  7. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies.

    Directory of Open Access Journals (Sweden)

    Anjana Srivatsan

    2008-08-01

    Full Text Available Whole-genome sequencing is a powerful technique for obtaining the reference sequence information of multiple organisms. Its use can be dramatically expanded to rapidly identify genomic variations, which can be linked with phenotypes to obtain biological insights. We explored these potential applications using the emerging next-generation sequencing platform Solexa Genome Analyzer, and the well-characterized model bacterium Bacillus subtilis. Combining sequencing with experimental verification, we first improved the accuracy of the published sequence of the B. subtilis reference strain 168, then obtained sequences of multiple related laboratory strains and different isolates of each strain. This provides a framework for comparing the divergence between different laboratory strains and between their individual isolates. We also demonstrated the power of Solexa sequencing by using its results to predict a defect in the citrate signal transduction pathway of a common laboratory strain, which we verified experimentally. Finally, we examined the molecular nature of spontaneously generated mutations that suppress the growth defect caused by deletion of the stringent response mediator relA. Using whole-genome sequencing, we rapidly mapped these suppressor mutations to two small homologs of relA. Interestingly, stable suppressor strains had mutations in both genes, with each mutation alone partially relieving the relA growth defect. This supports an intriguing three-locus interaction module that is not easily identifiable through traditional suppressor mapping. We conclude that whole-genome sequencing can drastically accelerate the identification of suppressor mutations and complex genetic interactions, and it can be applied as a standard tool to investigate the genetic traits of model organisms.

  8. Recombination and evolution of duplicate control regions in the mitochondrial genome of the Asian big-headed turtle, Platysternon megacephalum.

    Directory of Open Access Journals (Sweden)

    Chenfei Zheng

    Full Text Available Complete mitochondrial (mt genome sequences with duplicate control regions (CRs have been detected in various animal species. In Testudines, duplicate mtCRs have been reported in the mtDNA of the Asian big-headed turtle, Platysternon megacephalum, which has three living subspecies. However, the evolutionary pattern of these CRs remains unclear. In this study, we report the completed sequences of duplicate CRs from 20 individuals belonging to three subspecies of this turtle and discuss the micro-evolutionary analysis of the evolution of duplicate CRs. Genetic distances calculated with MEGA 4.1 using the complete duplicate CR sequences revealed that within turtle subspecies, genetic distances between orthologous copies from different individuals were 0.63% for CR1 and 1.2% for CR2app:addword:respectively, and the average distance between paralogous copies of CR1 and CR2 was 4.8%. Phylogenetic relationships were reconstructed from the CR sequences, excluding the variable number of tandem repeats (VNTRs at the 3' end using three methods: neighbor-joining, maximum likelihood algorithm, and Bayesian inference. These data show that any two CRs within individuals were more genetically distant from orthologous genes in different individuals within the same subspecies. This suggests independent evolution of the two mtCRs within each P. megacephalum subspecies. Reconstruction of separate phylogenetic trees using different CR components (TAS, CD, CSB, and VNTRs suggested the role of recombination in the evolution of duplicate CRs. Consequently, recombination events were detected using RDP software with break points at ≈290 bp and ≈1,080 bp. Based on these results, we hypothesize that duplicate CRs in P. megacephalum originated from heterological ancestral recombination of mtDNA. Subsequent recombination could have resulted in homogenization during independent evolutionary events, thus maintaining the functions of duplicate CRs in the mtDNA of P

  9. The genome sequence of Barbarea vulgaris facilitates the study of ecological biochemistry

    DEFF Research Database (Denmark)

    Byrne, Stephen L.; Erthmann, Pernille Østerbye; Agerbirk, Niels

    2017-01-01

    The genus Barbarea has emerged as a model for evolution and ecology of plant defense compounds, due to its unusual glucosinolate profile and production of saponins, unique to the Brassicaceae. One species, B. vulgaris, includes two ‘types’, G-type and P-type that differ in trichome density, and t...... deter larvae to the extent that they die. The B. vulgaris genome will promote the study of mechanisms in ecological biochemistry to benefit crop resistance breeding....

  10. Analysis of cis-elements that facilitate extrachromosomal persistence of human papillomavirus genomes

    International Nuclear Information System (INIS)

    Pittayakhajonwut, Daraporn; Angeletti, Peter C.

    2008-01-01

    Human papillomaviruses (HPVs) are maintained latently in dividing epithelial cells as nuclear plasmids. Two virally encoded proteins, E1, a helicase, and E2, a transcription factor, are important players in replication and stable plasmid maintenance in host cells. Recent experiments in yeast have demonstrated that viral genomes retain replication and maintenance function independently of E1 and E2 [Angeletti, P.C., Kim, K., Fernandes, F.J., and Lambert, P.F. (2002). Stable replication of papillomavirus genomes in Saccharomyces cerevisiae. J. Virol. 76(7), 3350-8; Kim, K., Angeletti, P.C., Hassebroek, E.C., and Lambert, P.F. (2005). Identification of cis-acting elements that mediate the replication and maintenance of human papillomavirus type 16 genomes in Saccharomyces cerevisiae. J. Virol. 79(10), 5933-42]. Flow cytometry studies of EGFP-reporter vectors containing subgenomic HPV fragments with or without a human ARS (hARS), revealed that six fragments located in E6-E7, E1-E2, L1, and L2 regions showed a capacity for plasmid stabilization in the absence of E1 and E2 proteins. Interestingly, four fragments within E7, the 3' end of L2, and the 5' end of L1 exhibited stability in plasmids that lacked an hARS, indicating that they possess both replication and maintenance functions. Two fragments lying in E1-E2 and the 3' region of L1 were stable only in the presence of hARS, that they contained only maintenance function. Mutational analyses of HPV16-GFP reporter constructs provided evidence that genomes lacking E1 and E2 could replicate to an extent similar to wild type HPV16. Together these results support the concept that cellular factors influence HPV replication and maintenance, independently, and perhaps in conjunction with E1 and E2, suggesting a role in the persistent phase of the viral lifecycle

  11. FACS-Assisted CRISPR-Cas9 Genome Editing Facilitates Parkinson's Disease Modeling

    Directory of Open Access Journals (Sweden)

    Jonathan Arias-Fuenzalida

    2017-11-01

    Full Text Available Genome editing and human induced pluripotent stem cells hold great promise for the development of isogenic disease models and the correction of disease-associated mutations for isogenic tissue therapy. CRISPR-Cas9 has emerged as a versatile and simple tool for engineering human cells for such purposes. However, the current protocols to derive genome-edited lines require the screening of a great number of clones to obtain one free of random integration or on-locus non-homologous end joining (NHEJ-containing alleles. Here, we describe an efficient method to derive biallelic genome-edited populations by the use of fluorescent markers. We call this technique FACS-assisted CRISPR-Cas9 editing (FACE. FACE allows the derivation of correctly edited polyclones carrying a positive selection fluorescent module and the exclusion of non-edited, random integrations and on-target allele NHEJ-containing cells. We derived a set of isogenic lines containing Parkinson's-disease-associated mutations in α-synuclein and present their comparative phenotypes.

  12. Human CST Facilitates Genome-wide RAD51 Recruitment to GC-Rich Repetitive Sequences in Response to Replication Stress.

    Science.gov (United States)

    Chastain, Megan; Zhou, Qing; Shiva, Olga; Fadri-Moskwik, Maria; Whitmore, Leanne; Jia, Pingping; Dai, Xueyu; Huang, Chenhui; Ye, Ping; Chai, Weihang

    2016-08-02

    The telomeric CTC1/STN1/TEN1 (CST) complex has been implicated in promoting replication recovery under replication stress at genomic regions, yet its precise role is unclear. Here, we report that STN1 is enriched at GC-rich repetitive sequences genome-wide in response to hydroxyurea (HU)-induced replication stress. STN1 deficiency exacerbates the fragility of these sequences under replication stress, resulting in chromosome fragmentation. We find that upon fork stalling, CST proteins form distinct nuclear foci that colocalize with RAD51. Furthermore, replication stress induces physical association of CST with RAD51 in an ATR-dependent manner. Strikingly, CST deficiency diminishes HU-induced RAD51 foci formation and reduces RAD51 recruitment to telomeres and non-telomeric GC-rich fragile sequences. Collectively, our findings establish that CST promotes RAD51 recruitment to GC-rich repetitive sequences in response to replication stress to facilitate replication restart, thereby providing insights into the mechanism underlying genome stability maintenance. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  13. Fungal genome and mating system transitions facilitated by chromosomal translocations involving intercentromeric recombination.

    Directory of Open Access Journals (Sweden)

    Sheng Sun

    2017-08-01

    Full Text Available Species within the human pathogenic Cryptococcus species complex are major threats to public health, causing approximately 1 million annual infections globally. Cryptococcus amylolentus is the most closely known related species of the pathogenic Cryptococcus species complex, and it is non-pathogenic. Additionally, while pathogenic Cryptococcus species have bipolar mating systems with a single large mating type (MAT locus that represents a derived state in Basidiomycetes, C. amylolentus has a tetrapolar mating system with 2 MAT loci (P/R and HD located on different chromosomes. Thus, studying C. amylolentus will shed light on the transition from tetrapolar to bipolar mating systems in the pathogenic Cryptococcus species, as well as its possible link with the origin and evolution of pathogenesis. In this study, we sequenced, assembled, and annotated the genomes of 2 C. amylolentus isolates, CBS6039 and CBS6273, which are sexual and interfertile. Genome comparison between the 2 C. amylolentus isolates identified the boundaries and the complete gene contents of the P/R and HD MAT loci. Bioinformatic and chromatin immunoprecipitation sequencing (ChIP-seq analyses revealed that, similar to those of the pathogenic Cryptococcus species, C. amylolentus has regional centromeres (CENs that are enriched with species-specific transposable and repetitive DNA elements. Additionally, we found that while neither the P/R nor the HD locus is physically closely linked to its centromere in C. amylolentus, and the regions between the MAT loci and their respective centromeres show overall synteny between the 2 genomes, both MAT loci exhibit genetic linkage to their respective centromere during meiosis, suggesting the presence of recombinational suppressors and/or epistatic gene interactions in the MAT-CEN intervening regions. Furthermore, genomic comparisons between C. amylolentus and related pathogenic Cryptococcus species provide evidence that multiple chromosomal

  14. Building a semantic web-based metadata repository for facilitating detailed clinical modeling in cancer genome studies.

    Science.gov (United States)

    Sharma, Deepak K; Solbrig, Harold R; Tao, Cui; Weng, Chunhua; Chute, Christopher G; Jiang, Guoqian

    2017-06-05

    Detailed Clinical Models (DCMs) have been regarded as the basis for retaining computable meaning when data are exchanged between heterogeneous computer systems. To better support clinical cancer data capturing and reporting, there is an emerging need to develop informatics solutions for standards-based clinical models in cancer study domains. The objective of the study is to develop and evaluate a cancer genome study metadata management system that serves as a key infrastructure in supporting clinical information modeling in cancer genome study domains. We leveraged a Semantic Web-based metadata repository enhanced with both ISO11179 metadata standard and Clinical Information Modeling Initiative (CIMI) Reference Model. We used the common data elements (CDEs) defined in The Cancer Genome Atlas (TCGA) data dictionary, and extracted the metadata of the CDEs using the NCI Cancer Data Standards Repository (caDSR) CDE dataset rendered in the Resource Description Framework (RDF). The ITEM/ITEM_GROUP pattern defined in the latest CIMI Reference Model is used to represent reusable model elements (mini-Archetypes). We produced a metadata repository with 38 clinical cancer genome study domains, comprising a rich collection of mini-Archetype pattern instances. We performed a case study of the domain "clinical pharmaceutical" in the TCGA data dictionary and demonstrated enriched data elements in the metadata repository are very useful in support of building detailed clinical models. Our informatics approach leveraging Semantic Web technologies provides an effective way to build a CIMI-compliant metadata repository that would facilitate the detailed clinical modeling to support use cases beyond TCGA in clinical cancer study domains.

  15. DECIDE: a Decision Support Tool to Facilitate Parents' Choices Regarding Genome-Wide Sequencing.

    Science.gov (United States)

    Birch, Patricia; Adam, S; Bansback, N; Coe, R R; Hicklin, J; Lehman, A; Li, K C; Friedman, J M

    2016-12-01

    We describe the rationale, development, and usability testing for an integrated e-learning tool and decision aid for parents facing decisions about genome-wide sequencing (GWS) for their children with a suspected genetic condition. The online tool, DECIDE, is designed to provide decision-support and to promote high quality decisions about undergoing GWS with or without return of optional incidental finding results. DECIDE works by integrating educational material with decision aids. Users may tailor their learning by controlling both the amount of information and its format - text and diagrams and/or short videos. The decision aid guides users to weigh the importance of various relevant factors in their own lives and circumstances. After considering the pros and cons of GWS and return of incidental findings, DECIDE summarizes the user's responses and apparent preferred choices. In a usability study of 16 parents who had already chosen GWS after conventional genetic counselling, all participants found DECIDE to be helpful. Many would have been satisfied to use it alone to guide their GWS decisions, but most would prefer to have the option of consulting a health care professional as well to aid their decision. Further testing is necessary to establish the effectiveness of using DECIDE as an adjunct to or instead of conventional pre-test genetic counselling for clinical genome-wide sequencing.

  16. Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes

    Energy Technology Data Exchange (ETDEWEB)

    White, Richard Allen; Bottos, Eric M.; Roy Chowdhury, Taniya; Zucker, Jeremy D.; Brislawn, Colin J.; Nicora, Carrie D.; Fansler, Sarah J.; Glaesemann, Kurt R.; Glass, Kevin; Jansson, Janet K.; Langille, Morgan

    2016-06-28

    ABSTRACT

    Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600 reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “CandidatusPseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundanceAcidobacteriawere highly transcriptionally active, whereas bins corresponding to high-relative-abundanceVerrucomicrobiawere not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities.

    IMPORTANCESoil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their

  17. Characterizing Big Data Management

    Directory of Open Access Journals (Sweden)

    Rogério Rossi

    2015-06-01

    Full Text Available Big data management is a reality for an increasing number of organizations in many areas and represents a set of challenges involving big data modeling, storage and retrieval, analysis and visualization. However, technological resources, people and processes are crucial to facilitate the management of big data in any kind of organization, allowing information and knowledge from a large volume of data to support decision-making. Big data management can be supported by these three dimensions: technology, people and processes. Hence, this article discusses these dimensions: the technological dimension that is related to storage, analytics and visualization of big data; the human aspects of big data; and, in addition, the process management dimension that involves in a technological and business approach the aspects of big data management.

  18. Ebola virus VP24 interacts with NP to facilitate nucleocapsid assembly and genome packaging.

    Science.gov (United States)

    Banadyga, Logan; Hoenen, Thomas; Ambroggio, Xavier; Dunham, Eric; Groseth, Allison; Ebihara, Hideki

    2017-08-09

    Ebola virus causes devastating hemorrhagic fever outbreaks for which no approved therapeutic exists. The viral nucleocapsid, which is minimally composed of the proteins NP, VP35, and VP24, represents an attractive target for drug development; however, the molecular determinants that govern the interactions and functions of these three proteins are still unknown. Through a series of mutational analyses, in combination with biochemical and bioinformatics approaches, we identified a region on VP24 that was critical for its interaction with NP. Importantly, we demonstrated that the interaction between VP24 and NP was required for both nucleocapsid assembly and genome packaging. Not only does this study underscore the critical role that these proteins play in the viral replication cycle, but it also identifies a key interaction interface on VP24 that may serve as a novel target for antiviral therapeutic intervention.

  19. The genome sequence of Barbarea vulgaris facilitates the study of ecological biochemistry

    DEFF Research Database (Denmark)

    Byrne, Stephen L.; Erthmann, Pernille Østerbye; Agerbirk, Niels

    2017-01-01

    The genus Barbarea has emerged as a model for evolution and ecology of plant defense compounds, due to its unusual glucosinolate profile and production of saponins, unique to the Brassicaceae. One species, B. vulgaris, includes two ‘types’, G-type and P-type that differ in trichome density......, and their glucosinolate and saponin profiles. A key difference is the stereochemistry of hydroxylation of their common phenethylglucosinolate backbone, leading to epimeric glucobarbarins. Here we report a draft genome sequence of the G-type, and re-sequencing of the P-type for comparison. This enables us to identify...... candidate genes underlying glucosinolate diversity, trichome density, and study the genetics of biochemical variation for glucosinolate and saponins. B. vulgaris is resistant to the diamondback moth, and may be exploited for “dead-end” trap cropping where glucosinolates stimulate oviposition and saponins...

  20. The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory disease.

    Science.gov (United States)

    Peng, Xinxia; Alföldi, Jessica; Gori, Kevin; Eisfeld, Amie J; Tyler, Scott R; Tisoncik-Go, Jennifer; Brawand, David; Law, G Lynn; Skunca, Nives; Hatta, Masato; Gasper, David J; Kelly, Sara M; Chang, Jean; Thomas, Matthew J; Johnson, Jeremy; Berlin, Aaron M; Lara, Marcia; Russell, Pamela; Swofford, Ross; Turner-Maier, Jason; Young, Sarah; Hourlier, Thibaut; Aken, Bronwen; Searle, Steve; Sun, Xingshen; Yi, Yaling; Suresh, M; Tumpey, Terrence M; Siepel, Adam; Wisely, Samantha M; Dessimoz, Christophe; Kawaoka, Yoshihiro; Birren, Bruce W; Lindblad-Toh, Kerstin; Di Palma, Federica; Engelhardt, John F; Palermo, Robert E; Katze, Michael G

    2014-12-01

    The domestic ferret (Mustela putorius furo) is an important animal model for multiple human respiratory diseases. It is considered the 'gold standard' for modeling human influenza virus infection and transmission. Here we describe the 2.41 Gb draft genome assembly of the domestic ferret, constituting 2.28 Gb of sequence plus gaps. We annotated 19,910 protein-coding genes on this assembly using RNA-seq data from 21 ferret tissues. We characterized the ferret host response to two influenza virus infections by RNA-seq analysis of 42 ferret samples from influenza time-course data and showed distinct signatures in ferret trachea and lung tissues specific to 1918 or 2009 human pandemic influenza virus infections. Using microarray data from 16 ferret samples reflecting cystic fibrosis disease progression, we showed that transcriptional changes in the CFTR-knockout ferret lung reflect pathways of early disease that cannot be readily studied in human infants with cystic fibrosis disease.

  1. Genomic evidence for role of inversion 3RP of Drosophila melanogaster in facilitating climate change adaptation.

    Science.gov (United States)

    Rane, Rahul V; Rako, Lea; Kapun, Martin; Lee, Siu F; Hoffmann, Ary A

    2015-05-01

    Chromosomal inversion polymorphisms are common in animals and plants, and recent models suggest that alternative arrangements spread by capturing different combinations of alleles acting additively or epistatically to favour local adaptation. It is also thought that inversions typically maintain favoured combinations for a long time by suppressing recombination between alternative chromosomal arrangements. Here, we consider patterns of linkage disequilibrium and genetic divergence in an old inversion polymorphism in Drosophila melanogaster (In(3R)Payne) known to be associated with climate change adaptation and a recent invasion event into Australia. We extracted, karyotyped and sequenced whole chromosomes from two Australian populations, so that changes in the arrangement of the alleles between geographically separated tropical and temperate areas could be compared. Chromosome-wide linkage disequilibrium (LD) analysis revealed strong LD within the region spanned by In(3R)Payne. This genomic region also showed strong differentiation between the tropical and the temperate populations, but no differentiation between different karyotypes from the same population, after controlling for chromosomal arrangement. Patterns of differentiation across the chromosome arm and in gene ontologies were enhanced by the presence of the inversion. These data support the notion that inversions are strongly selected by bringing together combinations of genes, but it is still not clear if such combinations act additively or epistatically. Our data suggest that climatic adaptation through inversions can be dynamic, reflecting changes in the relative abundance of different forms of an inversion and ongoing evolution of allelic content within an inversion. © 2015 John Wiley & Sons Ltd.

  2. Machine learning for Big Data analytics in plants.

    Science.gov (United States)

    Ma, Chuang; Zhang, Hao Helen; Wang, Xiangfeng

    2014-12-01

    Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Big Data in Medicine is Driving Big Changes

    Science.gov (United States)

    Verspoor, K.

    2014-01-01

    Summary Objectives To summarise current research that takes advantage of “Big Data” in health and biomedical informatics applications. Methods Survey of trends in this work, and exploration of literature describing how large-scale structured and unstructured data sources are being used to support applications from clinical decision making and health policy, to drug design and pharmacovigilance, and further to systems biology and genetics. Results The survey highlights ongoing development of powerful new methods for turning that large-scale, and often complex, data into information that provides new insights into human health, in a range of different areas. Consideration of this body of work identifies several important paradigm shifts that are facilitated by Big Data resources and methods: in clinical and translational research, from hypothesis-driven research to data-driven research, and in medicine, from evidence-based practice to practice-based evidence. Conclusions The increasing scale and availability of large quantities of health data require strategies for data management, data linkage, and data integration beyond the limits of many existing information systems, and substantial effort is underway to meet those needs. As our ability to make sense of that data improves, the value of the data will continue to increase. Health systems, genetics and genomics, population and public health; all areas of biomedicine stand to benefit from Big Data and the associated technologies. PMID:25123716

  4. Clinical genomics, big data, and electronic medical records: reconciling patient rights with research when privacy and science collide.

    Science.gov (United States)

    Kulynych, Jennifer; Greely, Henry T

    2017-04-01

    Widespread use of medical records for research, without consent, attracts little scrutiny compared to biospecimen research, where concerns about genomic privacy prompted recent federal proposals to mandate consent. This paper explores an important consequence of the proliferation of electronic health records (EHRs) in this permissive atmosphere: with the advent of clinical gene sequencing, EHR-based secondary research poses genetic privacy risks akin to those of biospecimen research, yet regulators still permit researchers to call gene sequence data 'de-identified', removing such data from the protection of the federal Privacy Rule and federal human subjects regulations. Medical centers and other providers seeking to offer genomic 'personalized medicine' now confront the problem of governing the secondary use of clinical genomic data as privacy risks escalate. We argue that regulators should no longer permit HIPAA-covered entities to treat dense genomic data as de-identified health information. Even with this step, the Privacy Rule would still permit disclosure of clinical genomic data for research, without consent, under a data use agreement, so we also urge that providers give patients specific notice before disclosing clinical genomic data for research, permitting (where possible) some degree of choice and control. To aid providers who offer clinical gene sequencing, we suggest both general approaches and specific actions to reconcile patients' rights and interests with genomic research.

  5. Clinical genomics, big data, and electronic medical records: reconciling patient rights with research when privacy and science collide

    Science.gov (United States)

    Greely, Henry T.

    2017-01-01

    Abstract Widespread use of medical records for research, without consent, attracts little scrutiny compared to biospecimen research, where concerns about genomic privacy prompted recent federal proposals to mandate consent. This paper explores an important consequence of the proliferation of electronic health records (EHRs) in this permissive atmosphere: with the advent of clinical gene sequencing, EHR-based secondary research poses genetic privacy risks akin to those of biospecimen research, yet regulators still permit researchers to call gene sequence data ‘de-identified’, removing such data from the protection of the federal Privacy Rule and federal human subjects regulations. Medical centers and other providers seeking to offer genomic ‘personalized medicine’ now confront the problem of governing the secondary use of clinical genomic data as privacy risks escalate. We argue that regulators should no longer permit HIPAA-covered entities to treat dense genomic data as de-identified health information. Even with this step, the Privacy Rule would still permit disclosure of clinical genomic data for research, without consent, under a data use agreement, so we also urge that providers give patients specific notice before disclosing clinical genomic data for research, permitting (where possible) some degree of choice and control. To aid providers who offer clinical gene sequencing, we suggest both general approaches and specific actions to reconcile patients’ rights and interests with genomic research. PMID:28852559

  6. Big data for health.

    Science.gov (United States)

    Andreu-Perez, Javier; Poon, Carmen C Y; Merrifield, Robert D; Wong, Stephen T C; Yang, Guang-Zhong

    2015-07-01

    This paper provides an overview of recent developments in big data in the context of biomedical and health informatics. It outlines the key characteristics of big data and how medical and health informatics, translational bioinformatics, sensor informatics, and imaging informatics will benefit from an integrated approach of piecing together different aspects of personalized information from a diverse range of data sources, both structured and unstructured, covering genomics, proteomics, metabolomics, as well as imaging, clinical diagnosis, and long-term continuous physiological sensing of an individual. It is expected that recent advances in big data will expand our knowledge for testing new hypotheses about disease management from diagnosis to prevention to personalized treatment. The rise of big data, however, also raises challenges in terms of privacy, security, data ownership, data stewardship, and governance. This paper discusses some of the existing activities and future opportunities related to big data for health, outlining some of the key underlying issues that need to be tackled.

  7. Advancing stroke genomic research in the age of Trans-Omics big data science: Emerging priorities and opportunities.

    Science.gov (United States)

    Owolabi, Mayowa; Peprah, Emmanuel; Xu, Huichun; Akinyemi, Rufus; Tiwari, Hemant K; Irvin, Marguerite R; Wahab, Kolawole Wasiu; Arnett, Donna K; Ovbiagele, Bruce

    2017-11-15

    We systematically reviewed the genetic variants associated with stroke in genome-wide association studies (GWAS) and examined the emerging priorities and opportunities for rapidly advancing stroke research in the era of Trans-Omics science. Using the PRISMA guideline, we searched PubMed and NHGRI- EBI GWAS catalog for stroke studies from 2007 till May 2017. We included 31 studies. The major challenge is that the few validated variants could not account for the full genetic risk of stroke and have not been translated for clinical use. None of the studies included continental Africans. Genomic study of stroke among Africans presents a unique opportunity for the discovery, validation, functional annotation, Trans-Omics study and translation of genomic determinants of stroke with implications for global populations. This is because all humans originated from Africa, a continent with a unique genomic architecture and a distinctive epidemiology of stroke; as well as substantially higher heritability and resolution of fine mapping of stroke genes. Understanding the genomic determinants of stroke and the corresponding molecular mechanisms will revolutionize the development of a new set of precise biomarkers for stroke prediction, diagnosis and prognostic estimates as well as personalized interventions for reducing the global burden of stroke. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Big universe, big data

    DEFF Research Database (Denmark)

    Kremer, Jan; Stensbo-Smidt, Kristoffer; Gieseke, Fabian Cristian

    2017-01-01

    , modern astronomy requires big data know-how, in particular it demands highly efficient machine learning and image analysis algorithms. But scalability is not the only challenge: Astronomy applications touch several current machine learning research questions, such as learning from biased data and dealing......, and highlight some recent methodological advancements in machine learning and image analysis triggered by astronomical applications....

  9. Big data, big responsibilities

    Directory of Open Access Journals (Sweden)

    Primavera De Filippi

    2014-01-01

    Full Text Available Big data refers to the collection and aggregation of large quantities of data produced by and about people, things or the interactions between them. With the advent of cloud computing, specialised data centres with powerful computational hardware and software resources can be used for processing and analysing a humongous amount of aggregated data coming from a variety of different sources. The analysis of such data is all the more valuable to the extent that it allows for specific patterns to be found and new correlations to be made between different datasets, so as to eventually deduce or infer new information, as well as to potentially predict behaviours or assess the likelihood for a certain event to occur. This article will focus specifically on the legal and moral obligations of online operators collecting and processing large amounts of data, to investigate the potential implications of big data analysis on the privacy of individual users and on society as a whole.

  10. Clinical research of traditional Chinese medicine in big data era.

    Science.gov (United States)

    Zhang, Junhua; Zhang, Boli

    2014-09-01

    With the advent of big data era, our thinking, technology and methodology are being transformed. Data-intensive scientific discovery based on big data, named "The Fourth Paradigm," has become a new paradigm of scientific research. Along with the development and application of the Internet information technology in the field of healthcare, individual health records, clinical data of diagnosis and treatment, and genomic data have been accumulated dramatically, which generates big data in medical field for clinical research and assessment. With the support of big data, the defects and weakness may be overcome in the methodology of the conventional clinical evaluation based on sampling. Our research target shifts from the "causality inference" to "correlativity analysis." This not only facilitates the evaluation of individualized treatment, disease prediction, prevention and prognosis, but also is suitable for the practice of preventive healthcare and symptom pattern differentiation for treatment in terms of traditional Chinese medicine (TCM), and for the post-marketing evaluation of Chinese patent medicines. To conduct clinical studies involved in big data in TCM domain, top level design is needed and should be performed orderly. The fundamental construction and innovation studies should be strengthened in the sections of data platform creation, data analysis technology and big-data professionals fostering and training.

  11. Big science

    CERN Multimedia

    Nadis, S

    2003-01-01

    " "Big science" is moving into astronomy, bringing large experimental teams, multi-year research projects, and big budgets. If this is the wave of the future, why are some astronomers bucking the trend?" (2 pages).

  12. Big Data in industry

    Science.gov (United States)

    Latinović, T. S.; Preradović, D. M.; Barz, C. R.; Latinović, M. T.; Petrica, P. P.; Pop-Vadean, A.

    2016-08-01

    The amount of data at the global level has grown exponentially. Along with this phenomena, we have a need for a new unit of measure like exabyte, zettabyte, and yottabyte as the last unit measures the amount of data. The growth of data gives a situation where the classic systems for the collection, storage, processing, and visualization of data losing the battle with a large amount, speed, and variety of data that is generated continuously. Many of data that is created by the Internet of Things, IoT (cameras, satellites, cars, GPS navigation, etc.). It is our challenge to come up with new technologies and tools for the management and exploitation of these large amounts of data. Big Data is a hot topic in recent years in IT circles. However, Big Data is recognized in the business world, and increasingly in the public administration. This paper proposes an ontology of big data analytics and examines how to enhance business intelligence through big data analytics as a service by presenting a big data analytics services-oriented architecture. This paper also discusses the interrelationship between business intelligence and big data analytics. The proposed approach in this paper might facilitate the research and development of business analytics, big data analytics, and business intelligence as well as intelligent agents.

  13. Big Data Analytics in Medicine and Healthcare.

    Science.gov (United States)

    Ristevski, Blagoj; Chen, Ming

    2018-05-10

    This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various - omics data (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenomics, diseasomics), biomedical data and electronic health records data. We underline the challenging issues about big data privacy and security. Regarding big data characteristics, some directions of using suitable and promising open-source distributed data processing software platform are given.

  14. Accurate Dna Assembly And Direct Genome Integration With Optimized Uracil Excision Cloning To Facilitate Engineering Of Escherichia Coli As A Cell Factory

    DEFF Research Database (Denmark)

    Cavaleiro, Mafalda; Kim, Se Hyeuk; Nørholm, Morten

    2015-01-01

    Plants produce a vast diversity of valuable compounds with medical properties, but these are often difficult to purify from the natural source or produce by organic synthesis. An alternative is to transfer the biosynthetic pathways to an efficient production host like the bacterium Escherichia co......-excision-based cloning and combining it with a genome-engineering approach to allow direct integration of whole metabolic pathways into the genome of E. coli, to facilitate the advanced engineering of cell factories........ Cloning and heterologous gene expression are major bottlenecks in the metabolic engineering field. We are working on standardizing DNA vector design processes to promote automation and collaborations in early phase metabolic engineering projects. Here, we focus on optimizing the already established uracil...

  15. Discovery of Nigri/nox and Panto/pox site-specific recombinase systems facilitates advanced genome engineering.

    Science.gov (United States)

    Karimova, Madina; Splith, Victoria; Karpinski, Janet; Pisabarro, M Teresa; Buchholz, Frank

    2016-07-22

    Precise genome engineering is instrumental for biomedical research and holds great promise for future therapeutic applications. Site-specific recombinases (SSRs) are valuable tools for genome engineering due to their exceptional ability to mediate precise excision, integration and inversion of genomic DNA in living systems. The ever-increasing complexity of genome manipulations and the desire to understand the DNA-binding specificity of these enzymes are driving efforts to identify novel SSR systems with unique properties. Here, we describe two novel tyrosine site-specific recombination systems designated Nigri/nox and Panto/pox. Nigri originates from Vibrio nigripulchritudo (plasmid VIBNI_pA) and recombines its target site nox with high efficiency and high target-site selectivity, without recombining target sites of the well established SSRs Cre, Dre, Vika and VCre. Panto, derived from Pantoea sp. aB, is less specific and in addition to its native target site, pox also recombines the target site for Dre recombinase, called rox. This relaxed specificity allowed the identification of residues that are involved in target site selectivity, thereby advancing our understanding of how SSRs recognize their respective DNA targets.

  16. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

    Science.gov (United States)

    Li, Miaoxin; Li, Jiang; Li, Mulin Jun; Pan, Zhicheng; Hsu, Jacob Shujui; Liu, Dajiang J; Zhan, Xiaowei; Wang, Junwen; Song, Youqiang; Sham, Pak Chung

    2017-05-19

    Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Facilitating the indirect detection of genomic DNA in an electrochemical DNA biosensor using magnetic nanoparticles and DNA ligase

    Directory of Open Access Journals (Sweden)

    Roozbeh Hushiarian

    2015-12-01

    This technique was found to be reliably repeatable. The indirect detection of genomic DNA using this method is significantly improved and showed high efficiency in small amounts of samples with the detection limit of 5.37 × 10−14 M.

  18. All the World's a Stage: Facilitating Discovery Science and Improved Cancer Care through the Global Alliance for Genomics and Health.

    Science.gov (United States)

    Lawler, Mark; Siu, Lillian L; Rehm, Heidi L; Chanock, Stephen J; Alterovitz, Gil; Burn, John; Calvo, Fabien; Lacombe, Denis; Teh, Bin Tean; North, Kathryn N; Sawyers, Charles L

    2015-11-01

    The recent explosion of genetic and clinical data generated from tumor genome analysis presents an unparalleled opportunity to enhance our understanding of cancer, but this opportunity is compromised by the reluctance of many in the scientific community to share datasets and the lack of interoperability between different data platforms. The Global Alliance for Genomics and Health is addressing these barriers and challenges through a cooperative framework that encourages "team science" and responsible data sharing, complemented by the development of a series of application program interfaces that link different data platforms, thus breaking down traditional silos and liberating the data to enable new discoveries and ultimately benefit patients. ©2015 American Association for Cancer Research.

  19. A rigorous approach to facilitate and guarantee the correctness of the genetic testing management in human genome information systems.

    Science.gov (United States)

    Araújo, Luciano V; Malkowski, Simon; Braghetto, Kelly R; Passos-Bueno, Maria R; Zatz, Mayana; Pu, Calton; Ferreira, João E

    2011-12-22

    Recent medical and biological technology advances have stimulated the development of new testing systems that have been providing huge, varied amounts of molecular and clinical data. Growing data volumes pose significant challenges for information processing systems in research centers. Additionally, the routines of genomics laboratory are typically characterized by high parallelism in testing and constant procedure changes. This paper describes a formal approach to address this challenge through the implementation of a genetic testing management system applied to human genome laboratory. We introduced the Human Genome Research Center Information System (CEGH) in Brazil, a system that is able to support constant changes in human genome testing and can provide patients updated results based on the most recent and validated genetic knowledge. Our approach uses a common repository for process planning to ensure reusability, specification, instantiation, monitoring, and execution of processes, which are defined using a relational database and rigorous control flow specifications based on process algebra (ACP). The main difference between our approach and related works is that we were able to join two important aspects: 1) process scalability achieved through relational database implementation, and 2) correctness of processes using process algebra. Furthermore, the software allows end users to define genetic testing without requiring any knowledge about business process notation or process algebra. This paper presents the CEGH information system that is a Laboratory Information Management System (LIMS) based on a formal framework to support genetic testing management for Mendelian disorder studies. We have proved the feasibility and showed usability benefits of a rigorous approach that is able to specify, validate, and perform genetic testing using easy end user interfaces.

  20. The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation

    Directory of Open Access Journals (Sweden)

    King Nichole L

    2009-02-01

    Full Text Available Abstract Background Crucial foundations of any quantitative systems biology experiment are correct genome and proteome annotations. Protein databases compiled from high quality empirical protein identifications that are in turn based on correct gene models increase the correctness, sensitivity, and quantitative accuracy of systems biology genome-scale experiments. Results In this manuscript, we present the Drosophila melanogaster PeptideAtlas, a fly proteomics and genomics resource of unsurpassed depth. Based on peptide mass spectrometry data collected in our laboratory the portal http://www.drosophila-peptideatlas.org allows querying fly protein data observed with respect to gene model confirmation and splice site verification as well as for the identification of proteotypic peptides suited for targeted proteomics studies. Additionally, the database provides consensus mass spectra for observed peptides along with qualitative and quantitative information about the number of observations of a particular peptide and the sample(s in which it was observed. Conclusion PeptideAtlas is an open access database for the Drosophila community that has several features and applications that support (1 reduction of the complexity inherently associated with performing targeted proteomic studies, (2 designing and accelerating shotgun proteomics experiments, (3 confirming or questioning gene models, and (4 adjusting gene models such that they are in line with observed Drosophila peptides. While the database consists of proteomic data it is not required that the user is a proteomics expert.

  1. Transient disruption of non-homologous end-joining facilitates targeted genome manipulations in the filamentous fungus Aspergillus nidulans

    DEFF Research Database (Denmark)

    Nielsen, Jakob Blæsbjerg; Nielsen, Michael Lynge; Mortensen, Uffe Hasbro

    2008-01-01

    influences subsequent analyses of the manipulated strain. Our system will facilitate construction of large numbers of defined mutations in A. nidulans. Moreover, as the system can likely be adapted to other filamentous fungi, we expect it will be particularly beneficial in species where NHEJ cannot...... be restored by sexual crossing. (c) 2007 Elsevier Inc. All rights reserved....

  2. Comparative genomics and prediction of conditionally dispensable sequences in legume-infecting Fusarium oxysporum formae speciales facilitates identification of candidate effectors.

    Science.gov (United States)

    Williams, Angela H; Sharma, Mamta; Thatcher, Louise F; Azam, Sarwar; Hane, James K; Sperschneider, Jana; Kidd, Brendan N; Anderson, Jonathan P; Ghosh, Raju; Garg, Gagan; Lichtenzveig, Judith; Kistler, H Corby; Shea, Terrance; Young, Sarah; Buck, Sally-Anne G; Kamphuis, Lars G; Saxena, Rachit; Pande, Suresh; Ma, Li-Jun; Varshney, Rajeev K; Singh, Karam B

    2016-03-05

    Soil-borne fungi of the Fusarium oxysporum species complex cause devastating wilt disease on many crops including legumes that supply human dietary protein needs across many parts of the globe. We present and compare draft genome assemblies for three legume-infecting formae speciales (ff. spp.): F. oxysporum f. sp. ciceris (Foc-38-1) and f. sp. pisi (Fop-37622), significant pathogens of chickpea and pea respectively, the world's second and third most important grain legumes, and lastly f. sp. medicaginis (Fom-5190a) for which we developed a model legume pathosystem utilising Medicago truncatula. Focusing on the identification of pathogenicity gene content, we leveraged the reference genomes of Fusarium pathogens F. oxysporum f. sp. lycopersici (tomato-infecting) and F. solani (pea-infecting) and their well-characterised core and dispensable chromosomes to predict genomic organisation in the newly sequenced legume-infecting isolates. Dispensable chromosomes are not essential for growth and in Fusarium species are known to be enriched in host-specificity and pathogenicity-associated genes. Comparative genomics of the publicly available Fusarium species revealed differential patterns of sequence conservation across F. oxysporum formae speciales, with legume-pathogenic formae speciales not exhibiting greater sequence conservation between them relative to non-legume-infecting formae speciales, possibly indicating the lack of a common ancestral source for legume pathogenicity. Combining predicted dispensable gene content with in planta expression in the model legume-infecting isolate, we identified small conserved regions and candidate effectors, four of which shared greatest similarity to proteins from another legume-infecting ff. spp. We demonstrate that distinction of core and potential dispensable genomic regions of novel F. oxysporum genomes is an effective tool to facilitate effector discovery and the identification of gene content possibly linked to host

  3. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  4. Local exome sequences facilitate imputation of less common variants and increase power of genome wide association studies.

    Directory of Open Access Journals (Sweden)

    Peter K Joshi

    Full Text Available The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 1-10% in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 28-38%, for SNPs with a minor allele frequency in the range 1-3%.

  5. Urbanising Big

    DEFF Research Database (Denmark)

    Ljungwall, Christer

    2013-01-01

    Development in China raises the question of how big a city can become, and at the same time be sustainable, writes Christer Ljungwall of the Swedish Agency for Growth Policy Analysis.......Development in China raises the question of how big a city can become, and at the same time be sustainable, writes Christer Ljungwall of the Swedish Agency for Growth Policy Analysis....

  6. Big Argumentation?

    Directory of Open Access Journals (Sweden)

    Daniel Faltesek

    2013-08-01

    Full Text Available Big Data is nothing new. Public concern regarding the mass diffusion of data has appeared repeatedly with computing innovations, in the formation before Big Data it was most recently referred to as the information explosion. In this essay, I argue that the appeal of Big Data is not a function of computational power, but of a synergistic relationship between aesthetic order and a politics evacuated of a meaningful public deliberation. Understanding, and challenging, Big Data requires an attention to the aesthetics of data visualization and the ways in which those aesthetics would seem to depoliticize information. The conclusion proposes an alternative argumentative aesthetic as the appropriate response to the depoliticization posed by the popular imaginary of Big Data.

  7. Big data

    DEFF Research Database (Denmark)

    Madsen, Anders Koed; Flyverbom, Mikkel; Hilbert, Martin

    2016-01-01

    is to outline a research agenda that can be used to raise a broader set of sociological and practice-oriented questions about the increasing datafication of international relations and politics. First, it proposes a way of conceptualizing big data that is broad enough to open fruitful investigations......The claim that big data can revolutionize strategy and governance in the context of international relations is increasingly hard to ignore. Scholars of international political sociology have mainly discussed this development through the themes of security and surveillance. The aim of this paper...... into the emerging use of big data in these contexts. This conceptualization includes the identification of three moments contained in any big data practice. Second, it suggests a research agenda built around a set of subthemes that each deserve dedicated scrutiny when studying the interplay between big data...

  8. EuGI: a novel resource for studying genomic islands to facilitate horizontal gene transfer detection in eukaryotes.

    Science.gov (United States)

    Clasen, Frederick Johannes; Pierneef, Rian Ewald; Slippers, Bernard; Reva, Oleg

    2018-05-03

    Genomic islands (GIs) are inserts of foreign DNA that have potentially arisen through horizontal gene transfer (HGT). There are evidences that GIs can contribute significantly to the evolution of prokaryotes. The acquisition of GIs through HGT in eukaryotes has, however, been largely unexplored. In this study, the previously developed GI prediction tool, SeqWord Gene Island Sniffer (SWGIS), is modified to predict GIs in eukaryotic chromosomes. Artificial simulations are used to estimate ratios of predicting false positive and false negative GIs by inserting GIs into different test chromosomes and performing the SWGIS v2.0 algorithm. Using SWGIS v2.0, GIs are then identified in 36 fungal, 22 protozoan and 8 invertebrate genomes. SWGIS v2.0 predicts GIs in large eukaryotic chromosomes based on the atypical nucleotide composition of these regions. Averages for predicting false negative and false positive GIs were 20.1% and 11.01% respectively. A total of 10,550 GIs were identified in 66 eukaryotic species with 5299 of these GIs coding for at least one functional protein. The EuGI web-resource, freely accessible at http://eugi.bi.up.ac.za , was developed that allows browsing the database created from identified GIs and genes within GIs through an interactive and visual interface. SWGIS v2.0 along with the EuGI database, which houses GIs identified in 66 different eukaryotic species, and the EuGI web-resource, provide the first comprehensive resource for studying HGT in eukaryotes.

  9. How Big Is Too Big?

    Science.gov (United States)

    Cibes, Margaret; Greenwood, James

    2016-01-01

    Media Clips appears in every issue of Mathematics Teacher, offering readers contemporary, authentic applications of quantitative reasoning based on print or electronic media. This issue features "How Big is Too Big?" (Margaret Cibes and James Greenwood) in which students are asked to analyze the data and tables provided and answer a…

  10. Genome-wide comparison of paired fresh frozen and formalin-fixed paraffin-embedded gliomas by custom BAC and oligonucleotide array comparative genomic hybridization: facilitating analysis of archival gliomas.

    Science.gov (United States)

    Mohapatra, Gayatry; Engler, David A; Starbuck, Kristen D; Kim, James C; Bernay, Derek C; Scangas, George A; Rousseau, Audrey; Batchelor, Tracy T; Betensky, Rebecca A; Louis, David N

    2011-04-01

    Array comparative genomic hybridization (aCGH) is a powerful tool for detecting DNA copy number alterations (CNA). Because diffuse malignant gliomas are often sampled by small biopsies, formalin-fixed paraffin-embedded (FFPE) blocks are often the only tissue available for genetic analysis; FFPE tissues are also needed to study the intratumoral heterogeneity that characterizes these neoplasms. In this paper, we present a combination of evaluations and technical advances that provide strong support for the ready use of oligonucleotide aCGH on FFPE diffuse gliomas. We first compared aCGH using bacterial artificial chromosome (BAC) arrays in 45 paired frozen and FFPE gliomas, and demonstrate a high concordance rate between FFPE and frozen DNA in an individual clone-level analysis of sensitivity and specificity, assuring that under certain array conditions, frozen and FFPE DNA can perform nearly identically. However, because oligonucleotide arrays offer advantages to BAC arrays in genomic coverage and practical availability, we next developed a method of labeling DNA from FFPE tissue that allows efficient hybridization to oligonucleotide arrays. To demonstrate utility in FFPE tissues, we applied this approach to biphasic anaplastic oligoastrocytomas and demonstrate CNA differences between DNA obtained from the two components. Therefore, BAC and oligonucleotide aCGH can be sensitive and specific tools for detecting CNAs in FFPE DNA, and novel labeling techniques enable the routine use of oligonucleotide arrays for FFPE DNA. In combination, these advances should facilitate genome-wide analysis of rare, small and/or histologically heterogeneous gliomas from FFPE tissues.

  11. MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.

    Science.gov (United States)

    Wang, Julia; Al-Ouran, Rami; Hu, Yanhui; Kim, Seon-Young; Wan, Ying-Wooi; Wangler, Michael F; Yamamoto, Shinya; Chao, Hsiao-Tuan; Comjean, Aram; Mohr, Stephanie E; Perrimon, Norbert; Liu, Zhandong; Bellen, Hugo J

    2017-06-01

    One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  12. Private and Efficient Query Processing on Outsourced Genomic Databases.

    Science.gov (United States)

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  13. Identifying Health Information Technology Needs of Oncologists to Facilitate the Adoption of Genomic Medicine: Recommendations From the 2016 American Society of Clinical Oncology Omics and Precision Oncology Workshop.

    Science.gov (United States)

    Hughes, Kevin S; Ambinder, Edward P; Hess, Gregory P; Yu, Peter Paul; Bernstam, Elmer V; Routbort, Mark J; Clemenceau, Jean Rene; Hamm, John T; Febbo, Phillip G; Domchek, Susan M; Chen, James L; Warner, Jeremy L

    2017-09-20

    At the ASCO Data Standards and Interoperability Summit held in May 2016, it was unanimously decided that four areas of current oncology clinical practice have serious, unmet health information technology needs. The following areas of need were identified: 1) omics and precision oncology, 2) advancing interoperability, 3) patient engagement, and 4) value-based oncology. To begin to address these issues, ASCO convened two complementary workshops: the Omics and Precision Oncology Workshop in October 2016 and the Advancing Interoperability Workshop in December 2016. A common goal was to address the complexity, enormity, and rapidly changing nature of genomic information, which existing electronic health records are ill equipped to manage. The subject matter experts invited to the Omics and Precision Oncology Workgroup were tasked with the responsibility of determining a specific, limited need that could be addressed by a software application (app) in the short-term future, using currently available genomic knowledge bases. Hence, the scope of this workshop was to determine the basic functionality of one app that could serve as a test case for app development. The goal of the second workshop, described separately, was to identify the specifications for such an app. This approach was chosen both to facilitate the development of a useful app and to help ASCO and oncologists better understand the mechanics, difficulties, and gaps in genomic clinical decision support tool development. In this article, we discuss the key challenges and recommendations identified by the workshop participants. Our hope is to narrow the gap between the practicing oncologist and ongoing national efforts to provide precision oncology and value-based care to cancer patients.

  14. Big Data and medicine: a big deal?

    Science.gov (United States)

    Mayer-Schönberger, V; Ingelsson, E

    2018-05-01

    Big Data promises huge benefits for medical research. Looking beyond superficial increases in the amount of data collected, we identify three key areas where Big Data differs from conventional analyses of data samples: (i) data are captured more comprehensively relative to the phenomenon under study; this reduces some bias but surfaces important trade-offs, such as between data quantity and data quality; (ii) data are often analysed using machine learning tools, such as neural networks rather than conventional statistical methods resulting in systems that over time capture insights implicit in data, but remain black boxes, rarely revealing causal connections; and (iii) the purpose of the analyses of data is no longer simply answering existing questions, but hinting at novel ones and generating promising new hypotheses. As a consequence, when performed right, Big Data analyses can accelerate research. Because Big Data approaches differ so fundamentally from small data ones, research structures, processes and mindsets need to adjust. The latent value of data is being reaped through repeated reuse of data, which runs counter to existing practices not only regarding data privacy, but data management more generally. Consequently, we suggest a number of adjustments such as boards reviewing responsible data use, and incentives to facilitate comprehensive data sharing. As data's role changes to a resource of insight, we also need to acknowledge the importance of collecting and making data available as a crucial part of our research endeavours, and reassess our formal processes from career advancement to treatment approval. © 2017 The Association for the Publication of the Journal of Internal Medicine.

  15. Big Surveys, Big Data Centres

    Science.gov (United States)

    Schade, D.

    2016-06-01

    Well-designed astronomical surveys are powerful and have consistently been keystones of scientific progress. The Byurakan Surveys using a Schmidt telescope with an objective prism produced a list of about 3000 UV-excess Markarian galaxies but these objects have stimulated an enormous amount of further study and appear in over 16,000 publications. The CFHT Legacy Surveys used a wide-field imager to cover thousands of square degrees and those surveys are mentioned in over 1100 publications since 2002. Both ground and space-based astronomy have been increasing their investments in survey work. Survey instrumentation strives toward fair samples and large sky coverage and therefore strives to produce massive datasets. Thus we are faced with the "big data" problem in astronomy. Survey datasets require specialized approaches to data management. Big data places additional challenging requirements for data management. If the term "big data" is defined as data collections that are too large to move then there are profound implications for the infrastructure that supports big data science. The current model of data centres is obsolete. In the era of big data the central problem is how to create architectures that effectively manage the relationship between data collections, networks, processing capabilities, and software, given the science requirements of the projects that need to be executed. A stand alone data silo cannot support big data science. I'll describe the current efforts of the Canadian community to deal with this situation and our successes and failures. I'll talk about how we are planning in the next decade to try to create a workable and adaptable solution to support big data science.

  16. Investigating core genetic-and-epigenetic cell cycle networks for stemness and carcinogenic mechanisms, and cancer drug design using big database mining and genome-wide next-generation sequencing data.

    Science.gov (United States)

    Li, Cheng-Wei; Chen, Bor-Sen

    2016-10-01

    Recent studies have demonstrated that cell cycle plays a central role in development and carcinogenesis. Thus, the use of big databases and genome-wide high-throughput data to unravel the genetic and epigenetic mechanisms underlying cell cycle progression in stem cells and cancer cells is a matter of considerable interest. Real genetic-and-epigenetic cell cycle networks (GECNs) of embryonic stem cells (ESCs) and HeLa cancer cells were constructed by applying system modeling, system identification, and big database mining to genome-wide next-generation sequencing data. Real GECNs were then reduced to core GECNs of HeLa cells and ESCs by applying principal genome-wide network projection. In this study, we investigated potential carcinogenic and stemness mechanisms for systems cancer drug design by identifying common core and specific GECNs between HeLa cells and ESCs. Integrating drug database information with the specific GECNs of HeLa cells could lead to identification of multiple drugs for cervical cancer treatment with minimal side-effects on the genes in the common core. We found that dysregulation of miR-29C, miR-34A, miR-98, and miR-215; and methylation of ANKRD1, ARID5B, CDCA2, PIF1, STAMBPL1, TROAP, ZNF165, and HIST1H2AJ in HeLa cells could result in cell proliferation and anti-apoptosis through NFκB, TGF-β, and PI3K pathways. We also identified 3 drugs, methotrexate, quercetin, and mimosine, which repressed the activated cell cycle genes, ARID5B, STK17B, and CCL2, in HeLa cells with minimal side-effects.

  17. Big Opportunities and Big Concerns of Big Data in Education

    Science.gov (United States)

    Wang, Yinying

    2016-01-01

    Against the backdrop of the ever-increasing influx of big data, this article examines the opportunities and concerns over big data in education. Specifically, this article first introduces big data, followed by delineating the potential opportunities of using big data in education in two areas: learning analytics and educational policy. Then, the…

  18. Big Dreams

    Science.gov (United States)

    Benson, Michael T.

    2015-01-01

    The Keen Johnson Building is symbolic of Eastern Kentucky University's historic role as a School of Opportunity. It is a place that has inspired generations of students, many from disadvantaged backgrounds, to dream big dreams. The construction of the Keen Johnson Building was inspired by a desire to create a student union facility that would not…

  19. Big Science

    Energy Technology Data Exchange (ETDEWEB)

    Anon.

    1986-05-15

    Astronomy, like particle physics, has become Big Science where the demands of front line research can outstrip the science budgets of whole nations. Thus came into being the European Southern Observatory (ESO), founded in 1962 to provide European scientists with a major modern observatory to study the southern sky under optimal conditions.

  20. The complete mitochondrial genome of the cryptic "lineage B" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) in Indo-West Pacific.

    Science.gov (United States)

    Shen, Kang-Ning; Yen, Ta-Chi; Chen, Ching-Hung; Ye, Jeng-Jia; Hsiao, Chung-Der

    2016-05-01

    In this study, the complete mitogenome sequence of the cryptic "lineage B" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) has been sequenced by next-generation sequencing method. The assembled mitogenome consisting of 16,694 bp, includes 13 protein coding genes, 25 transfer RNAs, 2 ribosomal RNAs genes. The overall base composition of "lineage B" S. lessoniana is 36.7% for A, 18.9 % for C, 34.5 % for T and 9.8 % for G and show 90% identities to "lineage C" S. lessoniana. It is also exhibits high T + A content (71.2%), two non-coding regions with TA tandem repeats. The complete mitogenome of the cryptic "lineage B" S. lessoniana provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for big-fin reef squid species complex.

  1. The complete mitochondrial genome of the cryptic "lineage A" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) in Indo-West Pacific.

    Science.gov (United States)

    Hsiao, Chung-Der; Shen, Kang-Ning; Ching, Tzu-Yun; Wang, Ya-Hsien; Ye, Jeng-Jia; Tsai, Shiou-Yi; Wu, Shan-Chun; Chen, Ching-Hung; Wang, Chia-Hui

    2016-07-01

    In this study, the complete mitogenome sequence of the cryptic "lineage A" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome consists of 16,605 bp, which includes 13 protein-coding genes, 22 transfer RNAs, and 2 ribosomal RNAs genes. The overall base composition of "lineage A" S. lessoniana is 37.5% for A, 17.4% for C, 9.1% for G, and 35.9% for T and shows 87% identities to "lineage C" S. lessoniana. It is also noticed by its high T + A content (73.4%), two non-coding regions with TA tandem repeats. The complete mitogenome of the cryptic "lineage A" S. lessoniana provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for big-fin reef squid species complex.

  2. Transforming Big Data into cancer-relevant insight: An initial, multi-tier approach to assess reproducibility and relevance* | Office of Cancer Genomics

    Science.gov (United States)

    The Cancer Target Discovery and Development (CTD^2) Network was established to accelerate the transformation of "Big Data" into novel pharmacological targets, lead compounds, and biomarkers for rapid translation into improved patient outcomes. It rapidly became clear in this collaborative network that a key central issue was to define what constitutes sufficient computational or experimental evidence to support a biologically or clinically relevant finding.

  3. Big Egos in Big Science

    DEFF Research Database (Denmark)

    Andersen, Kristina Vaarst; Jeppesen, Jacob

    In this paper we investigate the micro-mechanisms governing structural evolution and performance of scientific collaboration. Scientific discovery tends not to be lead by so called lone ?stars?, or big egos, but instead by collaboration among groups of researchers, from a multitude of institutions...

  4. Big Data and Big Science

    OpenAIRE

    Di Meglio, Alberto

    2014-01-01

    Brief introduction to the challenges of big data in scientific research based on the work done by the HEP community at CERN and how the CERN openlab promotes collaboration among research institutes and industrial IT companies. Presented at the FutureGov 2014 conference in Singapore.

  5. The BIG Data Center: from deposition to integration to translation.

    Science.gov (United States)

    2017-01-04

    Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Big inquiry

    Energy Technology Data Exchange (ETDEWEB)

    Wynne, B [Lancaster Univ. (UK)

    1979-06-28

    The recently published report entitled 'The Big Public Inquiry' from the Council for Science and Society and the Outer Circle Policy Unit is considered, with especial reference to any future enquiry which may take place into the first commercial fast breeder reactor. Proposals embodied in the report include stronger rights for objectors and an attempt is made to tackle the problem that participation in a public inquiry is far too late to be objective. It is felt by the author that the CSS/OCPU report is a constructive contribution to the debate about big technology inquiries but that it fails to understand the deeper currents in the economic and political structure of technology which so influence the consequences of whatever formal procedures are evolved.

  7. Big Data

    OpenAIRE

    Bútora, Matúš

    2017-01-01

    Cieľom bakalárskej práca je popísať problematiku Big Data a agregačné operácie OLAP pre podporu rozhodovania, ktoré sú na ne aplikované pomocou technológie Apache Hadoop. Prevažná časť práce je venovaná popisu práve tejto technológie. Posledná kapitola sa zaoberá spôsobom aplikovania agregačných operácií a problematikou ich realizácie. Nasleduje celkové zhodnotenie práce a možnosti využitia výsledného systému do budúcna. The aim of the bachelor thesis is to describe the Big Data issue and ...

  8. GrabBlur--a framework to facilitate the secure exchange of whole-exome and -genome SNV data using VCF files.

    Science.gov (United States)

    Stade, Björn; Seelow, Dominik; Thomsen, Ingo; Krawczak, Michael; Franke, Andre

    2014-01-01

    Next Generation Sequencing (NGS) of whole exomes or genomes is increasingly being used in human genetic research and diagnostics. Sharing NGS data with third parties can help physicians and researchers to identify causative or predisposing mutations for a specific sample of interest more efficiently. In many cases, however, the exchange of such data may collide with data privacy regulations. GrabBlur is a newly developed tool to aggregate and share NGS-derived single nucleotide variant (SNV) data in a public database, keeping individual samples unidentifiable. In contrast to other currently existing SNV databases, GrabBlur includes phenotypic information and contact details of the submitter of a given database entry. By means of GrabBlur human geneticists can securely and easily share SNV data from resequencing projects. GrabBlur can ease the interpretation of SNV data by offering basic annotations, genotype frequencies and in particular phenotypic information - given that this information was shared - for the SNV of interest. GrabBlur facilitates the combination of phenotypic and NGS data (VCF files) via a local interface or command line operations. Data submissions may include HPO (Human Phenotype Ontology) terms, other trait descriptions, NGS technology information and the identity of the submitter. Most of this information is optional and its provision at the discretion of the submitter. Upon initial intake, GrabBlur merges and aggregates all sample-specific data. If a certain SNV is rare, the sample-specific information is replaced with the submitter identity. Generally, all data in GrabBlur are highly aggregated so that they can be shared with others while ensuring maximum privacy. Thus, it is impossible to reconstruct complete exomes or genomes from the database or to re-identify single individuals. After the individual information has been sufficiently "blurred", the data can be uploaded into a publicly accessible domain where aggregated genotypes are

  9. BIG DATA

    OpenAIRE

    Abhishek Dubey

    2018-01-01

    The term 'Big Data' portrays inventive methods and advances to catch, store, disseminate, oversee and break down petabyte-or bigger estimated sets of data with high-speed & diverted structures. Enormous information can be organized, non-structured or half-organized, bringing about inadequacy of routine information administration techniques. Information is produced from different distinctive sources and can touch base in the framework at different rates. With a specific end goal to handle this...

  10. Keynote: Big Data, Big Opportunities

    OpenAIRE

    Borgman, Christine L.

    2014-01-01

    The enthusiasm for big data is obscuring the complexity and diversity of data in scholarship and the challenges for stewardship. Inside the black box of data are a plethora of research, technology, and policy issues. Data are not shiny objects that are easily exchanged. Rather, data are representations of observations, objects, or other entities used as evidence of phenomena for the purposes of research or scholarship. Data practices are local, varying from field to field, individual to indiv...

  11. Big Data Analytics in Healthcare.

    Science.gov (United States)

    Belle, Ashwin; Thiagarajan, Raghuram; Soroushmehr, S M Reza; Navidi, Fatemeh; Beard, Daniel A; Najarian, Kayvan

    2015-01-01

    The rapidly expanding field of big data analytics has started to play a pivotal role in the evolution of healthcare practices and research. It has provided tools to accumulate, manage, analyze, and assimilate large volumes of disparate, structured, and unstructured data produced by current healthcare systems. Big data analytics has been recently applied towards aiding the process of care delivery and disease exploration. However, the adoption rate and research development in this space is still hindered by some fundamental problems inherent within the big data paradigm. In this paper, we discuss some of these major challenges with a focus on three upcoming and promising areas of medical research: image, signal, and genomics based analytics. Recent research which targets utilization of large volumes of medical data while combining multimodal data from disparate sources is discussed. Potential areas of research within this field which have the ability to provide meaningful impact on healthcare delivery are also examined.

  12. Databases and web tools for cancer genomics study.

    Science.gov (United States)

    Yang, Yadong; Dong, Xunong; Xie, Bingbing; Ding, Nan; Chen, Juan; Li, Yongjun; Zhang, Qian; Qu, Hongzhu; Fang, Xiangdong

    2015-02-01

    Publicly-accessible resources have promoted the advance of scientific discovery. The era of genomics and big data has brought the need for collaboration and data sharing in order to make effective use of this new knowledge. Here, we describe the web resources for cancer genomics research and rate them on the basis of the diversity of cancer types, sample size, omics data comprehensiveness, and user experience. The resources reviewed include data repository and analysis tools; and we hope such introduction will promote the awareness and facilitate the usage of these resources in the cancer research community. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  13. Networking for big data

    CERN Document Server

    Yu, Shui; Misic, Jelena; Shen, Xuemin (Sherman)

    2015-01-01

    Networking for Big Data supplies an unprecedented look at cutting-edge research on the networking and communication aspects of Big Data. Starting with a comprehensive introduction to Big Data and its networking issues, it offers deep technical coverage of both theory and applications.The book is divided into four sections: introduction to Big Data, networking theory and design for Big Data, networking security for Big Data, and platforms and systems for Big Data applications. Focusing on key networking issues in Big Data, the book explains network design and implementation for Big Data. It exa

  14. Big Data

    DEFF Research Database (Denmark)

    Aaen, Jon; Nielsen, Jeppe Agger

    2016-01-01

    Big Data byder sig til som en af tidens mest hypede teknologiske innovationer, udråbt til at rumme kimen til nye, værdifulde operationelle indsigter for private virksomheder og offentlige organisationer. Mens de optimistiske udmeldinger er mange, er forskningen i Big Data i den offentlige sektor...... indtil videre begrænset. Denne artikel belyser, hvordan den offentlige sundhedssektor kan genanvende og udnytte en stadig større mængde data under hensyntagen til offentlige værdier. Artiklen bygger på et casestudie af anvendelsen af store mængder sundhedsdata i Dansk AlmenMedicinsk Database (DAMD......). Analysen viser, at (gen)brug af data i nye sammenhænge er en flerspektret afvejning mellem ikke alene økonomiske rationaler og kvalitetshensyn, men også kontrol over personfølsomme data og etiske implikationer for borgeren. I DAMD-casen benyttes data på den ene side ”i den gode sags tjeneste” til...

  15. Big data analytics turning big data into big money

    CERN Document Server

    Ohlhorst, Frank J

    2012-01-01

    Unique insights to implement big data analytics and reap big returns to your bottom line Focusing on the business and financial value of big data analytics, respected technology journalist Frank J. Ohlhorst shares his insights on the newly emerging field of big data analytics in Big Data Analytics. This breakthrough book demonstrates the importance of analytics, defines the processes, highlights the tangible and intangible values and discusses how you can turn a business liability into actionable material that can be used to redefine markets, improve profits and identify new business opportuni

  16. Big data analytics methods and applications

    CERN Document Server

    Rao, BLS; Rao, SB

    2016-01-01

    This book has a collection of articles written by Big Data experts to describe some of the cutting-edge methods and applications from their respective areas of interest, and provides the reader with a detailed overview of the field of Big Data Analytics as it is practiced today. The chapters cover technical aspects of key areas that generate and use Big Data such as management and finance; medicine and healthcare; genome, cytome and microbiome; graphs and networks; Internet of Things; Big Data standards; bench-marking of systems; and others. In addition to different applications, key algorithmic approaches such as graph partitioning, clustering and finite mixture modelling of high-dimensional data are also covered. The varied collection of themes in this volume introduces the reader to the richness of the emerging field of Big Data Analytics.

  17. Big Data Application in Biomedical Research and Health Care: A Literature Review.

    Science.gov (United States)

    Luo, Jake; Wu, Min; Gopukumar, Deepika; Zhao, Yiqing

    2016-01-01

    Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care.

  18. The big bang of genome editing technology: development and application of the CRISPR/Cas9 system in disease animal models

    Science.gov (United States)

    SHAO, Ming; XU, Tian-Rui; CHEN, Ce-Shi

    2016-01-01

    Targeted genome editing technology has been widely used in biomedical studies. The CRISPR-associated RNA-guided endonuclease Cas9 has become a versatile genome editing tool. The CRISPR/Cas9 system is useful for studying gene function through efficient knock-out, knock-in or chromatin modification of the targeted gene loci in various cell types and organisms. It can be applied in a number of fields, such as genetic breeding, disease treatment and gene functional investigation. In this review, we introduce the most recent developments and applications, the challenges, and future directions of Cas9 in generating disease animal model. Derived from the CRISPR adaptive immune system of bacteria, the development trend of Cas9 will inevitably fuel the vital applications from basic research to biotechnology and biomedicine. PMID:27469250

  19. The big bang of genome editing technology: development and application of the CRISPR/Cas9 system in disease animal models.

    Science.gov (United States)

    Shao, Ming; Xu, Tian-Rui; Chen, Ce-Shi

    2016-07-18

    Targeted genome editing technology has been widely used in biomedical studies. The CRISPR-associated RNA-guided endonuclease Cas9 has become a versatile genome editing tool. The CRISPR/Cas9 system is useful for studying gene function through efficient knock-out, knock-in or chromatin modification of the targeted gene loci in various cell types and organisms. It can be applied in a number of fields, such as genetic breeding, disease treatment and gene functional investigation. In this review, we introduce the most recent developments and applications, the challenges, and future directions of Cas9 in generating disease animal model. Derived from the CRISPR adaptive immune system of bacteria, the development trend of Cas9 will inevitably fuel the vital applications from basic research to biotechnology and bio-medicine.

  20. Harnessing NGS and Big Data Optimally: Comparison of miRNA Prediction from Assembled versus Non-assembled Sequencing Data--The Case of the Grass Aegilops tauschii Complex Genome.

    Science.gov (United States)

    Budak, Hikmet; Kantar, Melda

    2015-07-01

    MicroRNAs (miRNAs) are small, endogenous, non-coding RNA molecules that regulate gene expression at the post-transcriptional level. As high-throughput next generation sequencing (NGS) and Big Data rapidly accumulate for various species, efforts for in silico identification of miRNAs intensify. Surprisingly, the effect of the input genomics sequence on the robustness of miRNA prediction was not evaluated in detail to date. In the present study, we performed a homology-based miRNA and isomiRNA prediction of the 5D chromosome of bread wheat progenitor, Aegilops tauschii, using two distinct sequence data sets as input: (1) raw sequence reads obtained from 454-GS FLX Titanium sequencing platform and (2) an assembly constructed from these reads. We also compared this method with a number of available plant sequence datasets. We report here the identification of 62 and 22 miRNAs from raw reads and the assembly, respectively, of which 16 were predicted with high confidence from both datasets. While raw reads promoted sensitivity with the high number of miRNAs predicted, 55% (12 out of 22) of the assembly-based predictions were supported by previous observations, bringing specificity forward compared to the read-based predictions, of which only 37% were supported. Importantly, raw reads could identify several repeat-related miRNAs that could not be detected with the assembly. However, raw reads could not capture 6 miRNAs, for which the stem-loops could only be covered by the relatively longer sequences from the assembly. In summary, the comparison of miRNA datasets obtained by these two strategies revealed that utilization of raw reads, as well as assemblies for in silico prediction, have distinct advantages and disadvantages. Consideration of these important nuances can benefit future miRNA identification efforts in the current age of NGS and Big Data driven life sciences innovation.

  1. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats

    Directory of Open Access Journals (Sweden)

    Graner Andreas

    2008-10-01

    Full Text Available Abstract Background Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR index can be generated to map repetitive regions in genomic sequences. Results We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised. Conclusion An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i.e. potentially gene-containing sequences regions in uncharacterised genomic sequences. The restriction that a particular

  2. [Big Data Revolution or Data Hubris? : On the Data Positivism of Molecular Biology].

    Science.gov (United States)

    Gramelsberger, Gabriele

    2017-12-01

    Genome data, the core of the 2008 proclaimed big data revolution in biology, are automatically generated and analyzed. The transition from the manual laboratory practice of electrophoresis sequencing to automated DNA-sequencing machines and software-based analysis programs was completed between 1982 and 1992. This transition facilitated the first data deluge, which was considerably increased by the second and third generation of DNA-sequencers during the 2000s. However, the strategies for evaluating sequence data were also transformed along with this transition. The paper explores both the computational strategies of automation, as well as the data evaluation culture connected with it, in order to provide a complete picture of the complexity of today's data generation and its intrinsic data positivism. This paper is thereby guided by the question, whether this data positivism is the basis of the big data revolution of molecular biology announced today, or it marks the beginning of its data hubris.

  3. [Relevance of big data for molecular diagnostics].

    Science.gov (United States)

    Bonin-Andresen, M; Smiljanovic, B; Stuhlmüller, B; Sörensen, T; Grützkau, A; Häupl, T

    2018-04-01

    Big data analysis raises the expectation that computerized algorithms may extract new knowledge from otherwise unmanageable vast data sets. What are the algorithms behind the big data discussion? In principle, high throughput technologies in molecular research already introduced big data and the development and application of analysis tools into the field of rheumatology some 15 years ago. This includes especially omics technologies, such as genomics, transcriptomics and cytomics. Some basic methods of data analysis are provided along with the technology, however, functional analysis and interpretation requires adaptation of existing or development of new software tools. For these steps, structuring and evaluating according to the biological context is extremely important and not only a mathematical problem. This aspect has to be considered much more for molecular big data than for those analyzed in health economy or epidemiology. Molecular data are structured in a first order determined by the applied technology and present quantitative characteristics that follow the principles of their biological nature. These biological dependencies have to be integrated into software solutions, which may require networks of molecular big data of the same or even different technologies in order to achieve cross-technology confirmation. More and more extensive recording of molecular processes also in individual patients are generating personal big data and require new strategies for management in order to develop data-driven individualized interpretation concepts. With this perspective in mind, translation of information derived from molecular big data will also require new specifications for education and professional competence.

  4. Engineering Deinococcus radiodurans R1 for bioremediation of non radioactive and radioactive wastes facilitated by comparative genomics with Cupriavidus metallidurans CH34

    International Nuclear Information System (INIS)

    Badri, Hanene; Sghaier, Haitham; Barkallah, Insaf; Ben Salem, Issam; Wafa; Essouiss, Imen; Saied, Nadia; Saidi, M.; Gatri, Faten; Gatri, Maher; Boadabous, Abdellatifs; Leys, Natalie

    2009-01-01

    Deinococcus radiodurans R1 is a poly-extremophile for which a system of genetic transformation and manipulation has been developed and it is being engineered for in situ bioremediation of wastes particularly for cleanup of radioactive waste sites. In this study, additional attempts have been made to evaluate ''bioremediation determinants'' in the genome of D. radiodurans using a comparative-genomic approach with Cupriavidus metallidurans CH34, a multiple metal resistant bacterium. This resulted in the delineation of a set of ORFs that are common or peculiar to C. metallidurans and D. radiodurans. We identified 12 ORFs related to multidrug resistance efflux pumps as a special feature of C. metallidurans compared to D. radiodurans, which is the subject of further experimental work

  5. Big Data - Smart Health Strategies

    Science.gov (United States)

    2014-01-01

    Summary Objectives To select best papers published in 2013 in the field of big data and smart health strategies, and summarize outstanding research efforts. Methods A systematic search was performed using two major bibliographic databases for relevant journal papers. The references obtained were reviewed in a two-stage process, starting with a blinded review performed by the two section editors, and followed by a peer review process operated by external reviewers recognized as experts in the field. Results The complete review process selected four best papers, illustrating various aspects of the special theme, among them: (a) using large volumes of unstructured data and, specifically, clinical notes from Electronic Health Records (EHRs) for pharmacovigilance; (b) knowledge discovery via querying large volumes of complex (both structured and unstructured) biological data using big data technologies and relevant tools; (c) methodologies for applying cloud computing and big data technologies in the field of genomics, and (d) system architectures enabling high-performance access to and processing of large datasets extracted from EHRs. Conclusions The potential of big data in biomedicine has been pinpointed in various viewpoint papers and editorials. The review of current scientific literature illustrated a variety of interesting methods and applications in the field, but still the promises exceed the current outcomes. As we are getting closer towards a solid foundation with respect to common understanding of relevant concepts and technical aspects, and the use of standardized technologies and tools, we can anticipate to reach the potential that big data offer for personalized medicine and smart health strategies in the near future. PMID:25123721

  6. The connection domain in reverse transcriptase facilitates the in vivo annealing of tRNALys3 to HIV-1 genomic RNA

    Directory of Open Access Journals (Sweden)

    Niu Meijuan

    2004-10-01

    Full Text Available Abstract The primer tRNA for reverse transcription in HIV-1, tRNALys3, is selectively packaged into the virus during its assembly, and annealed to the viral genomic RNA. The ribonucleoprotein complex that is involved in the packaging and annealing of tRNALys into HIV-1 consists of Gag, GagPol, tRNALys, lysyl-tRNA synthetase (LysRS, and viral genomic RNA. Gag targets tRNALys for viral packaging through Gag's interaction with LysRS, a tRNALys-binding protein, while reverse transcriptase (RT sequences within GagPol (the thumb domain bind to tRNALys. The further annealing of tRNALys3 to viral RNA requires nucleocapsid (NC sequences in Gag, but not the NC sequences GagPol. In this report, we further show that while the RT connection domain in GagPol is not required for tRNALys3 packaging into the virus, it is required for tRNALys3 annealing to the viral RNA genome.

  7. Big Data, Big Problems: A Healthcare Perspective.

    Science.gov (United States)

    Househ, Mowafa S; Aldosari, Bakheet; Alanazi, Abdullah; Kushniruk, Andre W; Borycki, Elizabeth M

    2017-01-01

    Much has been written on the benefits of big data for healthcare such as improving patient outcomes, public health surveillance, and healthcare policy decisions. Over the past five years, Big Data, and the data sciences field in general, has been hyped as the "Holy Grail" for the healthcare industry promising a more efficient healthcare system with the promise of improved healthcare outcomes. However, more recently, healthcare researchers are exposing the potential and harmful effects Big Data can have on patient care associating it with increased medical costs, patient mortality, and misguided decision making by clinicians and healthcare policy makers. In this paper, we review the current Big Data trends with a specific focus on the inadvertent negative impacts that Big Data could have on healthcare, in general, and specifically, as it relates to patient and clinical care. Our study results show that although Big Data is built up to be as a the "Holy Grail" for healthcare, small data techniques using traditional statistical methods are, in many cases, more accurate and can lead to more improved healthcare outcomes than Big Data methods. In sum, Big Data for healthcare may cause more problems for the healthcare industry than solutions, and in short, when it comes to the use of data in healthcare, "size isn't everything."

  8. Big Game Reporting Stations

    Data.gov (United States)

    Vermont Center for Geographic Information — Point locations of big game reporting stations. Big game reporting stations are places where hunters can legally report harvested deer, bear, or turkey. These are...

  9. Stalin's Big Fleet Program

    National Research Council Canada - National Science Library

    Mauner, Milan

    2002-01-01

    Although Dr. Milan Hauner's study 'Stalin's Big Fleet program' has focused primarily on the formation of Big Fleets during the Tsarist and Soviet periods of Russia's naval history, there are important lessons...

  10. Big Data Semantics

    NARCIS (Netherlands)

    Ceravolo, Paolo; Azzini, Antonia; Angelini, Marco; Catarci, Tiziana; Cudré-Mauroux, Philippe; Damiani, Ernesto; Mazak, Alexandra; van Keulen, Maurice; Jarrar, Mustafa; Santucci, Giuseppe; Sattler, Kai-Uwe; Scannapieco, Monica; Wimmer, Manuel; Wrembel, Robert; Zaraket, Fadi

    2018-01-01

    Big Data technology has discarded traditional data modeling approaches as no longer applicable to distributed data processing. It is, however, largely recognized that Big Data impose novel challenges in data and infrastructure management. Indeed, multiple components and procedures must be

  11. Interaction between the cellular protein eEF1A and the 3'-terminal stem-loop of West Nile virus genomic RNA facilitates viral minus-strand RNA synthesis.

    Science.gov (United States)

    Davis, William G; Blackwell, Jerry L; Shi, Pei-Yong; Brinton, Margo A

    2007-09-01

    RNase footprinting and nitrocellulose filter binding assays were previously used to map one major and two minor binding sites for the cell protein eEF1A on the 3'(+) stem-loop (SL) RNA of West Nile virus (WNV) (3). Base substitutions in the major eEF1A binding site or adjacent areas of the 3'(+) SL were engineered into a WNV infectious clone. Mutations that decreased, as well as ones that increased, eEF1A binding in in vitro assays had a negative effect on viral growth. None of these mutations affected the efficiency of translation of the viral polyprotein from the genomic RNA, but all of the mutations that decreased in vitro eEF1A binding to the 3' SL RNA also decreased viral minus-strand RNA synthesis in transfected cells. Also, a mutation that increased the efficiency of eEF1A binding to the 3' SL RNA increased minus-strand RNA synthesis in transfected cells, which resulted in decreased synthesis of genomic RNA. These results strongly suggest that the interaction between eEF1A and the WNV 3' SL facilitates viral minus-strand synthesis. eEF1A colocalized with viral replication complexes (RC) in infected cells and antibody to eEF1A coimmunoprecipitated viral RC proteins, suggesting that eEF1A facilitates an interaction between the 3' end of the genome and the RC. eEF1A bound with similar efficiencies to the 3'-terminal SL RNAs of four divergent flaviviruses, including a tick-borne flavivirus, and colocalized with dengue virus RC in infected cells. These results suggest that eEF1A plays a similar role in RNA replication for all flaviviruses.

  12. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  13. Indian microchip for Big Bang research in Geneva

    CERN Multimedia

    Bhabani, Soudhriti

    2007-01-01

    "A premier nuclear physics institute here has come up with India's first indigenously designed microchip that will facilitate research on the Big Bang theory in Geneva's CERN, the world's largest particle physics laboratory." (1 page)

  14. Evaluation of a Phylogenetic Marker Based on Genomic Segment B of Infectious Bursal Disease Virus: Facilitating a Feasible Incorporation of this Segment to the Molecular Epidemiology Studies for this Viral Agent.

    Directory of Open Access Journals (Sweden)

    Abdulahi Alfonso-Morales

    Full Text Available Infectious bursal disease (IBD is a highly contagious and acute viral disease, which has caused high mortality rates in birds and considerable economic losses in different parts of the world for more than two decades and it still represents a considerable threat to poultry. The current study was designed to rigorously measure the reliability of a phylogenetic marker included into segment B. This marker can facilitate molecular epidemiology studies, incorporating this segment of the viral genome, to better explain the links between emergence, spreading and maintenance of the very virulent IBD virus (vvIBDV strains worldwide.Sequences of the segment B gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank Database; Cuban sequences were obtained in the current work. A phylogenetic marker named B-marker was assessed by different phylogenetic principles such as saturation of substitution, phylogenetic noise and high consistency. This last parameter is based on the ability of B-marker to reconstruct the same topology as the complete segment B of the viral genome. From the results obtained from B-marker, demographic history for both main lineages of IBDV regarding segment B was performed by Bayesian skyline plot analysis. Phylogenetic analysis for both segments of IBDV genome was also performed, revealing the presence of a natural reassortant strain with segment A from vvIBDV strains and segment B from non-vvIBDV strains within Cuban IBDV population.This study contributes to a better understanding of the emergence of vvIBDV strains, describing molecular epidemiology of IBDV using the state-of-the-art methodology concerning phylogenetic reconstruction. This study also revealed the presence of a novel natural reassorted strain as possible manifest of change in the genetic structure and stability of the vvIBDV strains. Therefore, it highlights the need to obtain information about both genome segments of IBDV for

  15. Evaluation of a Phylogenetic Marker Based on Genomic Segment B of Infectious Bursal Disease Virus: Facilitating a Feasible Incorporation of this Segment to the Molecular Epidemiology Studies for this Viral Agent.

    Science.gov (United States)

    Alfonso-Morales, Abdulahi; Rios, Liliam; Martínez-Pérez, Orlando; Dolz, Roser; Valle, Rosa; Perera, Carmen L; Bertran, Kateri; Frías, Maria T; Ganges, Llilianne; Díaz de Arce, Heidy; Majó, Natàlia; Núñez, José I; Pérez, Lester J

    2015-01-01

    Infectious bursal disease (IBD) is a highly contagious and acute viral disease, which has caused high mortality rates in birds and considerable economic losses in different parts of the world for more than two decades and it still represents a considerable threat to poultry. The current study was designed to rigorously measure the reliability of a phylogenetic marker included into segment B. This marker can facilitate molecular epidemiology studies, incorporating this segment of the viral genome, to better explain the links between emergence, spreading and maintenance of the very virulent IBD virus (vvIBDV) strains worldwide. Sequences of the segment B gene from IBDV strains isolated from diverse geographic locations were obtained from the GenBank Database; Cuban sequences were obtained in the current work. A phylogenetic marker named B-marker was assessed by different phylogenetic principles such as saturation of substitution, phylogenetic noise and high consistency. This last parameter is based on the ability of B-marker to reconstruct the same topology as the complete segment B of the viral genome. From the results obtained from B-marker, demographic history for both main lineages of IBDV regarding segment B was performed by Bayesian skyline plot analysis. Phylogenetic analysis for both segments of IBDV genome was also performed, revealing the presence of a natural reassortant strain with segment A from vvIBDV strains and segment B from non-vvIBDV strains within Cuban IBDV population. This study contributes to a better understanding of the emergence of vvIBDV strains, describing molecular epidemiology of IBDV using the state-of-the-art methodology concerning phylogenetic reconstruction. This study also revealed the presence of a novel natural reassorted strain as possible manifest of change in the genetic structure and stability of the vvIBDV strains. Therefore, it highlights the need to obtain information about both genome segments of IBDV for molecular

  16. Database Resources of the BIG Data Center in 2018.

    Science.gov (United States)

    2018-01-04

    The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Big data computing

    CERN Document Server

    Akerkar, Rajendra

    2013-01-01

    Due to market forces and technological evolution, Big Data computing is developing at an increasing rate. A wide variety of novel approaches and tools have emerged to tackle the challenges of Big Data, creating both more opportunities and more challenges for students and professionals in the field of data computation and analysis. Presenting a mix of industry cases and theory, Big Data Computing discusses the technical and practical issues related to Big Data in intelligent information management. Emphasizing the adoption and diffusion of Big Data tools and technologies in industry, the book i

  18. Microsoft big data solutions

    CERN Document Server

    Jorgensen, Adam; Welch, John; Clark, Dan; Price, Christopher; Mitchell, Brian

    2014-01-01

    Tap the power of Big Data with Microsoft technologies Big Data is here, and Microsoft's new Big Data platform is a valuable tool to help your company get the very most out of it. This timely book shows you how to use HDInsight along with HortonWorks Data Platform for Windows to store, manage, analyze, and share Big Data throughout the enterprise. Focusing primarily on Microsoft and HortonWorks technologies but also covering open source tools, Microsoft Big Data Solutions explains best practices, covers on-premises and cloud-based solutions, and features valuable case studies. Best of all,

  19. Big Data, Small Sample.

    Science.gov (United States)

    Gerlovina, Inna; van der Laan, Mark J; Hubbard, Alan

    2017-05-20

    Multiple comparisons and small sample size, common characteristics of many types of "Big Data" including those that are produced by genomic studies, present specific challenges that affect reliability of inference. Use of multiple testing procedures necessitates calculation of very small tail probabilities of a test statistic distribution. Results based on large deviation theory provide a formal condition that is necessary to guarantee error rate control given practical sample sizes, linking the number of tests and the sample size; this condition, however, is rarely satisfied. Using methods that are based on Edgeworth expansions (relying especially on the work of Peter Hall), we explore the impact of departures of sampling distributions from typical assumptions on actual error rates. Our investigation illustrates how far the actual error rates can be from the declared nominal levels, suggesting potentially wide-spread problems with error rate control, specifically excessive false positives. This is an important factor that contributes to "reproducibility crisis". We also review some other commonly used methods (such as permutation and methods based on finite sampling inequalities) in their application to multiple testing/small sample data. We point out that Edgeworth expansions, providing higher order approximations to the sampling distribution, offer a promising direction for data analysis that could improve reliability of studies relying on large numbers of comparisons with modest sample sizes.

  20. Reversible dual inhibitor against G9a and DNMT1 improves human iPSC derivation enhancing MET and facilitating transcription factor engagement to the genome.

    Directory of Open Access Journals (Sweden)

    Juan Roberto Rodriguez-Madoz

    Full Text Available The combination of defined factors with small molecules targeting epigenetic factors is a strategy that has been shown to enhance optimal derivation of iPSCs and could be used for disease modelling, high throughput screenings and/or regenerative medicine applications. In this study, we showed that a new first-in-class reversible dual G9a/DNMT1 inhibitor compound (CM272 improves the efficiency of human cell reprogramming and iPSC generation from primary cells of healthy donors and patient samples, using both integrative and non-integrative methods. Moreover, CM272 facilitates the generation of human iPSC with only two factors allowing the removal of the most potent oncogenic factor cMYC. Furthermore, we demonstrated that mechanistically, treatment with CM272 induces heterochromatin relaxation, facilitates the engagement of OCT4 and SOX2 transcription factors to OSKM refractory binding regions that are required for iPSC establishment, and enhances mesenchymal to epithelial transition during the early phase of cell reprogramming. Thus, the use of this new G9a/DNMT reversible dual inhibitor compound may represent an interesting alternative for improving cell reprogramming and human iPSC derivation for many different applications while providing interesting insights into reprogramming mechanisms.

  1. Reversible dual inhibitor against G9a and DNMT1 improves human iPSC derivation enhancing MET and facilitating transcription factor engagement to the genome.

    Science.gov (United States)

    Rodriguez-Madoz, Juan Roberto; San Jose-Eneriz, Edurne; Rabal, Obdulia; Zapata-Linares, Natalia; Miranda, Estibaliz; Rodriguez, Saray; Porciuncula, Angelo; Vilas-Zornoza, Amaia; Garate, Leire; Segura, Victor; Guruceaga, Elizabeth; Agirre, Xabier; Oyarzabal, Julen; Prosper, Felipe

    2017-01-01

    The combination of defined factors with small molecules targeting epigenetic factors is a strategy that has been shown to enhance optimal derivation of iPSCs and could be used for disease modelling, high throughput screenings and/or regenerative medicine applications. In this study, we showed that a new first-in-class reversible dual G9a/DNMT1 inhibitor compound (CM272) improves the efficiency of human cell reprogramming and iPSC generation from primary cells of healthy donors and patient samples, using both integrative and non-integrative methods. Moreover, CM272 facilitates the generation of human iPSC with only two factors allowing the removal of the most potent oncogenic factor cMYC. Furthermore, we demonstrated that mechanistically, treatment with CM272 induces heterochromatin relaxation, facilitates the engagement of OCT4 and SOX2 transcription factors to OSKM refractory binding regions that are required for iPSC establishment, and enhances mesenchymal to epithelial transition during the early phase of cell reprogramming. Thus, the use of this new G9a/DNMT reversible dual inhibitor compound may represent an interesting alternative for improving cell reprogramming and human iPSC derivation for many different applications while providing interesting insights into reprogramming mechanisms.

  2. Comprehensive Genomic Profiling Facilitates Implementation of the National Comprehensive Cancer Network Guidelines for Lung Cancer Biomarker Testing and Identifies Patients Who May Benefit From Enrollment in Mechanism-Driven Clinical Trials.

    Science.gov (United States)

    Suh, James H; Johnson, Adrienne; Albacker, Lee; Wang, Kai; Chmielecki, Juliann; Frampton, Garrett; Gay, Laurie; Elvin, Julia A; Vergilio, Jo-Anne; Ali, Siraj; Miller, Vincent A; Stephens, Philip J; Ross, Jeffrey S

    2016-06-01

    The National Comprehensive Cancer Network (NCCN) guidelines for patients with metastatic non-small cell lung cancer (NSCLC) recommend testing for EGFR, BRAF, ERBB2, and MET mutations; ALK, ROS1, and RET rearrangements; and MET amplification. We investigated the feasibility and utility of comprehensive genomic profiling (CGP), a hybrid capture-based next-generation sequencing (NGS) test, in clinical practice. CGP was performed to a mean coverage depth of 576× on 6,832 consecutive cases of NSCLC (2012-2015). Genomic alterations (GAs) (point mutations, small indels, copy number changes, and rearrangements) involving EGFR, ALK, BRAF, ERBB2, MET, ROS1, RET, and KRAS were recorded. We also evaluated lung adenocarcinoma (AD) cases without GAs, involving these eight genes. The median age of the patients was 64 years (range: 13-88 years) and 53% were female. Among the patients studied, 4,876 (71%) harbored at least one GA involving EGFR (20%), ALK (4.1%), BRAF (5.7%), ERBB2 (6.0%), MET (5.6%), ROS1 (1.5%), RET (2.4%), or KRAS (32%). In the remaining cohort of lung AD without these known drivers, 273 cancer-related genes were altered in at least 0.1% of cases, including STK11 (21%), NF1 (13%), MYC (9.8%), RICTOR (6.4%), PIK3CA (5.4%), CDK4 (4.3%), CCND1 (4.0%), BRCA2 (2.5%), NRAS (2.3%), BRCA1 (1.7%), MAP2K1 (1.2%), HRAS (0.7%), NTRK1 (0.7%), and NTRK3 (0.2%). CGP is practical and facilitates implementation of the NCCN guidelines for NSCLC by enabling simultaneous detection of GAs involving all seven driver oncogenes and KRAS. Furthermore, without additional tissue use or cost, CGP identifies patients with "pan-negative" lung AD who may benefit from enrollment in mechanism-driven clinical trials. National Comprehensive Cancer Network guidelines for patients with metastatic non-small cell lung cancer (NSCLC) recommend testing for several genomic alterations (GAs). The feasibility and utility of comprehensive genomic profiling were studied in NSCLC and in lung adenocarcinoma

  3. Genome-wide comparison of paired fresh frozen and formalin-fixed paraffin-embedded gliomas by custom BAC and oligonucleotide array comparative genomic hybridization: facilitating analysis of archival gliomas

    Science.gov (United States)

    Mohapatra, Gayatry; Engler, David A.; Starbuck, Kristen D.; Kim, James C.; Bernay, Derek C.; Scangas, George A.; Rousseau, Audrey; Batchelor, Tracy T.; Betensky, Rebecca A.; Louis, David N.

    2010-01-01

    Molecular genetic analysis of cancer is rapidly evolving as a result of improvement in genomic technologies and the growing applicability of such analyses to clinical oncology. Array based comparative genomic hybridization (aCGH) is a powerful tool for detecting DNA copy number alterations (CNA), particularly in solid tumors, and has been applied to the study of malignant gliomas. In the clinical setting, however, gliomas are often sampled by small biopsies and thus formalin-fixed paraffin-embedded (FFPE) blocks are often the only tissue available for genetic analysis, especially for rare types of gliomas. Moreover, the biological basis for the marked intratumoral heterogeneity in gliomas is most readily addressed in FFPE material. Therefore, for gliomas, the ability to use DNA from FFPE tissue is essential for both clinical and research applications. In this study, we have constructed a custom bacterial artificial chromosome (BAC) array and show excellent sensitivity and specificity for detecting CNAs in a panel of paired frozen and FFPE glioma samples. Our study demonstrates a high concordance rate between CNAs detected in FFPE compared to frozen DNA. We have also developed a method of labeling DNA from FFPE tissue that allows efficient hybridization to oligonucleotide arrays. This labeling technique was applied to a panel of biphasic anaplastic oligoastrocytomas (AOA) to identify genetic changes unique to each component. Together, results from these studies suggest that BAC and oligonucleotide aCGH are sensitive tools for detecting CNAs in FFPE DNA, and can enable genome-wide analysis of rare, small and/or histologically heterogeneous gliomas. PMID:21080181

  4. Big Data: Survey, Technologies, Opportunities, and Challenges

    Directory of Open Access Journals (Sweden)

    Nawsher Khan

    2014-01-01

    Full Text Available Big Data has gained much attention from the academia and the IT industry. In the digital and computing world, information is generated and collected at a rate that rapidly exceeds the boundary range. Currently, over 2 billion people worldwide are connected to the Internet, and over 5 billion individuals own mobile phones. By 2020, 50 billion devices are expected to be connected to the Internet. At this point, predicted data production will be 44 times greater than that in 2009. As information is transferred and shared at light speed on optic fiber and wireless networks, the volume of data and the speed of market growth increase. However, the fast growth rate of such large data generates numerous challenges, such as the rapid growth of data, transfer speed, diverse data, and security. Nonetheless, Big Data is still in its infancy stage, and the domain has not been reviewed in general. Hence, this study comprehensively surveys and classifies the various attributes of Big Data, including its nature, definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a data life cycle that uses the technologies and terminologies of Big Data. Future research directions in this field are determined based on opportunities and several open issues in Big Data domination. These research directions facilitate the exploration of the domain and the development of optimal techniques to address Big Data.

  5. Big data: survey, technologies, opportunities, and challenges.

    Science.gov (United States)

    Khan, Nawsher; Yaqoob, Ibrar; Hashem, Ibrahim Abaker Targio; Inayat, Zakira; Ali, Waleed Kamaleldin Mahmoud; Alam, Muhammad; Shiraz, Muhammad; Gani, Abdullah

    2014-01-01

    Big Data has gained much attention from the academia and the IT industry. In the digital and computing world, information is generated and collected at a rate that rapidly exceeds the boundary range. Currently, over 2 billion people worldwide are connected to the Internet, and over 5 billion individuals own mobile phones. By 2020, 50 billion devices are expected to be connected to the Internet. At this point, predicted data production will be 44 times greater than that in 2009. As information is transferred and shared at light speed on optic fiber and wireless networks, the volume of data and the speed of market growth increase. However, the fast growth rate of such large data generates numerous challenges, such as the rapid growth of data, transfer speed, diverse data, and security. Nonetheless, Big Data is still in its infancy stage, and the domain has not been reviewed in general. Hence, this study comprehensively surveys and classifies the various attributes of Big Data, including its nature, definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a data life cycle that uses the technologies and terminologies of Big Data. Future research directions in this field are determined based on opportunities and several open issues in Big Data domination. These research directions facilitate the exploration of the domain and the development of optimal techniques to address Big Data.

  6. Big Data: Survey, Technologies, Opportunities, and Challenges

    Science.gov (United States)

    Khan, Nawsher; Yaqoob, Ibrar; Hashem, Ibrahim Abaker Targio; Inayat, Zakira; Mahmoud Ali, Waleed Kamaleldin; Alam, Muhammad; Shiraz, Muhammad; Gani, Abdullah

    2014-01-01

    Big Data has gained much attention from the academia and the IT industry. In the digital and computing world, information is generated and collected at a rate that rapidly exceeds the boundary range. Currently, over 2 billion people worldwide are connected to the Internet, and over 5 billion individuals own mobile phones. By 2020, 50 billion devices are expected to be connected to the Internet. At this point, predicted data production will be 44 times greater than that in 2009. As information is transferred and shared at light speed on optic fiber and wireless networks, the volume of data and the speed of market growth increase. However, the fast growth rate of such large data generates numerous challenges, such as the rapid growth of data, transfer speed, diverse data, and security. Nonetheless, Big Data is still in its infancy stage, and the domain has not been reviewed in general. Hence, this study comprehensively surveys and classifies the various attributes of Big Data, including its nature, definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a data life cycle that uses the technologies and terminologies of Big Data. Future research directions in this field are determined based on opportunities and several open issues in Big Data domination. These research directions facilitate the exploration of the domain and the development of optimal techniques to address Big Data. PMID:25136682

  7. HARNESSING BIG DATA VOLUMES

    Directory of Open Access Journals (Sweden)

    Bogdan DINU

    2014-04-01

    Full Text Available Big Data can revolutionize humanity. Hidden within the huge amounts and variety of the data we are creating we may find information, facts, social insights and benchmarks that were once virtually impossible to find or were simply inexistent. Large volumes of data allow organizations to tap in real time the full potential of all the internal or external information they possess. Big data calls for quick decisions and innovative ways to assist customers and the society as a whole. Big data platforms and product portfolio will help customers harness to the full the value of big data volumes. This paper deals with technical and technological issues related to handling big data volumes in the Big Data environment.

  8. The big bang

    International Nuclear Information System (INIS)

    Chown, Marcus.

    1987-01-01

    The paper concerns the 'Big Bang' theory of the creation of the Universe 15 thousand million years ago, and traces events which physicists predict occurred soon after the creation. Unified theory of the moment of creation, evidence of an expanding Universe, the X-boson -the particle produced very soon after the big bang and which vanished from the Universe one-hundredth of a second after the big bang, and the fate of the Universe, are all discussed. (U.K.)

  9. Molecular mimicry of human tRNALys anti-codon domain by HIV-1 RNA genome facilitates tRNA primer annealing.

    Science.gov (United States)

    Jones, Christopher P; Saadatmand, Jenan; Kleiman, Lawrence; Musier-Forsyth, Karin

    2013-02-01

    The primer for initiating reverse transcription in human immunodeficiency virus type 1 (HIV-1) is tRNA(Lys3). Host cell tRNA(Lys) is selectively packaged into HIV-1 through a specific interaction between the major tRNA(Lys)-binding protein, human lysyl-tRNA synthetase (hLysRS), and the viral proteins Gag and GagPol. Annealing of the tRNA primer onto the complementary primer-binding site (PBS) in viral RNA is mediated by the nucleocapsid domain of Gag. The mechanism by which tRNA(Lys3) is targeted to the PBS and released from hLysRS prior to annealing is unknown. Here, we show that hLysRS specifically binds to a tRNA anti-codon-like element (TLE) in the HIV-1 genome, which mimics the anti-codon loop of tRNA(Lys) and is located proximal to the PBS. Mutation of the U-rich sequence within the TLE attenuates binding of hLysRS in vitro and reduces the amount of annealed tRNA(Lys3) in virions. Thus, LysRS binds specifically to the TLE, which is part of a larger LysRS binding domain in the viral RNA that includes elements of the Psi packaging signal. Our results suggest that HIV-1 uses molecular mimicry of the anti-codon of tRNA(Lys) to increase the efficiency of tRNA(Lys3) annealing to viral RNA.

  10. Summary big data

    CERN Document Server

    2014-01-01

    This work offers a summary of Cukier the book: "Big Data: A Revolution That Will Transform How we Live, Work, and Think" by Viktor Mayer-Schonberg and Kenneth. Summary of the ideas in Viktor Mayer-Schonberg's and Kenneth Cukier's book: " Big Data " explains that big data is where we use huge quantities of data to make better predictions based on the fact we identify patters in the data rather than trying to understand the underlying causes in more detail. This summary highlights that big data will be a source of new economic value and innovation in the future. Moreover, it shows that it will

  11. Data: Big and Small.

    Science.gov (United States)

    Jones-Schenk, Jan

    2017-02-01

    Big data is a big topic in all leadership circles. Leaders in professional development must develop an understanding of what data are available across the organization that can inform effective planning for forecasting. Collaborating with others to integrate data sets can increase the power of prediction. Big data alone is insufficient to make big decisions. Leaders must find ways to access small data and triangulate multiple types of data to ensure the best decision making. J Contin Educ Nurs. 2017;48(2):60-61. Copyright 2017, SLACK Incorporated.

  12. A Big Video Manifesto

    DEFF Research Database (Denmark)

    Mcilvenny, Paul Bruce; Davidsen, Jacob

    2017-01-01

    and beautiful visualisations. However, we also need to ask what the tools of big data can do both for the Humanities and for more interpretative approaches and methods. Thus, we prefer to explore how the power of computation, new sensor technologies and massive storage can also help with video-based qualitative......For the last few years, we have witnessed a hype about the potential results and insights that quantitative big data can bring to the social sciences. The wonder of big data has moved into education, traffic planning, and disease control with a promise of making things better with big numbers...

  13. Big data integration: scalability and sustainability

    KAUST Repository

    Zhang, Zhang

    2016-01-26

    Integration of various types of omics data is critically indispensable for addressing most important and complex biological questions. In the era of big data, however, data integration becomes increasingly tedious, time-consuming and expensive, posing a significant obstacle to fully exploit the wealth of big biological data. Here we propose a scalable and sustainable architecture that integrates big omics data through community-contributed modules. Community modules are contributed and maintained by different committed groups and each module corresponds to a specific data type, deals with data collection, processing and visualization, and delivers data on-demand via web services. Based on this community-based architecture, we build Information Commons for Rice (IC4R; http://ic4r.org), a rice knowledgebase that integrates a variety of rice omics data from multiple community modules, including genome-wide expression profiles derived entirely from RNA-Seq data, resequencing-based genomic variations obtained from re-sequencing data of thousands of rice varieties, plant homologous genes covering multiple diverse plant species, post-translational modifications, rice-related literatures, and community annotations. Taken together, such architecture achieves integration of different types of data from multiple community-contributed modules and accordingly features scalable, sustainable and collaborative integration of big data as well as low costs for database update and maintenance, thus helpful for building IC4R into a comprehensive knowledgebase covering all aspects of rice data and beneficial for both basic and translational researches.

  14. Some experiences and opportunities for big data in translational research.

    Science.gov (United States)

    Chute, Christopher G; Ullman-Cullere, Mollie; Wood, Grant M; Lin, Simon M; He, Min; Pathak, Jyotishman

    2013-10-01

    Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of "big data." The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records.

  15. BigWig and BigBed: enabling browsing of large distributed datasets.

    Science.gov (United States)

    Kent, W J; Zweig, A S; Barber, G; Hinrichs, A S; Karolchik, D

    2010-09-01

    BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets. Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/. Source code for the creation and visualization software is freely available for non-commercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, implemented in C and supported on Linux. The UCSC Genome Browser is available at http://genome.ucsc.edu.

  16. Big Data in Caenorhabditis elegans: quo vadis?

    Science.gov (United States)

    Hutter, Harald; Moerman, Donald

    2015-11-05

    A clear definition of what constitutes "Big Data" is difficult to identify, but we find it most useful to define Big Data as a data collection that is complete. By this criterion, researchers on Caenorhabditis elegans have a long history of collecting Big Data, since the organism was selected with the idea of obtaining a complete biological description and understanding of development. The complete wiring diagram of the nervous system, the complete cell lineage, and the complete genome sequence provide a framework to phrase and test hypotheses. Given this history, it might be surprising that the number of "complete" data sets for this organism is actually rather small--not because of lack of effort, but because most types of biological experiments are not currently amenable to complete large-scale data collection. Many are also not inherently limited, so that it becomes difficult to even define completeness. At present, we only have partial data on mutated genes and their phenotypes, gene expression, and protein-protein interaction--important data for many biological questions. Big Data can point toward unexpected correlations, and these unexpected correlations can lead to novel investigations; however, Big Data cannot establish causation. As a result, there is much excitement about Big Data, but there is also a discussion on just what Big Data contributes to solving a biological problem. Because of its relative simplicity, C. elegans is an ideal test bed to explore this issue and at the same time determine what is necessary to build a multicellular organism from a single cell. © 2015 Hutter and Moerman. This article is distributed by The American Society for Cell Biology under license from the author(s). Two months after publication it is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  17. Bliver big data til big business?

    DEFF Research Database (Denmark)

    Ritter, Thomas

    2015-01-01

    Danmark har en digital infrastruktur, en registreringskultur og it-kompetente medarbejdere og kunder, som muliggør en førerposition, men kun hvis virksomhederne gør sig klar til næste big data-bølge.......Danmark har en digital infrastruktur, en registreringskultur og it-kompetente medarbejdere og kunder, som muliggør en førerposition, men kun hvis virksomhederne gør sig klar til næste big data-bølge....

  18. Dual of big bang and big crunch

    International Nuclear Information System (INIS)

    Bak, Dongsu

    2007-01-01

    Starting from the Janus solution and its gauge theory dual, we obtain the dual gauge theory description of the cosmological solution by the procedure of double analytic continuation. The coupling is driven either to zero or to infinity at the big-bang and big-crunch singularities, which are shown to be related by the S-duality symmetry. In the dual Yang-Mills theory description, these are nonsingular as the coupling goes to zero in the N=4 super Yang-Mills theory. The cosmological singularities simply signal the failure of the supergravity description of the full type IIB superstring theory

  19. Big Data and Neuroimaging.

    Science.gov (United States)

    Webb-Vargas, Yenny; Chen, Shaojie; Fisher, Aaron; Mejia, Amanda; Xu, Yuting; Crainiceanu, Ciprian; Caffo, Brian; Lindquist, Martin A

    2017-12-01

    Big Data are of increasing importance in a variety of areas, especially in the biosciences. There is an emerging critical need for Big Data tools and methods, because of the potential impact of advancements in these areas. Importantly, statisticians and statistical thinking have a major role to play in creating meaningful progress in this arena. We would like to emphasize this point in this special issue, as it highlights both the dramatic need for statistical input for Big Data analysis and for a greater number of statisticians working on Big Data problems. We use the field of statistical neuroimaging to demonstrate these points. As such, this paper covers several applications and novel methodological developments of Big Data tools applied to neuroimaging data.

  20. Facilitating Transfers

    DEFF Research Database (Denmark)

    Kjær, Poul F.

    to specific logics of temporalisation and spatial expansion of a diverse set of social processes in relation to, for example, the economy, politics, science and the mass media. On this background, the paper will more concretely develop a conceptual framework for classifying different contextual orders...... that the essential functional and normative purpose of regulatory governance is to facilitate, stabilise and justify the transfer of condensed social components (such as economic capital and products, political decisions, legal judgements, religious beliefs and scientific knowledge) from one social contexts...

  1. From Big Data to Big Business

    DEFF Research Database (Denmark)

    Lund Pedersen, Carsten

    2017-01-01

    Idea in Brief: Problem: There is an enormous profit potential for manufacturing firms in big data, but one of the key barriers to obtaining data-driven growth is the lack of knowledge about which capabilities are needed to extract value and profit from data. Solution: We (BDBB research group at C...

  2. Big data, big knowledge: big data for personalized healthcare.

    Science.gov (United States)

    Viceconti, Marco; Hunter, Peter; Hose, Rod

    2015-07-01

    The idea that the purely phenomenological knowledge that we can extract by analyzing large amounts of data can be useful in healthcare seems to contradict the desire of VPH researchers to build detailed mechanistic models for individual patients. But in practice no model is ever entirely phenomenological or entirely mechanistic. We propose in this position paper that big data analytics can be successfully combined with VPH technologies to produce robust and effective in silico medicine solutions. In order to do this, big data technologies must be further developed to cope with some specific requirements that emerge from this application. Such requirements are: working with sensitive data; analytics of complex and heterogeneous data spaces, including nontextual information; distributed data management under security and performance constraints; specialized analytics to integrate bioinformatics and systems biology information with clinical observations at tissue, organ and organisms scales; and specialized analytics to define the "physiological envelope" during the daily life of each patient. These domain-specific requirements suggest a need for targeted funding, in which big data technologies for in silico medicine becomes the research priority.

  3. Big data in oncologic imaging.

    Science.gov (United States)

    Regge, Daniele; Mazzetti, Simone; Giannini, Valentina; Bracco, Christian; Stasi, Michele

    2017-06-01

    Cancer is a complex disease and unfortunately understanding how the components of the cancer system work does not help understand the behavior of the system as a whole. In the words of the Greek philosopher Aristotle "the whole is greater than the sum of parts." To date, thanks to improved information technology infrastructures, it is possible to store data from each single cancer patient, including clinical data, medical images, laboratory tests, and pathological and genomic information. Indeed, medical archive storage constitutes approximately one-third of total global storage demand and a large part of the data are in the form of medical images. The opportunity is now to draw insight on the whole to the benefit of each individual patient. In the oncologic patient, big data analysis is at the beginning but several useful applications can be envisaged including development of imaging biomarkers to predict disease outcome, assessing the risk of X-ray dose exposure or of renal damage following the administration of contrast agents, and tracking and optimizing patient workflow. The aim of this review is to present current evidence of how big data derived from medical images may impact on the diagnostic pathway of the oncologic patient.

  4. Big data a primer

    CERN Document Server

    Bhuyan, Prachet; Chenthati, Deepak

    2015-01-01

    This book is a collection of chapters written by experts on various aspects of big data. The book aims to explain what big data is and how it is stored and used. The book starts from  the fundamentals and builds up from there. It is intended to serve as a review of the state-of-the-practice in the field of big data handling. The traditional framework of relational databases can no longer provide appropriate solutions for handling big data and making it available and useful to users scattered around the globe. The study of big data covers a wide range of issues including management of heterogeneous data, big data frameworks, change management, finding patterns in data usage and evolution, data as a service, service-generated data, service management, privacy and security. All of these aspects are touched upon in this book. It also discusses big data applications in different domains. The book will prove useful to students, researchers, and practicing database and networking engineers.

  5. Recht voor big data, big data voor recht

    NARCIS (Netherlands)

    Lafarre, Anne

    Big data is een niet meer weg te denken fenomeen in onze maatschappij. Het is de hype cycle voorbij en de eerste implementaties van big data-technieken worden uitgevoerd. Maar wat is nu precies big data? Wat houden de vijf V's in die vaak genoemd worden in relatie tot big data? Ter inleiding van

  6. Facilitating participation

    DEFF Research Database (Denmark)

    Skøtt, Bo

    2018-01-01

    the resulting need for a redefinition of library competence. In doing this, I primarily address the first two questions from Chapter 1 and how they relate to the public’s informal, leisure-time activities in a networked society. In particular, I focus on the skills of reflexive self-perception and informed...... opinion formation. Further, I point out the significance which these informal leisure-time activities have for public library staff’s cultural dissemination skills. In this way, I take on the question of the skills required for facilitating the learning of a participatory public (cf. Chapter 1......), exemplifying with the competence required of library staff. My discussion will proceed by way of a literature review. In the next section, I shall explain how and what sources were chosen and section three and four present the theoretical framework and how the applied theories are related. In the fifth section...

  7. Facilitating Transfers

    DEFF Research Database (Denmark)

    Kjær, Poul F.

    2018-01-01

    Departing from the paradox that globalisation has implied an increase, rather than a decrease, in contextual diversity, this paper re-assesses the function, normative purpose and location of Regulatory Governance Frameworks in world society. Drawing on insights from sociology of law and world...... society studies, the argument advanced is that Regulatory Governance Frameworks are oriented towards facilitating transfers of condensed social components, such as economic capital and products, legal acts, political decisions and scientific knowledge, from one legally-constituted normative order, i.......e. contextual setting, to another. Against this background, it is suggested that Regulatory Governance Frameworks can be understood as schemes which act as ‘rites of passage’ aimed at providing legal stabilisation to social processes characterised by liminality, i.e ambiguity, hybridity and in-betweenness....

  8. Assessing Big Data

    DEFF Research Database (Denmark)

    Leimbach, Timo; Bachlechner, Daniel

    2015-01-01

    In recent years, big data has been one of the most controversially discussed technologies in terms of its possible positive and negative impact. Therefore, the need for technology assessments is obvious. This paper first provides, based on the results of a technology assessment study, an overview...... of the potential and challenges associated with big data and then describes the problems experienced during the study as well as methods found helpful to address them. The paper concludes with reflections on how the insights from the technology assessment study may have an impact on the future governance of big...... data....

  9. Big bang nucleosynthesis

    International Nuclear Information System (INIS)

    Boyd, Richard N.

    2001-01-01

    The precision of measurements in modern cosmology has made huge strides in recent years, with measurements of the cosmic microwave background and the determination of the Hubble constant now rivaling the level of precision of the predictions of big bang nucleosynthesis. However, these results are not necessarily consistent with the predictions of the Standard Model of big bang nucleosynthesis. Reconciling these discrepancies may require extensions of the basic tenets of the model, and possibly of the reaction rates that determine the big bang abundances

  10. Big data for dummies

    CERN Document Server

    Hurwitz, Judith; Halper, Fern; Kaufman, Marcia

    2013-01-01

    Find the right big data solution for your business or organization Big data management is one of the major challenges facing business, industry, and not-for-profit organizations. Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you'll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You'll learn what it is, why it m

  11. Big Data, indispensable today

    Directory of Open Access Journals (Sweden)

    Radu-Ioan ENACHE

    2015-10-01

    Full Text Available Big data is and will be used more in the future as a tool for everything that happens both online and offline. Of course , online is a real hobbit, Big Data is found in this medium , offering many advantages , being a real help for all consumers. In this paper we talked about Big Data as being a plus in developing new applications, by gathering useful information about the users and their behaviour.We've also presented the key aspects of real-time monitoring and the architecture principles of this technology. The most important benefit brought to this paper is presented in the cloud section.

  12. Big Data in der Cloud

    DEFF Research Database (Denmark)

    Leimbach, Timo; Bachlechner, Daniel

    2014-01-01

    Technology assessment of big data, in particular cloud based big data services, for the Office for Technology Assessment at the German federal parliament (Bundestag)......Technology assessment of big data, in particular cloud based big data services, for the Office for Technology Assessment at the German federal parliament (Bundestag)...

  13. Small Big Data Congress 2017

    NARCIS (Netherlands)

    Doorn, J.

    2017-01-01

    TNO, in collaboration with the Big Data Value Center, presents the fourth Small Big Data Congress! Our congress aims at providing an overview of practical and innovative applications based on big data. Do you want to know what is happening in applied research with big data? And what can already be

  14. Cryptography for Big Data Security

    Science.gov (United States)

    2015-07-13

    Cryptography for Big Data Security Book Chapter for Big Data: Storage, Sharing, and Security (3S) Distribution A: Public Release Ariel Hamlin1 Nabil...Email: arkady@ll.mit.edu ii Contents 1 Cryptography for Big Data Security 1 1.1 Introduction...48 Chapter 1 Cryptography for Big Data Security 1.1 Introduction With the amount

  15. Big data opportunities and challenges

    CERN Document Server

    2014-01-01

    This ebook aims to give practical guidance for all those who want to understand big data better and learn how to make the most of it. Topics range from big data analysis, mobile big data and managing unstructured data to technologies, governance and intellectual property and security issues surrounding big data.

  16. The Study of “big data” to support internal business strategists

    Science.gov (United States)

    Ge, Mei

    2018-01-01

    How is big data different from previous data analysis systems? The primary purpose behind traditional small data analytics that all managers are more or less familiar with is to support internal business strategies. But big data also offers a promising new dimension: to discover new opportunities to offer customers high-value products and services. The study focus to introduce some strategists which big data support to. Business decisions using big data can also involve some areas for analytics. They include customer satisfaction, customer journeys, supply chains, risk management, competitive intelligence, pricing, discovery and experimentation or facilitating big data discovery.

  17. Big Data as Governmentality

    DEFF Research Database (Denmark)

    Flyverbom, Mikkel; Madsen, Anders Koed; Rasche, Andreas

    This paper conceptualizes how large-scale data and algorithms condition and reshape knowledge production when addressing international development challenges. The concept of governmentality and four dimensions of an analytics of government are proposed as a theoretical framework to examine how big...... data is constituted as an aspiration to improve the data and knowledge underpinning development efforts. Based on this framework, we argue that big data’s impact on how relevant problems are governed is enabled by (1) new techniques of visualizing development issues, (2) linking aspects...... shows that big data problematizes selected aspects of traditional ways to collect and analyze data for development (e.g. via household surveys). We also demonstrate that using big data analyses to address development challenges raises a number of questions that can deteriorate its impact....

  18. Big Data Revisited

    DEFF Research Database (Denmark)

    Kallinikos, Jannis; Constantiou, Ioanna

    2015-01-01

    We elaborate on key issues of our paper New games, new rules: big data and the changing context of strategy as a means of addressing some of the concerns raised by the paper’s commentators. We initially deal with the issue of social data and the role it plays in the current data revolution...... and the technological recording of facts. We further discuss the significance of the very mechanisms by which big data is produced as distinct from the very attributes of big data, often discussed in the literature. In the final section of the paper, we qualify the alleged importance of algorithms and claim...... that the structures of data capture and the architectures in which data generation is embedded are fundamental to the phenomenon of big data....

  19. The Big Bang Singularity

    Science.gov (United States)

    Ling, Eric

    The big bang theory is a model of the universe which makes the striking prediction that the universe began a finite amount of time in the past at the so called "Big Bang singularity." We explore the physical and mathematical justification of this surprising result. After laying down the framework of the universe as a spacetime manifold, we combine physical observations with global symmetrical assumptions to deduce the FRW cosmological models which predict a big bang singularity. Next we prove a couple theorems due to Stephen Hawking which show that the big bang singularity exists even if one removes the global symmetrical assumptions. Lastly, we investigate the conditions one needs to impose on a spacetime if one wishes to avoid a singularity. The ideas and concepts used here to study spacetimes are similar to those used to study Riemannian manifolds, therefore we compare and contrast the two geometries throughout.

  20. BigDansing

    KAUST Repository

    Khayyat, Zuhair; Ilyas, Ihab F.; Jindal, Alekh; Madden, Samuel; Ouzzani, Mourad; Papotti, Paolo; Quiané -Ruiz, Jorge-Arnulfo; Tang, Nan; Yin, Si

    2015-01-01

    of the underlying distributed platform. BigDansing takes these rules into a series of transformations that enable distributed computations and several optimizations, such as shared scans and specialized joins operators. Experimental results on both synthetic

  1. Boarding to Big data

    Directory of Open Access Journals (Sweden)

    Oana Claudia BRATOSIN

    2016-05-01

    Full Text Available Today Big data is an emerging topic, as the quantity of the information grows exponentially, laying the foundation for its main challenge, the value of the information. The information value is not only defined by the value extraction from huge data sets, as fast and optimal as possible, but also by the value extraction from uncertain and inaccurate data, in an innovative manner using Big data analytics. At this point, the main challenge of the businesses that use Big data tools is to clearly define the scope and the necessary output of the business so that the real value can be gained. This article aims to explain the Big data concept, its various classifications criteria, architecture, as well as the impact in the world wide processes.

  2. Big Creek Pit Tags

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The BCPITTAGS database is used to store data from an Oncorhynchus mykiss (steelhead/rainbow trout) population dynamics study in Big Creek, a coastal stream along the...

  3. Scaling Big Data Cleansing

    KAUST Repository

    Khayyat, Zuhair

    2017-01-01

    on top of general-purpose distributed platforms. Its programming inter- face allows users to express data quality rules independently from the requirements of parallel and distributed environments. Without sacrificing their quality, BigDans- ing also

  4. Reframing Open Big Data

    DEFF Research Database (Denmark)

    Marton, Attila; Avital, Michel; Jensen, Tina Blegind

    2013-01-01

    Recent developments in the techniques and technologies of collecting, sharing and analysing data are challenging the field of information systems (IS) research let alone the boundaries of organizations and the established practices of decision-making. Coined ‘open data’ and ‘big data......’, these developments introduce an unprecedented level of societal and organizational engagement with the potential of computational data to generate new insights and information. Based on the commonalities shared by open data and big data, we develop a research framework that we refer to as open big data (OBD......) by employing the dimensions of ‘order’ and ‘relationality’. We argue that these dimensions offer a viable approach for IS research on open and big data because they address one of the core value propositions of IS; i.e. how to support organizing with computational data. We contrast these dimensions with two...

  5. [Three applications and the challenge of the big data in otology].

    Science.gov (United States)

    Lei, Guanxiong; Li, Jianan; Shen, Weidong; Yang, Shiming

    2016-03-01

    With the expansion of human practical activities, more and more areas have suffered from big data problems. The emergence of big data requires people to update the research paradigm and develop new technical methods. This review discussed that big data might bring opportunities and challenges in the area of auditory implantation, the deafness genome, and auditory pathophysiology, and pointed out that we needed to find appropriate theories and methods to make this kind of expectation into reality.

  6. Spaces of genomics : exploring the innovation journey of genomics in research on common disease

    NARCIS (Netherlands)

    Bitsch, L.

    2013-01-01

    Genomics was introduced with big promises and expectations of its future contribution to our society. Medical genomics was introduced as that which would lay the foundation for a revolution in our management of common diseases. Genomics would lead the way towards a future of personalised medicine.

  7. Conociendo Big Data

    Directory of Open Access Journals (Sweden)

    Juan José Camargo-Vega

    2014-12-01

    Full Text Available Teniendo en cuenta la importancia que ha adquirido el término Big Data, la presente investigación buscó estudiar y analizar de manera exhaustiva el estado del arte del Big Data; además, y como segundo objetivo, analizó las características, las herramientas, las tecnologías, los modelos y los estándares relacionados con Big Data, y por último buscó identificar las características más relevantes en la gestión de Big Data, para que con ello se pueda conocer todo lo concerniente al tema central de la investigación.La metodología utilizada incluyó revisar el estado del arte de Big Data y enseñar su situación actual; conocer las tecnologías de Big Data; presentar algunas de las bases de datos NoSQL, que son las que permiten procesar datos con formatos no estructurados, y mostrar los modelos de datos y las tecnologías de análisis de ellos, para terminar con algunos beneficios de Big Data.El diseño metodológico usado para la investigación fue no experimental, pues no se manipulan variables, y de tipo exploratorio, debido a que con esta investigación se empieza a conocer el ambiente del Big Data.

  8. Big Bang baryosynthesis

    International Nuclear Information System (INIS)

    Turner, M.S.; Chicago Univ., IL

    1983-01-01

    In these lectures I briefly review Big Bang baryosynthesis. In the first lecture I discuss the evidence which exists for the BAU, the failure of non-GUT symmetrical cosmologies, the qualitative picture of baryosynthesis, and numerical results of detailed baryosynthesis calculations. In the second lecture I discuss the requisite CP violation in some detail, further the statistical mechanics of baryosynthesis, possible complications to the simplest scenario, and one cosmological implication of Big Bang baryosynthesis. (orig./HSI)

  9. Minsky on "Big Government"

    Directory of Open Access Journals (Sweden)

    Daniel de Santana Vasconcelos

    2014-03-01

    Full Text Available This paper objective is to assess, in light of the main works of Minsky, his view and analysis of what he called the "Big Government" as that huge institution which, in parallels with the "Big Bank" was capable of ensuring stability in the capitalist system and regulate its inherently unstable financial system in mid-20th century. In this work, we analyze how Minsky proposes an active role for the government in a complex economic system flawed by financial instability.

  10. Big data need big theory too.

    Science.gov (United States)

    Coveney, Peter V; Dougherty, Edward R; Highfield, Roger R

    2016-11-13

    The current interest in big data, machine learning and data analytics has generated the widespread impression that such methods are capable of solving most problems without the need for conventional scientific methods of inquiry. Interest in these methods is intensifying, accelerated by the ease with which digitized data can be acquired in virtually all fields of endeavour, from science, healthcare and cybersecurity to economics, social sciences and the humanities. In multiscale modelling, machine learning appears to provide a shortcut to reveal correlations of arbitrary complexity between processes at the atomic, molecular, meso- and macroscales. Here, we point out the weaknesses of pure big data approaches with particular focus on biology and medicine, which fail to provide conceptual accounts for the processes to which they are applied. No matter their 'depth' and the sophistication of data-driven methods, such as artificial neural nets, in the end they merely fit curves to existing data. Not only do these methods invariably require far larger quantities of data than anticipated by big data aficionados in order to produce statistically reliable results, but they can also fail in circumstances beyond the range of the data used to train them because they are not designed to model the structural characteristics of the underlying system. We argue that it is vital to use theory as a guide to experimental design for maximal efficiency of data collection and to produce reliable predictive models and conceptual knowledge. Rather than continuing to fund, pursue and promote 'blind' big data projects with massive budgets, we call for more funding to be allocated to the elucidation of the multiscale and stochastic processes controlling the behaviour of complex systems, including those of life, medicine and healthcare.This article is part of the themed issue 'Multiscale modelling at the physics-chemistry-biology interface'. © 2015 The Authors.

  11. Big data uncertainties.

    Science.gov (United States)

    Maugis, Pierre-André G

    2018-07-01

    Big data-the idea that an always-larger volume of information is being constantly recorded-suggests that new problems can now be subjected to scientific scrutiny. However, can classical statistical methods be used directly on big data? We analyze the problem by looking at two known pitfalls of big datasets. First, that they are biased, in the sense that they do not offer a complete view of the populations under consideration. Second, that they present a weak but pervasive level of dependence between all their components. In both cases we observe that the uncertainty of the conclusion obtained by statistical methods is increased when used on big data, either because of a systematic error (bias), or because of a larger degree of randomness (increased variance). We argue that the key challenge raised by big data is not only how to use big data to tackle new problems, but to develop tools and methods able to rigorously articulate the new risks therein. Copyright © 2016. Published by Elsevier Ltd.

  12. Mouse Genome Informatics (MGI)

    Data.gov (United States)

    U.S. Department of Health & Human Services — MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human...

  13. GRAFISK FACILITERING - En magtanalyse af styringen i konsulentværktøjet grafisk facilitering

    OpenAIRE

    Munch, Anna; Boholt, Marianne

    2012-01-01

    Abstract The topic of this thesis is the relatively new consultancy tool of graphic facilitation (GF). GF is a method that combines hand-drawn images and big picture thinking. A graphic facilitator leads a group through a process that results in visual output such as a poster or pamphlet. Our thesis analyses this management tool from a power perspective in an attempt to determine the power relations inherent in its practice. Our theoretical basis is French philosopher Michel Foucault’s theory...

  14. The Rise of Big Data in Oncology.

    Science.gov (United States)

    Fessele, Kristen L

    2018-05-01

    To describe big data and data science in the context of oncology nursing care. Peer-reviewed and lay publications. The rapid expansion of real-world evidence from sources such as the electronic health record, genomic sequencing, administrative claims and other data sources has outstripped the ability of clinicians and researchers to manually review and analyze it. To promote high-quality, high-value cancer care, big data platforms must be constructed from standardized data sources to support extraction of meaningful, comparable insights. Nurses must advocate for the use of standardized vocabularies and common data elements that represent terms and concepts that are meaningful to patient care. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. Using Globus to Transfer and Share Big Data | Poster

    Science.gov (United States)

    By Ashley DeVine, Staff Writer, and Mark Wance, Guest Writer; photo by Richard Frederickson, Staff Photographer Editor's note: This article was updated April 30, 2018. Transferring big data, such as the genomics data delivered to customers from the Center for Cancer Research Sequencing Facility (CCR SF), has been difficult in the past because the transfer systems have not kept

  16. The GEP: Crowd-Sourcing Big Data Analysis with Undergraduates.

    Science.gov (United States)

    Elgin, Sarah C R; Hauser, Charles; Holzen, Teresa M; Jones, Christopher; Kleinschmit, Adam; Leatherman, Judith

    2017-02-01

    The era of 'big data' is also the era of abundant data, creating new opportunities for student-scientist research partnerships. By coordinating undergraduate efforts, the Genomics Education Partnership produces high-quality annotated data sets and analyses that could not be generated otherwise, leading to scientific publications while providing many students with research experience. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. BigDansing

    KAUST Repository

    Khayyat, Zuhair

    2015-06-02

    Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling to big datasets. This presents a serious impediment since data cleansing often involves costly computations such as enumerating pairs of tuples, handling inequality joins, and dealing with user-defined functions. In this paper, we present BigDansing, a Big Data Cleansing system to tackle efficiency, scalability, and ease-of-use issues in data cleansing. The system can run on top of most common general purpose data processing platforms, ranging from DBMSs to MapReduce-like frameworks. A user-friendly programming interface allows users to express data quality rules both declaratively and procedurally, with no requirement of being aware of the underlying distributed platform. BigDansing takes these rules into a series of transformations that enable distributed computations and several optimizations, such as shared scans and specialized joins operators. Experimental results on both synthetic and real datasets show that BigDansing outperforms existing baseline systems up to more than two orders of magnitude without sacrificing the quality provided by the repair algorithms.

  18. Epstein-Barr virus nuclear antigen EBNA-LP is essential for transforming naïve B cells, and facilitates recruitment of transcription factors to the viral genome.

    Science.gov (United States)

    Szymula, Agnieszka; Palermo, Richard D; Bayoumy, Amr; Groves, Ian J; Ba Abdullah, Mohammed; Holder, Beth; White, Robert E

    2018-02-01

    The Epstein-Barr virus (EBV) nuclear antigen leader protein (EBNA-LP) is the first viral latency-associated protein produced after EBV infection of resting B cells. Its role in B cell transformation is poorly defined, but it has been reported to enhance gene activation by the EBV protein EBNA2 in vitro. We generated EBNA-LP knockout (LPKO) EBVs containing a STOP codon within each repeat unit of internal repeat 1 (IR1). EBNA-LP-mutant EBVs established lymphoblastoid cell lines (LCLs) from adult B cells at reduced efficiency, but not from umbilical cord B cells, which died approximately two weeks after infection. Adult B cells only established EBNA-LP-null LCLs with a memory (CD27+) phenotype. Quantitative PCR analysis of virus gene expression after infection identified both an altered ratio of the EBNA genes, and a dramatic reduction in transcript levels of both EBNA2-regulated virus genes (LMP1 and LMP2) and the EBNA2-independent EBER genes in the first 2 weeks. By 30 days post infection, LPKO transcription was the same as wild-type EBV. In contrast, EBNA2-regulated cellular genes were induced efficiently by LPKO viruses. Chromatin immunoprecipitation revealed that EBNA2 and the host transcription factors EBF1 and RBPJ were delayed in their recruitment to all viral latency promoters tested, whereas these same factors were recruited efficiently to several host genes, which exhibited increased EBNA2 recruitment. We conclude that EBNA-LP does not simply co-operate with EBNA2 in activating gene transcription, but rather facilitates the recruitment of several transcription factors to the viral genome, to enable transcription of virus latency genes. Additionally, our findings suggest that EBNA-LP is essential for the survival of EBV-infected naïve B cells.

  19. Epstein-Barr virus nuclear antigen EBNA-LP is essential for transforming naïve B cells, and facilitates recruitment of transcription factors to the viral genome

    Science.gov (United States)

    Szymula, Agnieszka; Palermo, Richard D.; Bayoumy, Amr; Groves, Ian J.

    2018-01-01

    The Epstein-Barr virus (EBV) nuclear antigen leader protein (EBNA-LP) is the first viral latency-associated protein produced after EBV infection of resting B cells. Its role in B cell transformation is poorly defined, but it has been reported to enhance gene activation by the EBV protein EBNA2 in vitro. We generated EBNA-LP knockout (LPKO) EBVs containing a STOP codon within each repeat unit of internal repeat 1 (IR1). EBNA-LP-mutant EBVs established lymphoblastoid cell lines (LCLs) from adult B cells at reduced efficiency, but not from umbilical cord B cells, which died approximately two weeks after infection. Adult B cells only established EBNA-LP-null LCLs with a memory (CD27+) phenotype. Quantitative PCR analysis of virus gene expression after infection identified both an altered ratio of the EBNA genes, and a dramatic reduction in transcript levels of both EBNA2-regulated virus genes (LMP1 and LMP2) and the EBNA2-independent EBER genes in the first 2 weeks. By 30 days post infection, LPKO transcription was the same as wild-type EBV. In contrast, EBNA2-regulated cellular genes were induced efficiently by LPKO viruses. Chromatin immunoprecipitation revealed that EBNA2 and the host transcription factors EBF1 and RBPJ were delayed in their recruitment to all viral latency promoters tested, whereas these same factors were recruited efficiently to several host genes, which exhibited increased EBNA2 recruitment. We conclude that EBNA-LP does not simply co-operate with EBNA2 in activating gene transcription, but rather facilitates the recruitment of several transcription factors to the viral genome, to enable transcription of virus latency genes. Additionally, our findings suggest that EBNA-LP is essential for the survival of EBV-infected naïve B cells. PMID:29462212

  20. Big data challenges

    DEFF Research Database (Denmark)

    Bachlechner, Daniel; Leimbach, Timo

    2016-01-01

    Although reports on big data success stories have been accumulating in the media, most organizations dealing with high-volume, high-velocity and high-variety information assets still face challenges. Only a thorough understanding of these challenges puts organizations into a position in which...... they can make an informed decision for or against big data, and, if the decision is positive, overcome the challenges smoothly. The combination of a series of interviews with leading experts from enterprises, associations and research institutions, and focused literature reviews allowed not only...... framework are also relevant. For large enterprises and startups specialized in big data, it is typically easier to overcome the challenges than it is for other enterprises and public administration bodies....

  1. Thick-Big Descriptions

    DEFF Research Database (Denmark)

    Lai, Signe Sophus

    The paper discusses the rewards and challenges of employing commercial audience measurements data – gathered by media industries for profitmaking purposes – in ethnographic research on the Internet in everyday life. It questions claims to the objectivity of big data (Anderson 2008), the assumption...... communication systems, language and behavior appear as texts, outputs, and discourses (data to be ‘found’) – big data then documents things that in earlier research required interviews and observations (data to be ‘made’) (Jensen 2014). However, web-measurement enterprises build audiences according...... to a commercial logic (boyd & Crawford 2011) and is as such directed by motives that call for specific types of sellable user data and specific segmentation strategies. In combining big data and ‘thick descriptions’ (Geertz 1973) scholars need to question how ethnographic fieldwork might map the ‘data not seen...

  2. Big data in biomedicine.

    Science.gov (United States)

    Costa, Fabricio F

    2014-04-01

    The increasing availability and growth rate of biomedical information, also known as 'big data', provides an opportunity for future personalized medicine programs that will significantly improve patient care. Recent advances in information technology (IT) applied to biomedicine are changing the landscape of privacy and personal information, with patients getting more control of their health information. Conceivably, big data analytics is already impacting health decisions and patient care; however, specific challenges need to be addressed to integrate current discoveries into medical practice. In this article, I will discuss the major breakthroughs achieved in combining omics and clinical health data in terms of their application to personalized medicine. I will also review the challenges associated with using big data in biomedicine and translational science. Copyright © 2013 Elsevier Ltd. All rights reserved.

  3. Big data bioinformatics.

    Science.gov (United States)

    Greene, Casey S; Tan, Jie; Ung, Matthew; Moore, Jason H; Cheng, Chao

    2014-12-01

    Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the "big data" era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both "machine learning" algorithms as well as "unsupervised" and "supervised" examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia. © 2014 Wiley Periodicals, Inc.

  4. TOWARDS EFFECTIVE CUSTOMER RELATIONSHIP MANAGEMENT IN OMAN: ROLE OF BIG DATA

    OpenAIRE

    Tarek Khalil; Mohammad Al-Refai; Amer Nizar Fayez; Mohammed Sharaf Qudah

    2017-01-01

    We established a framework to explore the feasibility of enabling big data within the customer relationship management (CRM) strategies in Oman for creating sustainable business profit nationwide. A qualitative evaluation was made based on predictive analytics convergence and big data facilitated CRM. It was found that the big data analytics can meticulously alter the competitive industrial setting, and thereby proffered notable benefits to the business organization in terms of operation, str...

  5. Big Java late objects

    CERN Document Server

    Horstmann, Cay S

    2012-01-01

    Big Java: Late Objects is a comprehensive introduction to Java and computer programming, which focuses on the principles of programming, software engineering, and effective learning. It is designed for a two-semester first course in programming for computer science students.

  6. Big ideas: innovation policy

    OpenAIRE

    John Van Reenen

    2011-01-01

    In the last CentrePiece, John Van Reenen stressed the importance of competition and labour market flexibility for productivity growth. His latest in CEP's 'big ideas' series describes the impact of research on how policy-makers can influence innovation more directly - through tax credits for business spending on research and development.

  7. Big Data ethics

    NARCIS (Netherlands)

    Zwitter, Andrej

    2014-01-01

    The speed of development in Big Data and associated phenomena, such as social media, has surpassed the capacity of the average consumer to understand his or her actions and their knock-on effects. We are moving towards changes in how ethics has to be perceived: away from individual decisions with

  8. Big data in history

    CERN Document Server

    Manning, Patrick

    2013-01-01

    Big Data in History introduces the project to create a world-historical archive, tracing the last four centuries of historical dynamics and change. Chapters address the archive's overall plan, how to interpret the past through a global archive, the missions of gathering records, linking local data into global patterns, and exploring the results.

  9. The Big Sky inside

    Science.gov (United States)

    Adams, Earle; Ward, Tony J.; Vanek, Diana; Marra, Nancy; Hester, Carolyn; Knuth, Randy; Spangler, Todd; Jones, David; Henthorn, Melissa; Hammill, Brock; Smith, Paul; Salisbury, Rob; Reckin, Gene; Boulafentis, Johna

    2009-01-01

    The University of Montana (UM)-Missoula has implemented a problem-based program in which students perform scientific research focused on indoor air pollution. The Air Toxics Under the Big Sky program (Jones et al. 2007; Adams et al. 2008; Ward et al. 2008) provides a community-based framework for understanding the complex relationship between poor…

  10. Moving Another Big Desk.

    Science.gov (United States)

    Fawcett, Gay

    1996-01-01

    New ways of thinking about leadership require that leaders move their big desks and establish environments that encourage trust and open communication. Educational leaders must trust their colleagues to make wise choices. When teachers are treated democratically as leaders, classrooms will also become democratic learning organizations. (SM)

  11. A Big Bang Lab

    Science.gov (United States)

    Scheider, Walter

    2005-01-01

    The February 2005 issue of The Science Teacher (TST) reminded everyone that by learning how scientists study stars, students gain an understanding of how science measures things that can not be set up in lab, either because they are too big, too far away, or happened in a very distant past. The authors of "How Far are the Stars?" show how the…

  12. New 'bigs' in cosmology

    International Nuclear Information System (INIS)

    Yurov, Artyom V.; Martin-Moruno, Prado; Gonzalez-Diaz, Pedro F.

    2006-01-01

    This paper contains a detailed discussion on new cosmic solutions describing the early and late evolution of a universe that is filled with a kind of dark energy that may or may not satisfy the energy conditions. The main distinctive property of the resulting space-times is that they make to appear twice the single singular events predicted by the corresponding quintessential (phantom) models in a manner which can be made symmetric with respect to the origin of cosmic time. Thus, big bang and big rip singularity are shown to take place twice, one on the positive branch of time and the other on the negative one. We have also considered dark energy and phantom energy accretion onto black holes and wormholes in the context of these new cosmic solutions. It is seen that the space-times of these holes would then undergo swelling processes leading to big trip and big hole events taking place on distinct epochs along the evolution of the universe. In this way, the possibility is considered that the past and future be connected in a non-paradoxical manner in the universes described by means of the new symmetric solutions

  13. The Big Bang

    CERN Multimedia

    Moods, Patrick

    2006-01-01

    How did the Universe begin? The favoured theory is that everything - space, time, matter - came into existence at the same moment, around 13.7 thousand million years ago. This event was scornfully referred to as the "Big Bang" by Sir Fred Hoyle, who did not believe in it and maintained that the Universe had always existed.

  14. Big Data Analytics

    Indian Academy of Sciences (India)

    The volume and variety of data being generated using computersis doubling every two years. It is estimated that in 2015,8 Zettabytes (Zetta=1021) were generated which consistedmostly of unstructured data such as emails, blogs, Twitter,Facebook posts, images, and videos. This is called big data. Itis possible to analyse ...

  15. Identifying Dwarfs Workloads in Big Data Analytics

    OpenAIRE

    Gao, Wanling; Luo, Chunjie; Zhan, Jianfeng; Ye, Hainan; He, Xiwen; Wang, Lei; Zhu, Yuqing; Tian, Xinhui

    2015-01-01

    Big data benchmarking is particularly important and provides applicable yardsticks for evaluating booming big data systems. However, wide coverage and great complexity of big data computing impose big challenges on big data benchmarking. How can we construct a benchmark suite using a minimum set of units of computation to represent diversity of big data analytics workloads? Big data dwarfs are abstractions of extracting frequently appearing operations in big data computing. One dwarf represen...

  16. A Survey of Scholarly Data: From Big Data Perspective

    DEFF Research Database (Denmark)

    Khan, Samiya; Liu, Xiufeng; Shakil, Kashish A.

    2017-01-01

    of which, this scholarly reserve is popularly referred to as big scholarly data. In order to facilitate data analytics for big scholarly data, architectures and services for the same need to be developed. The evolving nature of research problems has made them essentially interdisciplinary. As a result......, there is a growing demand for scholarly applications like collaborator discovery, expert finding and research recommendation systems, in addition to several others. This research paper investigates the current trends and identifies the existing challenges in development of a big scholarly data platform......Recently, there has been a shifting focus of organizations and governments towards digitization of academic and technical documents, adding a new facet to the concept of digital libraries. The volume, variety and velocity of this generated data, satisfies the big data definition, as a result...

  17. Big Data and Chemical Education

    Science.gov (United States)

    Pence, Harry E.; Williams, Antony J.

    2016-01-01

    The amount of computerized information that organizations collect and process is growing so large that the term Big Data is commonly being used to describe the situation. Accordingly, Big Data is defined by a combination of the Volume, Variety, Velocity, and Veracity of the data being processed. Big Data tools are already having an impact in…

  18. Scaling Big Data Cleansing

    KAUST Repository

    Khayyat, Zuhair

    2017-07-31

    Data cleansing approaches have usually focused on detecting and fixing errors with little attention to big data scaling. This presents a serious impediment since identify- ing and repairing dirty data often involves processing huge input datasets, handling sophisticated error discovery approaches and managing huge arbitrary errors. With large datasets, error detection becomes overly expensive and complicated especially when considering user-defined functions. Furthermore, a distinctive algorithm is de- sired to optimize inequality joins in sophisticated error discovery rather than na ̈ıvely parallelizing them. Also, when repairing large errors, their skewed distribution may obstruct effective error repairs. In this dissertation, I present solutions to overcome the above three problems in scaling data cleansing. First, I present BigDansing as a general system to tackle efficiency, scalability, and ease-of-use issues in data cleansing for Big Data. It automatically parallelizes the user’s code on top of general-purpose distributed platforms. Its programming inter- face allows users to express data quality rules independently from the requirements of parallel and distributed environments. Without sacrificing their quality, BigDans- ing also enables parallel execution of serial repair algorithms by exploiting the graph representation of discovered errors. The experimental results show that BigDansing outperforms existing baselines up to more than two orders of magnitude. Although BigDansing scales cleansing jobs, it still lacks the ability to handle sophisticated error discovery requiring inequality joins. Therefore, I developed IEJoin as an algorithm for fast inequality joins. It is based on sorted arrays and space efficient bit-arrays to reduce the problem’s search space. By comparing IEJoin against well- known optimizations, I show that it is more scalable, and several orders of magnitude faster. BigDansing depends on vertex-centric graph systems, i.e., Pregel

  19. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.

    Directory of Open Access Journals (Sweden)

    James B Pettengill

    Full Text Available The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis. In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples and computational (petabytes of sequence data issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST scheme. When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates there are features (e.g., genomic, assembly, and contamination that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.

  20. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.

    Science.gov (United States)

    Pettengill, James B; Pightling, Arthur W; Baugher, Joseph D; Rand, Hugh; Strain, Errol

    2016-01-01

    The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.

  1. Big Bang Tumor Growth and Clonal Evolution.

    Science.gov (United States)

    Sun, Ruping; Hu, Zheng; Curtis, Christina

    2018-05-01

    The advent and application of next-generation sequencing (NGS) technologies to tumor genomes has reinvigorated efforts to understand clonal evolution. Although tumor progression has traditionally been viewed as a gradual stepwise process, recent studies suggest that evolutionary rates in tumors can be variable with periods of punctuated mutational bursts and relative stasis. For example, Big Bang dynamics have been reported, wherein after transformation, growth occurs in the absence of stringent selection, consistent with effectively neutral evolution. Although first noted in colorectal tumors, effective neutrality may be relatively common. Additionally, punctuated evolution resulting from mutational bursts and cataclysmic genomic alterations have been described. In this review, we contrast these findings with the conventional gradualist view of clonal evolution and describe potential clinical and therapeutic implications of different evolutionary modes and tempos. Copyright © 2018 Cold Spring Harbor Laboratory Press; all rights reserved.

  2. Genome Maps, a new generation genome browser.

    Science.gov (United States)

    Medina, Ignacio; Salavert, Francisco; Sanchez, Rubén; de Maria, Alejandro; Alonso, Roberto; Escobar, Pablo; Bleda, Marta; Dopazo, Joaquín

    2013-07-01

    Genome browsers have gained importance as more genomes and related genomic information become available. However, the increase of information brought about by new generation sequencing technologies is, at the same time, causing a subtle but continuous decrease in the efficiency of conventional genome browsers. Here, we present Genome Maps, a genome browser that implements an innovative model of data transfer and management. The program uses highly efficient technologies from the new HTML5 standard, such as scalable vector graphics, that optimize workloads at both server and client sides and ensure future scalability. Thus, data management and representation are entirely carried out by the browser, without the need of any Java Applet, Flash or other plug-in technology installation. Relevant biological data on genes, transcripts, exons, regulatory features, single-nucleotide polymorphisms, karyotype and so forth, are imported from web services and are available as tracks. In addition, several DAS servers are already included in Genome Maps. As a novelty, this web-based genome browser allows the local upload of huge genomic data files (e.g. VCF or BAM) that can be dynamically visualized in real time at the client side, thus facilitating the management of medical data affected by privacy restrictions. Finally, Genome Maps can easily be integrated in any web application by including only a few lines of code. Genome Maps is an open source collaborative initiative available in the GitHub repository (https://github.com/compbio-bigdata-viz/genome-maps). Genome Maps is available at: http://www.genomemaps.org.

  3. How Big Are "Martin's Big Words"? Thinking Big about the Future.

    Science.gov (United States)

    Gardner, Traci

    "Martin's Big Words: The Life of Dr. Martin Luther King, Jr." tells of King's childhood determination to use "big words" through biographical information and quotations. In this lesson, students in grades 3 to 5 explore information on Dr. King to think about his "big" words, then they write about their own…

  4. Big Data-Survey

    Directory of Open Access Journals (Sweden)

    P.S.G. Aruna Sri

    2016-03-01

    Full Text Available Big data is the term for any gathering of information sets, so expensive and complex, that it gets to be hard to process for utilizing customary information handling applications. The difficulties incorporate investigation, catch, duration, inquiry, sharing, stockpiling, Exchange, perception, and protection infringement. To reduce spot business patterns, anticipate diseases, conflict etc., we require bigger data sets when compared with the smaller data sets. Enormous information is hard to work with utilizing most social database administration frameworks and desktop measurements and perception bundles, needing rather enormously parallel programming running on tens, hundreds, or even a large number of servers. In this paper there was an observation on Hadoop architecture, different tools used for big data and its security issues.

  5. Finding the big bang

    CERN Document Server

    Page, Lyman A; Partridge, R Bruce

    2009-01-01

    Cosmology, the study of the universe as a whole, has become a precise physical science, the foundation of which is our understanding of the cosmic microwave background radiation (CMBR) left from the big bang. The story of the discovery and exploration of the CMBR in the 1960s is recalled for the first time in this collection of 44 essays by eminent scientists who pioneered the work. Two introductory chapters put the essays in context, explaining the general ideas behind the expanding universe and fossil remnants from the early stages of the expanding universe. The last chapter describes how the confusion of ideas and measurements in the 1960s grew into the present tight network of tests that demonstrate the accuracy of the big bang theory. This book is valuable to anyone interested in how science is done, and what it has taught us about the large-scale nature of the physical universe.

  6. Big Data as Governmentality

    DEFF Research Database (Denmark)

    Flyverbom, Mikkel; Klinkby Madsen, Anders; Rasche, Andreas

    data is constituted as an aspiration to improve the data and knowledge underpinning development efforts. Based on this framework, we argue that big data’s impact on how relevant problems are governed is enabled by (1) new techniques of visualizing development issues, (2) linking aspects......This paper conceptualizes how large-scale data and algorithms condition and reshape knowledge production when addressing international development challenges. The concept of governmentality and four dimensions of an analytics of government are proposed as a theoretical framework to examine how big...... of international development agendas to algorithms that synthesize large-scale data, (3) novel ways of rationalizing knowledge claims that underlie development efforts, and (4) shifts in professional and organizational identities of those concerned with producing and processing data for development. Our discussion...

  7. Big nuclear accidents

    International Nuclear Information System (INIS)

    Marshall, W.; Billingon, D.E.; Cameron, R.F.; Curl, S.J.

    1983-09-01

    Much of the debate on the safety of nuclear power focuses on the large number of fatalities that could, in theory, be caused by extremely unlikely but just imaginable reactor accidents. This, along with the nuclear industry's inappropriate use of vocabulary during public debate, has given the general public a distorted impression of the risks of nuclear power. The paper reviews the way in which the probability and consequences of big nuclear accidents have been presented in the past and makes recommendations for the future, including the presentation of the long-term consequences of such accidents in terms of 'loss of life expectancy', 'increased chance of fatal cancer' and 'equivalent pattern of compulsory cigarette smoking'. The paper presents mathematical arguments, which show the derivation and validity of the proposed methods of presenting the consequences of imaginable big nuclear accidents. (author)

  8. Big Bounce and inhomogeneities

    International Nuclear Information System (INIS)

    Brizuela, David; Mena Marugan, Guillermo A; Pawlowski, Tomasz

    2010-01-01

    The dynamics of an inhomogeneous universe is studied with the methods of loop quantum cosmology, via a so-called hybrid quantization, as an example of the quantization of vacuum cosmological spacetimes containing gravitational waves (Gowdy spacetimes). The analysis of this model with an infinite number of degrees of freedom, performed at the effective level, shows that (i) the initial Big Bang singularity is replaced (as in the case of homogeneous cosmological models) by a Big Bounce, joining deterministically two large universes, (ii) the universe size at the bounce is at least of the same order of magnitude as that of the background homogeneous universe and (iii) for each gravitational wave mode, the difference in amplitude at very early and very late times has a vanishing statistical average when the bounce dynamics is strongly dominated by the inhomogeneities, whereas this average is positive when the dynamics is in a near-vacuum regime, so that statistically the inhomogeneities are amplified. (fast track communication)

  9. Big Data and reality

    Directory of Open Access Journals (Sweden)

    Ryan Shaw

    2015-11-01

    Full Text Available DNA sequencers, Twitter, MRIs, Facebook, particle accelerators, Google Books, radio telescopes, Tumblr: what do these things have in common? According to the evangelists of “data science,” all of these are instruments for observing reality at unprecedentedly large scales and fine granularities. This perspective ignores the social reality of these very different technological systems, ignoring how they are made, how they work, and what they mean in favor of an exclusive focus on what they generate: Big Data. But no data, big or small, can be interpreted without an understanding of the process that generated them. Statistical data science is applicable to systems that have been designed as scientific instruments, but is likely to lead to confusion when applied to systems that have not. In those cases, a historical inquiry is preferable.

  10. Really big numbers

    CERN Document Server

    Schwartz, Richard Evan

    2014-01-01

    In the American Mathematical Society's first-ever book for kids (and kids at heart), mathematician and author Richard Evan Schwartz leads math lovers of all ages on an innovative and strikingly illustrated journey through the infinite number system. By means of engaging, imaginative visuals and endearing narration, Schwartz manages the monumental task of presenting the complex concept of Big Numbers in fresh and relatable ways. The book begins with small, easily observable numbers before building up to truly gigantic ones, like a nonillion, a tredecillion, a googol, and even ones too huge for names! Any person, regardless of age, can benefit from reading this book. Readers will find themselves returning to its pages for a very long time, perpetually learning from and growing with the narrative as their knowledge deepens. Really Big Numbers is a wonderful enrichment for any math education program and is enthusiastically recommended to every teacher, parent and grandparent, student, child, or other individual i...

  11. Harnessing Big Data for Systems Pharmacology.

    Science.gov (United States)

    Xie, Lei; Draizen, Eli J; Bourne, Philip E

    2017-01-06

    Systems pharmacology aims to holistically understand mechanisms of drug actions to support drug discovery and clinical practice. Systems pharmacology modeling (SPM) is data driven. It integrates an exponentially growing amount of data at multiple scales (genetic, molecular, cellular, organismal, and environmental). The goal of SPM is to develop mechanistic or predictive multiscale models that are interpretable and actionable. The current explosions in genomics and other omics data, as well as the tremendous advances in big data technologies, have already enabled biologists to generate novel hypotheses and gain new knowledge through computational models of genome-wide, heterogeneous, and dynamic data sets. More work is needed to interpret and predict a drug response phenotype, which is dependent on many known and unknown factors. To gain a comprehensive understanding of drug actions, SPM requires close collaborations between domain experts from diverse fields and integration of heterogeneous models from biophysics, mathematics, statistics, machine learning, and semantic webs. This creates challenges in model management, model integration, model translation, and knowledge integration. In this review, we discuss several emergent issues in SPM and potential solutions using big data technology and analytics. The concurrent development of high-throughput techniques, cloud computing, data science, and the semantic web will likely allow SPM to be findable, accessible, interoperable, reusable, reliable, interpretable, and actionable.

  12. Big Bang Circus

    Science.gov (United States)

    Ambrosini, C.

    2011-06-01

    Big Bang Circus is an opera I composed in 2001 and which was premiered at the Venice Biennale Contemporary Music Festival in 2002. A chamber group, four singers and a ringmaster stage the story of the Universe confronting and interweaving two threads: how early man imagined it and how scientists described it. Surprisingly enough fancy, myths and scientific explanations often end up using the same images, metaphors and sometimes even words: a strong tension, a drumskin starting to vibrate, a shout…

  13. Big Bang 5

    CERN Document Server

    Apolin, Martin

    2007-01-01

    Physik soll verständlich sein und Spaß machen! Deshalb beginnt jedes Kapitel in Big Bang mit einem motivierenden Überblick und Fragestellungen und geht dann von den Grundlagen zu den Anwendungen, vom Einfachen zum Komplizierten. Dabei bleibt die Sprache einfach, alltagsorientiert und belletristisch. Der Band 5 RG behandelt die Grundlagen (Maßsystem, Größenordnungen) und die Mechanik (Translation, Rotation, Kraft, Erhaltungssätze).

  14. Big Bang 8

    CERN Document Server

    Apolin, Martin

    2008-01-01

    Physik soll verständlich sein und Spaß machen! Deshalb beginnt jedes Kapitel in Big Bang mit einem motivierenden Überblick und Fragestellungen und geht dann von den Grundlagen zu den Anwendungen, vom Einfachen zum Komplizierten. Dabei bleibt die Sprache einfach, alltagsorientiert und belletristisch. Band 8 vermittelt auf verständliche Weise Relativitätstheorie, Kern- und Teilchenphysik (und deren Anwendungen in der Kosmologie und Astrophysik), Nanotechnologie sowie Bionik.

  15. Big Bang 6

    CERN Document Server

    Apolin, Martin

    2008-01-01

    Physik soll verständlich sein und Spaß machen! Deshalb beginnt jedes Kapitel in Big Bang mit einem motivierenden Überblick und Fragestellungen und geht dann von den Grundlagen zu den Anwendungen, vom Einfachen zum Komplizierten. Dabei bleibt die Sprache einfach, alltagsorientiert und belletristisch. Der Band 6 RG behandelt die Gravitation, Schwingungen und Wellen, Thermodynamik und eine Einführung in die Elektrizität anhand von Alltagsbeispielen und Querverbindungen zu anderen Disziplinen.

  16. Big Bang 7

    CERN Document Server

    Apolin, Martin

    2008-01-01

    Physik soll verständlich sein und Spaß machen! Deshalb beginnt jedes Kapitel in Big Bang mit einem motivierenden Überblick und Fragestellungen und geht dann von den Grundlagen zu den Anwendungen, vom Einfachen zum Komplizierten. Dabei bleibt die Sprache einfach, alltagsorientiert und belletristisch. In Band 7 werden neben einer Einführung auch viele aktuelle Aspekte von Quantenmechanik (z. Beamen) und Elektrodynamik (zB Elektrosmog), sowie die Klimaproblematik und die Chaostheorie behandelt.

  17. Big Bang Darkleosynthesis

    OpenAIRE

    Krnjaic, Gordan; Sigurdson, Kris

    2014-01-01

    In a popular class of models, dark matter comprises an asymmetric population of composite particles with short range interactions arising from a confined nonabelian gauge group. We show that coupling this sector to a well-motivated light mediator particle yields efficient darkleosynthesis , a dark-sector version of big-bang nucleosynthesis (BBN), in generic regions of parameter space. Dark matter self-interaction bounds typically require the confinement scale to be above ΛQCD , which generica...

  18. Big³. Editorial.

    Science.gov (United States)

    Lehmann, C U; Séroussi, B; Jaulent, M-C

    2014-05-22

    To provide an editorial introduction into the 2014 IMIA Yearbook of Medical Informatics with an overview of the content, the new publishing scheme, and upcoming 25th anniversary. A brief overview of the 2014 special topic, Big Data - Smart Health Strategies, and an outline of the novel publishing model is provided in conjunction with a call for proposals to celebrate the 25th anniversary of the Yearbook. 'Big Data' has become the latest buzzword in informatics and promise new approaches and interventions that can improve health, well-being, and quality of life. This edition of the Yearbook acknowledges the fact that we just started to explore the opportunities that 'Big Data' will bring. However, it will become apparent to the reader that its pervasive nature has invaded all aspects of biomedical informatics - some to a higher degree than others. It was our goal to provide a comprehensive view at the state of 'Big Data' today, explore its strengths and weaknesses, as well as its risks, discuss emerging trends, tools, and applications, and stimulate the development of the field through the aggregation of excellent survey papers and working group contributions to the topic. For the first time in history will the IMIA Yearbook be published in an open access online format allowing a broader readership especially in resource poor countries. For the first time, thanks to the online format, will the IMIA Yearbook be published twice in the year, with two different tracks of papers. We anticipate that the important role of the IMIA yearbook will further increase with these changes just in time for its 25th anniversary in 2016.

  19. Recent big flare

    International Nuclear Information System (INIS)

    Moriyama, Fumio; Miyazawa, Masahide; Yamaguchi, Yoshisuke

    1978-01-01

    The features of three big solar flares observed at Tokyo Observatory are described in this paper. The active region, McMath 14943, caused a big flare on September 16, 1977. The flare appeared on both sides of a long dark line which runs along the boundary of the magnetic field. Two-ribbon structure was seen. The electron density of the flare observed at Norikura Corona Observatory was 3 x 10 12 /cc. Several arc lines which connect both bright regions of different magnetic polarity were seen in H-α monochrome image. The active region, McMath 15056, caused a big flare on December 10, 1977. At the beginning, several bright spots were observed in the region between two main solar spots. Then, the area and the brightness increased, and the bright spots became two ribbon-shaped bands. A solar flare was observed on April 8, 1978. At first, several bright spots were seen around the solar spot in the active region, McMath 15221. Then, these bright spots developed to a large bright region. On both sides of a dark line along the magnetic neutral line, bright regions were generated. These developed to a two-ribbon flare. The time required for growth was more than one hour. A bright arc which connects two ribbons was seen, and this arc may be a loop prominence system. (Kato, T.)

  20. Big Data Technologies

    Science.gov (United States)

    Bellazzi, Riccardo; Dagliati, Arianna; Sacchi, Lucia; Segagni, Daniele

    2015-01-01

    The so-called big data revolution provides substantial opportunities to diabetes management. At least 3 important directions are currently of great interest. First, the integration of different sources of information, from primary and secondary care to administrative information, may allow depicting a novel view of patient’s care processes and of single patient’s behaviors, taking into account the multifaceted nature of chronic care. Second, the availability of novel diabetes technologies, able to gather large amounts of real-time data, requires the implementation of distributed platforms for data analysis and decision support. Finally, the inclusion of geographical and environmental information into such complex IT systems may further increase the capability of interpreting the data gathered and extract new knowledge from them. This article reviews the main concepts and definitions related to big data, it presents some efforts in health care, and discusses the potential role of big data in diabetes care. Finally, as an example, it describes the research efforts carried on in the MOSAIC project, funded by the European Commission. PMID:25910540

  1. Mitochondrial Disease Sequence Data Resource (MSeqDR): A global grass-roots consortium to facilitate deposition, curation, annotation, and integrated analysis of genomic data for the mitochondrial disease clinical and research communities

    NARCIS (Netherlands)

    M.J. Falk (Marni J.); L. Shen (Lishuang); M. Gonzalez (Michael); J. Leipzig (Jeremy); M.T. Lott (Marie T.); A.P.M. Stassen (Alphons P.M.); M.A. Diroma (Maria Angela); D. Navarro-Gomez (Daniel); P. Yeske (Philip); R. Bai (Renkui); R.G. Boles (Richard G.); V. Brilhante (Virginia); D. Ralph (David); J.T. DaRe (Jeana T.); R. Shelton (Robert); S.F. Terry (Sharon); Z. Zhang (Zhe); W.C. Copeland (William C.); M. van Oven (Mannis); H. Prokisch (Holger); D.C. Wallace; M. Attimonelli (Marcella); D. Krotoski (Danuta); S. Zuchner (Stephan); X. Gai (Xiaowu); S. Bale (Sherri); J. Bedoyan (Jirair); D.M. Behar (Doron); P. Bonnen (Penelope); L. Brooks (Lisa); C. Calabrese (Claudia); S. Calvo (Sarah); P.F. Chinnery (Patrick); J. Christodoulou (John); D. Church (Deanna); R. Clima (Rosanna); B.H. Cohen (Bruce H.); R.G.H. Cotton (Richard); I.F.M. de Coo (René); O. Derbenevoa (Olga); J.T. den Dunnen (Johan); D. Dimmock (David); G. Enns (Gregory); G. Gasparre (Giuseppe); A. Goldstein (Amy); I. Gonzalez (Iris); K. Gwinn (Katrina); S. Hahn (Sihoun); R.H. Haas (Richard H.); H. Hakonarson (Hakon); M. Hirano (Michio); D. Kerr (Douglas); D. Li (Dong); M. Lvova (Maria); F. Macrae (Finley); D. Maglott (Donna); E. McCormick (Elizabeth); G. Mitchell (Grant); V.K. Mootha (Vamsi K.); Y. Okazaki (Yasushi); A. Pujol (Aurora); M. Parisi (Melissa); J.C. Perin (Juan Carlos); E.A. Pierce (Eric A.); V. Procaccio (Vincent); S. Rahman (Shamima); H. Reddi (Honey); H. Rehm (Heidi); E. Riggs (Erin); R.J.T. Rodenburg (Richard); Y. Rubinstein (Yaffa); R. Saneto (Russell); M. Santorsola (Mariangela); C. Scharfe (Curt); C. Sheldon (Claire); E.A. Shoubridge (Eric); D. Simone (Domenico); B. Smeets (Bert); J.A.M. Smeitink (Jan); C. Stanley (Christine); A. Suomalainen (Anu); M.A. Tarnopolsky (Mark); I. Thiffault (Isabelle); D.R. Thorburn (David R.); J.V. Hove (Johan Van); L. Wolfe (Lynne); L.-J. Wong (Lee-Jun)

    2015-01-01

    textabstractSuccess rates for genomic analyses of highly heterogeneous disorders can be greatly improved if a large cohort of patient data is assembled to enhance collective capabilities for accurate sequence variant annotation, analysis, and interpretation. Indeed, molecular diagnostics requires

  2. The NOAA Big Data Project

    Science.gov (United States)

    de la Beaujardiere, J.

    2015-12-01

    The US National Oceanic and Atmospheric Administration (NOAA) is a Big Data producer, generating tens of terabytes per day from hundreds of sensors on satellites, radars, aircraft, ships, and buoys, and from numerical models. These data are of critical importance and value for NOAA's mission to understand and predict changes in climate, weather, oceans, and coasts. In order to facilitate extracting additional value from this information, NOAA has established Cooperative Research and Development Agreements (CRADAs) with five Infrastructure-as-a-Service (IaaS) providers — Amazon, Google, IBM, Microsoft, Open Cloud Consortium — to determine whether hosting NOAA data in publicly-accessible Clouds alongside on-demand computational capability stimulates the creation of new value-added products and services and lines of business based on the data, and if the revenue generated by these new applications can support the costs of data transmission and hosting. Each IaaS provider is the anchor of a "Data Alliance" which organizations or entrepreneurs can join to develop and test new business or research avenues. This presentation will report on progress and lessons learned during the first 6 months of the 3-year CRADAs.

  3. Big bang and big crunch in matrix string theory

    OpenAIRE

    Bedford, J; Papageorgakis, C; Rodríguez-Gómez, D; Ward, J

    2007-01-01

    Following the holographic description of linear dilaton null Cosmologies with a Big Bang in terms of Matrix String Theory put forward by Craps, Sethi and Verlinde, we propose an extended background describing a Universe including both Big Bang and Big Crunch singularities. This belongs to a class of exact string backgrounds and is perturbative in the string coupling far away from the singularities, both of which can be resolved using Matrix String Theory. We provide a simple theory capable of...

  4. Big data and visual analytics in anaesthesia and health care.

    Science.gov (United States)

    Simpao, A F; Ahumada, L M; Rehman, M A

    2015-09-01

    Advances in computer technology, patient monitoring systems, and electronic health record systems have enabled rapid accumulation of patient data in electronic form (i.e. big data). Organizations such as the Anesthesia Quality Institute and Multicenter Perioperative Outcomes Group have spearheaded large-scale efforts to collect anaesthesia big data for outcomes research and quality improvement. Analytics--the systematic use of data combined with quantitative and qualitative analysis to make decisions--can be applied to big data for quality and performance improvements, such as predictive risk assessment, clinical decision support, and resource management. Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces, and it can facilitate performance of cognitive activities involving big data. Ongoing integration of big data and analytics within anaesthesia and health care will increase demand for anaesthesia professionals who are well versed in both the medical and the information sciences. © The Author 2015. Published by Oxford University Press on behalf of the British Journal of Anaesthesia. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. Vertical landscraping, a big regionalism for Dubai.

    Science.gov (United States)

    Wilson, Matthew

    2010-01-01

    Dubai's ecologic and economic complications are exacerbated by six years of accelerated expansion, a fixed top-down approach to urbanism and the construction of iconic single-phase mega-projects. With recent construction delays, project cancellations and growing landscape issues, Dubai's tower typologies have been unresponsive to changing environmental, socio-cultural and economic patterns (BBC, 2009; Gillet, 2009; Lewis, 2009). In this essay, a theory of "Big Regionalism" guides an argument for an economically and ecologically linked tower typology called the Condenser. This phased "box-to-tower" typology is part of a greater Landscape Urbanist strategy called Vertical Landscraping. Within this strategy, the Condenser's role is to densify the city, facilitating the creation of ecologic voids that order the urban region. Delineating "Big Regional" principles, the Condenser provides a time-based, global-local urban growth approach that weaves Bigness into a series of urban-regional, economic and ecological relationships, builds upon the environmental performance of the city's regional architecture and planning, promotes a continuity of Dubai's urban history, and responds to its landscape issues while condensing development. These speculations permit consideration of the overlooked opportunities embedded within Dubai's mega-projects and their long-term impact on the urban morphology.

  6. Asymmetric author-topic model for knowledge discovering of big data in toxicogenomics.

    Science.gov (United States)

    Chung, Ming-Hua; Wang, Yuping; Tang, Hailin; Zou, Wen; Basinger, John; Xu, Xiaowei; Tong, Weida

    2015-01-01

    The advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on keyword search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past 10 years, probabilistic topic modeling has been recognized as an effective machine learning algorithm to annotate the hidden thematic structure of massive collection of documents. The analogy between text corpus and large-scale genomic data enables the application of text mining tools, like probabilistic topic models, to explore hidden patterns of genomic data and to the extension of altered biological functions. In this paper, we developed a generalized probabilistic topic model to analyze a toxicogenomics dataset that consists of a large number of gene expression data from the rat livers treated with drugs in multiple dose and time-points. We discovered the hidden patterns in gene expression associated with the effect of doses and time-points of treatment. Finally, we illustrated the ability of our model to identify the evidence of potential reduction of animal use.

  7. Asymmetric author-topic model for knowledge discovering of big data in toxicogenomics

    Directory of Open Access Journals (Sweden)

    Ming-Hua eChung

    2015-04-01

    Full Text Available The advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on keyword search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past ten years, probabilistic topic modeling has been recognized as an effective machine learning algorithm to annotate the hidden thematic structure of massive collection of documents. The analogy between text corpus and large-scale genomic data enables the application of text mining tools, like probabilistic topic models, to explore hidden patterns of genomic data and to the extension of altered biological functions. In this paper, we developed a generalized probabilistic topic model to analyze a toxicogenomics dataset that consists of a large number of gene expression data from the rat livers treated with drugs in multiple dose and time-points. We discovered the hidden patterns in gene expression associated with the effect of doses and time-points of treatment. Finally, we illustrated the ability of our model to identify the evidence of potential reduction of animal use.

  8. Disaggregating asthma: Big investigation versus big data.

    Science.gov (United States)

    Belgrave, Danielle; Henderson, John; Simpson, Angela; Buchan, Iain; Bishop, Christopher; Custovic, Adnan

    2017-02-01

    We are facing a major challenge in bridging the gap between identifying subtypes of asthma to understand causal mechanisms and translating this knowledge into personalized prevention and management strategies. In recent years, "big data" has been sold as a panacea for generating hypotheses and driving new frontiers of health care; the idea that the data must and will speak for themselves is fast becoming a new dogma. One of the dangers of ready accessibility of health care data and computational tools for data analysis is that the process of data mining can become uncoupled from the scientific process of clinical interpretation, understanding the provenance of the data, and external validation. Although advances in computational methods can be valuable for using unexpected structure in data to generate hypotheses, there remains a need for testing hypotheses and interpreting results with scientific rigor. We argue for combining data- and hypothesis-driven methods in a careful synergy, and the importance of carefully characterized birth and patient cohorts with genetic, phenotypic, biological, and molecular data in this process cannot be overemphasized. The main challenge on the road ahead is to harness bigger health care data in ways that produce meaningful clinical interpretation and to translate this into better diagnoses and properly personalized prevention and treatment plans. There is a pressing need for cross-disciplinary research with an integrative approach to data science, whereby basic scientists, clinicians, data analysts, and epidemiologists work together to understand the heterogeneity of asthma. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Big data and educational research

    OpenAIRE

    Beneito-Montagut, Roser

    2017-01-01

    Big data and data analytics offer the promise to enhance teaching and learning, improve educational research and progress education governance. This chapter aims to contribute to the conceptual and methodological understanding of big data and analytics within educational research. It describes the opportunities and challenges that big data and analytics bring to education as well as critically explore the perils of applying a data driven approach to education. Despite the claimed value of the...

  10. The trashing of Big Green

    International Nuclear Information System (INIS)

    Felten, E.

    1990-01-01

    The Big Green initiative on California's ballot lost by a margin of 2-to-1. Green measures lost in five other states, shocking ecology-minded groups. According to the postmortem by environmentalists, Big Green was a victim of poor timing and big spending by the opposition. Now its supporters plan to break up the bill and try to pass some provisions in the Legislature

  11. Big data in fashion industry

    Science.gov (United States)

    Jain, S.; Bruniaux, J.; Zeng, X.; Bruniaux, P.

    2017-10-01

    Significant work has been done in the field of big data in last decade. The concept of big data includes analysing voluminous data to extract valuable information. In the fashion world, big data is increasingly playing a part in trend forecasting, analysing consumer behaviour, preference and emotions. The purpose of this paper is to introduce the term fashion data and why it can be considered as big data. It also gives a broad classification of the types of fashion data and briefly defines them. Also, the methodology and working of a system that will use this data is briefly described.

  12. Big Data Analytics An Overview

    Directory of Open Access Journals (Sweden)

    Jayshree Dwivedi

    2015-08-01

    Full Text Available Big data is a data beyond the storage capacity and beyond the processing power is called big data. Big data term is used for data sets its so large or complex that traditional data it involves data sets with sizes. Big data size is a constantly moving target year by year ranging from a few dozen terabytes to many petabytes of data means like social networking sites the amount of data produced by people is growing rapidly every year. Big data is not only a data rather it become a complete subject which includes various tools techniques and framework. It defines the epidemic possibility and evolvement of data both structured and unstructured. Big data is a set of techniques and technologies that require new forms of assimilate to uncover large hidden values from large datasets that are diverse complex and of a massive scale. It is difficult to work with using most relational database management systems and desktop statistics and visualization packages exacting preferably massively parallel software running on tens hundreds or even thousands of servers. Big data environment is used to grab organize and resolve the various types of data. In this paper we describe applications problems and tools of big data and gives overview of big data.

  13. Was there a big bang

    International Nuclear Information System (INIS)

    Narlikar, J.

    1981-01-01

    In discussing the viability of the big-bang model of the Universe relative evidence is examined including the discrepancies in the age of the big-bang Universe, the red shifts of quasars, the microwave background radiation, general theory of relativity aspects such as the change of the gravitational constant with time, and quantum theory considerations. It is felt that the arguments considered show that the big-bang picture is not as soundly established, either theoretically or observationally, as it is usually claimed to be, that the cosmological problem is still wide open and alternatives to the standard big-bang picture should be seriously investigated. (U.K.)

  14. How Big is Earth?

    Science.gov (United States)

    Thurber, Bonnie B.

    2015-08-01

    How Big is Earth celebrates the Year of Light. Using only the sunlight striking the Earth and a wooden dowel, students meet each other and then measure the circumference of the earth. Eratosthenes did it over 2,000 years ago. In Cosmos, Carl Sagan shared the process by which Eratosthenes measured the angle of the shadow cast at local noon when sunlight strikes a stick positioned perpendicular to the ground. By comparing his measurement to another made a distance away, Eratosthenes was able to calculate the circumference of the earth. How Big is Earth provides an online learning environment where students do science the same way Eratosthenes did. A notable project in which this was done was The Eratosthenes Project, conducted in 2005 as part of the World Year of Physics; in fact, we will be drawing on the teacher's guide developed by that project.How Big Is Earth? expands on the Eratosthenes project by providing an online learning environment provided by the iCollaboratory, www.icollaboratory.org, where teachers and students from Sweden, China, Nepal, Russia, Morocco, and the United States collaborate, share data, and reflect on their learning of science and astronomy. They are sharing their information and discussing their ideas/brainstorming the solutions in a discussion forum. There is an ongoing database of student measurements and another database to collect data on both teacher and student learning from surveys, discussions, and self-reflection done online.We will share our research about the kinds of learning that takes place only in global collaborations.The entrance address for the iCollaboratory is http://www.icollaboratory.org.

  15. Privacy and Big Data

    CERN Document Server

    Craig, Terence

    2011-01-01

    Much of what constitutes Big Data is information about us. Through our online activities, we leave an easy-to-follow trail of digital footprints that reveal who we are, what we buy, where we go, and much more. This eye-opening book explores the raging privacy debate over the use of personal data, with one undeniable conclusion: once data's been collected, we have absolutely no control over who uses it or how it is used. Personal data is the hottest commodity on the market today-truly more valuable than gold. We are the asset that every company, industry, non-profit, and government wants. Pri

  16. Visualizing big energy data

    DEFF Research Database (Denmark)

    Hyndman, Rob J.; Liu, Xueqin Amy; Pinson, Pierre

    2018-01-01

    Visualization is a crucial component of data analysis. It is always a good idea to plot the data before fitting models, making predictions, or drawing conclusions. As sensors of the electric grid are collecting large volumes of data from various sources, power industry professionals are facing th...... the challenge of visualizing such data in a timely fashion. In this article, we demonstrate several data-visualization solutions for big energy data through three case studies involving smart-meter data, phasor measurement unit (PMU) data, and probabilistic forecasts, respectively....

  17. Big Data Challenges

    Directory of Open Access Journals (Sweden)

    Alexandru Adrian TOLE

    2013-10-01

    Full Text Available The amount of data that is traveling across the internet today, not only that is large, but is complex as well. Companies, institutions, healthcare system etc., all of them use piles of data which are further used for creating reports in order to ensure continuity regarding the services that they have to offer. The process behind the results that these entities requests represents a challenge for software developers and companies that provide IT infrastructure. The challenge is how to manipulate an impressive volume of data that has to be securely delivered through the internet and reach its destination intact. This paper treats the challenges that Big Data creates.

  18. Big data naturally rescaled

    International Nuclear Information System (INIS)

    Stoop, Ruedi; Kanders, Karlis; Lorimer, Tom; Held, Jenny; Albert, Carlo

    2016-01-01

    We propose that a handle could be put on big data by looking at the systems that actually generate the data, rather than the data itself, realizing that there may be only few generic processes involved in this, each one imprinting its very specific structures in the space of systems, the traces of which translate into feature space. From this, we propose a practical computational clustering approach, optimized for coping with such data, inspired by how the human cortex is known to approach the problem.

  19. A Matrix Big Bang

    OpenAIRE

    Craps, Ben; Sethi, Savdeep; Verlinde, Erik

    2005-01-01

    The light-like linear dilaton background represents a particularly simple time-dependent 1/2 BPS solution of critical type IIA superstring theory in ten dimensions. Its lift to M-theory, as well as its Einstein frame metric, are singular in the sense that the geometry is geodesically incomplete and the Riemann tensor diverges along a light-like subspace of codimension one. We study this background as a model for a big bang type singularity in string theory/M-theory. We construct the dual Matr...

  20. Comparative Genome Viewer

    International Nuclear Information System (INIS)

    Molineris, I.; Sales, G.

    2009-01-01

    The amount of information about genomes, both in the form of complete sequences and annotations, has been exponentially increasing in the last few years. As a result there is the need for tools providing a graphical representation of such information that should be comprehensive and intuitive. Visual representation is especially important in the comparative genomics field since it should provide a combined view of data belonging to different genomes. We believe that existing tools are limited in this respect as they focus on a single genome at a time (conservation histograms) or compress alignment representation to a single dimension. We have therefore developed a web-based tool called Comparative Genome Viewer (Cgv): it integrates a bidimensional representation of alignments between two regions, both at small and big scales, with the richness of annotations present in other genome browsers. We give access to our system through a web-based interface that provides the user with an interactive representation that can be updated in real time using the mouse to move from region to region and to zoom in on interesting details.

  1. BIG Data - BIG Gains? Understanding the Link Between Big Data Analytics and Innovation

    OpenAIRE

    Niebel, Thomas; Rasel, Fabienne; Viete, Steffen

    2017-01-01

    This paper analyzes the relationship between firms’ use of big data analytics and their innovative performance for product innovations. Since big data technologies provide new data information practices, they create new decision-making possibilities, which firms can use to realize innovations. Applying German firm-level data we find suggestive evidence that big data analytics matters for the likelihood of becoming a product innovator as well as the market success of the firms’ product innovat...

  2. BIG data - BIG gains? Empirical evidence on the link between big data analytics and innovation

    OpenAIRE

    Niebel, Thomas; Rasel, Fabienne; Viete, Steffen

    2017-01-01

    This paper analyzes the relationship between firms’ use of big data analytics and their innovative performance in terms of product innovations. Since big data technologies provide new data information practices, they create novel decision-making possibilities, which are widely believed to support firms’ innovation process. Applying German firm-level data within a knowledge production function framework we find suggestive evidence that big data analytics is a relevant determinant for the likel...

  3. An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer.

    Science.gov (United States)

    Yang, Xi; Wu, Chengkun; Lu, Kai; Fang, Lin; Zhang, Yong; Li, Shengkang; Guo, Guixin; Du, YunFei

    2017-12-01

    Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion-a big data interface on the Tianhe-2 supercomputer-to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the "allocate-when-needed" paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2.

  4. [Big data in imaging].

    Science.gov (United States)

    Sewerin, Philipp; Ostendorf, Benedikt; Hueber, Axel J; Kleyer, Arnd

    2018-04-01

    Until now, most major medical advancements have been achieved through hypothesis-driven research within the scope of clinical trials. However, due to a multitude of variables, only a certain number of research questions could be addressed during a single study, thus rendering these studies expensive and time consuming. Big data acquisition enables a new data-based approach in which large volumes of data can be used to investigate all variables, thus opening new horizons. Due to universal digitalization of the data as well as ever-improving hard- and software solutions, imaging would appear to be predestined for such analyses. Several small studies have already demonstrated that automated analysis algorithms and artificial intelligence can identify pathologies with high precision. Such automated systems would also seem well suited for rheumatology imaging, since a method for individualized risk stratification has long been sought for these patients. However, despite all the promising options, the heterogeneity of the data and highly complex regulations covering data protection in Germany would still render a big data solution for imaging difficult today. Overcoming these boundaries is challenging, but the enormous potential advances in clinical management and science render pursuit of this goal worthwhile.

  5. Big bang nucleosynthesis

    International Nuclear Information System (INIS)

    Fields, Brian D.; Olive, Keith A.

    2006-01-01

    We present an overview of the standard model of big bang nucleosynthesis (BBN), which describes the production of the light elements in the early universe. The theoretical prediction for the abundances of D, 3 He, 4 He, and 7 Li is discussed. We emphasize the role of key nuclear reactions and the methods by which experimental cross section uncertainties are propagated into uncertainties in the predicted abundances. The observational determination of the light nuclides is also discussed. Particular attention is given to the comparison between the predicted and observed abundances, which yields a measurement of the cosmic baryon content. The spectrum of anisotropies in the cosmic microwave background (CMB) now independently measures the baryon density to high precision; we show how the CMB data test BBN, and find that the CMB and the D and 4 He observations paint a consistent picture. This concordance stands as a major success of the hot big bang. On the other hand, 7 Li remains discrepant with the CMB-preferred baryon density; possible explanations are reviewed. Finally, moving beyond the standard model, primordial nucleosynthesis constraints on early universe and particle physics are also briefly discussed

  6. HGVA: the Human Genome Variation Archive

    OpenAIRE

    Lopez, Javier; Coll, Jacobo; Haimel, Matthias; Kandasamy, Swaathi; Tarraga, Joaquin; Furio-Tari, Pedro; Bari, Wasim; Bleda, Marta; Rueda, Antonio; Gr?f, Stefan; Rendon, Augusto; Dopazo, Joaquin; Medina, Ignacio

    2017-01-01

    Abstract High-profile genomic variation projects like the 1000 Genomes project or the Exome Aggregation Consortium, are generating a wealth of human genomic variation knowledge which can be used as an essential reference for identifying disease-causing genotypes. However, accessing these data, contrasting the various studies and integrating those data in downstream analyses remains cumbersome. The Human Genome Variation Archive (HGVA) tackles these challenges and facilitates access to genomic...

  7. Nursing Needs Big Data and Big Data Needs Nursing.

    Science.gov (United States)

    Brennan, Patricia Flatley; Bakken, Suzanne

    2015-09-01

    Contemporary big data initiatives in health care will benefit from greater integration with nursing science and nursing practice; in turn, nursing science and nursing practice has much to gain from the data science initiatives. Big data arises secondary to scholarly inquiry (e.g., -omics) and everyday observations like cardiac flow sensors or Twitter feeds. Data science methods that are emerging ensure that these data be leveraged to improve patient care. Big data encompasses data that exceed human comprehension, that exist at a volume unmanageable by standard computer systems, that arrive at a velocity not under the control of the investigator and possess a level of imprecision not found in traditional inquiry. Data science methods are emerging to manage and gain insights from big data. The primary methods included investigation of emerging federal big data initiatives, and exploration of exemplars from nursing informatics research to benchmark where nursing is already poised to participate in the big data revolution. We provide observations and reflections on experiences in the emerging big data initiatives. Existing approaches to large data set analysis provide a necessary but not sufficient foundation for nursing to participate in the big data revolution. Nursing's Social Policy Statement guides a principled, ethical perspective on big data and data science. There are implications for basic and advanced practice clinical nurses in practice, for the nurse scientist who collaborates with data scientists, and for the nurse data scientist. Big data and data science has the potential to provide greater richness in understanding patient phenomena and in tailoring interventional strategies that are personalized to the patient. © 2015 Sigma Theta Tau International.

  8. Chromatin dynamics in genome stability

    DEFF Research Database (Denmark)

    Nair, Nidhi; Shoaib, Muhammad; Sørensen, Claus Storgaard

    2017-01-01

    Genomic DNA is compacted into chromatin through packaging with histone and non-histone proteins. Importantly, DNA accessibility is dynamically regulated to ensure genome stability. This is exemplified in the response to DNA damage where chromatin relaxation near genomic lesions serves to promote...... access of relevant enzymes to specific DNA regions for signaling and repair. Furthermore, recent data highlight genome maintenance roles of chromatin through the regulation of endogenous DNA-templated processes including transcription and replication. Here, we review research that shows the importance...... of chromatin structure regulation in maintaining genome integrity by multiple mechanisms including facilitating DNA repair and directly suppressing endogenous DNA damage....

  9. Using predictive analytics and big data to optimize pharmaceutical outcomes.

    Science.gov (United States)

    Hernandez, Inmaculada; Zhang, Yuting

    2017-09-15

    The steps involved, the resources needed, and the challenges associated with applying predictive analytics in healthcare are described, with a review of successful applications of predictive analytics in implementing population health management interventions that target medication-related patient outcomes. In healthcare, the term big data typically refers to large quantities of electronic health record, administrative claims, and clinical trial data as well as data collected from smartphone applications, wearable devices, social media, and personal genomics services; predictive analytics refers to innovative methods of analysis developed to overcome challenges associated with big data, including a variety of statistical techniques ranging from predictive modeling to machine learning to data mining. Predictive analytics using big data have been applied successfully in several areas of medication management, such as in the identification of complex patients or those at highest risk for medication noncompliance or adverse effects. Because predictive analytics can be used in predicting different outcomes, they can provide pharmacists with a better understanding of the risks for specific medication-related problems that each patient faces. This information will enable pharmacists to deliver interventions tailored to patients' needs. In order to take full advantage of these benefits, however, clinicians will have to understand the basics of big data and predictive analytics. Predictive analytics that leverage big data will become an indispensable tool for clinicians in mapping interventions and improving patient outcomes. Copyright © 2017 by the American Society of Health-System Pharmacists, Inc. All rights reserved.

  10. Genomic signal processing

    CERN Document Server

    Shmulevich, Ilya

    2007-01-01

    Genomic signal processing (GSP) can be defined as the analysis, processing, and use of genomic signals to gain biological knowledge, and the translation of that knowledge into systems-based applications that can be used to diagnose and treat genetic diseases. Situated at the crossroads of engineering, biology, mathematics, statistics, and computer science, GSP requires the development of both nonlinear dynamical models that adequately represent genomic regulation, and diagnostic and therapeutic tools based on these models. This book facilitates these developments by providing rigorous mathema

  11. Implementing genomics and pharmacogenomics in the clinic: The National Human Genome Research Institute's genomic medicine portfolio.

    Science.gov (United States)

    Manolio, Teri A

    2016-10-01

    Increasing knowledge about the influence of genetic variation on human health and growing availability of reliable, cost-effective genetic testing have spurred the implementation of genomic medicine in the clinic. As defined by the National Human Genome Research Institute (NHGRI), genomic medicine uses an individual's genetic information in his or her clinical care, and has begun to be applied effectively in areas such as cancer genomics, pharmacogenomics, and rare and undiagnosed diseases. In 2011 NHGRI published its strategic vision for the future of genomic research, including an ambitious research agenda to facilitate and promote the implementation of genomic medicine. To realize this agenda, NHGRI is consulting and facilitating collaborations with the external research community through a series of "Genomic Medicine Meetings," under the guidance and leadership of the National Advisory Council on Human Genome Research. These meetings have identified and begun to address significant obstacles to implementation, such as lack of evidence of efficacy, limited availability of genomics expertise and testing, lack of standards, and difficulties in integrating genomic results into electronic medical records. The six research and dissemination initiatives comprising NHGRI's genomic research portfolio are designed to speed the evaluation and incorporation, where appropriate, of genomic technologies and findings into routine clinical care. Actual adoption of successful approaches in clinical care will depend upon the willingness, interest, and energy of professional societies, practitioners, patients, and payers to promote their responsible use and share their experiences in doing so. Published by Elsevier Ireland Ltd.

  12. The Sequenced Angiosperm Genomes and Genome Databases.

    Science.gov (United States)

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  13. Reducing Racial Disparities in Breast Cancer Care: The Role of 'Big Data'.

    Science.gov (United States)

    Reeder-Hayes, Katherine E; Troester, Melissa A; Meyer, Anne-Marie

    2017-10-15

    Advances in a wide array of scientific technologies have brought data of unprecedented volume and complexity into the oncology research space. These novel big data resources are applied across a variety of contexts-from health services research using data from insurance claims, cancer registries, and electronic health records, to deeper and broader genomic characterizations of disease. Several forms of big data show promise for improving our understanding of racial disparities in breast cancer, and for powering more intelligent and far-reaching interventions to close the racial gap in breast cancer survival. In this article we introduce several major types of big data used in breast cancer disparities research, highlight important findings to date, and discuss how big data may transform breast cancer disparities research in ways that lead to meaningful, lifesaving changes in breast cancer screening and treatment. We also discuss key challenges that may hinder progress in using big data for cancer disparities research and quality improvement.

  14. BIG: a large-scale data integration tool for renal physiology.

    Science.gov (United States)

    Zhao, Yue; Yang, Chin-Rang; Raghuram, Viswanathan; Parulekar, Jaya; Knepper, Mark A

    2016-10-01

    Due to recent advances in high-throughput techniques, we and others have generated multiple proteomic and transcriptomic databases to describe and quantify gene expression, protein abundance, or cellular signaling on the scale of the whole genome/proteome in kidney cells. The existence of so much data from diverse sources raises the following question: "How can researchers find information efficiently for a given gene product over all of these data sets without searching each data set individually?" This is the type of problem that has motivated the "Big-Data" revolution in Data Science, which has driven progress in fields such as marketing. Here we present an online Big-Data tool called BIG (Biological Information Gatherer) that allows users to submit a single online query to obtain all relevant information from all indexed databases. BIG is accessible at http://big.nhlbi.nih.gov/.

  15. Was the big bang hot

    International Nuclear Information System (INIS)

    Wright, E.L.

    1983-01-01

    The author considers experiments to confirm the substantial deviations from a Planck curve in the Woody and Richards spectrum of the microwave background, and search for conducting needles in our galaxy. Spectral deviations and needle-shaped grains are expected for a cold Big Bang, but are not required by a hot Big Bang. (Auth.)

  16. Ethische aspecten van big data

    NARCIS (Netherlands)

    N. (Niek) van Antwerpen; Klaas Jan Mollema

    2017-01-01

    Big data heeft niet alleen geleid tot uitdagende technische vraagstukken, ook gaat het gepaard met allerlei nieuwe ethische en morele kwesties. Om verantwoord met big data om te gaan, moet ook over deze kwesties worden nagedacht. Want slecht datagebruik kan nadelige gevolgen hebben voor

  17. Fremtidens landbrug bliver big business

    DEFF Research Database (Denmark)

    Hansen, Henning Otte

    2016-01-01

    Landbrugets omverdensforhold og konkurrencevilkår ændres, og det vil nødvendiggøre en udvikling i retning af “big business“, hvor landbrugene bliver endnu større, mere industrialiserede og koncentrerede. Big business bliver en dominerende udvikling i dansk landbrug - men ikke den eneste...

  18. Human factors in Big Data

    NARCIS (Netherlands)

    Boer, J. de

    2016-01-01

    Since 2014 I am involved in various (research) projects that try to make the hype around Big Data more concrete and tangible for the industry and government. Big Data is about multiple sources of (real-time) data that can be analysed, transformed to information and be used to make 'smart' decisions.

  19. Passport to the Big Bang

    CERN Multimedia

    De Melis, Cinzia

    2013-01-01

    Le 2 juin 2013, le CERN inaugure le projet Passeport Big Bang lors d'un grand événement public. Affiche et programme. On 2 June 2013 CERN launches a scientific tourist trail through the Pays de Gex and the Canton of Geneva known as the Passport to the Big Bang. Poster and Programme.

  20. Between Two Fern Genomes

    Science.gov (United States)

    2014-01-01

    Ferns are the only major lineage of vascular plants not represented by a sequenced nuclear genome. This lack of genome sequence information significantly impedes our ability to understand and reconstruct genome evolution not only in ferns, but across all land plants. Azolla and Ceratopteris are ideal and complementary candidates to be the first ferns to have their nuclear genomes sequenced. They differ dramatically in genome size, life history, and habit, and thus represent the immense diversity of extant ferns. Together, this pair of genomes will facilitate myriad large-scale comparative analyses across ferns and all land plants. Here we review the unique biological characteristics of ferns and describe a number of outstanding questions in plant biology that will benefit from the addition of ferns to the set of taxa with sequenced nuclear genomes. We explain why the fern clade is pivotal for understanding genome evolution across land plants, and we provide a rationale for how knowledge of fern genomes will enable progress in research beyond the ferns themselves. PMID:25324969

  1. Fungal Genomics Program

    Energy Technology Data Exchange (ETDEWEB)

    Grigoriev, Igor

    2012-03-12

    The JGI Fungal Genomics Program aims to scale up sequencing and analysis of fungal genomes to explore the diversity of fungi important for energy and the environment, and to promote functional studies on a system level. Combining new sequencing technologies and comparative genomics tools, JGI is now leading the world in fungal genome sequencing and analysis. Over 120 sequenced fungal genomes with analytical tools are available via MycoCosm (www.jgi.doe.gov/fungi), a web-portal for fungal biologists. Our model of interacting with user communities, unique among other sequencing centers, helps organize these communities, improves genome annotation and analysis work, and facilitates new larger-scale genomic projects. This resulted in 20 high-profile papers published in 2011 alone and contributing to the Genomics Encyclopedia of Fungi, which targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts). Our next grand challenges include larger scale exploration of fungal diversity (1000 fungal genomes), developing molecular tools for DOE-relevant model organisms, and analysis of complex systems and metagenomes.

  2. Applications of Big Data in Education

    OpenAIRE

    Faisal Kalota

    2015-01-01

    Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners' needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in educa...

  3. Exploring complex and big data

    Directory of Open Access Journals (Sweden)

    Stefanowski Jerzy

    2017-12-01

    Full Text Available This paper shows how big data analysis opens a range of research and technological problems and calls for new approaches. We start with defining the essential properties of big data and discussing the main types of data involved. We then survey the dedicated solutions for storing and processing big data, including a data lake, virtual integration, and a polystore architecture. Difficulties in managing data quality and provenance are also highlighted. The characteristics of big data imply also specific requirements and challenges for data mining algorithms, which we address as well. The links with related areas, including data streams and deep learning, are discussed. The common theme that naturally emerges from this characterization is complexity. All in all, we consider it to be the truly defining feature of big data (posing particular research and technological challenges, which ultimately seems to be of greater importance than the sheer data volume.

  4. A Thousand Fly Genomes: An Expanded Drosophila Genome Nexus.

    Science.gov (United States)

    Lack, Justin B; Lange, Jeremy D; Tang, Alison D; Corbett-Detig, Russell B; Pool, John E

    2016-12-01

    The Drosophila Genome Nexus is a population genomic resource that provides D. melanogaster genomes from multiple sources. To facilitate comparisons across data sets, genomes are aligned using a common reference alignment pipeline which involves two rounds of mapping. Regions of residual heterozygosity, identity-by-descent, and recent population admixture are annotated to enable data filtering based on the user's needs. Here, we present a significant expansion of the Drosophila Genome Nexus, which brings the current data object to a total of 1,121 wild-derived genomes. New additions include 305 previously unpublished genomes from inbred lines representing six population samples in Egypt, Ethiopia, France, and South Africa, along with another 193 genomes added from recently-published data sets. We also provide an aligned D. simulans genome to facilitate divergence comparisons. This improved resource will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species. The larger set of genomes will also enhance the discovery of functionally relevant natural variation that exists within and between populations. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  5. The big data telescope

    International Nuclear Information System (INIS)

    Finkel, Elizabeth

    2017-01-01

    On a flat, red mulga plain in the outback of Western Australia, preparations are under way to build the most audacious telescope astronomers have ever dreamed of - the Square Kilometre Array (SKA). Next-generation telescopes usually aim to double the performance of their predecessors. The Australian arm of SKA will deliver a 168-fold leap on the best technology available today, to show us the universe as never before. It will tune into signals emitted just a million years after the Big Bang, when the universe was a sea of hydrogen gas, slowly percolating with the first galaxies. Their starlight illuminated the fledgling universe in what is referred to as the “cosmic dawn”.

  6. The Big Optical Array

    International Nuclear Information System (INIS)

    Mozurkewich, D.; Johnston, K.J.; Simon, R.S.

    1990-01-01

    This paper describes the design and the capabilities of the Naval Research Laboratory Big Optical Array (BOA), an interferometric optical array for high-resolution imaging of stars, stellar systems, and other celestial objects. There are four important differences between the BOA design and the design of Mark III Optical Interferometer on Mount Wilson (California). These include a long passive delay line which will be used in BOA to do most of the delay compensation, so that the fast delay line will have a very short travel; the beam combination in BOA will be done in triplets, to allow measurement of closure phase; the same light will be used for both star and fringe tracking; and the fringe tracker will use several wavelength channels

  7. Big nuclear accidents

    International Nuclear Information System (INIS)

    Marshall, W.

    1983-01-01

    Much of the debate on the safety of nuclear power focuses on the large number of fatalities that could, in theory, be caused by extremely unlikely but imaginable reactor accidents. This, along with the nuclear industry's inappropriate use of vocabulary during public debate, has given the general public a distorted impression of the safety of nuclear power. The way in which the probability and consequences of big nuclear accidents have been presented in the past is reviewed and recommendations for the future are made including the presentation of the long-term consequences of such accidents in terms of 'reduction in life expectancy', 'increased chance of fatal cancer' and the equivalent pattern of compulsory cigarette smoking. (author)

  8. Nonstandard big bang models

    International Nuclear Information System (INIS)

    Calvao, M.O.; Lima, J.A.S.

    1989-01-01

    The usual FRW hot big-bang cosmologies have been generalized by considering the equation of state ρ = Anm +(γ-1) -1 p, where m is the rest mass of the fluid particles and A is a dimensionless constant. Explicit analytic solutions are given for the flat case (ε=O). For large cosmological times these extended models behave as the standard Einstein-de Sitter universes regardless of the values of A and γ. Unlike the usual FRW flat case the deceleration parameter q is a time-dependent function and its present value, q≅ 1, obtained from the luminosity distance versus redshift relation, may be fitted by taking, for instance, A=1 and γ = 5/3 (monatomic relativistic gas with >> k B T). In all cases the universe cools obeying the same temperature law of the FRW models and it is shown that the age of the universe is only slightly modified. (author) [pt

  9. The Last Big Bang

    Energy Technology Data Exchange (ETDEWEB)

    McGuire, Austin D. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Meade, Roger Allen [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-09-13

    As one of the very few people in the world to give the “go/no go” decision to detonate a nuclear device, Austin “Mac” McGuire holds a very special place in the history of both the Los Alamos National Laboratory and the world. As Commander of Joint Task Force Unit 8.1.1, on Christmas Island in the spring and summer of 1962, Mac directed the Los Alamos data collection efforts for twelve of the last atmospheric nuclear detonations conducted by the United States. Since data collection was at the heart of nuclear weapon testing, it fell to Mac to make the ultimate decision to detonate each test device. He calls his experience THE LAST BIG BANG, since these tests, part of Operation Dominic, were characterized by the dramatic displays of the heat, light, and sounds unique to atmospheric nuclear detonations – never, perhaps, to be witnessed again.

  10. A matrix big bang

    International Nuclear Information System (INIS)

    Craps, Ben; Sethi, Savdeep; Verlinde, Erik

    2005-01-01

    The light-like linear dilaton background represents a particularly simple time-dependent 1/2 BPS solution of critical type-IIA superstring theory in ten dimensions. Its lift to M-theory, as well as its Einstein frame metric, are singular in the sense that the geometry is geodesically incomplete and the Riemann tensor diverges along a light-like subspace of codimension one. We study this background as a model for a big bang type singularity in string theory/M-theory. We construct the dual Matrix theory description in terms of a (1+1)-d supersymmetric Yang-Mills theory on a time-dependent world-sheet given by the Milne orbifold of (1+1)-d Minkowski space. Our model provides a framework in which the physics of the singularity appears to be under control

  11. A matrix big bang

    Energy Technology Data Exchange (ETDEWEB)

    Craps, Ben [Instituut voor Theoretische Fysica, Universiteit van Amsterdam, Valckenierstraat 65, 1018 XE Amsterdam (Netherlands); Sethi, Savdeep [Enrico Fermi Institute, University of Chicago, Chicago, IL 60637 (United States); Verlinde, Erik [Instituut voor Theoretische Fysica, Universiteit van Amsterdam, Valckenierstraat 65, 1018 XE Amsterdam (Netherlands)

    2005-10-15

    The light-like linear dilaton background represents a particularly simple time-dependent 1/2 BPS solution of critical type-IIA superstring theory in ten dimensions. Its lift to M-theory, as well as its Einstein frame metric, are singular in the sense that the geometry is geodesically incomplete and the Riemann tensor diverges along a light-like subspace of codimension one. We study this background as a model for a big bang type singularity in string theory/M-theory. We construct the dual Matrix theory description in terms of a (1+1)-d supersymmetric Yang-Mills theory on a time-dependent world-sheet given by the Milne orbifold of (1+1)-d Minkowski space. Our model provides a framework in which the physics of the singularity appears to be under control.

  12. Integrative Analysis of Omics Big Data.

    Science.gov (United States)

    Yu, Xiang-Tian; Zeng, Tao

    2018-01-01

    The diversity and huge omics data take biology and biomedicine research and application into a big data era, just like that popular in human society a decade ago. They are opening a new challenge from horizontal data ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data ensemble (e.g., the different types of data collected for a group of person with match information), which requires the integrative analysis in biology and biomedicine and also asks for emergent development of data integration to address the great changes from previous population-guided to newly individual-guided investigations.Data integration is an effective concept to solve the complex problem or understand the complicate system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproducible way. Current integration approaches on biological data have two modes: one is "bottom-up integration" mode with follow-up manual integration, and the other one is "top-down integration" mode with follow-up in silico integration.This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on biological experiment design for effectively integrative study on genomics and then survey the data fusion approaches to give helpful instruction on computational model development for biological significance detection, which have also provided newly data resources and analysis tools to support the precision medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted for integrative analysis of omics big data.

  13. DPF Big One

    International Nuclear Information System (INIS)

    Anon.

    1993-01-01

    At its latest venue at Fermilab from 10-14 November, the American Physical Society's Division of Particles and Fields meeting entered a new dimension. These regular meetings, which allow younger researchers to communicate with their peers, have been gaining popularity over the years (this was the seventh in the series), but nobody had expected almost a thousand participants and nearly 500 requests to give talks. Thus Fermilab's 800-seat auditorium had to be supplemented with another room with a video hookup, while the parallel sessions were organized into nine bewildering streams covering fourteen major physics topics. With the conventionality of the Standard Model virtually unchallenged, physics does not move fast these days. While most of the physics results had already been covered in principle at the International Conference on High Energy Physics held in Dallas in August (October, page 1), the Fermilab DPF meeting had a very different atmosphere. Major international meetings like Dallas attract big names from far and wide, and it is difficult in such an august atmosphere for young researchers to find a receptive audience. This was not the case at the DPF parallel sessions. The meeting also adopted a novel approach, with the parallels sandwiched between an initial day of plenaries to set the scene, and a final day of summaries. With the whole world waiting for the sixth ('top') quark to be discovered at Fermilab's Tevatron protonantiproton collider, the meeting began with updates from Avi Yagil and Ronald Madaras from the big detectors, CDF and DO respectively. Although rumours flew thick and fast, the Tevatron has not yet reached the top, although Yagil could show one intriguing event of a type expected from the heaviest quark

  14. DPF Big One

    Energy Technology Data Exchange (ETDEWEB)

    Anon.

    1993-01-15

    At its latest venue at Fermilab from 10-14 November, the American Physical Society's Division of Particles and Fields meeting entered a new dimension. These regular meetings, which allow younger researchers to communicate with their peers, have been gaining popularity over the years (this was the seventh in the series), but nobody had expected almost a thousand participants and nearly 500 requests to give talks. Thus Fermilab's 800-seat auditorium had to be supplemented with another room with a video hookup, while the parallel sessions were organized into nine bewildering streams covering fourteen major physics topics. With the conventionality of the Standard Model virtually unchallenged, physics does not move fast these days. While most of the physics results had already been covered in principle at the International Conference on High Energy Physics held in Dallas in August (October, page 1), the Fermilab DPF meeting had a very different atmosphere. Major international meetings like Dallas attract big names from far and wide, and it is difficult in such an august atmosphere for young researchers to find a receptive audience. This was not the case at the DPF parallel sessions. The meeting also adopted a novel approach, with the parallels sandwiched between an initial day of plenaries to set the scene, and a final day of summaries. With the whole world waiting for the sixth ('top') quark to be discovered at Fermilab's Tevatron protonantiproton collider, the meeting began with updates from Avi Yagil and Ronald Madaras from the big detectors, CDF and DO respectively. Although rumours flew thick and fast, the Tevatron has not yet reached the top, although Yagil could show one intriguing event of a type expected from the heaviest quark.

  15. A Grey Theory Based Approach to Big Data Risk Management Using FMEA

    Directory of Open Access Journals (Sweden)

    Maisa Mendonça Silva

    2016-01-01

    Full Text Available Big data is the term used to denote enormous sets of data that differ from other classic databases in four main ways: (huge volume, (high velocity, (much greater variety, and (big value. In general, data are stored in a distributed fashion and on computing nodes as a result of which big data may be more susceptible to attacks by hackers. This paper presents a risk model for big data, which comprises Failure Mode and Effects Analysis (FMEA and Grey Theory, more precisely grey relational analysis. This approach has several advantages: it provides a structured approach in order to incorporate the impact of big data risk factors; it facilitates the assessment of risk by breaking down the overall risk to big data; and finally its efficient evaluation criteria can help enterprises reduce the risks associated with big data. In order to illustrate the applicability of our proposal in practice, a numerical example, with realistic data based on expert knowledge, was developed. The numerical example analyzes four dimensions, that is, managing identification and access, registering the device and application, managing the infrastructure, and data governance, and 20 failure modes concerning the vulnerabilities of big data. The results show that the most important aspect of risk to big data relates to data governance.

  16. Big bang and big crunch in matrix string theory

    International Nuclear Information System (INIS)

    Bedford, J.; Ward, J.; Papageorgakis, C.; Rodriguez-Gomez, D.

    2007-01-01

    Following the holographic description of linear dilaton null cosmologies with a big bang in terms of matrix string theory put forward by Craps, Sethi, and Verlinde, we propose an extended background describing a universe including both big bang and big crunch singularities. This belongs to a class of exact string backgrounds and is perturbative in the string coupling far away from the singularities, both of which can be resolved using matrix string theory. We provide a simple theory capable of describing the complete evolution of this closed universe

  17. An optimal big data workflow for biomedical image analysis

    Directory of Open Access Journals (Sweden)

    Aurelle Tchagna Kouanou

    Full Text Available Background and objective: In the medical field, data volume is increasingly growing, and traditional methods cannot manage it efficiently. In biomedical computation, the continuous challenges are: management, analysis, and storage of the biomedical data. Nowadays, big data technology plays a significant role in the management, organization, and analysis of data, using machine learning and artificial intelligence techniques. It also allows a quick access to data using the NoSQL database. Thus, big data technologies include new frameworks to process medical data in a manner similar to biomedical images. It becomes very important to develop methods and/or architectures based on big data technologies, for a complete processing of biomedical image data. Method: This paper describes big data analytics for biomedical images, shows examples reported in the literature, briefly discusses new methods used in processing, and offers conclusions. We argue for adapting and extending related work methods in the field of big data software, using Hadoop and Spark frameworks. These provide an optimal and efficient architecture for biomedical image analysis. This paper thus gives a broad overview of big data analytics to automate biomedical image diagnosis. A workflow with optimal methods and algorithm for each step is proposed. Results: Two architectures for image classification are suggested. We use the Hadoop framework to design the first, and the Spark framework for the second. The proposed Spark architecture allows us to develop appropriate and efficient methods to leverage a large number of images for classification, which can be customized with respect to each other. Conclusions: The proposed architectures are more complete, easier, and are adaptable in all of the steps from conception. The obtained Spark architecture is the most complete, because it facilitates the implementation of algorithms with its embedded libraries. Keywords: Biomedical images, Big

  18. BigOP: Generating Comprehensive Big Data Workloads as a Benchmarking Framework

    OpenAIRE

    Zhu, Yuqing; Zhan, Jianfeng; Weng, Chuliang; Nambiar, Raghunath; Zhang, Jinchao; Chen, Xingzhen; Wang, Lei

    2014-01-01

    Big Data is considered proprietary asset of companies, organizations, and even nations. Turning big data into real treasure requires the support of big data systems. A variety of commercial and open source products have been unleashed for big data storage and processing. While big data users are facing the choice of which system best suits their needs, big data system developers are facing the question of how to evaluate their systems with regard to general big data processing needs. System b...

  19. The Storyboard's Big Picture

    Science.gov (United States)

    Malloy, Cheryl A.; Cooley, William

    2003-01-01

    At Science Applications International Corporation (SAIC), Cape Canaveral Office, we're using a project management tool that facilitates team communication, keeps our project team focused, streamlines work and identifies potential issues. What did it cost us to install the tool? Almost nothing.

  20. From big bang to big crunch and beyond

    International Nuclear Information System (INIS)

    Elitzur, Shmuel; Rabinovici, Eliezer; Giveon, Amit; Kutasov, David

    2002-01-01

    We study a quotient Conformal Field Theory, which describes a 3+1 dimensional cosmological spacetime. Part of this spacetime is the Nappi-Witten (NW) universe, which starts at a 'big bang' singularity, expands and then contracts to a 'big crunch' singularity at a finite time. The gauged WZW model contains a number of copies of the NW spacetime, with each copy connected to the preceding one and to the next one at the respective big bang/big crunch singularities. The sequence of NW spacetimes is further connected at the singularities to a series of non-compact static regions with closed timelike curves. These regions contain boundaries, on which the observables of the theory live. This suggests a holographic interpretation of the physics. (author)

  1. Boosting Big National Lab Data

    Energy Technology Data Exchange (ETDEWEB)

    Kleese van Dam, Kerstin [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2013-02-21

    Introduction: Big data. Love it or hate it, solving the world’s most intractable problems requires the ability to make sense of huge and complex sets of data and do it quickly. Speeding up the process – from hours to minutes or from weeks to days – is key to our success. One major source of such big data are physical experiments. As many will know, these physical experiments are commonly used to solve challenges in fields such as energy security, manufacturing, medicine, pharmacology, environmental protection and national security. Experiments use different instruments and sensor types to research for example the validity of new drugs, the base cause for diseases, more efficient energy sources, new materials for every day goods, effective methods for environmental cleanup, the optimal ingredients composition for chocolate or determine how to preserve valuable antics. This is done by experimentally determining the structure, properties and processes that govern biological systems, chemical processes and materials. The speed and quality at which we can acquire new insights from experiments directly influences the rate of scientific progress, industrial innovation and competitiveness. And gaining new groundbreaking insights, faster, is key to the economic success of our nations. Recent years have seen incredible advances in sensor technologies, from house size detector systems in large experiments such as the Large Hadron Collider and the ‘Eye of Gaia’ billion pixel camera detector to high throughput genome sequencing. These developments have led to an exponential increase in data volumes, rates and variety produced by instruments used for experimental work. This increase is coinciding with a need to analyze the experimental results at the time they are collected. This speed is required to optimize the data taking and quality, and also to enable new adaptive experiments, where the sample is manipulated as it is observed, e.g. a substance is injected into a

  2. Google BigQuery analytics

    CERN Document Server

    Tigani, Jordan

    2014-01-01

    How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addit

  3. Empathy and the Big Five

    OpenAIRE

    Paulus, Christoph

    2016-01-01

    Del Barrio et al. (2004) haben vor mehr als 10 Jahren versucht, eine direkte Beziehung zwischen Empathie und den Big Five herzustellen. Im Mittel hatten in ihrer Stichprobe Frauen höhere Werte in der Empathie und auf den Big Five-Faktoren mit Ausnahme des Faktors Neurotizismus. Zusammenhänge zu Empathie fanden sie in den Bereichen Offenheit, Verträglichkeit, Gewissenhaftigkeit und Extraversion. In unseren Daten besitzen Frauen sowohl in der Empathie als auch den Big Five signifikant höhere We...

  4. "Big data" in economic history.

    Science.gov (United States)

    Gutmann, Myron P; Merchant, Emily Klancher; Roberts, Evan

    2018-03-01

    Big data is an exciting prospect for the field of economic history, which has long depended on the acquisition, keying, and cleaning of scarce numerical information about the past. This article examines two areas in which economic historians are already using big data - population and environment - discussing ways in which increased frequency of observation, denser samples, and smaller geographic units allow us to analyze the past with greater precision and often to track individuals, places, and phenomena across time. We also explore promising new sources of big data: organically created economic data, high resolution images, and textual corpora.

  5. Big Data as Information Barrier

    Directory of Open Access Journals (Sweden)

    Victor Ya. Tsvetkov

    2014-07-01

    Full Text Available The article covers analysis of ‘Big Data’ which has been discussed over last 10 years. The reasons and factors for the issue are revealed. It has proved that the factors creating ‘Big Data’ issue has existed for quite a long time, and from time to time, would cause the informational barriers. Such barriers were successfully overcome through the science and technologies. The conducted analysis refers the “Big Data” issue to a form of informative barrier. This issue may be solved correctly and encourages development of scientific and calculating methods.

  6. Homogeneous and isotropic big rips?

    CERN Document Server

    Giovannini, Massimo

    2005-01-01

    We investigate the way big rips are approached in a fully inhomogeneous description of the space-time geometry. If the pressure and energy densities are connected by a (supernegative) barotropic index, the spatial gradients and the anisotropic expansion decay as the big rip is approached. This behaviour is contrasted with the usual big-bang singularities. A similar analysis is performed in the case of sudden (quiescent) singularities and it is argued that the spatial gradients may well be non-negligible in the vicinity of pressure singularities.

  7. Big Data in Space Science

    OpenAIRE

    Barmby, Pauline

    2018-01-01

    It seems like “big data” is everywhere these days. In planetary science and astronomy, we’ve been dealing with large datasets for a long time. So how “big” is our data? How does it compare to the big data that a bank or an airline might have? What new tools do we need to analyze big datasets, and how can we make better use of existing tools? What kinds of science problems can we address with these? I’ll address these questions with examples including ESA’s Gaia mission, ...

  8. Rate Change Big Bang Theory

    Science.gov (United States)

    Strickland, Ken

    2013-04-01

    The Rate Change Big Bang Theory redefines the birth of the universe with a dramatic shift in energy direction and a new vision of the first moments. With rate change graph technology (RCGT) we can look back 13.7 billion years and experience every step of the big bang through geometrical intersection technology. The analysis of the Big Bang includes a visualization of the first objects, their properties, the astounding event that created space and time as well as a solution to the mystery of anti-matter.

  9. The tiger genome and comparative analysis with lion and snow leopard genomes.

    Science.gov (United States)

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-Uk; Luo, Shu-Jin; Johnson, Warren E; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A; Marker, Laurie; Harper, Cindy; Miller, Susan M; Jacobs, Wilhelm; Bertola, Laura D; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O'Brien, Stephen J; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world's most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats' hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species.

  10. The tiger genome and comparative analysis with lion and snow leopard genomes

    Science.gov (United States)

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-uk; Luo, Shu-Jin; Johnson, Warren E.; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A.; Marker, Laurie; Harper, Cindy; Miller, Susan M.; Jacobs, Wilhelm; Bertola, Laura D.; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O’Brien, Stephen J.; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world’s most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats’ hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858

  11. BigData as a Driver for Capacity Building in Astrophysics

    Science.gov (United States)

    Shastri, Prajval

    2015-08-01

    Exciting public interest in astrophysics acquires new significance in the era of Big Data. Since Big Data involves advanced technologies of both software and hardware, astrophysics with Big Data has the potential to inspire young minds with diverse inclinations - i.e., not just those attracted to physics but also those pursuing engineering careers. Digital technologies have become steadily cheaper, which can enable expansion of the Big Data user pool considerably, especially to communities that may not yet be in the astrophysics mainstream, but have high potential because of access to thesetechnologies. For success, however, capacity building at the early stages becomes key. The development of on-line pedagogical resources in astrophysics, astrostatistics, data-mining and data visualisation that are designed around the big facilities of the future can be an important effort that drives such capacity building, especially if facilitated by the IAU.

  12. Big Data is invading big places as CERN

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Big Data technologies are becoming more popular with the constant grow of data generation in different fields such as social networks, internet of things and laboratories like CERN. How is CERN making use of such technologies? How machine learning is applied at CERN with Big Data technologies? How much data we move and how it is analyzed? All these questions will be answered during the talk.

  13. The ethics of big data in big agriculture

    OpenAIRE

    Carbonell (Isabelle M.)

    2016-01-01

    This paper examines the ethics of big data in agriculture, focusing on the power asymmetry between farmers and large agribusinesses like Monsanto. Following the recent purchase of Climate Corp., Monsanto is currently the most prominent biotech agribusiness to buy into big data. With wireless sensors on tractors monitoring or dictating every decision a farmer makes, Monsanto can now aggregate large quantities of previously proprietary farming data, enabling a privileged position with unique in...

  14. Big climate data analysis

    Science.gov (United States)

    Mudelsee, Manfred

    2015-04-01

    The Big Data era has begun also in the climate sciences, not only in economics or molecular biology. We measure climate at increasing spatial resolution by means of satellites and look farther back in time at increasing temporal resolution by means of natural archives and proxy data. We use powerful supercomputers to run climate models. The model output of the calculations made for the IPCC's Fifth Assessment Report amounts to ~650 TB. The 'scientific evolution' of grid computing has started, and the 'scientific revolution' of quantum computing is being prepared. This will increase computing power, and data amount, by several orders of magnitude in the future. However, more data does not automatically mean more knowledge. We need statisticians, who are at the core of transforming data into knowledge. Statisticians notably also explore the limits of our knowledge (uncertainties, that is, confidence intervals and P-values). Mudelsee (2014 Climate Time Series Analysis: Classical Statistical and Bootstrap Methods. Second edition. Springer, Cham, xxxii + 454 pp.) coined the term 'optimal estimation'. Consider the hyperspace of climate estimation. It has many, but not infinite, dimensions. It consists of the three subspaces Monte Carlo design, method and measure. The Monte Carlo design describes the data generating process. The method subspace describes the estimation and confidence interval construction. The measure subspace describes how to detect the optimal estimation method for the Monte Carlo experiment. The envisaged large increase in computing power may bring the following idea of optimal climate estimation into existence. Given a data sample, some prior information (e.g. measurement standard errors) and a set of questions (parameters to be estimated), the first task is simple: perform an initial estimation on basis of existing knowledge and experience with such types of estimation problems. The second task requires the computing power: explore the hyperspace to

  15. Hey, big spender

    Energy Technology Data Exchange (ETDEWEB)

    Cope, G.

    2000-04-01

    Business to business electronic commerce is looming large in the future of the oil industry. It is estimated that by adopting e-commerce the industry could achieve bottom line savings of between $1.8 to $ 3.4 billion a year on annual gross revenues in excess of $ 30 billion. At present there are several teething problems to overcome such as inter-operability standards, which are at least two or three years away. Tying in electronically with specific suppliers is also an expensive proposition, although the big benefits are in fact in doing business with the same suppliers on a continuing basis. Despite these problems, 14 of the world's largest energy and petrochemical companies joined forces in mid-April to create a single Internet procurement marketplace for the industry's complex supply chain. The exchange was designed by B2B (business-to-business) software provider, Commerce One Inc., ; it will leverage the buying clout of these industry giants (BP Amoco, Royal Dutch Shell Group, Conoco, Occidental Petroleum, Phillips Petroleum, Unocal Corporation and Statoil among them), currently about $ 125 billion on procurement per year; they hope to save between 5 to 30 per cent depending on the product and the region involved. Other similar schemes such as Chevron and partners' Petrocosm Marketplace, Network Oil, a Houston-based Internet portal aimed at smaller petroleum companies, are also doing business in the $ 10 billion per annum range. e-Energy, a cooperative project between IBM Ericson and Telus Advertising is another neutral, virtual marketplace targeted at the oil and gas sector. PetroTRAX, a Calgary-based website plans to take online procurement and auction sales a big step forward by establishing a portal to handle any oil company's asset management needs. There are also a number of websites targeting specific needs: IndigoPool.com (acquisitions and divestitures) and WellBid.com (products related to upstream oil and gas operators) are just

  16. Hey, big spender

    International Nuclear Information System (INIS)

    Cope, G.

    2000-01-01

    Business to business electronic commerce is looming large in the future of the oil industry. It is estimated that by adopting e-commerce the industry could achieve bottom line savings of between $1.8 to $ 3.4 billion a year on annual gross revenues in excess of $ 30 billion. At present there are several teething problems to overcome such as inter-operability standards, which are at least two or three years away. Tying in electronically with specific suppliers is also an expensive proposition, although the big benefits are in fact in doing business with the same suppliers on a continuing basis. Despite these problems, 14 of the world's largest energy and petrochemical companies joined forces in mid-April to create a single Internet procurement marketplace for the industry's complex supply chain. The exchange was designed by B2B (business-to-business) software provider, Commerce One Inc., ; it will leverage the buying clout of these industry giants (BP Amoco, Royal Dutch Shell Group, Conoco, Occidental Petroleum, Phillips Petroleum, Unocal Corporation and Statoil among them), currently about $ 125 billion on procurement per year; they hope to save between 5 to 30 per cent depending on the product and the region involved. Other similar schemes such as Chevron and partners' Petrocosm Marketplace, Network Oil, a Houston-based Internet portal aimed at smaller petroleum companies, are also doing business in the $ 10 billion per annum range. e-Energy, a cooperative project between IBM Ericson and Telus Advertising is another neutral, virtual marketplace targeted at the oil and gas sector. PetroTRAX, a Calgary-based website plans to take online procurement and auction sales a big step forward by establishing a portal to handle any oil company's asset management needs. There are also a number of websites targeting specific needs: IndigoPool.com (acquisitions and divestitures) and WellBid.com (products related to upstream oil and gas operators) are just two examples. All in

  17. Methods and tools for big data visualization

    OpenAIRE

    Zubova, Jelena; Kurasova, Olga

    2015-01-01

    In this paper, methods and tools for big data visualization have been investigated. Challenges faced by the big data analysis and visualization have been identified. Technologies for big data analysis have been discussed. A review of methods and tools for big data visualization has been done. Functionalities of the tools have been demonstrated by examples in order to highlight their advantages and disadvantages.

  18. Measuring the Promise of Big Data Syllabi

    Science.gov (United States)

    Friedman, Alon

    2018-01-01

    Growing interest in Big Data is leading industries, academics and governments to accelerate Big Data research. However, how teachers should teach Big Data has not been fully examined. This article suggests criteria for redesigning Big Data syllabi in public and private degree-awarding higher education establishments. The author conducted a survey…

  19. The BigBOSS Experiment

    Energy Technology Data Exchange (ETDEWEB)

    Schelgel, D.; Abdalla, F.; Abraham, T.; Ahn, C.; Allende Prieto, C.; Annis, J.; Aubourg, E.; Azzaro, M.; Bailey, S.; Baltay, C.; Baugh, C.; /APC, Paris /Brookhaven /IRFU, Saclay /Marseille, CPPM /Marseille, CPT /Durham U. / /IEU, Seoul /Fermilab /IAA, Granada /IAC, La Laguna

    2011-01-01

    BigBOSS will obtain observational constraints that will bear on three of the four 'science frontier' questions identified by the Astro2010 Cosmology and Fundamental Phyics Panel of the Decadal Survey: Why is the universe accelerating; what is dark matter and what are the properties of neutrinos? Indeed, the BigBOSS project was recommended for substantial immediate R and D support the PASAG report. The second highest ground-based priority from the Astro2010 Decadal Survey was the creation of a funding line within the NSF to support a 'Mid-Scale Innovations' program, and it used BigBOSS as a 'compelling' example for support. This choice was the result of the Decadal Survey's Program Priorization panels reviewing 29 mid-scale projects and recommending BigBOSS 'very highly'.

  20. Biophotonics: the big picture

    Science.gov (United States)

    Marcu, Laura; Boppart, Stephen A.; Hutchinson, Mark R.; Popp, Jürgen; Wilson, Brian C.

    2018-02-01

    The 5th International Conference on Biophotonics (ICOB) held April 30 to May 1, 2017, in Fremantle, Western Australia, brought together opinion leaders to discuss future directions for the field and opportunities to consider. The first session of the conference, "How to Set a Big Picture Biophotonics Agenda," was focused on setting the stage for developing a vision and strategies for translation and impact on society of biophotonic technologies. The invited speakers, panelists, and attendees engaged in discussions that focused on opportunities and promising applications for biophotonic techniques, challenges when working at the confluence of the physical and biological sciences, driving factors for advances of biophotonic technologies, and educational opportunities. We share a summary of the presentations and discussions. Three main themes from the conference are presented in this position paper that capture the current status, opportunities, challenges, and future directions of biophotonics research and key areas of applications: (1) biophotonics at the nano- to microscale level; (2) biophotonics at meso- to macroscale level; and (3) biophotonics and the clinical translation conundrum.

  1. Big bang darkleosynthesis

    Directory of Open Access Journals (Sweden)

    Gordan Krnjaic

    2015-12-01

    Full Text Available In a popular class of models, dark matter comprises an asymmetric population of composite particles with short range interactions arising from a confined nonabelian gauge group. We show that coupling this sector to a well-motivated light mediator particle yields efficient darkleosynthesis, a dark-sector version of big-bang nucleosynthesis (BBN, in generic regions of parameter space. Dark matter self-interaction bounds typically require the confinement scale to be above ΛQCD, which generically yields large (≫MeV/dark-nucleon binding energies. These bounds further suggest the mediator is relatively weakly coupled, so repulsive forces between dark-sector nuclei are much weaker than Coulomb repulsion between standard-model nuclei, which results in an exponential barrier-tunneling enhancement over standard BBN. Thus, darklei are easier to make and harder to break than visible species with comparable mass numbers. This process can efficiently yield a dominant population of states with masses significantly greater than the confinement scale and, in contrast to dark matter that is a fundamental particle, may allow the dominant form of dark matter to have high spin (S≫3/2, whose discovery would be smoking gun evidence for dark nuclei.

  2. Predicting big bang deuterium

    Energy Technology Data Exchange (ETDEWEB)

    Hata, N.; Scherrer, R.J.; Steigman, G.; Thomas, D.; Walker, T.P. [Department of Physics, Ohio State University, Columbus, Ohio 43210 (United States)

    1996-02-01

    We present new upper and lower bounds to the primordial abundances of deuterium and {sup 3}He based on observational data from the solar system and the interstellar medium. Independent of any model for the primordial production of the elements we find (at the 95{percent} C.L.): 1.5{times}10{sup {minus}5}{le}(D/H){sub {ital P}}{le}10.0{times}10{sup {minus}5} and ({sup 3}He/H){sub {ital P}}{le}2.6{times}10{sup {minus}5}. When combined with the predictions of standard big bang nucleosynthesis, these constraints lead to a 95{percent} C.L. bound on the primordial abundance deuterium: (D/H){sub best}=(3.5{sup +2.7}{sub {minus}1.8}){times}10{sup {minus}5}. Measurements of deuterium absorption in the spectra of high-redshift QSOs will directly test this prediction. The implications of this prediction for the primordial abundances of {sup 4}He and {sup 7}Li are discussed, as well as those for the universal density of baryons. {copyright} {ital 1996 The American Astronomical Society.}

  3. Big bang darkleosynthesis

    Science.gov (United States)

    Krnjaic, Gordan; Sigurdson, Kris

    2015-12-01

    In a popular class of models, dark matter comprises an asymmetric population of composite particles with short range interactions arising from a confined nonabelian gauge group. We show that coupling this sector to a well-motivated light mediator particle yields efficient darkleosynthesis, a dark-sector version of big-bang nucleosynthesis (BBN), in generic regions of parameter space. Dark matter self-interaction bounds typically require the confinement scale to be above ΛQCD, which generically yields large (≫MeV /dark-nucleon) binding energies. These bounds further suggest the mediator is relatively weakly coupled, so repulsive forces between dark-sector nuclei are much weaker than Coulomb repulsion between standard-model nuclei, which results in an exponential barrier-tunneling enhancement over standard BBN. Thus, darklei are easier to make and harder to break than visible species with comparable mass numbers. This process can efficiently yield a dominant population of states with masses significantly greater than the confinement scale and, in contrast to dark matter that is a fundamental particle, may allow the dominant form of dark matter to have high spin (S ≫ 3 / 2), whose discovery would be smoking gun evidence for dark nuclei.

  4. The role of big laboratories

    CERN Document Server

    Heuer, Rolf-Dieter

    2013-01-01

    This paper presents the role of big laboratories in their function as research infrastructures. Starting from the general definition and features of big laboratories, the paper goes on to present the key ingredients and issues, based on scientific excellence, for the successful realization of large-scale science projects at such facilities. The paper concludes by taking the example of scientific research in the field of particle physics and describing the structures and methods required to be implemented for the way forward.

  5. The role of big laboratories

    International Nuclear Information System (INIS)

    Heuer, R-D

    2013-01-01

    This paper presents the role of big laboratories in their function as research infrastructures. Starting from the general definition and features of big laboratories, the paper goes on to present the key ingredients and issues, based on scientific excellence, for the successful realization of large-scale science projects at such facilities. The paper concludes by taking the example of scientific research in the field of particle physics and describing the structures and methods required to be implemented for the way forward. (paper)

  6. Challenges of Big Data Analysis.

    Science.gov (United States)

    Fan, Jianqing; Han, Fang; Liu, Han

    2014-06-01

    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

  7. Generalized formal model of Big Data

    OpenAIRE

    Shakhovska, N.; Veres, O.; Hirnyak, M.

    2016-01-01

    This article dwells on the basic characteristic features of the Big Data technologies. It is analyzed the existing definition of the “big data” term. The article proposes and describes the elements of the generalized formal model of big data. It is analyzed the peculiarities of the application of the proposed model components. It is described the fundamental differences between Big Data technology and business analytics. Big Data is supported by the distributed file system Google File System ...

  8. Visual explorer facilitator's guide

    CERN Document Server

    Palus, Charles J

    2010-01-01

    Grounded in research and practice, the Visual Explorer™ Facilitator's Guide provides a method for supporting collaborative, creative conversations about complex issues through the power of images. The guide is available as a component in the Visual Explorer Facilitator's Letter-sized Set, Visual Explorer Facilitator's Post card-sized Set, Visual Explorer Playing Card-sized Set, and is also available as a stand-alone title for purchase to assist multiple tool users in an organization.

  9. Learning facilitating leadership

    DEFF Research Database (Denmark)

    Rasmussen, Lauge Baungaard; Hansen, Mette Sanne

    2016-01-01

    This paper explains how engineering students at a Danish university acquired the necessary skills to become emergent facilitators of organisational development. The implications of this approach are discussed and related to relevant viewpoints and findings in the literature. The methodology deplo....... By connecting the literature, the authors’ and engineering students’ reflections on facilitator skills, this paper adds value to existing academic and practical discussions on learning facilitating leadership....

  10. The perennial ryegrass GenomeZipper: targeted use of genome resources for comparative grass genomics.

    Science.gov (United States)

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F X; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-02-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.

  11. Containers, facilitators, innovators?

    DEFF Research Database (Denmark)

    Makkonen, Teemu; Merisalo, Maria; Inkinen, Tommi

    2018-01-01

    : are they containers, facilitators or innovators? This is investigated here through empirical material derived from 27 interviews with top departmental management in three Finnish cities (Helsinki, Espoo and Vantaa). The results show that local city governments (LCGs) consider cities as facilitators of innovation...

  12. Training facilitators and supervisors

    DEFF Research Database (Denmark)

    Kjær, Louise Binow; O Connor, Maja; Krogh, Kristian

    At the Master’s program in Medicine at Aarhus University, Denmark, we have developed a faculty development program for facilitators and supervisors in 4 progressing student modules in communication, cooperation, and leadership. 1) A course for module 1 and 3 facilitators inspired by the apprentic...

  13. Managing Variant Calling Files the Big Data Way: Using HDFS and Apache Parquet

    NARCIS (Netherlands)

    Boufea, Aikaterini; Finkers, H.J.; Kaauwen, van M.P.W.; Kramer, M.R.; Athanasiadis, I.N.

    2017-01-01

    Big Data has been seen as a remedy for the efficient management of the ever-increasing genomic data. In this paper, we investigate the use of Apache Spark to store and process Variant Calling Files (VCF) on a Hadoop cluster. We demonstrate Tomatula, a software tool for converting VCF files to Apache

  14. [Big data in official statistics].

    Science.gov (United States)

    Zwick, Markus

    2015-08-01

    The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.

  15. GEOSS: Addressing Big Data Challenges

    Science.gov (United States)

    Nativi, S.; Craglia, M.; Ochiai, O.

    2014-12-01

    In the sector of Earth Observation, the explosion of data is due to many factors including: new satellite constellations, the increased capabilities of sensor technologies, social media, crowdsourcing, and the need for multidisciplinary and collaborative research to face Global Changes. In this area, there are many expectations and concerns about Big Data. Vendors have attempted to use this term for their commercial purposes. It is necessary to understand whether Big Data is a radical shift or an incremental change for the existing digital infrastructures. This presentation tries to explore and discuss the impact of Big Data challenges and new capabilities on the Global Earth Observation System of Systems (GEOSS) and particularly on its common digital infrastructure called GCI. GEOSS is a global and flexible network of content providers allowing decision makers to access an extraordinary range of data and information at their desk. The impact of the Big Data dimensionalities (commonly known as 'V' axes: volume, variety, velocity, veracity, visualization) on GEOSS is discussed. The main solutions and experimentation developed by GEOSS along these axes are introduced and analyzed. GEOSS is a pioneering framework for global and multidisciplinary data sharing in the Earth Observation realm; its experience on Big Data is valuable for the many lessons learned.

  16. Official statistics and Big Data

    Directory of Open Access Journals (Sweden)

    Peter Struijs

    2014-07-01

    Full Text Available The rise of Big Data changes the context in which organisations producing official statistics operate. Big Data provides opportunities, but in order to make optimal use of Big Data, a number of challenges have to be addressed. This stimulates increased collaboration between National Statistical Institutes, Big Data holders, businesses and universities. In time, this may lead to a shift in the role of statistical institutes in the provision of high-quality and impartial statistical information to society. In this paper, the changes in context, the opportunities, the challenges and the way to collaborate are addressed. The collaboration between the various stakeholders will involve each partner building on and contributing different strengths. For national statistical offices, traditional strengths include, on the one hand, the ability to collect data and combine data sources with statistical products and, on the other hand, their focus on quality, transparency and sound methodology. In the Big Data era of competing and multiplying data sources, they continue to have a unique knowledge of official statistical production methods. And their impartiality and respect for privacy as enshrined in law uniquely position them as a trusted third party. Based on this, they may advise on the quality and validity of information of various sources. By thus positioning themselves, they will be able to play their role as key information providers in a changing society.

  17. Big data for bipolar disorder.

    Science.gov (United States)

    Monteith, Scott; Glenn, Tasha; Geddes, John; Whybrow, Peter C; Bauer, Michael

    2016-12-01

    The delivery of psychiatric care is changing with a new emphasis on integrated care, preventative measures, population health, and the biological basis of disease. Fundamental to this transformation are big data and advances in the ability to analyze these data. The impact of big data on the routine treatment of bipolar disorder today and in the near future is discussed, with examples that relate to health policy, the discovery of new associations, and the study of rare events. The primary sources of big data today are electronic medical records (EMR), claims, and registry data from providers and payers. In the near future, data created by patients from active monitoring, passive monitoring of Internet and smartphone activities, and from sensors may be integrated with the EMR. Diverse data sources from outside of medicine, such as government financial data, will be linked for research. Over the long term, genetic and imaging data will be integrated with the EMR, and there will be more emphasis on predictive models. Many technical challenges remain when analyzing big data that relates to size, heterogeneity, complexity, and unstructured text data in the EMR. Human judgement and subject matter expertise are critical parts of big data analysis, and the active participation of psychiatrists is needed throughout the analytical process.

  18. Quantum fields in a big-crunch-big-bang spacetime

    International Nuclear Information System (INIS)

    Tolley, Andrew J.; Turok, Neil

    2002-01-01

    We consider quantum field theory on a spacetime representing the big-crunch-big-bang transition postulated in ekpyrotic or cyclic cosmologies. We show via several independent methods that an essentially unique matching rule holds connecting the incoming state, in which a single extra dimension shrinks to zero, to the outgoing state in which it reexpands at the same rate. For free fields in our construction there is no particle production from the incoming adiabatic vacuum. When interactions are included the particle production for fixed external momentum is finite at the tree level. We discuss a formal correspondence between our construction and quantum field theory on de Sitter spacetime

  19. Poker Player Behavior After Big Wins and Big Losses

    OpenAIRE

    Gary Smith; Michael Levere; Robert Kurtzman

    2009-01-01

    We find that experienced poker players typically change their style of play after winning or losing a big pot--most notably, playing less cautiously after a big loss, evidently hoping for lucky cards that will erase their loss. This finding is consistent with Kahneman and Tversky's (Kahneman, D., A. Tversky. 1979. Prospect theory: An analysis of decision under risk. Econometrica 47(2) 263-292) break-even hypothesis and suggests that when investors incur a large loss, it might be time to take ...

  20. Turning big bang into big bounce: II. Quantum dynamics

    Energy Technology Data Exchange (ETDEWEB)

    Malkiewicz, Przemyslaw; Piechocki, Wlodzimierz, E-mail: pmalk@fuw.edu.p, E-mail: piech@fuw.edu.p [Theoretical Physics Department, Institute for Nuclear Studies, Hoza 69, 00-681 Warsaw (Poland)

    2010-11-21

    We analyze the big bounce transition of the quantum Friedmann-Robertson-Walker model in the setting of the nonstandard loop quantum cosmology (LQC). Elementary observables are used to quantize composite observables. The spectrum of the energy density operator is bounded and continuous. The spectrum of the volume operator is bounded from below and discrete. It has equally distant levels defining a quantum of the volume. The discreteness may imply a foamy structure of spacetime at a semiclassical level which may be detected in astro-cosmo observations. The nonstandard LQC method has a free parameter that should be fixed in some way to specify the big bounce transition.

  1. MLBCD: a machine learning tool for big clinical data.

    Science.gov (United States)

    Luo, Gang

    2015-01-01

    Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. The paper describes MLBCD's design in detail. By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care.

  2. The challenges of big data.

    Science.gov (United States)

    Mardis, Elaine R

    2016-05-01

    The largely untapped potential of big data analytics is a feeding frenzy that has been fueled by the production of many next-generation-sequencing-based data sets that are seeking to answer long-held questions about the biology of human diseases. Although these approaches are likely to be a powerful means of revealing new biological insights, there are a number of substantial challenges that currently hamper efforts to harness the power of big data. This Editorial outlines several such challenges as a means of illustrating that the path to big data revelations is paved with perils that the scientific community must overcome to pursue this important quest. © 2016. Published by The Company of Biologists Ltd.

  3. Big Book of Windows Hacks

    CERN Document Server

    Gralla, Preston

    2008-01-01

    Bigger, better, and broader in scope, the Big Book of Windows Hacks gives you everything you need to get the most out of your Windows Vista or XP system, including its related applications and the hardware it runs on or connects to. Whether you want to tweak Vista's Aero interface, build customized sidebar gadgets and run them from a USB key, or hack the "unhackable" screensavers, you'll find quick and ingenious ways to bend these recalcitrant operating systems to your will. The Big Book of Windows Hacks focuses on Vista, the new bad boy on Microsoft's block, with hacks and workarounds that

  4. Sosiaalinen asiakassuhdejohtaminen ja big data

    OpenAIRE

    Toivonen, Topi-Antti

    2015-01-01

    Tässä tutkielmassa käsitellään sosiaalista asiakassuhdejohtamista sekä hyötyjä, joita siihen voidaan saada big datan avulla. Sosiaalinen asiakassuhdejohtaminen on terminä uusi ja monille tuntematon. Tutkimusta motivoi aiheen vähäinen tutkimus, suomenkielisen tutkimuksen puuttuminen kokonaan sekä sosiaalisen asiakassuhdejohtamisen mahdollinen olennainen rooli yritysten toiminnassa tulevaisuudessa. Big dataa käsittelevissä tutkimuksissa keskitytään monesti sen tekniseen puoleen, eikä sovellutuk...

  5. Do big gods cause anything?

    DEFF Research Database (Denmark)

    Geertz, Armin W.

    2014-01-01

    Dette er et bidrag til et review symposium vedrørende Ara Norenzayans bog Big Gods: How Religion Transformed Cooperation and Conflict (Princeton University Press 2013). Bogen er spændende men problematisk i forhold til kausalitet, ateisme og stereotyper om jægere-samlere.......Dette er et bidrag til et review symposium vedrørende Ara Norenzayans bog Big Gods: How Religion Transformed Cooperation and Conflict (Princeton University Press 2013). Bogen er spændende men problematisk i forhold til kausalitet, ateisme og stereotyper om jægere-samlere....

  6. Big Data and Social Media

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    A critical analysis of the "keep everything" Big Data era, the impact on our lives of the information, at first glance "convenient for future use" that we make known about ourselves on the network. NB! The lecture will be recorded like all Academic Training lectures. Lecturer's biography: Father of the Internet, see https://internethalloffame.org/inductees/vint-cerf or https://en.wikipedia.org/wiki/Vint_Cerf The video on slide number 9 is from page https://www.gapminder.org/tools/#$state$time$value=2018&value;;&chart-type=bubbles   Keywords: Big Data, Internet, History, Applications, tools, privacy, technology, preservation, surveillance, google, Arpanet, CERN, Web  

  7. Baryon symmetric big bang cosmology

    International Nuclear Information System (INIS)

    Stecker, F.W.

    1978-01-01

    It is stated that the framework of baryon symmetric big bang (BSBB) cosmology offers our greatest potential for deducting the evolution of the Universe because its physical laws and processes have the minimum number of arbitrary assumptions about initial conditions in the big-bang. In addition, it offers the possibility of explaining the photon-baryon ratio in the Universe and how galaxies and galaxy clusters are formed. BSBB cosmology also provides the only acceptable explanation at present for the origin of the cosmic γ-ray background radiation. (author)

  8. Release plan for Big Pete

    International Nuclear Information System (INIS)

    Edwards, T.A.

    1996-11-01

    This release plan is to provide instructions for the Radiological Control Technician (RCT) to conduct surveys for the unconditional release of ''Big Pete,'' which was used in the removal of ''Spacers'' from the N-Reactor. Prior to performing surveys on the rear end portion of ''Big Pete,'' it shall be cleaned (i.e., free of oil, grease, caked soil, heavy dust). If no contamination is found, the vehicle may be released with the permission of the area RCT Supervisor. If contamination is found by any of the surveys, contact the cognizant Radiological Engineer for decontamination instructions

  9. Small quarks make big nuggets

    International Nuclear Information System (INIS)

    Deligeorges, S.

    1985-01-01

    After a brief recall on the classification of subatomic particles, this paper deals with quark nuggets, particle with more than three quarks, a big bag, which is called ''nuclearite''. Neutron stars, in fact, are big sacks of quarks, gigantic nuggets. Now, physicists try to calculate which type of nuggets of strange quark matter is stable, what has been the influence of quark nuggets on the primordial nucleosynthesis. At the present time, one says that if these ''nuggets'' exist, and in a large proportion, they may be candidates for the missing mass [fr

  10. [Big Data- challenges and risks].

    Science.gov (United States)

    Krauß, Manuela; Tóth, Tamás; Hanika, Heinrich; Kozlovszky, Miklós; Dinya, Elek

    2015-12-06

    The term "Big Data" is commonly used to describe the growing mass of information being created recently. New conclusions can be drawn and new services can be developed by the connection, processing and analysis of these information. This affects all aspects of life, including health and medicine. The authors review the application areas of Big Data, and present examples from health and other areas. However, there are several preconditions of the effective use of the opportunities: proper infrastructure, well defined regulatory environment with particular emphasis on data protection and privacy. These issues and the current actions for solution are also presented.

  11. Towards a big crunch dual

    Energy Technology Data Exchange (ETDEWEB)

    Hertog, Thomas E-mail: hertog@vulcan2.physics.ucsb.edu; Horowitz, Gary T

    2004-07-01

    We show there exist smooth asymptotically anti-de Sitter initial data which evolve to a big crunch singularity in a low energy supergravity limit of string theory. This opens up the possibility of using the dual conformal field theory to obtain a fully quantum description of the cosmological singularity. A preliminary study of this dual theory suggests that the big crunch is an endpoint of evolution even in the full string theory. We also show that any theory with scalar solitons must have negative energy solutions. The results presented here clarify our earlier work on cosmic censorship violation in N=8 supergravity. (author)

  12. The Inverted Big-Bang

    OpenAIRE

    Vaas, Ruediger

    2004-01-01

    Our universe appears to have been created not out of nothing but from a strange space-time dust. Quantum geometry (loop quantum gravity) makes it possible to avoid the ominous beginning of our universe with its physically unrealistic (i.e. infinite) curvature, extreme temperature, and energy density. This could be the long sought after explanation of the big-bang and perhaps even opens a window into a time before the big-bang: Space itself may have come from an earlier collapsing universe tha...

  13. Big Cities, Big Problems: Reason for the Elderly to Move?

    NARCIS (Netherlands)

    Fokkema, T.; de Jong-Gierveld, J.; Nijkamp, P.

    1996-01-01

    In many European countries, data on geographical patterns of internal elderly migration show that the elderly (55+) are more likely to leave than to move to the big cities. Besides emphasising the attractive features of the destination areas (pull factors), it is often assumed that this negative

  14. Big-Eyed Bugs Have Big Appetite for Pests

    Science.gov (United States)

    Many kinds of arthropod natural enemies (predators and parasitoids) inhabit crop fields in Arizona and can have a large negative impact on several pest insect species that also infest these crops. Geocoris spp., commonly known as big-eyed bugs, are among the most abundant insect predators in field c...

  15. An overview of big data and data science education at South African universities

    Directory of Open Access Journals (Sweden)

    Eduan Kotzé

    2016-02-01

    Full Text Available Man and machine are generating data electronically at an astronomical speed and in such a way that society is experiencing cognitive challenges to analyse this data meaningfully. Big data firms, such as Google and Facebook, identified this problem several years ago and are continuously developing new technologies or improving existing technologies in order to facilitate the cognitive analysis process of these large data sets. The purpose of this article is to contribute to our theoretical understanding of the role that big data might play in creating new training opportunities for South African universities. The article investigates emerging literature on the characteristics and main components of big data, together with the Hadoop application stack as an example of big data technology. Due to the rapid development of big data technology, a paradigm shift of human resources is required to analyse these data sets; therefore, this study examines the state of big data teaching at South African universities. This article also provides an overview of possible big data sources for South African universities, as well as relevant big data skills that data scientists need. The study also investigates existing academic programs in South Africa, where the focus is on teaching advanced database systems. The study found that big data and data science topics are introduced to students on a postgraduate level, but that the scope is very limited. This article contributes by proposing important theoretical topics that could be introduced as part of the existing academic programs. More research is required, however, to expand these programs in order to meet the growing demand for data scientists with big data skills.

  16. Intelligent Decisional Assistant that Facilitate the Choice of a Proper Computer System Applied in Busines

    OpenAIRE

    Nicolae MARGINEAN

    2009-01-01

    The choice of a proper computer system is not an easy task for a decider. One reason could be the present market development of computer systems applied in business. The big number of the Romanian market players determines a big number of computerized products, with a multitude of various properties. Our proposal tries to optimize and facilitate this decisional process within an e-shop where are sold IT packets applied in business, building an online decisional assistant, a special component ...

  17. Will Big Data Close the Missing Heritability Gap?

    Science.gov (United States)

    Kim, Hwasoon; Grueneberg, Alexander; Vazquez, Ana I; Hsu, Stephen; de Los Campos, Gustavo

    2017-11-01

    Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity ( e.g. , number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing ( n = 22,221) of 0.24 (95% C.I.: 0.23-0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed. Copyright © 2017 by the Genetics Society of America.

  18. Clinical Trial Data as Public Goods: Fair Trade and the Virtual Knowledge Bank as a Solution to the Free Rider Problem - A Framework for the Promotion of Innovation by Facilitation of Clinical Trial Data Sharing among Biopharmaceutical Companies in the Era of Omics and Big Data.

    Science.gov (United States)

    Evangelatos, Nikolaos; Reumann, Matthias; Lehrach, Hans; Brand, Angela

    2016-01-01

    Knowledge in the era of Omics and Big Data has been increasingly conceptualized as a public good. Sharing of de-identified patient data has been advocated as a means to increase confidence and public trust in the results of clinical trials. On the other hand, research has shown that the current research and development model of the biopharmaceutical industry has reached its innovation capacity. In response to that, the biopharmaceutical industry has adopted open innovation practices, with sharing of clinical trial data being among the most interesting ones. However, due to the free rider problem, clinical trial data sharing among biopharmaceutical companies could undermine their innovativeness. Based on the theory of public goods, we have developed a commons arrangement and devised a model, which enables secure and fair clinical trial data sharing over a Virtual Knowledge Bank based on a web platform. Our model uses data as a virtual currency and treats knowledge as a club good. Fair sharing of clinical trial data over the Virtual Knowledge Bank has positive effects on the innovation capacity of the biopharmaceutical industry without compromising the intellectual rights, proprietary interests and competitiveness of the latter. The Virtual Knowledge Bank is a sustainable and self-expanding model for secure and fair clinical trial data sharing that allows for sharing of clinical trial data, while at the same time it increases the innovation capacity of the biopharmaceutical industry. © 2016 S. Karger AG, Basel.

  19. A survey on Big Data Stream Mining

    African Journals Online (AJOL)

    pc

    2018-03-05

    Mar 5, 2018 ... Big Data can be static on one machine or distributed ... decision making, and process automation. Big data .... Concept Drifting: concept drifting mean the classifier .... transactions generated by a prefix tree structure. EstDec ...

  20. Extreme genomes

    OpenAIRE

    DeLong, Edward F

    2000-01-01

    The complete genome sequence of Thermoplasma acidophilum, an acid- and heat-loving archaeon, has recently been reported. Comparative genomic analysis of this 'extremophile' is providing new insights into the metabolic machinery, ecology and evolution of thermophilic archaea.

  1. BIG DATA-DRIVEN MARKETING: AN ABSTRACT

    OpenAIRE

    Suoniemi, Samppa; Meyer-Waarden, Lars; Munzel, Andreas

    2017-01-01

    Customer information plays a key role in managing successful relationships with valuable customers. Big data customer analytics use (BD use), i.e., the extent to which customer information derived from big data analytics guides marketing decisions, helps firms better meet customer needs for competitive advantage. This study addresses three research questions: What are the key antecedents of big data customer analytics use? How, and to what extent, does big data customer an...

  2. Big data in Finnish financial services

    OpenAIRE

    Laurila, M. (Mikko)

    2017-01-01

    Abstract This thesis aims to explore the concept of big data, and create understanding of big data maturity in the Finnish financial services industry. The research questions of this thesis are “What kind of big data solutions are being implemented in the Finnish financial services sector?” and “Which factors impede faster implementation of big data solutions in the Finnish financial services sector?”. ...

  3. Grass genomes

    OpenAIRE

    Bennetzen, Jeffrey L.; SanMiguel, Phillip; Chen, Mingsheng; Tikhonov, Alexander; Francki, Michael; Avramova, Zoya

    1998-01-01

    For the most part, studies of grass genome structure have been limited to the generation of whole-genome genetic maps or the fine structure and sequence analysis of single genes or gene clusters. We have investigated large contiguous segments of the genomes of maize, sorghum, and rice, primarily focusing on intergenic spaces. Our data indicate that much (>50%) of the maize genome is composed of interspersed repetitive DNAs, primarily nested retrotransposons that in...

  4. Transforming business models through big data in the textile industry

    DEFF Research Database (Denmark)

    Aagaard, Annabeth

    , such as textile, and have led to disruption of established business models (Westerman et al., 2014; Weill and Woerner, 2015). Yet, little is known of the managerial process and facilitation of the digital transformation of business models through big data (McAfee and Brynjolfsson, 2012; Markus and Loebbecke, 2013).......The extensive stream of work on business models (BM) and business model innovation (BMI) has generated many important insights (Amit & Zott, 2001; Osterwalder, 2004; Markides, 2008, 2013; Chesbrough 2010; Teece, 2010; Zott et al, 2011). Yet, our understanding of business models remains fragmented...... as stressed by Zott et al. (2011), Weill et al. (2011) and David J. Teece (2010: 174), who states that: “the concept of a business model lacks theoretical grounding in economics or in business studies”. With the acceleration of digitization and use of big data analytics quality data are accessible...

  5. The emerging role of Big Data in key development issues: Opportunities, challenges, and concerns

    Directory of Open Access Journals (Sweden)

    Nir Kshetri

    2014-12-01

    Full Text Available This paper presents a review of academic literature, policy documents from government organizations and international agencies, and reports from industries and popular media on the trends in Big Data utilization in key development issues and its worthwhileness, usefulness, and relevance. By looking at Big Data deployment in a number of key economic sectors, it seeks to provide a better understanding of the opportunities and challenges of using it for addressing key issues facing the developing world. It reviews the uses of Big Data in agriculture and farming activities in developing countries to assess the capabilities required at various levels to benefit from Big Data. It also provides insights into how the current digital divide is associated with and facilitated by the pattern of Big Data diffusion and its effective use in key development areas. It also discusses the lessons that developing countries can learn from the utilization of Big Data in big corporations as well as in other activities in industrialized countries.

  6. China: Big Changes Coming Soon

    Science.gov (United States)

    Rowen, Henry S.

    2011-01-01

    Big changes are ahead for China, probably abrupt ones. The economy has grown so rapidly for many years, over 30 years at an average of nine percent a year, that its size makes it a major player in trade and finance and increasingly in political and military matters. This growth is not only of great importance internationally, it is already having…

  7. Big data and urban governance

    NARCIS (Netherlands)

    Taylor, L.; Richter, C.; Gupta, J.; Pfeffer, K.; Verrest, H.; Ros-Tonen, M.

    2015-01-01

    This chapter examines the ways in which big data is involved in the rise of smart cities. Mobile phones, sensors and online applications produce streams of data which are used to regulate and plan the city, often in real time, but which presents challenges as to how the city’s functions are seen and

  8. Big Data for personalized healthcare

    NARCIS (Netherlands)

    Siemons, Liseth; Sieverink, Floor; Vollenbroek, Wouter; van de Wijngaert, Lidwien; Braakman-Jansen, Annemarie; van Gemert-Pijnen, Lisette

    2016-01-01

    Big Data, often defined according to the 5V model (volume, velocity, variety, veracity and value), is seen as the key towards personalized healthcare. However, it also confronts us with new technological and ethical challenges that require more sophisticated data management tools and data analysis

  9. Big data en gelijke behandeling

    NARCIS (Netherlands)

    Lammerant, Hans; de Hert, Paul; Blok, P.H.; Blok, P.H.

    2017-01-01

    In dit hoofdstuk bekijken we allereerst de voornaamste basisbegrippen inzake gelijke behandeling en discriminatie (paragraaf 6.2). Vervolgens kijken we haar het Nederlandse en Europese juridisch kader inzake non-discriminatie (paragraaf 6.3-6.5) en hoe die regels moeten worden toegepast op big

  10. Research Ethics in Big Data.

    Science.gov (United States)

    Hammer, Marilyn J

    2017-05-01

    The ethical conduct of research includes, in part, patient agreement to participate in studies and the protection of health information. In the evolving world of data science and the accessibility of large quantities of web-based data created by millions of individuals, novel methodologic approaches to answering research questions are emerging. This article explores research ethics in the context of big data.

  11. Big data e data science

    OpenAIRE

    Cavique, Luís

    2014-01-01

    Neste artigo foram apresentados os conceitos básicos de Big Data e a nova área a que deu origem, a Data Science. Em Data Science foi discutida e exemplificada a noção de redução da dimensionalidade dos dados.

  12. The Case for "Big History."

    Science.gov (United States)

    Christian, David

    1991-01-01

    Urges an approach to the teaching of history that takes the largest possible perspective, crossing time as well as space. Discusses the problems and advantages of such an approach. Describes a course on "big" history that begins with time, creation myths, and astronomy, and moves on to paleontology and evolution. (DK)

  13. Finding errors in big data

    NARCIS (Netherlands)

    Puts, Marco; Daas, Piet; de Waal, A.G.

    No data source is perfect. Mistakes inevitably creep in. Spotting errors is hard enough when dealing with survey responses from several thousand people, but the difficulty is multiplied hugely when that mysterious beast Big Data comes into play. Statistics Netherlands is about to publish its first

  14. Sampling Operations on Big Data

    Science.gov (United States)

    2015-11-29

    gories. These include edge sampling methods where edges are selected by a predetermined criteria; snowball sampling methods where algorithms start... Sampling Operations on Big Data Vijay Gadepally, Taylor Herr, Luke Johnson, Lauren Milechin, Maja Milosavljevic, Benjamin A. Miller Lincoln...process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and

  15. The International Big History Association

    Science.gov (United States)

    Duffy, Michael; Duffy, D'Neil

    2013-01-01

    IBHA, the International Big History Association, was organized in 2010 and "promotes the unified, interdisciplinary study and teaching of history of the Cosmos, Earth, Life, and Humanity." This is the vision that Montessori embraced long before the discoveries of modern science fleshed out the story of the evolving universe. "Big…

  16. NASA's Big Data Task Force

    Science.gov (United States)

    Holmes, C. P.; Kinter, J. L.; Beebe, R. F.; Feigelson, E.; Hurlburt, N. E.; Mentzel, C.; Smith, G.; Tino, C.; Walker, R. J.

    2017-12-01

    Two years ago NASA established the Ad Hoc Big Data Task Force (BDTF - https://science.nasa.gov/science-committee/subcommittees/big-data-task-force), an advisory working group with the NASA Advisory Council system. The scope of the Task Force included all NASA Big Data programs, projects, missions, and activities. The Task Force focused on such topics as exploring the existing and planned evolution of NASA's science data cyber-infrastructure that supports broad access to data repositories for NASA Science Mission Directorate missions; best practices within NASA, other Federal agencies, private industry and research institutions; and Federal initiatives related to big data and data access. The BDTF has completed its two-year term and produced several recommendations plus four white papers for NASA's Science Mission Directorate. This presentation will discuss the activities and results of the TF including summaries of key points from its focused study topics. The paper serves as an introduction to the papers following in this ESSI session.

  17. Big Math for Little Kids

    Science.gov (United States)

    Greenes, Carole; Ginsburg, Herbert P.; Balfanz, Robert

    2004-01-01

    "Big Math for Little Kids," a comprehensive program for 4- and 5-year-olds, develops and expands on the mathematics that children know and are capable of doing. The program uses activities and stories to develop ideas about number, shape, pattern, logical reasoning, measurement, operations on numbers, and space. The activities introduce the…

  18. BIG DATA IN BUSINESS ENVIRONMENT

    Directory of Open Access Journals (Sweden)

    Logica BANICA

    2015-06-01

    Full Text Available In recent years, dealing with a lot of data originating from social media sites and mobile communications among data from business environments and institutions, lead to the definition of a new concept, known as Big Data. The economic impact of the sheer amount of data produced in a last two years has increased rapidly. It is necessary to aggregate all types of data (structured and unstructured in order to improve current transactions, to develop new business models, to provide a real image of the supply and demand and thereby, generate market advantages. So, the companies that turn to Big Data have a competitive advantage over other firms. Looking from the perspective of IT organizations, they must accommodate the storage and processing Big Data, and provide analysis tools that are easily integrated into business processes. This paper aims to discuss aspects regarding the Big Data concept, the principles to build, organize and analyse huge datasets in the business environment, offering a three-layer architecture, based on actual software solutions. Also, the article refers to the graphical tools for exploring and representing unstructured data, Gephi and NodeXL.

  19. From Big Bang to Eternity?

    Indian Academy of Sciences (India)

    at different distances (that is, at different epochs in the past) to come to this ... that the expansion started billions of years ago from an explosive Big Bang. Recent research sheds new light on the key cosmological question about the distant ...

  20. Banking Wyoming big sagebrush seeds

    Science.gov (United States)

    Robert P. Karrfalt; Nancy Shaw

    2013-01-01

    Five commercially produced seed lots of Wyoming big sagebrush (Artemisia tridentata Nutt. var. wyomingensis (Beetle & Young) S.L. Welsh [Asteraceae]) were stored under various conditions for 5 y. Purity, moisture content as measured by equilibrium relative humidity, and storage temperature were all important factors to successful seed storage. Our results indicate...

  1. The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap.

    Science.gov (United States)

    Mann, Richard P; Mushtaq, Faisal; White, Alan D; Mata-Cervantes, Gabriel; Pike, Tom; Coker, Dalton; Murdoch, Stuart; Hiles, Tim; Smith, Clare; Berridge, David; Hinchliffe, Suzanne; Hall, Geoff; Smye, Stephen; Wilkie, Richard M; Lodge, J Peter A; Mon-Williams, Mark

    2016-01-01

    Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public's perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating health-care delivery. We propose that greater attention (and funding) needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit "big data."

  2. The Uses of Big Data in Cities.

    Science.gov (United States)

    Bettencourt, Luís M A

    2014-03-01

    There is much enthusiasm currently about the possibilities created by new and more extensive sources of data to better understand and manage cities. Here, I explore how big data can be useful in urban planning by formalizing the planning process as a general computational problem. I show that, under general conditions, new sources of data coordinated with urban policy can be applied following fundamental principles of engineering to achieve new solutions to important age-old urban problems. I also show that comprehensive urban planning is computationally intractable (i.e., practically impossible) in large cities, regardless of the amounts of data available. This dilemma between the need for planning and coordination and its impossibility in detail is resolved by the recognition that cities are first and foremost self-organizing social networks embedded in space and enabled by urban infrastructure and services. As such, the primary role of big data in cities is to facilitate information flows and mechanisms of learning and coordination by heterogeneous individuals. However, processes of self-organization in cities, as well as of service improvement and expansion, must rely on general principles that enforce necessary conditions for cities to operate and evolve. Such ideas are the core of a developing scientific theory of cities, which is itself enabled by the growing availability of quantitative data on thousands of cities worldwide, across different geographies and levels of development. These three uses of data and information technologies in cities constitute then the necessary pillars for more successful urban policy and management that encourages, and does not stifle, the fundamental role of cities as engines of development and innovation in human societies.

  3. The Natural Science Underlying Big History

    Directory of Open Access Journals (Sweden)

    Eric J. Chaisson

    2014-01-01

    Full Text Available Nature’s many varied complex systems—including galaxies, stars, planets, life, and society—are islands of order within the increasingly disordered Universe. All organized systems are subject to physical, biological, or cultural evolution, which together comprise the grander interdisciplinary subject of cosmic evolution. A wealth of observational data supports the hypothesis that increasingly complex systems evolve unceasingly, uncaringly, and unpredictably from big bang to humankind. These are global history greatly extended, big history with a scientific basis, and natural history broadly portrayed across ∼14 billion years of time. Human beings and our cultural inventions are not special, unique, or apart from Nature; rather, we are an integral part of a universal evolutionary process connecting all such complex systems throughout space and time. Such evolution writ large has significant potential to unify the natural sciences into a holistic understanding of who we are and whence we came. No new science (beyond frontier, nonequilibrium thermodynamics is needed to describe cosmic evolution’s major milestones at a deep and empirical level. Quantitative models and experimental tests imply that a remarkable simplicity underlies the emergence and growth of complexity for a wide spectrum of known and diverse systems. Energy is a principal facilitator of the rising complexity of ordered systems within the expanding Universe; energy flows are as central to life and society as they are to stars and galaxies. In particular, energy rate density—contrasting with information content or entropy production—is an objective metric suitable to gauge relative degrees of complexity among a hierarchy of widely assorted systems observed throughout the material Universe. Operationally, those systems capable of utilizing optimum amounts of energy tend to survive, and those that cannot are nonrandomly eliminated.

  4. Cancer genomics

    DEFF Research Database (Denmark)

    Norrild, Bodil; Guldberg, Per; Ralfkiær, Elisabeth Methner

    2007-01-01

    Almost all cells in the human body contain a complete copy of the genome with an estimated number of 25,000 genes. The sequences of these genes make up about three percent of the genome and comprise the inherited set of genetic information. The genome also contains information that determines whe...

  5. Big Data: Implications for Health System Pharmacy.

    Science.gov (United States)

    Stokes, Laura B; Rogers, Joseph W; Hertig, John B; Weber, Robert J

    2016-07-01

    Big Data refers to datasets that are so large and complex that traditional methods and hardware for collecting, sharing, and analyzing them are not possible. Big Data that is accurate leads to more confident decision making, improved operational efficiency, and reduced costs. The rapid growth of health care information results in Big Data around health services, treatments, and outcomes, and Big Data can be used to analyze the benefit of health system pharmacy services. The goal of this article is to provide a perspective on how Big Data can be applied to health system pharmacy. It will define Big Data, describe the impact of Big Data on population health, review specific implications of Big Data in health system pharmacy, and describe an approach for pharmacy leaders to effectively use Big Data. A few strategies involved in managing Big Data in health system pharmacy include identifying potential opportunities for Big Data, prioritizing those opportunities, protecting privacy concerns, promoting data transparency, and communicating outcomes. As health care information expands in its content and becomes more integrated, Big Data can enhance the development of patient-centered pharmacy services.

  6. Big data: een zoektocht naar instituties

    NARCIS (Netherlands)

    van der Voort, H.G.; Crompvoets, J

    2016-01-01

    Big data is a well-known phenomenon, even a buzzword nowadays. It refers to an abundance of data and new possibilities to process and use them. Big data is subject of many publications. Some pay attention to the many possibilities of big data, others warn us for their consequences. This special

  7. A SWOT Analysis of Big Data

    Science.gov (United States)

    Ahmadi, Mohammad; Dileepan, Parthasarati; Wheatley, Kathleen K.

    2016-01-01

    This is the decade of data analytics and big data, but not everyone agrees with the definition of big data. Some researchers see it as the future of data analysis, while others consider it as hype and foresee its demise in the near future. No matter how it is defined, big data for the time being is having its glory moment. The most important…

  8. Data, Data, Data : Big, Linked & Open

    NARCIS (Netherlands)

    Folmer, E.J.A.; Krukkert, D.; Eckartz, S.M.

    2013-01-01

    De gehele business en IT-wereld praat op dit moment over Big Data, een trend die medio 2013 Cloud Computing is gepasseerd (op basis van Google Trends). Ook beleidsmakers houden zich actief bezig met Big Data. Neelie Kroes, vice-president van de Europese Commissie, spreekt over de ‘Big Data

  9. A survey of big data research

    Science.gov (United States)

    Fang, Hua; Zhang, Zhaoyang; Wang, Chanpaul Jin; Daneshmand, Mahmoud; Wang, Chonggang; Wang, Honggang

    2015-01-01

    Big data create values for business and research, but pose significant challenges in terms of networking, storage, management, analytics and ethics. Multidisciplinary collaborations from engineers, computer scientists, statisticians and social scientists are needed to tackle, discover and understand big data. This survey presents an overview of big data initiatives, technologies and research in industries and academia, and discusses challenges and potential solutions. PMID:26504265

  10. The BigBoss Experiment

    Energy Technology Data Exchange (ETDEWEB)

    Schelgel, D.; Abdalla, F.; Abraham, T.; Ahn, C.; Allende Prieto, C.; Annis, J.; Aubourg, E.; Azzaro, M.; Bailey, S.; Baltay, C.; Baugh, C.; Bebek, C.; Becerril, S.; Blanton, M.; Bolton, A.; Bromley, B.; Cahn, R.; Carton, P.-H.; Cervanted-Cota, J.L.; Chu, Y.; Cortes, M.; /APC, Paris /Brookhaven /IRFU, Saclay /Marseille, CPPM /Marseille, CPT /Durham U. / /IEU, Seoul /Fermilab /IAA, Granada /IAC, La Laguna / /IAC, Mexico / / /Madrid, IFT /Marseille, Lab. Astrophys. / / /New York U. /Valencia U.

    2012-06-07

    BigBOSS is a Stage IV ground-based dark energy experiment to study baryon acoustic oscillations (BAO) and the growth of structure with a wide-area galaxy and quasar redshift survey over 14,000 square degrees. It has been conditionally accepted by NOAO in response to a call for major new instrumentation and a high-impact science program for the 4-m Mayall telescope at Kitt Peak. The BigBOSS instrument is a robotically-actuated, fiber-fed spectrograph capable of taking 5000 simultaneous spectra over a wavelength range from 340 nm to 1060 nm, with a resolution R = {lambda}/{Delta}{lambda} = 3000-4800. Using data from imaging surveys that are already underway, spectroscopic targets are selected that trace the underlying dark matter distribution. In particular, targets include luminous red galaxies (LRGs) up to z = 1.0, extending the BOSS LRG survey in both redshift and survey area. To probe the universe out to even higher redshift, BigBOSS will target bright [OII] emission line galaxies (ELGs) up to z = 1.7. In total, 20 million galaxy redshifts are obtained to measure the BAO feature, trace the matter power spectrum at smaller scales, and detect redshift space distortions. BigBOSS will provide additional constraints on early dark energy and on the curvature of the universe by measuring the Ly-alpha forest in the spectra of over 600,000 2.2 < z < 3.5 quasars. BigBOSS galaxy BAO measurements combined with an analysis of the broadband power, including the Ly-alpha forest in BigBOSS quasar spectra, achieves a FOM of 395 with Planck plus Stage III priors. This FOM is based on conservative assumptions for the analysis of broad band power (k{sub max} = 0.15), and could grow to over 600 if current work allows us to push the analysis to higher wave numbers (k{sub max} = 0.3). BigBOSS will also place constraints on theories of modified gravity and inflation, and will measure the sum of neutrino masses to 0.024 eV accuracy.

  11. Quest for Value in Big Earth Data

    Science.gov (United States)

    Kuo, Kwo-Sen; Oloso, Amidu O.; Rilee, Mike L.; Doan, Khoa; Clune, Thomas L.; Yu, Hongfeng

    2017-04-01

    and time, rather than their array index dimensions to achieve co-location for spatiotemporal coincidence. This leads further to the insight that, in terms of optimizing Value, achieving good scalability in Variety is more crucial than good scalability in Volume. Therefore, we will discuss our innovative approach to improving productivity by homogenizing the daunting varieties in Earth Science data to enable data co-location systematically. In addition, a Big Data system incorporating the capabilities described above has the potential to drastically shorten the data preparation period of machine learning, better facilitate automated machine learning operations, and further boost scientific productivity.

  12. Implementing genomics and pharmacogenomics in the clinic: The National Human Genome Research Institute’s genomic medicine portfolio

    Science.gov (United States)

    Manolio, Teri A.

    2016-01-01

    Increasing knowledge about the influence of genetic variation on human health and growing availability of reliable, cost-effective genetic testing have spurred the implementation of genomic medicine in the clinic. As defined by the National Human Genome Research Institute (NHGRI), genomic medicine uses an individual’s genetic information in his or her clinical care, and has begun to be applied effectively in areas such as cancer genomics, pharmacogenomics, and rare and undiagnosed diseases. In 2011 NHGRI published its strategic vision for the future of genomic research, including an ambitious research agenda to facilitate and promote the implementation of genomic medicine. To realize this agenda, NHGRI is consulting and facilitating collaborations with the external research community through a series of “Genomic Medicine Meetings,” under the guidance and leadership of the National Advisory Council on Human Genome Research. These meetings have identified and begun to address significant obstacles to implementation, such as lack of evidence of efficacy, limited availability of genomics expertise and testing, lack of standards, and diffficulties in integrating genomic results into electronic medical records. The six research and dissemination initiatives comprising NHGRI’s genomic research portfolio are designed to speed the evaluation and incorporation, where appropriate, of genomic technologies and findings into routine clinical care. Actual adoption of successful approaches in clinical care will depend upon the willingness, interest, and energy of professional societies, practitioners, patients, and payers to promote their responsible use and share their experiences in doing so. PMID:27612677

  13. Facilitating functional annotation of chicken microarray data

    Directory of Open Access Journals (Sweden)

    Gresham Cathy R

    2009-10-01

    Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and

  14. Family genome browser: visualizing genomes with pedigree information.

    Science.gov (United States)

    Juan, Liran; Liu, Yongzhuang; Wang, Yongtian; Teng, Mingxiang; Zang, Tianyi; Wang, Yadong

    2015-07-15

    Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. The FGB is available at http://mlg.hit.edu.cn/FGB/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Big Data access and infrastructure for modern biology: case studies in data repository utility.

    Science.gov (United States)

    Boles, Nathan C; Stone, Tyler; Bergeron, Charles; Kiehl, Thomas R

    2017-01-01

    Big Data is no longer solely the purview of big organizations with big resources. Today's routine tools and experimental methods can generate large slices of data. For example, high-throughput sequencing can quickly interrogate biological systems for the expression levels of thousands of different RNAs, examine epigenetic marks throughout the genome, and detect differences in the genomes of individuals. Multichannel electrophysiology platforms produce gigabytes of data in just a few minutes of recording. Imaging systems generate videos capturing biological behaviors over the course of days. Thus, any researcher now has access to a veritable wealth of data. However, the ability of any given researcher to utilize that data is limited by her/his own resources and skills for downloading, storing, and analyzing the data. In this paper, we examine the necessary resources required to engage Big Data, survey the state of modern data analysis pipelines, present a few data repository case studies, and touch on current institutions and programs supporting the work that relies on Big Data. © 2016 New York Academy of Sciences.

  16. Entomological Collections in the Age of Big Data.

    Science.gov (United States)

    Short, Andrew Edward Z; Dikow, Torsten; Moreau, Corrie S

    2018-01-07

    With a million described species and more than half a billion preserved specimens, the large scale of insect collections is unequaled by those of any other group. Advances in genomics, collection digitization, and imaging have begun to more fully harness the power that such large data stores can provide. These new approaches and technologies have transformed how entomological collections are managed and utilized. While genomic research has fundamentally changed the way many specimens are collected and curated, advances in technology have shown promise for extracting sequence data from the vast holdings already in museums. Efforts to mainstream specimen digitization have taken root and have accelerated traditional taxonomic studies as well as distribution modeling and global change research. Emerging imaging technologies such as microcomputed tomography and confocal laser scanning microscopy are changing how morphology can be investigated. This review provides an overview of how the realization of big data has transformed our field and what may lie in store.

  17. Big Biomedical data as the key resource for discovery science

    Energy Technology Data Exchange (ETDEWEB)

    Toga, Arthur W.; Foster, Ian; Kesselman, Carl; Madduri, Ravi; Chard, Kyle; Deutsch, Eric W.; Price, Nathan D.; Glusman, Gustavo; Heavner, Benjamin D.; Dinov, Ivo D.; Ames, Joseph; Van Horn, John; Kramer, Roger; Hood, Leroy

    2015-07-21

    Modern biomedical data collection is generating exponentially more data in a multitude of formats. This flood of complex data poses significant opportunities to discover and understand the critical interplay among such diverse domains as genomics, proteomics, metabolomics, and phenomics, including imaging, biometrics, and clinical data. The Big Data for Discovery Science Center is taking an “-ome to home” approach to discover linkages between these disparate data sources by mining existing databases of proteomic and genomic data, brain images, and clinical assessments. In support of this work, the authors developed new technological capabilities that make it easy for researchers to manage, aggregate, manipulate, integrate, and model large amounts of distributed data. Guided by biological domain expertise, the Center’s computational resources and software will reveal relationships and patterns, aiding researchers in identifying biomarkers for the most confounding conditions and diseases, such as Parkinson’s and Alzheimer’s.

  18. Big-Leaf Mahogany on CITES Appendix II: Big Challenge, Big Opportunity

    Science.gov (United States)

    JAMES GROGAN; PAULO BARRETO

    2005-01-01

    On 15 November 2003, big-leaf mahogany (Swietenia macrophylla King, Meliaceae), the most valuable widely traded Neotropical timber tree, gained strengthened regulatory protection from its listing on Appendix II of the Convention on International Trade in Endangered Species ofWild Fauna and Flora (CITES). CITES is a United Nations-chartered agreement signed by 164...

  19. Big Bang Day : The Great Big Particle Adventure - 3. Origins

    CERN Multimedia

    2008-01-01

    In this series, comedian and physicist Ben Miller asks the CERN scientists what they hope to find. If the LHC is successful, it will explain the nature of the Universe around us in terms of a few simple ingredients and a few simple rules. But the Universe now was forged in a Big Bang where conditions were very different, and the rules were very different, and those early moments were crucial to determining how things turned out later. At the LHC they can recreate conditions as they were billionths of a second after the Big Bang, before atoms and nuclei existed. They can find out why matter and antimatter didn't mutually annihilate each other to leave behind a Universe of pure, brilliant light. And they can look into the very structure of space and time - the fabric of the Universe

  20. Antigravity and the big crunch/big bang transition

    Science.gov (United States)

    Bars, Itzhak; Chen, Shih-Hung; Steinhardt, Paul J.; Turok, Neil

    2012-08-01

    We point out a new phenomenon which seems to be generic in 4d effective theories of scalar fields coupled to Einstein gravity, when applied to cosmology. A lift of such theories to a Weyl-invariant extension allows one to define classical evolution through cosmological singularities unambiguously, and hence construct geodesically complete background spacetimes. An attractor mechanism ensures that, at the level of the effective theory, generic solutions undergo a big crunch/big bang transition by contracting to zero size, passing through a brief antigravity phase, shrinking to zero size again, and re-emerging into an expanding normal gravity phase. The result may be useful for the construction of complete bouncing cosmologies like the cyclic model.

  1. Antigravity and the big crunch/big bang transition

    Energy Technology Data Exchange (ETDEWEB)

    Bars, Itzhak [Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089-2535 (United States); Chen, Shih-Hung [Perimeter Institute for Theoretical Physics, Waterloo, ON N2L 2Y5 (Canada); Department of Physics and School of Earth and Space Exploration, Arizona State University, Tempe, AZ 85287-1404 (United States); Steinhardt, Paul J., E-mail: steinh@princeton.edu [Department of Physics and Princeton Center for Theoretical Physics, Princeton University, Princeton, NJ 08544 (United States); Turok, Neil [Perimeter Institute for Theoretical Physics, Waterloo, ON N2L 2Y5 (Canada)

    2012-08-29

    We point out a new phenomenon which seems to be generic in 4d effective theories of scalar fields coupled to Einstein gravity, when applied to cosmology. A lift of such theories to a Weyl-invariant extension allows one to define classical evolution through cosmological singularities unambiguously, and hence construct geodesically complete background spacetimes. An attractor mechanism ensures that, at the level of the effective theory, generic solutions undergo a big crunch/big bang transition by contracting to zero size, passing through a brief antigravity phase, shrinking to zero size again, and re-emerging into an expanding normal gravity phase. The result may be useful for the construction of complete bouncing cosmologies like the cyclic model.

  2. Antigravity and the big crunch/big bang transition

    International Nuclear Information System (INIS)

    Bars, Itzhak; Chen, Shih-Hung; Steinhardt, Paul J.; Turok, Neil

    2012-01-01

    We point out a new phenomenon which seems to be generic in 4d effective theories of scalar fields coupled to Einstein gravity, when applied to cosmology. A lift of such theories to a Weyl-invariant extension allows one to define classical evolution through cosmological singularities unambiguously, and hence construct geodesically complete background spacetimes. An attractor mechanism ensures that, at the level of the effective theory, generic solutions undergo a big crunch/big bang transition by contracting to zero size, passing through a brief antigravity phase, shrinking to zero size again, and re-emerging into an expanding normal gravity phase. The result may be useful for the construction of complete bouncing cosmologies like the cyclic model.

  3. Solution of a braneworld big crunch/big bang cosmology

    International Nuclear Information System (INIS)

    McFadden, Paul L.; Turok, Neil; Steinhardt, Paul J.

    2007-01-01

    We solve for the cosmological perturbations in a five-dimensional background consisting of two separating or colliding boundary branes, as an expansion in the collision speed V divided by the speed of light c. Our solution permits a detailed check of the validity of four-dimensional effective theory in the vicinity of the event corresponding to the big crunch/big bang singularity. We show that the four-dimensional description fails at the first nontrivial order in (V/c) 2 . At this order, there is nontrivial mixing of the two relevant four-dimensional perturbation modes (the growing and decaying modes) as the boundary branes move from the narrowly separated limit described by Kaluza-Klein theory to the well-separated limit where gravity is confined to the positive-tension brane. We comment on the cosmological significance of the result and compute other quantities of interest in five-dimensional cosmological scenarios

  4. The ethics of big data in big agriculture

    Directory of Open Access Journals (Sweden)

    Isabelle M. Carbonell

    2016-03-01

    Full Text Available This paper examines the ethics of big data in agriculture, focusing on the power asymmetry between farmers and large agribusinesses like Monsanto. Following the recent purchase of Climate Corp., Monsanto is currently the most prominent biotech agribusiness to buy into big data. With wireless sensors on tractors monitoring or dictating every decision a farmer makes, Monsanto can now aggregate large quantities of previously proprietary farming data, enabling a privileged position with unique insights on a field-by-field basis into a third or more of the US farmland. This power asymmetry may be rebalanced through open-sourced data, and publicly-funded data analytic tools which rival Climate Corp. in complexity and innovation for use in the public domain.

  5. Coal export facilitation

    International Nuclear Information System (INIS)

    Eeles, L.

    1998-01-01

    There is a wide range of trade barriers, particularly tariffs, in current and potential coal market. Commonwealth departments in Australia play a crucial role in supporting government industry policies. This article summarises some of more recent activities of the Department of Primary Industries and Energy (DPIE) in facilitating the export of Australian Coals. Coal export facilitation activities are designed to assist the Australian coal industry by directing Commonwealth Government resources towards issues which would be inappropriate or difficult for the industry to address itself

  6. Big Data – Big Deal for Organization Design?

    OpenAIRE

    Janne J. Korhonen

    2014-01-01

    Analytics is an increasingly important source of competitive advantage. It has even been posited that big data will be the next strategic emphasis of organizations and that analytics capability will be manifested in organizational structure. In this article, I explore how analytics capability might be reflected in organizational structure using the notion of  “requisite organization” developed by Jaques (1998). Requisite organization argues that a new strategic emphasis requires the addition ...

  7. Nowcasting using news topics Big Data versus big bank

    OpenAIRE

    Thorsrud, Leif Anders

    2016-01-01

    The agents in the economy use a plethora of high frequency information, including news media, to guide their actions and thereby shape aggregate economic fluctuations. Traditional nowcasting approches have to a relatively little degree made use of such information. In this paper, I show how unstructured textual information in a business newspaper can be decomposed into daily news topics and used to nowcast quarterly GDP growth. Compared with a big bank of experts, here represented by o cial c...

  8. Developing a Big Game for Financial Education Using Service Design Approach

    Science.gov (United States)

    Kang, Myunghee; Yoon, Seonghye; Kang, Minjeng; Jang, JeeEun; Lee, Yujung

    2018-01-01

    The purpose of this study was to design and develop an educational game which facilitates building adolescents' knowledge and attitudes in financial principles of a daily life. To achieve this purpose, the authors designed a learner-centered big game for financial education by applying an experience-based triple-diamond instructional design model…

  9. Neuroblastoma, a Paradigm for Big Data Science in Pediatric Oncology.

    Science.gov (United States)

    Salazar, Brittany M; Balczewski, Emily A; Ung, Choong Yong; Zhu, Shizhen

    2016-12-27

    Pediatric cancers rarely exhibit recurrent mutational events when compared to most adult cancers. This poses a challenge in understanding how cancers initiate, progress, and metastasize in early childhood. Also, due to limited detected driver mutations, it is difficult to benchmark key genes for drug development. In this review, we use neuroblastoma, a pediatric solid tumor of neural crest origin, as a paradigm for exploring "big data" applications in pediatric oncology. Computational strategies derived from big data science-network- and machine learning-based modeling and drug repositioning-hold the promise of shedding new light on the molecular mechanisms driving neuroblastoma pathogenesis and identifying potential therapeutics to combat this devastating disease. These strategies integrate robust data input, from genomic and transcriptomic studies, clinical data, and in vivo and in vitro experimental models specific to neuroblastoma and other types of cancers that closely mimic its biological characteristics. We discuss contexts in which "big data" and computational approaches, especially network-based modeling, may advance neuroblastoma research, describe currently available data and resources, and propose future models of strategic data collection and analyses for neuroblastoma and other related diseases.

  10. Neuroblastoma, a Paradigm for Big Data Science in Pediatric Oncology

    Directory of Open Access Journals (Sweden)

    Brittany M. Salazar

    2016-12-01

    Full Text Available Pediatric cancers rarely exhibit recurrent mutational events when compared to most adult cancers. This poses a challenge in understanding how cancers initiate, progress, and metastasize in early childhood. Also, due to limited detected driver mutations, it is difficult to benchmark key genes for drug development. In this review, we use neuroblastoma, a pediatric solid tumor of neural crest origin, as a paradigm for exploring “big data” applications in pediatric oncology. Computational strategies derived from big data science–network- and machine learning-based modeling and drug repositioning—hold the promise of shedding new light on the molecular mechanisms driving neuroblastoma pathogenesis and identifying potential therapeutics to combat this devastating disease. These strategies integrate robust data input, from genomic and transcriptomic studies, clinical data, and in vivo and in vitro experimental models specific to neuroblastoma and other types of cancers that closely mimic its biological characteristics. We discuss contexts in which “big data” and computational approaches, especially network-based modeling, may advance neuroblastoma research, describe currently available data and resources, and propose future models of strategic data collection and analyses for neuroblastoma and other related diseases.

  11. Big data is not a monolith

    CERN Document Server

    Ekbia, Hamid R; Mattioli, Michael

    2016-01-01

    Big data is ubiquitous but heterogeneous. Big data can be used to tally clicks and traffic on web pages, find patterns in stock trades, track consumer preferences, identify linguistic correlations in large corpuses of texts. This book examines big data not as an undifferentiated whole but contextually, investigating the varied challenges posed by big data for health, science, law, commerce, and politics. Taken together, the chapters reveal a complex set of problems, practices, and policies. The advent of big data methodologies has challenged the theory-driven approach to scientific knowledge in favor of a data-driven one. Social media platforms and self-tracking tools change the way we see ourselves and others. The collection of data by corporations and government threatens privacy while promoting transparency. Meanwhile, politicians, policy makers, and ethicists are ill-prepared to deal with big data's ramifications. The contributors look at big data's effect on individuals as it exerts social control throu...

  12. Asexual sporulation facilitates adaptation

    NARCIS (Netherlands)

    Zhang, Jianhua; Debets, A.J.M.; Verweij, P.E.; Melchers, W.J.G.; Zwaan, B.J.; Schoustra, S.E.

    2015-01-01

    Understanding the occurrence and spread of azole resistance in Aspergillus fumigatus is crucial for public health. It has been hypothesized that asexual sporulation, which is abundant in nature, is essential for phenotypic expression of azole resistance mutations in A. fumigatus facilitating

  13. Facilitators in Ambivalence

    Science.gov (United States)

    Karlsson, Mikael R.; Erlandson, Peter

    2018-01-01

    This is part of a larger ethnographical study concerning how school development in a local educational context sets cultural and social life in motion. The main data "in this article" consists of semi-structural interviews with teachers (facilitators) who have the responsibility of carrying out a project about formative assessment in…

  14. Facilitation of Adult Development

    Science.gov (United States)

    Boydell, Tom

    2016-01-01

    Taking an autobiographical approach, I tell the story of my experiences facilitating adult development, in a polytechnic and as a management consultant. I relate these to a developmental framework of Modes of Being and Learning that I created and elaborated with colleagues. I connect this picture with a number of related models, theories,…

  15. From Teaching to Facilitation

    DEFF Research Database (Denmark)

    de Graaff, Erik

    2013-01-01

    A shift from teaching to learning is characteristic of the introduction of Problem Based Learning (PBL) in an existing school. As a consequence the teaching staff has to be trained in skills like facilitating group work and writing cases. Most importantly a change in thinking about teaching...

  16. Trade Facilitation in Ethiopia:

    African Journals Online (AJOL)

    Tilahun_EK

    so doing, it attempts to examine how Ethiopia's WTO Accession and trade facilitation ... the more expensive imports, exports and production becomes rendering. Ethiopian ..... can reserve the right to refuse requests of importers for the fifth valuation method to ..... units may find it easier to deal with post clearance audit. In the ...

  17. Big Data for Precision Medicine

    Directory of Open Access Journals (Sweden)

    Daniel Richard Leff

    2015-09-01

    Full Text Available This article focuses on the potential impact of big data analysis to improve health, prevent and detect disease at an earlier stage, and personalize interventions. The role that big data analytics may have in interrogating the patient electronic health record toward improved clinical decision support is discussed. We examine developments in pharmacogenetics that have increased our appreciation of the reasons why patients respond differently to chemotherapy. We also assess the expansion of online health communications and the way in which this data may be capitalized on in order to detect public health threats and control or contain epidemics. Finally, we describe how a new generation of wearable and implantable body sensors may improve wellbeing, streamline management of chronic diseases, and improve the quality of surgical implants.

  18. Big Data hvor N=1

    DEFF Research Database (Denmark)

    Bardram, Jakob Eyvind

    2017-01-01

    Forskningen vedrørende anvendelsen af ’big data’ indenfor sundhed er kun lige begyndt, og kan på sigt blive en stor hjælp i forhold til at tilrettelægge en mere personlig og helhedsorienteret sundhedsindsats for multisyge. Personlig sundhedsteknologi, som kort præsenteres i dette kapital, rummer et...... stor potentiale for at gennemføre ’big data’ analyser for den enkelte person, det vil sige hvor N=1. Der er store teknologiske udfordringer i at få lavet teknologier og metoder til at indsamle og håndtere personlige data, som kan deles, på tværs på en standardiseret, forsvarlig, robust, sikker og ikke...

  19. George and the big bang

    CERN Document Server

    Hawking, Lucy; Parsons, Gary

    2012-01-01

    George has problems. He has twin baby sisters at home who demand his parents’ attention. His beloved pig Freddy has been exiled to a farm, where he’s miserable. And worst of all, his best friend, Annie, has made a new friend whom she seems to like more than George. So George jumps at the chance to help Eric with his plans to run a big experiment in Switzerland that seeks to explore the earliest moment of the universe. But there is a conspiracy afoot, and a group of evildoers is planning to sabotage the experiment. Can George repair his friendship with Annie and piece together the clues before Eric’s experiment is destroyed forever? This engaging adventure features essays by Professor Stephen Hawking and other eminent physicists about the origins of the universe and ends with a twenty-page graphic novel that explains how the Big Bang happened—in reverse!

  20. Did the Big Bang begin?

    International Nuclear Information System (INIS)

    Levy-Leblond, J.

    1990-01-01

    It is argued that the age of the universe may well be numerically finite (20 billion years or so) and conceptually infinite. A new and natural time scale is defined on a physical basis using group-theoretical arguments. An additive notion of time is obtained according to which the age of the universe is indeed infinite. In other words, never did the Big Bang begin. This new time scale is not supposed to replace the ordinary cosmic time scale, but to supplement it (in the same way as rapidity has taken a place by the side of velocity in Einsteinian relativity). The question is discussed within the framework of conventional (big-bang) and classical (nonquantum) cosmology, but could easily be extended to more elaborate views, as the purpose is not so much to modify present theories as to reach a deeper understanding of their meaning

  1. Big Data in Drug Discovery.

    Science.gov (United States)

    Brown, Nathan; Cambruzzi, Jean; Cox, Peter J; Davies, Mark; Dunbar, James; Plumbley, Dean; Sellwood, Matthew A; Sim, Aaron; Williams-Jones, Bryn I; Zwierzyna, Magdalena; Sheppard, David W

    2018-01-01

    Interpretation of Big Data in the drug discovery community should enhance project timelines and reduce clinical attrition through improved early decision making. The issues we encounter start with the sheer volume of data and how we first ingest it before building an infrastructure to house it to make use of the data in an efficient and productive way. There are many problems associated with the data itself including general reproducibility, but often, it is the context surrounding an experiment that is critical to success. Help, in the form of artificial intelligence (AI), is required to understand and translate the context. On the back of natural language processing pipelines, AI is also used to prospectively generate new hypotheses by linking data together. We explain Big Data from the context of biology, chemistry and clinical trials, showcasing some of the impressive public domain sources and initiatives now available for interrogation. © 2018 Elsevier B.V. All rights reserved.

  2. Re-evaluation of the immunological Big Bang.

    Science.gov (United States)

    Flajnik, Martin F

    2014-11-03

    Classically the immunological 'Big Bang' of adaptive immunity was believed to have resulted from the insertion of a transposon into an immunoglobulin superfamily gene member, initiating antigen receptor gene rearrangement via the RAG recombinase in an ancestor of jawed vertebrates. However, the discovery of a second, convergent adaptive immune system in jawless fish, focused on the so-called variable lymphocyte receptors (VLRs), was arguably the most exciting finding of the past decade in immunology and has drastically changed the view of immune origins. The recent report of a new lymphocyte lineage in lampreys, defined by the antigen receptor VLRC, suggests that there were three lymphocyte lineages in the common ancestor of jawless and jawed vertebrates that co-opted different antigen receptor supertypes. The transcriptional control of these lineages during development is predicted to be remarkably similar in both the jawless (agnathan) and jawed (gnathostome) vertebrates, suggesting that an early 'division of labor' among lymphocytes was a driving force in the emergence of adaptive immunity. The recent cartilaginous fish genome project suggests that most effector cytokines and chemokines were also present in these fish, and further studies of the lamprey and hagfish genomes will determine just how explosive the Big Bang actually was. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Big Data and central banks

    OpenAIRE

    David Bholat

    2015-01-01

    This commentary recaps a Centre for Central Banking Studies event held at the Bank of England on 2–3 July 2014. The article covers three main points. First, it situates the Centre for Central Banking Studies event within the context of the Bank’s Strategic Plan and initiatives. Second, it summarises and reflects on major themes from the event. Third, the article links central banks’ emerging interest in Big Data approaches with their broader uptake by other economic agents.

  4. Big Bang or vacuum fluctuation

    International Nuclear Information System (INIS)

    Zel'dovich, Ya.B.

    1980-01-01

    Some general properties of vacuum fluctuations in quantum field theory are described. The connection between the ''energy dominance'' of the energy density of vacuum fluctuations in curved space-time and the presence of singularity is discussed. It is pointed out that a de-Sitter space-time (with the energy density of the vacuum fluctuations in the Einstein equations) that matches the expanding Friedman solution may describe the history of the Universe before the Big Bang. (P.L.)

  5. Big bang is not needed

    Energy Technology Data Exchange (ETDEWEB)

    Allen, A.D.

    1976-02-01

    Recent computer simulations indicate that a system of n gravitating masses breaks up, even when the total energy is negative. As a result, almost any initial phase-space distribution results in a universe that eventually expands under the Hubble law. Hence Hubble expansion implies little regarding an initial cosmic state. Especially it does not imply the singularly dense superpositioned state used in the big bang model.

  6. Big data: the management revolution.

    Science.gov (United States)

    McAfee, Andrew; Brynjolfsson, Erik

    2012-10-01

    Big data, the authors write, is far more powerful than the analytics of the past. Executives can measure and therefore manage more precisely than ever before. They can make better predictions and smarter decisions. They can target more-effective interventions in areas that so far have been dominated by gut and intuition rather than by data and rigor. The differences between big data and analytics are a matter of volume, velocity, and variety: More data now cross the internet every second than were stored in the entire internet 20 years ago. Nearly real-time information makes it possible for a company to be much more agile than its competitors. And that information can come from social networks, images, sensors, the web, or other unstructured sources. The managerial challenges, however, are very real. Senior decision makers have to learn to ask the right questions and embrace evidence-based decision making. Organizations must hire scientists who can find patterns in very large data sets and translate them into useful business information. IT departments have to work hard to integrate all the relevant internal and external sources of data. The authors offer two success stories to illustrate how companies are using big data: PASSUR Aerospace enables airlines to match their actual and estimated arrival times. Sears Holdings directly analyzes its incoming store data to make promotions much more precise and faster.

  7. Big Data Comes to School

    Directory of Open Access Journals (Sweden)

    Bill Cope

    2016-03-01

    Full Text Available The prospect of “big data” at once evokes optimistic views of an information-rich future and concerns about surveillance that adversely impacts our personal and private lives. This overview article explores the implications of big data in education, focusing by way of example on data generated by student writing. We have chosen writing because it presents particular complexities, highlighting the range of processes for collecting and interpreting evidence of learning in the era of computer-mediated instruction and assessment as well as the challenges. Writing is significant not only because it is central to the core subject area of literacy; it is also an ideal medium for the representation of deep disciplinary knowledge across a number of subject areas. After defining what big data entails in education, we map emerging sources of evidence of learning that separately and together have the potential to generate unprecedented amounts of data: machine assessments, structured data embedded in learning, and unstructured data collected incidental to learning activity. Our case is that these emerging sources of evidence of learning have significant implications for the traditional relationships between assessment and instruction. Moreover, for educational researchers, these data are in some senses quite different from traditional evidentiary sources, and this raises a number of methodological questions. The final part of the article discusses implications for practice in an emerging field of education data science, including publication of data, data standards, and research ethics.

  8. Application of Genomic Tools in Plant Breeding

    OpenAIRE

    Pérez-de-Castro, A.M.; Vilanova, S.; Cañizares, J.; Pascual, L.; Blanca, J.M.; Díez, M.J.; Prohens, J.; Picó, B.

    2012-01-01

    Plant breeding has been very successful in developing improved varieties using conventional tools and methodologies. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic...

  9. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

    Directory of Open Access Journals (Sweden)

    Deborah Galpert

    2015-01-01

    Full Text Available Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.

  10. Big defensins, a diverse family of antimicrobial peptides that follows different patterns of expression in hemocytes of the oyster Crassostrea gigas.

    Science.gov (United States)

    Rosa, Rafael D; Santini, Adrien; Fievet, Julie; Bulet, Philippe; Destoumieux-Garzón, Delphine; Bachère, Evelyne

    2011-01-01

    Big defensin is an antimicrobial peptide composed of a highly hydrophobic N-terminal region and a cationic C-terminal region containing six cysteine residues involved in three internal disulfide bridges. While big defensin sequences have been reported in various mollusk species, few studies have been devoted to their sequence diversity, gene organization and their expression in response to microbial infections. Using the high-throughput Digital Gene Expression approach, we have identified in Crassostrea gigas oysters several sequences coding for big defensins induced in response to a Vibrio infection. We showed that the oyster big defensin family is composed of three members (named Cg-BigDef1, Cg-BigDef2 and Cg-BigDef3) that are encoded by distinct genomic sequences. All Cg-BigDefs contain a hydrophobic N-terminal domain and a cationic C-terminal domain that resembles vertebrate β-defensins. Both domains are encoded by separate exons. We found that big defensins form a group predominantly present in mollusks and closer to vertebrate defensins than to invertebrate and fungi CSαβ-containing defensins. Moreover, we showed that Cg-BigDefs are expressed in oyster hemocytes only and follow different patterns of gene expression. While Cg-BigDef3 is non-regulated, both Cg-BigDef1 and Cg-BigDef2 transcripts are strongly induced in response to bacterial challenge. Induction was dependent on pathogen associated molecular patterns but not damage-dependent. The inducibility of Cg-BigDef1 was confirmed by HPLC and mass spectrometry, since ions with a molecular mass compatible with mature Cg-BigDef1 (10.7 kDa) were present in immune-challenged oysters only. From our biochemical data, native Cg-BigDef1 would result from the elimination of a prepropeptide sequence and the cyclization of the resulting N-terminal glutamine residue into a pyroglutamic acid. We provide here the first report showing that big defensins form a family of antimicrobial peptides diverse not only in terms

  11. Big Defensins, a Diverse Family of Antimicrobial Peptides That Follows Different Patterns of Expression in Hemocytes of the Oyster Crassostrea gigas

    Science.gov (United States)

    Rosa, Rafael D.; Santini, Adrien; Fievet, Julie; Bulet, Philippe; Destoumieux-Garzón, Delphine; Bachère, Evelyne

    2011-01-01

    Background Big defensin is an antimicrobial peptide composed of a highly hydrophobic N-terminal region and a cationic C-terminal region containing six cysteine residues involved in three internal disulfide bridges. While big defensin sequences have been reported in various mollusk species, few studies have been devoted to their sequence diversity, gene organization and their expression in response to microbial infections. Findings Using the high-throughput Digital Gene Expression approach, we have identified in Crassostrea gigas oysters several sequences coding for big defensins induced in response to a Vibrio infection. We showed that the oyster big defensin family is composed of three members (named Cg-BigDef1, Cg-BigDef2 and Cg-BigDef3) that are encoded by distinct genomic sequences. All Cg-BigDefs contain a hydrophobic N-terminal domain and a cationic C-terminal domain that resembles vertebrate β-defensins. Both domains are encoded by separate exons. We found that big defensins form a group predominantly present in mollusks and closer to vertebrate defensins than to invertebrate and fungi CSαβ-containing defensins. Moreover, we showed that Cg-BigDefs are expressed in oyster hemocytes only and follow different patterns of gene expression. While Cg-BigDef3 is non-regulated, both Cg-BigDef1 and Cg-BigDef2 transcripts are strongly induced in response to bacterial challenge. Induction was dependent on pathogen associated molecular patterns but not damage-dependent. The inducibility of Cg-BigDef1 was confirmed by HPLC and mass spectrometry, since ions with a molecular mass compatible with mature Cg-BigDef1 (10.7 kDa) were present in immune-challenged oysters only. From our biochemical data, native Cg-BigDef1 would result from the elimination of a prepropeptide sequence and the cyclization of the resulting N-terminal glutamine residue into a pyroglutamic acid. Conclusions We provide here the first report showing that big defensins form a family of antimicrobial

  12. Big defensins, a diverse family of antimicrobial peptides that follows different patterns of expression in hemocytes of the oyster Crassostrea gigas.

    Directory of Open Access Journals (Sweden)

    Rafael D Rosa

    Full Text Available BACKGROUND: Big defensin is an antimicrobial peptide composed of a highly hydrophobic N-terminal region and a cationic C-terminal region containing six cysteine residues involved in three internal disulfide bridges. While big defensin sequences have been reported in various mollusk species, few studies have been devoted to their sequence diversity, gene organization and their expression in response to microbial infections. FINDINGS: Using the high-throughput Digital Gene Expression approach, we have identified in Crassostrea gigas oysters several sequences coding for big defensins induced in response to a Vibrio infection. We showed that the oyster big defensin family is composed of three members (named Cg-BigDef1, Cg-BigDef2 and Cg-BigDef3 that are encoded by distinct genomic sequences. All Cg-BigDefs contain a hydrophobic N-terminal domain and a cationic C-terminal domain that resembles vertebrate β-defensins. Both domains are encoded by separate exons. We found that big defensins form a group predominantly present in mollusks and closer to vertebrate defensins than to invertebrate and fungi CSαβ-containing defensins. Moreover, we showed that Cg-BigDefs are expressed in oyster hemocytes only and follow different patterns of gene expression. While Cg-BigDef3 is non-regulated, both Cg-BigDef1 and Cg-BigDef2 transcripts are strongly induced in response to bacterial challenge. Induction was dependent on pathogen associated molecular patterns but not damage-dependent. The inducibility of Cg-BigDef1 was confirmed by HPLC and mass spectrometry, since ions with a molecular mass compatible with mature Cg-BigDef1 (10.7 kDa were present in immune-challenged oysters only. From our biochemical data, native Cg-BigDef1 would result from the elimination of a prepropeptide sequence and the cyclization of the resulting N-terminal glutamine residue into a pyroglutamic acid. CONCLUSIONS: We provide here the first report showing that big defensins form a family

  13. Big Pharma: a former insider's view.

    Science.gov (United States)

    Badcott, David

    2013-05-01

    There is no lack of criticisms frequently levelled against the international pharmaceutical industry (Big Pharma): excessive profits, dubious or even dishonest practices, exploiting the sick and selective use of research data. Neither is there a shortage of examples used to support such opinions. A recent book by Brody (Hooked: Ethics, the Medical Profession and the Pharmaceutical Industry, 2008) provides a précis of the main areas of criticism, adopting a twofold strategy: (1) An assumption that the special nature and human need for pharmaceutical medicines requires that such products should not be treated like other commodities and (2) A multilevel descriptive approach that facilitates an ethical analysis of relationships and practices. At the same time, Brody is fully aware of the nature of the fundamental dilemma: the apparent addiction to (and denial of) the widespread availability of gifts and financial support for conferences etc., but recognises that 'Remove the industry and its products, and a considerable portion of scientific medicine's power to help the patient vanishes' (Brody 2008, p. 5). The paper explores some of the relevant issues, and argues that despite the identified shortcomings and a need for rigorous and perhaps enhanced regulation, and realistic price control, the commercially competitive pharmaceutical industry remains the best option for developing safer and more effective medicinal treatments. At the same time, adoption of a broader ethical basis for the industry's activities, such as a triple bottom line policy, would register an important move in the right direction and go some way toward answering critics.

  14. ClimateSpark: An in-memory distributed computing framework for big climate data analytics

    Science.gov (United States)

    Hu, Fei; Yang, Chaowei; Schnase, John L.; Duffy, Daniel Q.; Xu, Mengchao; Bowen, Michael K.; Lee, Tsengdar; Song, Weiwei

    2018-06-01

    The unprecedented growth of climate data creates new opportunities for climate studies, and yet big climate data pose a grand challenge to climatologists to efficiently manage and analyze big data. The complexity of climate data content and analytical algorithms increases the difficulty of implementing algorithms on high performance computing systems. This paper proposes an in-memory, distributed computing framework, ClimateSpark, to facilitate complex big data analytics and time-consuming computational tasks. Chunking data structure improves parallel I/O efficiency, while a spatiotemporal index is built for the chunks to avoid unnecessary data reading and preprocessing. An integrated, multi-dimensional, array-based data model (ClimateRDD) and ETL operations are developed to address big climate data variety by integrating the processing components of the climate data lifecycle. ClimateSpark utilizes Spark SQL and Apache Zeppelin to develop a web portal to facilitate the interaction among climatologists, climate data, analytic operations and computing resources (e.g., using SQL query and Scala/Python notebook). Experimental results show that ClimateSpark conducts different spatiotemporal data queries/analytics with high efficiency and data locality. ClimateSpark is easily adaptable to other big multiple-dimensional, array-based datasets in various geoscience domains.

  15. Laser facilitates vaccination

    Directory of Open Access Journals (Sweden)

    Ji Wang

    2016-01-01

    Full Text Available Development of novel vaccine deliveries and vaccine adjuvants is of great importance to address the dilemma that the vaccine field faces: to improve vaccine efficacy without compromising safety. Harnessing the specific effects of laser on biological systems, a number of novel concepts have been proposed and proved in recent years to facilitate vaccination in a safer and more efficient way. The key advantage of using laser technology in vaccine delivery and adjuvantation is that all processes are initiated by physical effects with no foreign chemicals administered into the body. Here, we review the recent advances in using laser technology to facilitate vaccine delivery and augment vaccine efficacy as well as the underlying mechanisms.

  16. Facilitating Learning at Conferences

    DEFF Research Database (Denmark)

    Ravn, Ib; Elsborg, Steen

    2011-01-01

    The typical conference consists of a series of PowerPoint presentations that tend to render participants passive. Students of learning have long abandoned the transfer model that underlies such one-way communication. We propose an al-ternative theory of conferences that sees them as a forum...... for learning, mutual inspiration and human flourishing. We offer five design principles that specify how conferences may engage participants more and hence increase their learning. In the research-and-development effort reported here, our team collaborated with conference organizers in Denmark to introduce...... and facilitate a variety of simple learning techniques at thirty one- and two-day conferences of up to 300 participants each. We present ten of these techniques and data evaluating them. We conclude that if conference organizers allocate a fraction of the total conference time to facilitated processes...

  17. Mindfulness for group facilitation

    DEFF Research Database (Denmark)

    Adriansen, Hanne Kirstine; Krohn, Simon

    2014-01-01

    In this paper, we argue that mindfulness techniques can be used for enhancing the outcome of group performance. The word mindfulness has different connotations in the academic literature. Broadly speaking there is ‘mindfulness without meditation’ or ‘Western’ mindfulness which involves active...... thinking and ‘Eastern’ mindfulness which refers to an open, accepting state of mind, as intended with Buddhist-inspired techniques such as meditation. In this paper, we are interested in the latter type of mindfulness and demonstrate how Eastern mindfulness techniques can be used as a tool for facilitation....... A brief introduction to the physiology and philosophy of Eastern mindfulness constitutes the basis for the arguments of the effect of mindfulness techniques. The use of mindfulness techniques for group facilitation is novel as it changes the focus from individuals’ mindfulness practice...

  18. [Big data, medical language and biomedical terminology systems].

    Science.gov (United States)

    Schulz, Stefan; López-García, Pablo

    2015-08-01

    A variety of rich terminology systems, such as thesauri, classifications, nomenclatures and ontologies support information and knowledge processing in health care and biomedical research. Nevertheless, human language, manifested as individually written texts, persists as the primary carrier of information, in the description of disease courses or treatment episodes in electronic medical records, and in the description of biomedical research in scientific publications. In the context of the discussion about big data in biomedicine, we hypothesize that the abstraction of the individuality of natural language utterances into structured and semantically normalized information facilitates the use of statistical data analytics to distil new knowledge out of textual data from biomedical research and clinical routine. Computerized human language technologies are constantly evolving and are increasingly ready to annotate narratives with codes from biomedical terminology. However, this depends heavily on linguistic and terminological resources. The creation and maintenance of such resources is labor-intensive. Nevertheless, it is sensible to assume that big data methods can be used to support this process. Examples include the learning of hierarchical relationships, the grouping of synonymous terms into concepts and the disambiguation of homonyms. Although clear evidence is still lacking, the combination of natural language technologies, semantic resources, and big data analytics is promising.

  19. The genome of Chenopodium quinoa

    NARCIS (Netherlands)

    Jarvis, D.E.; Shwen Ho, Yung; Lightfoot, Damien J.; Schmöckel, Sandra M.; Li, Bo; Borm, T.J.A.; Ohyanagi, Hajime; Mineta, Katsuhiko; Mitchell, Craig T.; Saber, Noha; Kharbatia, Najeh M.; Rupper, Ryan R.; Sharp, Aaron R.; Dally, Nadine; Boughton, Berin A.; Woo, Yong H.; Gao, Ge; Schijlen, E.G.W.M.; Guo, Xiujie; Momin, Afaque A.; Negräo, Sónia; Al-Babili, Salim; Gehring, Christoph; Roessner, Ute; Jung, Christian; Murphy, Kevin; Arold, Stefan T.; Gojobori, Takashi; Linden, van der C.G.; Loo, van E.N.; Jellen, Eric N.; Maughan, Peter J.; Tester, Mark

    2017-01-01

    Chenopodium quinoa (quinoa) is a highly nutritious grain identified as an important crop to improve world food security. Unfortunately, few resources are available to facilitate its genetic improvement. Here we report the assembly of a high-quality, chromosome-scale reference genome sequence for

  20. BIG GEO DATA MANAGEMENT: AN EXPLORATION WITH SOCIAL MEDIA AND TELECOMMUNICATIONS OPEN DATA

    Directory of Open Access Journals (Sweden)

    C. Arias Munoz

    2016-06-01

    Full Text Available The term Big Data has been recently used to define big, highly varied, complex data sets, which are created and updated at a high speed and require faster processing, namely, a reduced time to filter and analyse relevant data. These data is also increasingly becoming Open Data (data that can be freely distributed made public by the government, agencies, private enterprises and among others. There are at least two issues that can obstruct the availability and use of Open Big Datasets: Firstly, the gathering and geoprocessing of these datasets are very computationally intensive; hence, it is necessary to integrate high-performance solutions, preferably internet based, to achieve the goals. Secondly, the problems of heterogeneity and inconsistency in geospatial data are well known and affect the data integration process, but is particularly problematic for Big Geo Data. Therefore, Big Geo Data integration will be one of the most challenging issues to solve. With these applications, we demonstrate that is possible to provide processed Big Geo Data to common users, using open geospatial standards and technologies. NoSQL databases like MongoDB and frameworks like RASDAMAN could offer different functionalities that facilitate working with larger volumes and more heterogeneous geospatial data sources.

  1. Big Geo Data Management: AN Exploration with Social Media and Telecommunications Open Data

    Science.gov (United States)

    Arias Munoz, C.; Brovelli, M. A.; Corti, S.; Zamboni, G.

    2016-06-01

    The term Big Data has been recently used to define big, highly varied, complex data sets, which are created and updated at a high speed and require faster processing, namely, a reduced time to filter and analyse relevant data. These data is also increasingly becoming Open Data (data that can be freely distributed) made public by the government, agencies, private enterprises and among others. There are at least two issues that can obstruct the availability and use of Open Big Datasets: Firstly, the gathering and geoprocessing of these datasets are very computationally intensive; hence, it is necessary to integrate high-performance solutions, preferably internet based, to achieve the goals. Secondly, the problems of heterogeneity and inconsistency in geospatial data are well known and affect the data integration process, but is particularly problematic for Big Geo Data. Therefore, Big Geo Data integration will be one of the most challenging issues to solve. With these applications, we demonstrate that is possible to provide processed Big Geo Data to common users, using open geospatial standards and technologies. NoSQL databases like MongoDB and frameworks like RASDAMAN could offer different functionalities that facilitate working with larger volumes and more heterogeneous geospatial data sources.

  2. Perspectives on making big data analytics work for oncology.

    Science.gov (United States)

    El Naqa, Issam

    2016-12-01

    Oncology, with its unique combination of clinical, physical, technological, and biological data provides an ideal case study for applying big data analytics to improve cancer treatment safety and outcomes. An oncology treatment course such as chemoradiotherapy can generate a large pool of information carrying the 5Vs hallmarks of big data. This data is comprised of a heterogeneous mixture of patient demographics, radiation/chemo dosimetry, multimodality imaging features, and biological markers generated over a treatment period that can span few days to several weeks. Efforts using commercial and in-house tools are underway to facilitate data aggregation, ontology creation, sharing, visualization and varying analytics in a secure environment. However, open questions related to proper data structure representation and effective analytics tools to support oncology decision-making need to be addressed. It is recognized that oncology data constitutes a mix of structured (tabulated) and unstructured (electronic documents) that need to be processed to facilitate searching and subsequent knowledge discovery from relational or NoSQL databases. In this context, methods based on advanced analytics and image feature extraction for oncology applications will be discussed. On the other hand, the classical p (variables)≫n (samples) inference problem of statistical learning is challenged in the Big data realm and this is particularly true for oncology applications where p-omics is witnessing exponential growth while the number of cancer incidences has generally plateaued over the past 5-years leading to a quasi-linear growth in samples per patient. Within the Big data paradigm, this kind of phenomenon may yield undesirable effects such as echo chamber anomalies, Yule-Simpson reversal paradox, or misleading ghost analytics. In this work, we will present these effects as they pertain to oncology and engage small thinking methodologies to counter these effects ranging from

  3. Turning big bang into big bounce. I. Classical dynamics

    Science.gov (United States)

    Dzierżak, Piotr; Małkiewicz, Przemysław; Piechocki, Włodzimierz

    2009-11-01

    The big bounce (BB) transition within a flat Friedmann-Robertson-Walker model is analyzed in the setting of loop geometry underlying the loop cosmology. We solve the constraint of the theory at the classical level to identify physical phase space and find the Lie algebra of the Dirac observables. We express energy density of matter and geometrical functions in terms of the observables. It is the modification of classical theory by the loop geometry that is responsible for BB. The classical energy scale specific to BB depends on a parameter that should be fixed either by cosmological data or determined theoretically at quantum level, otherwise the energy scale stays unknown.

  4. Big Data Knowledge in Global Health Education.

    Science.gov (United States)

    Olayinka, Olaniyi; Kekeh, Michele; Sheth-Chandra, Manasi; Akpinar-Elci, Muge

    The ability to synthesize and analyze massive amounts of data is critical to the success of organizations, including those that involve global health. As countries become highly interconnected, increasing the risk for pandemics and outbreaks, the demand for big data is likely to increase. This requires a global health workforce that is trained in the effective use of big data. To assess implementation of big data training in global health, we conducted a pilot survey of members of the Consortium of Universities of Global Health. More than half the respondents did not have a big data training program at their institution. Additionally, the majority agreed that big data training programs will improve global health deliverables, among other favorable outcomes. Given the observed gap and benefits, global health educators may consider investing in big data training for students seeking a career in global health. Copyright © 2017 Icahn School of Medicine at Mount Sinai. Published by Elsevier Inc. All rights reserved.

  5. Earth Science Data Analysis in the Era of Big Data

    Science.gov (United States)

    Kuo, K.-S.; Clune, T. L.; Ramachandran, R.

    2014-01-01

    Anyone with even a cursory interest in information technology cannot help but recognize that "Big Data" is one of the most fashionable catchphrases of late. From accurate voice and facial recognition, language translation, and airfare prediction and comparison, to monitoring the real-time spread of flu, Big Data techniques have been applied to many seemingly intractable problems with spectacular successes. They appear to be a rewarding way to approach many currently unsolved problems. Few fields of research can claim a longer history with problems involving voluminous data than Earth science. The problems we are facing today with our Earth's future are more complex and carry potentially graver consequences than the examples given above. How has our climate changed? Beside natural variations, what is causing these changes? What are the processes involved and through what mechanisms are these connected? How will they impact life as we know it? In attempts to answer these questions, we have resorted to observations and numerical simulations with ever-finer resolutions, which continue to feed the "data deluge." Plausibly, many Earth scientists are wondering: How will Big Data technologies benefit Earth science research? As an example from the global water cycle, one subdomain among many in Earth science, how would these technologies accelerate the analysis of decades of global precipitation to ascertain the changes in its characteristics, to validate these changes in predictive climate models, and to infer the implications of these changes to ecosystems, economies, and public health? Earth science researchers need a viable way to harness the power of Big Data technologies to analyze large volumes and varieties of data with velocity and veracity. Beyond providing speedy data analysis capabilities, Big Data technologies can also play a crucial, albeit indirect, role in boosting scientific productivity by facilitating effective collaboration within an analysis environment

  6. Genome Imprinting

    Indian Academy of Sciences (India)

    the cell nucleus (mitochondrial and chloroplast genomes), and. (3) traits governed ... tively good embryonic development but very poor development of membranes and ... Human homologies for the type of situation described above are naturally ..... imprint; (b) New modifications of the paternal genome in germ cells of each ...

  7. Baculovirus Genomics

    NARCIS (Netherlands)

    Oers, van M.M.; Vlak, J.M.

    2007-01-01

    Baculovirus genomes are covalently closed circles of double stranded-DNA varying in size between 80 and 180 kilobase-pair. The genomes of more than fourty-one baculoviruses have been sequenced to date. The majority of these (37) are pathogenic to lepidopteran hosts; three infect sawflies

  8. Genomic Testing

    Science.gov (United States)

    ... this database. Top of Page Evaluation of Genomic Applications in Practice and Prevention (EGAPP™) In 2004, the Centers for Disease Control and Prevention launched the EGAPP initiative to establish and test a ... and other applications of genomic technology that are in transition from ...

  9. Ancient genomes

    OpenAIRE

    Hoelzel, A Rus

    2005-01-01

    Ever since its invention, the polymerase chain reaction has been the method of choice for work with ancient DNA. In an application of modern genomic methods to material from the Pleistocene, a recent study has instead undertaken to clone and sequence a portion of the ancient genome of the cave bear.

  10. Development and validation of Big Four personality scales for the Schedule for Nonadaptive and Adaptive Personality--Second Edition (SNAP-2).

    Science.gov (United States)

    Calabrese, William R; Rudick, Monica M; Simms, Leonard J; Clark, Lee Anna

    2012-09-01

    Recently, integrative, hierarchical models of personality and personality disorder (PD)--such as the Big Three, Big Four, and Big Five trait models--have gained support as a unifying dimensional framework for describing PD. However, no measures to date can simultaneously represent each of these potentially interesting levels of the personality hierarchy. To unify these measurement models psychometrically, we sought to develop Big Five trait scales within the Schedule for Nonadaptive and Adaptive Personality--Second Edition (SNAP-2). Through structural and content analyses, we examined relations between the SNAP-2, the Big Five Inventory (BFI), and the NEO Five-Factor Inventory (NEO-FFI) ratings in a large data set (N = 8,690), including clinical, military, college, and community participants. Results yielded scales consistent with the Big Four model of personality (i.e., Neuroticism, Conscientiousness, Introversion, and Antagonism) and not the Big Five, as there were insufficient items related to Openness. Resulting scale scores demonstrated strong internal consistency and temporal stability. Structural validity and external validity were supported by strong convergent and discriminant validity patterns between Big Four scale scores and other personality trait scores and expectable patterns of self-peer agreement. Descriptive statistics and community-based norms are provided. The SNAP-2 Big Four Scales enable researchers and clinicians to assess personality at multiple levels of the trait hierarchy and facilitate comparisons among competing big-trait models. PsycINFO Database Record (c) 2012 APA, all rights reserved.

  11. Development and Validation of Big Four Personality Scales for the Schedule for Nonadaptive and Adaptive Personality-2nd Edition (SNAP-2)

    Science.gov (United States)

    Calabrese, William R.; Rudick, Monica M.; Simms, Leonard J.; Clark, Lee Anna

    2012-01-01

    Recently, integrative, hierarchical models of personality and personality disorder (PD)—such as the Big Three, Big Four and Big Five trait models—have gained support as a unifying dimensional framework for describing PD. However, no measures to date can simultaneously represent each of these potentially interesting levels of the personality hierarchy. To unify these measurement models psychometrically, we sought to develop Big Five trait scales within the Schedule for Adaptive and Nonadaptive Personality–2nd Edition (SNAP-2). Through structural and content analyses, we examined relations between the SNAP-2, Big Five Inventory (BFI), and NEO-Five Factor Inventory (NEO-FFI) ratings in a large data set (N = 8,690), including clinical, military, college, and community participants. Results yielded scales consistent with the Big Four model of personality (i.e., Neuroticism, Conscientiousness, Introversion, and Antagonism) and not the Big Five as there were insufficient items related to Openness. Resulting scale scores demonstrated strong internal consistency and temporal stability. Structural and external validity was supported by strong convergent and discriminant validity patterns between Big Four scale scores and other personality trait scores and expectable patterns of self-peer agreement. Descriptive statistics and community-based norms are provided. The SNAP-2 Big Four Scales enable researchers and clinicians to assess personality at multiple levels of the trait hierarchy and facilitate comparisons among competing “Big Trait” models. PMID:22250598

  12. Big Data Strategy for Telco: Network Transformation

    OpenAIRE

    F. Amin; S. Feizi

    2014-01-01

    Big data has the potential to improve the quality of services; enable infrastructure that businesses depend on to adapt continually and efficiently; improve the performance of employees; help organizations better understand customers; and reduce liability risks. Analytics and marketing models of fixed and mobile operators are falling short in combating churn and declining revenue per user. Big Data presents new method to reverse the way and improve profitability. The benefits of Big Data and ...

  13. Big Data in Shipping - Challenges and Opportunities

    OpenAIRE

    Rødseth, Ørnulf Jan; Perera, Lokukaluge Prasad; Mo, Brage

    2016-01-01

    Big Data is getting popular in shipping where large amounts of information is collected to better understand and improve logistics, emissions, energy consumption and maintenance. Constraints to the use of big data include cost and quality of on-board sensors and data acquisition systems, satellite communication, data ownership and technical obstacles to effective collection and use of big data. New protocol standards may simplify the process of collecting and organizing the data, including in...

  14. Big Data in Action for Government : Big Data Innovation in Public Services, Policy, and Engagement

    OpenAIRE

    World Bank

    2017-01-01

    Governments have an opportunity to harness big data solutions to improve productivity, performance and innovation in service delivery and policymaking processes. In developing countries, governments have an opportunity to adopt big data solutions and leapfrog traditional administrative approaches

  15. Big data optimization recent developments and challenges

    CERN Document Server

    2016-01-01

    The main objective of this book is to provide the necessary background to work with big data by introducing some novel optimization algorithms and codes capable of working in the big data setting as well as introducing some applications in big data optimization for both academics and practitioners interested, and to benefit society, industry, academia, and government. Presenting applications in a variety of industries, this book will be useful for the researchers aiming to analyses large scale data. Several optimization algorithms for big data including convergent parallel algorithms, limited memory bundle algorithm, diagonal bundle method, convergent parallel algorithms, network analytics, and many more have been explored in this book.

  16. Medical big data: promise and challenges

    Directory of Open Access Journals (Sweden)

    Choong Ho Lee

    2017-03-01

    Full Text Available The concept of big data, commonly characterized by volume, variety, velocity, and veracity, goes far beyond the data type and includes the aspects of data analysis, such as hypothesis-generating, rather than hypothesis-testing. Big data focuses on temporal stability of the association, rather than on causal relationship and underlying probability distribution assumptions are frequently not required. Medical big data as material to be analyzed has various features that are not only distinct from big data of other disciplines, but also distinct from traditional clinical epidemiology. Big data technology has many areas of application in healthcare, such as predictive modeling and clinical decision support, disease or safety surveillance, public health, and research. Big data analytics frequently exploits analytic methods developed in data mining, including classification, clustering, and regression. Medical big data analyses are complicated by many technical issues, such as missing values, curse of dimensionality, and bias control, and share the inherent limitations of observation study, namely the inability to test causality resulting from residual confounding and reverse causation. Recently, propensity score analysis and instrumental variable analysis have been introduced to overcome these limitations, and they have accomplished a great deal. Many challenges, such as the absence of evidence of practical benefits of big data, methodological issues including legal and ethical issues, and clinical integration and utility issues, must be overcome to realize the promise of medical big data as the fuel of a continuous learning healthcare system that will improve patient outcome and reduce waste in areas including nephrology.

  17. Traffic information computing platform for big data

    Energy Technology Data Exchange (ETDEWEB)

    Duan, Zongtao, E-mail: ztduan@chd.edu.cn; Li, Ying, E-mail: ztduan@chd.edu.cn; Zheng, Xibin, E-mail: ztduan@chd.edu.cn; Liu, Yan, E-mail: ztduan@chd.edu.cn; Dai, Jiting, E-mail: ztduan@chd.edu.cn; Kang, Jun, E-mail: ztduan@chd.edu.cn [Chang' an University School of Information Engineering, Xi' an, China and Shaanxi Engineering and Technical Research Center for Road and Traffic Detection, Xi' an (China)

    2014-10-06

    Big data environment create data conditions for improving the quality of traffic information service. The target of this article is to construct a traffic information computing platform for big data environment. Through in-depth analysis the connotation and technology characteristics of big data and traffic information service, a distributed traffic atomic information computing platform architecture is proposed. Under the big data environment, this type of traffic atomic information computing architecture helps to guarantee the traffic safety and efficient operation, more intelligent and personalized traffic information service can be used for the traffic information users.

  18. Medical big data: promise and challenges.

    Science.gov (United States)

    Lee, Choong Ho; Yoon, Hyung-Jin

    2017-03-01

    The concept of big data, commonly characterized by volume, variety, velocity, and veracity, goes far beyond the data type and includes the aspects of data analysis, such as hypothesis-generating, rather than hypothesis-testing. Big data focuses on temporal stability of the association, rather than on causal relationship and underlying probability distribution assumptions are frequently not required. Medical big data as material to be analyzed has various features that are not only distinct from big data of other disciplines, but also distinct from traditional clinical epidemiology. Big data technology has many areas of application in healthcare, such as predictive modeling and clinical decision support, disease or safety surveillance, public health, and research. Big data analytics frequently exploits analytic methods developed in data mining, including classification, clustering, and regression. Medical big data analyses are complicated by many technical issues, such as missing values, curse of dimensionality, and bias control, and share the inherent limitations of observation study, namely the inability to test causality resulting from residual confounding and reverse causation. Recently, propensity score analysis and instrumental variable analysis have been introduced to overcome these limitations, and they have accomplished a great deal. Many challenges, such as the absence of evidence of practical benefits of big data, methodological issues including legal and ethical issues, and clinical integration and utility issues, must be overcome to realize the promise of medical big data as the fuel of a continuous learning healthcare system that will improve patient outcome and reduce waste in areas including nephrology.

  19. Traffic information computing platform for big data

    International Nuclear Information System (INIS)

    Duan, Zongtao; Li, Ying; Zheng, Xibin; Liu, Yan; Dai, Jiting; Kang, Jun

    2014-01-01

    Big data environment create data conditions for improving the quality of traffic information service. The target of this article is to construct a traffic information computing platform for big data environment. Through in-depth analysis the connotation and technology characteristics of big data and traffic information service, a distributed traffic atomic information computing platform architecture is proposed. Under the big data environment, this type of traffic atomic information computing architecture helps to guarantee the traffic safety and efficient operation, more intelligent and personalized traffic information service can be used for the traffic information users

  20. Big Data's Role in Precision Public Health.

    Science.gov (United States)

    Dolley, Shawn

    2018-01-01

    Precision public health is an emerging practice to more granularly predict and understand public health risks and customize treatments for more specific and homogeneous subpopulations, often using new data, technologies, and methods. Big data is one element that has consistently helped to achieve these goals, through its ability to deliver to practitioners a volume and variety of structured or unstructured data not previously possible. Big data has enabled more widespread and specific research and trials of stratifying and segmenting populations at risk for a variety of health problems. Examples of success using big data are surveyed in surveillance and signal detection, predicting future risk, targeted interventions, and understanding disease. Using novel big data or big data approaches has risks that remain to be resolved. The continued growth in volume and variety of available data, decreased costs of data capture, and emerging computational methods mean big data success will likely be a required pillar of precision public health into the future. This review article aims to identify the precision public health use cases where big data has added value, identify classes of value that big data may bring, and outline the risks inherent in using big data in precision public health efforts.

  1. Big data analytics with R and Hadoop

    CERN Document Server

    Prajapati, Vignesh

    2013-01-01

    Big Data Analytics with R and Hadoop is a tutorial style book that focuses on all the powerful big data tasks that can be achieved by integrating R and Hadoop.This book is ideal for R developers who are looking for a way to perform big data analytics with Hadoop. This book is also aimed at those who know Hadoop and want to build some intelligent applications over Big data with R packages. It would be helpful if readers have basic knowledge of R.

  2. The Scope of Big Data in One Medicine: Unprecedented Opportunities and Challenges.

    Science.gov (United States)

    McCue, Molly E; McCoy, Annette M

    2017-01-01

    Advances in high-throughput molecular biology and electronic health records (EHR), coupled with increasing computer capabilities have resulted in an increased interest in the use of big data in health care. Big data require collection and analysis of data at an unprecedented scale and represents a paradigm shift in health care, offering (1) the capacity to generate new knowledge more quickly than traditional scientific approaches; (2) unbiased collection and analysis of data; and (3) a holistic understanding of biology and pathophysiology. Big data promises more personalized and precision medicine for patients with improved accuracy and earlier diagnosis, and therapy tailored to an individual's unique combination of genes, environmental risk, and precise disease phenotype. This promise comes from data collected from numerous sources, ranging from molecules to cells, to tissues, to individuals and populations-and the integration of these data into networks that improve understanding of heath and disease. Big data-driven science should play a role in propelling comparative medicine and "one medicine" (i.e., the shared physiology, pathophysiology, and disease risk factors across species) forward. Merging of data from EHR across institutions will give access to patient data on a scale previously unimaginable, allowing for precise phenotype definition and objective evaluation of risk factors and response to therapy. High-throughput molecular data will give insight into previously unexplored molecular pathophysiology and disease etiology. Investigation and integration of big data from a variety of sources will result in stronger parallels drawn at the molecular level between human and animal disease, allow for predictive modeling of infectious disease and identification of key areas of intervention, and facilitate step-changes in our understanding of disease that can make a substantial impact on animal and human health. However, the use of big data comes with significant

  3. Big Data; A Management Revolution : The emerging role of big data in businesses

    OpenAIRE

    Blasiak, Kevin

    2014-01-01

    Big data is a term that was coined in 2012 and has since then emerged to one of the top trends in business and technology. Big data is an agglomeration of different technologies resulting in data processing capabilities that have been unreached before. Big data is generally characterized by 4 factors. Volume, velocity and variety. These three factors distinct it from the traditional data use. The possibilities to utilize this technology are vast. Big data technology has touch points in differ...

  4. BigDataBench: a Big Data Benchmark Suite from Internet Services

    OpenAIRE

    Wang, Lei; Zhan, Jianfeng; Luo, Chunjie; Zhu, Yuqing; Yang, Qiang; He, Yongqiang; Gao, Wanling; Jia, Zhen; Shi, Yingjie; Zhang, Shujie; Zheng, Chen; Lu, Gang; Zhan, Kent; Li, Xiaona; Qiu, Bizhu

    2014-01-01

    As architecture, systems, and data management communities pay greater attention to innovative big data systems and architectures, the pressure of benchmarking and evaluating these systems rises. Considering the broad use of big data systems, big data benchmarks must include diversity of data and workloads. Most of the state-of-the-art big data benchmarking efforts target evaluating specific types of applications or system software stacks, and hence they are not qualified for serving the purpo...

  5. Facilitation as a teaching strategy : experiences of facilitators

    Directory of Open Access Journals (Sweden)

    E Lekalakala-Mokgele

    2006-09-01

    Full Text Available Changes in nursing education involve the move from traditional teaching approaches that are teacher-centred to facilitation, a student centred approach. The studentcentred approach is based on a philosophy of teaching and learning that puts the learner on centre-stage. The aim of this study was to identify the challenges of facilitators of learning using facilitation as a teaching method and recommend strategies for their (facilitators development and support. A qualitative, explorative and contextual design was used. Four (4 universities in South Africa which utilize facilitation as a teaching/ learning process were identified and the facilitators were selected to be the sample of the study. The main question posed during in-depth group interviews was: How do you experience facilitation as a teaching/learning method?. Facilitators indicated different experiences and emotions when they first had to facilitate learning. All of them indicated that it was difficult to facilitate at the beginning as they were trained to lecture and that no format for facilitation was available. They experienced frustrations and anxieties as a result. The lack of knowledge of facilitation instilled fear in them. However they indicated that facilitation had many benefits for them and for the students. Amongst the ones mentioned were personal and professional growth. Challenges mentioned were the fear that they waste time and that they do not cover the content. It is therefore important that facilitation be included in the training of nurse educators.

  6. Essence: Facilitating Software Innovation

    DEFF Research Database (Denmark)

    Aaen, Ivan

    2008-01-01

      This paper suggests ways to facilitate creativity and innovation in software development. The paper applies four perspectives – Product, Project, Process, and People –to identify an outlook for software innovation. The paper then describes a new facility–Software Innovation Research Lab (SIRL......) – and a new method concept for software innovation – Essence – based on views, modes, and team roles. Finally, the paper reports from an early experiment using SIRL and Essence and identifies further research....

  7. The faces of Big Science.

    Science.gov (United States)

    Schatz, Gottfried

    2014-06-01

    Fifty years ago, academic science was a calling with few regulations or financial rewards. Today, it is a huge enterprise confronted by a plethora of bureaucratic and political controls. This change was not triggered by specific events or decisions but reflects the explosive 'knee' in the exponential growth that science has sustained during the past three-and-a-half centuries. Coming to terms with the demands and benefits of 'Big Science' is a major challenge for today's scientific generation. Since its foundation 50 years ago, the European Molecular Biology Organization (EMBO) has been of invaluable help in meeting this challenge.

  8. Big Data and central banks

    Directory of Open Access Journals (Sweden)

    David Bholat

    2015-04-01

    Full Text Available This commentary recaps a Centre for Central Banking Studies event held at the Bank of England on 2–3 July 2014. The article covers three main points. First, it situates the Centre for Central Banking Studies event within the context of the Bank’s Strategic Plan and initiatives. Second, it summarises and reflects on major themes from the event. Third, the article links central banks’ emerging interest in Big Data approaches with their broader uptake by other economic agents.

  9. Inhomogeneous Big Bang Nucleosynthesis Revisited

    OpenAIRE

    Lara, J. F.; Kajino, T.; Mathews, G. J.

    2006-01-01

    We reanalyze the allowed parameters for inhomogeneous big bang nucleosynthesis in light of the WMAP constraints on the baryon-to-photon ratio and a recent measurement which has set the neutron lifetime to be 878.5 +/- 0.7 +/- 0.3 seconds. For a set baryon-to-photon ratio the new lifetime reduces the mass fraction of He4 by 0.0015 but does not significantly change the abundances of other isotopes. This enlarges the region of concordance between He4 and deuterium in the parameter space of the b...

  10. The Perennial Ryegrass GenomeZipper: Targeted Use of Genome Resources for Comparative Grass Genomics1[C][W

    Science.gov (United States)

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F.X.; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-01-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species. PMID:23184232

  11. Pillow seal system at the BigRIPS separator

    Energy Technology Data Exchange (ETDEWEB)

    Tanaka, K., E-mail: ktanaka@riken.jp; Inabe, N.; Yoshida, K.; Kusaka, K.; Kubo, T.

    2013-12-15

    Highlights: • Pillow seal system has been installed for a high-intensity RI-beam facility at RIKEN. • It is aimed at facilitating remote maintenance under high residual radiation. • Local radiation shields are integrated with one of the pillow seals. • Pillow seals have been aligned to the beam axis within 1mm accuracy. • A leakage rate of 10{sup –9} Pa m{sup 3}/s has been achieved with our pillow seal system. -- Abstract: We have designed and installed a pillow seal system for the BigRIPS fragment separator at the RIKEN Radioactive Isotope Beam Factory (RIBF) to facilitate remote maintenance in a radioactive environment. The pillow seal system is a device to connect a vacuum chamber and a beam tube. It allows quick attachment and detachment of vacuum connections in the BigRIPS separator and consists of a double diaphragm with a differential pumping system. The leakage rate achieved with this system is as low as 10{sup –9} Pa m{sup 3}/s. We have also designed and installed a local radiation-shielding system, integrated with the pillow seal system, to protect the superconducting magnets and to reduce the heat load on the cryogenic system. We present an overview of the pillow seal and the local shielding systems.

  12. Comparative validity of brief to medium-length Big Five and Big Six personality questionnaires

    NARCIS (Netherlands)

    Thalmayer, A.G.; Saucier, G.; Eigenhuis, A.

    2011-01-01

    A general consensus on the Big Five model of personality attributes has been highly generative for the field of personality psychology. Many important psychological and life outcome correlates with Big Five trait dimensions have been established. But researchers must choose between multiple Big Five

  13. Comparative Validity of Brief to Medium-Length Big Five and Big Six Personality Questionnaires

    Science.gov (United States)

    Thalmayer, Amber Gayle; Saucier, Gerard; Eigenhuis, Annemarie

    2011-01-01

    A general consensus on the Big Five model of personality attributes has been highly generative for the field of personality psychology. Many important psychological and life outcome correlates with Big Five trait dimensions have been established. But researchers must choose between multiple Big Five inventories when conducting a study and are…

  14. Big Data en surveillance, deel 1 : Definities en discussies omtrent Big Data

    NARCIS (Netherlands)

    Timan, Tjerk

    2016-01-01

    Naar aanleiding van een (vrij kort) college over surveillance en Big Data, werd me gevraagd iets dieper in te gaan op het thema, definities en verschillende vraagstukken die te maken hebben met big data. In dit eerste deel zal ik proberen e.e.a. uiteen te zetten betreft Big Data theorie en

  15. 77 FR 27245 - Big Stone National Wildlife Refuge, Big Stone and Lac Qui Parle Counties, MN

    Science.gov (United States)

    2012-05-09

    ... DEPARTMENT OF THE INTERIOR Fish and Wildlife Service [FWS-R3-R-2012-N069; FXRS1265030000S3-123-FF03R06000] Big Stone National Wildlife Refuge, Big Stone and Lac Qui Parle Counties, MN AGENCY: Fish and... plan (CCP) and environmental assessment (EA) for Big Stone National Wildlife Refuge (Refuge, NWR) for...

  16. Big game hunting practices, meanings, motivations and constraints: a survey of Oregon big game hunters

    Science.gov (United States)

    Suresh K. Shrestha; Robert C. Burns

    2012-01-01

    We conducted a self-administered mail survey in September 2009 with randomly selected Oregon hunters who had purchased big game hunting licenses/tags for the 2008 hunting season. Survey questions explored hunting practices, the meanings of and motivations for big game hunting, the constraints to big game hunting participation, and the effects of age, years of hunting...

  17. The life cycle of a genome project: perspectives and guidelines inspired by insect genome projects.

    Science.gov (United States)

    Papanicolaou, Alexie

    2016-01-01

    Many research programs on non-model species biology have been empowered by genomics. In turn, genomics is underpinned by a reference sequence and ancillary information created by so-called "genome projects". The most reliable genome projects are the ones created as part of an active research program and designed to address specific questions but their life extends past publication. In this opinion paper I outline four key insights that have facilitated maintaining genomic communities: the key role of computational capability, the iterative process of building genomic resources, the value of community participation and the importance of manual curation. Taken together, these ideas can and do ensure the longevity of genome projects and the growing non-model species community can use them to focus a discussion with regards to its future genomic infrastructure.

  18. Lecture 10: The European Bioinformatics Institute - "Big data" for biomedical sciences

    CERN Multimedia

    CERN. Geneva; Dana, Jose

    2013-01-01

    Part 1: Big data for biomedical sciences (Tom Hancocks) Ten years ago witnessed the completion of the first international 'Big Biology' project that sequenced the human genome. In the years since biological sciences, have seen a vast growth in data. In the coming years advances will come from integration of experimental approaches and the translation into applied technologies is the hospital, clinic and even at home. This talk will examine the development of infrastructure, physical and virtual, that will allow millions of life scientists across Europe better access to biological data Tom studied Human Genetics at the University of Leeds and McMaster University, before completing an MSc in Analytical Genomics at the University of Birmingham. He has worked for the UK National Health Service in diagnostic genetics and in training healthcare scientists and clinicians in bioinformatics. Tom joined the EBI in 2012 and is responsible for the scientific development and delivery of training for the BioMedBridges pr...

  19. Alignment of whole genomes.

    Science.gov (United States)

    Delcher, A L; Kasif, S; Fleischmann, R D; Peterson, J; White, O; Salzberg, S L

    1999-01-01

    A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications. PMID:10325427

  20. An analysis of cross-sectional differences in big and non-big public accounting firms' audit programs

    NARCIS (Netherlands)

    Blokdijk, J.H. (Hans); Drieenhuizen, F.; Stein, M.T.; Simunic, D.A.

    2006-01-01

    A significant body of prior research has shown that audits by the Big 5 (now Big 4) public accounting firms are quality differentiated relative to non-Big 5 audits. This result can be derived analytically by assuming that Big 5 and non-Big 5 firms face different loss functions for "audit failures"