WorldWideScience

Sample records for data sharing

  1. BBSRC Data Sharing Policy

    OpenAIRE

    Amanda Collis; David McAllister; Michael Ball

    2011-01-01

    BBSRC recognizes the importance of contributing to the growing international efforts in data sharing. BBSRC is committed to getting the best value for the funds we invest and believes that making research data more readily available will reinforce open scientific inquiry and stimulate new investigations and analyses. BBSRC supports the view that data sharing should be led by the scientific community and driven by scientific need. It should also be cost effective and the data shared should be ...

  2. A Data Sharing Story

    Directory of Open Access Journals (Sweden)

    Mercè Crosas

    2012-01-01

    Full Text Available From the early days of modern science through this century of Big Data, data sharing has enabled some of the greatest advances in science. In the digital age, technology can facilitate more effective and efficient data sharing and preservation practices, and provide incentives for making data easily accessible among researchers. At the Institute for Quantitative Social Science at Harvard University, we have developed an open-source software to share, cite, preserve, discover and analyze data, named the Dataverse Network. We share here the project’s motivation, its growth and successes, and likely evolution.

  3. Report endorses data sharing

    Science.gov (United States)

    The potential benefits of sharing data so outweigh its costs that investigators should be required to include plans for sharing data as part of their grant proposals, according to recommendations issued recently by the Committee on National Statistics (CNSTAT) of the National Research Council (NRC).In their report Sharing Research Data, CNSTAT also recommended that “Journals should give more emphasis to reports of secondary analyses and to replications,” provided that the original collections of data receive full credit. In addition, “Journal editors should require authors to provide access to data during the peer review process.”

  4. Sharing data increases citations

    DEFF Research Database (Denmark)

    Drachen, Thea Marie; Ellegaard, Ole; Larsen, Asger Væring

    2016-01-01

    This paper presents some indications to the existence of a citation advantage related to sharing data using astrophysics as a case. Through bibliometric analyses we find a citation advantage for astrophysical papers in core journals. The advantage arises as indexed papers are associated with data...... by bibliographical links, and consists of papers receiving on average significantly more citations per paper per year, than do papers not associated with links to data.......This paper presents some indications to the existence of a citation advantage related to sharing data using astrophysics as a case. Through bibliometric analyses we find a citation advantage for astrophysical papers in core journals. The advantage arises as indexed papers are associated with data...

  5. VHA Data Sharing Agreement Repository

    Data.gov (United States)

    Department of Veterans Affairs — The VHA Data Sharing Agreement Repository serves as a centralized location to collect and report on agreements that share VHA data with entities outside of VA. It...

  6. Information partnerships--shared data, shared scale.

    Science.gov (United States)

    Konsynski, B R; McFarlan, F W

    1990-01-01

    How can one company gain access to another's resources or customers without merging ownership, management, or plotting a takeover? The answer is found in new information partnerships, enabling diverse companies to develop strategic coalitions through the sharing of data. The key to cooperation is a quantum improvement in the hardware and software supporting relational databases: new computer speeds, cheaper mass-storage devices, the proliferation of fiber-optic networks, and networking architectures. Information partnerships mean that companies can distribute the technological and financial exposure that comes with huge investments. For the customer's part, partnerships inevitably lead to greater simplification on the desktop and more common standards around which vendors have to compete. The most common types of partnership are: joint marketing partnerships, such as American Airline's award of frequent flyer miles to customers who use Citibank's credit card; intraindustry partnerships, such as the insurance value-added network service (which links insurance and casualty companies to independent agents); customer-supplier partnerships, such as Baxter Healthcare's electronic channel to hospitals for medical and other equipment; and IT vendor-driven partnerships, exemplified by ESAB (a European welding supplies and equipment company), whose expansion strategy was premised on a technology platform offered by an IT vendor. Partnerships that succeed have shared vision at the top, reciprocal skills in information technology, concrete plans for an early success, persistence in the development of usable information for all partners, coordination on business policy, and a new and imaginative business architecture.

  7. Fast transfer of shared data

    International Nuclear Information System (INIS)

    Timmer, C.; Abbott, D.J.; Heyes, W.G.; Jostizembski, E.; MacLeod, R.W.; Wolin, E.

    2000-01-01

    The Event Transfer system enables its users to produce events (data) and share them with other users by utilizing shared memory on either Solaris or Linux-based computers. Its design emphasizes speed, reliability, ease of use, and recoverability from crashes. In addition to fast local operation, the ET system allows network transfer of events. Using multi-threaded code based on POSIX threades and mutexes, a successful implementation was developed which allowed passing events over 500 kHz on a 4 cpu Sun workstation and 150 kHz on a dual cpu PC

  8. DataShare: Empowering Researcher Data Curation

    Directory of Open Access Journals (Sweden)

    Stephen Abrams

    2014-07-01

    Full Text Available Researchers are increasingly being asked to ensure that all products of research activity – not just traditional publications – are preserved and made widely available for study and reuse as a precondition for publication or grant funding, or to conform to disciplinary best practices. In order to conform to these requirements, scholars need effective, easy-to-use tools and services for the long-term curation of their research data. The DataShare service, developed at the University of California, is being used by researchers to: (1 prepare for curation by reviewing best practice recommendations for the acquisition or creation of digital research data; (2 select datasets using intuitive file browsing and drag-and-drop interfaces; (3 describe their data for enhanced discoverability in terms of the DataCite metadata schema; (4 preserve their data by uploading to a public access collection in the UC3 Merritt curation repository; (5 cite their data in terms of persistent and globally-resolvable DOI identifiers; (6 expose their data through registration with well-known abstracting and indexing services and major internet search engines; (7 control the dissemination of their data through enforceable data use agreements; and (8 discover and retrieve datasets of interest through a faceted search and browse environment. Since the widespread adoption of effective data management practices is highly dependent on ease of use and integration into existing individual, institutional, and disciplinary workflows, the emphasis throughout the design and implementation of DataShare is to provide the highest level of curation service with the lowest possible technical barriers to entry by individual researchers. By enabling intuitive, self-service access to data curation functions, DataShare helps to contribute to more widespread adoption of good data curation practices that are critical to open scientific inquiry, discourse, and advancement.

  9. Qualitative Data Sharing Practices in Social Sciences

    Science.gov (United States)

    Jeng, Wei

    2017-01-01

    Social scientists have been sharing data for a long time. Sharing qualitative data, however, has not become a common practice, despite the context of e-Research, information growth, and funding agencies' mandates on research data archiving and sharing. Since most systematic and comprehensive studies are based on quantitative data practices, little…

  10. Sharing data is a shared responsibility: Commentary on: "The essential nature of sharing in science".

    Science.gov (United States)

    Giffels, Joe

    2010-12-01

    Research data should be made readily available. A robust data-sharing plan, led by the principal investigator of the research project, requires considerable administrative and operational resources. Because external support for data sharing is minimal, principal investigators should consider engaging existing institutional information experts, such as librarians and information systems personnel, to participate in data-sharing efforts.

  11. 77 FR 4277 - Proposed Data Sharing Activity

    Science.gov (United States)

    2012-01-27

    ... proposes to share business data for statistical purposes. More specifically, the Census Bureau will share selected business data of multi-location businesses with the U.S. Bureau of Labor Statistics (BLS) of the....S.C. 402 allow the Census Bureau to share business data for statistical purposes with the BLS...

  12. DataSync - sharing data via filesystem

    Science.gov (United States)

    Ulbricht, Damian; Klump, Jens

    2014-05-01

    Usually research work is a cycle of to hypothesize, to collect data, to corroborate the hypothesis, and finally to publish the results. In this sequence there are possibilities to base the own work on the work of others. Maybe there are candidates of physical samples listed in the IGSN-Registry and there is no need to go on excursion to acquire physical samples. Hopefully the DataCite catalogue lists already metadata of datasets that meet the constraints of the hypothesis and that are now open for reappraisal. After all, working with the measured data to corroborate the hypothesis involves new methods, and proven methods as well as different software tools. A cohort of intermediate data is created that can be shared with colleagues to discuss the research progress and receive a first evaluation. In consequence, the intermediate data should be versioned to easily get back to valid intermediate data, when you notice you get on the wrong track. Things are different for project managers. They want to know what is currently done, what has been done, and what is the last valid data, if somebody has to continue the work. To make life of members of small science projects easier we developed Datasync [1] as a software for sharing and versioning data. Datasync is designed to synchronize directory trees between different computers of a research team over the internet. The software is developed as JAVA application and watches a local directory tree for changes that are replicated as eSciDoc-objects into an eSciDoc-infrastructure [2] using the eSciDoc REST API. Modifications to the local filesystem automatically create a new version of an eSciDoc-object inside the eSciDoc-infrastructure. This way individual folders can be shared between team members while project managers can get a general idea of current status by synchronizing whole project inventories. Additionally XML metadata from separate files can be managed together with data files inside the eSciDoc-objects. While

  13. Facilitating Data Sharing in the Behavioural Sciences

    Directory of Open Access Journals (Sweden)

    R de la Sablonnière

    2012-03-01

    Full Text Available In most scientific fields, significant improvements have been made in terms of data sharing among scientists and researchers. Although there are clear benefits to data sharing, there is at least one field where this norm has yet to be developed: the behavioural sciences. In this paper, we propose an innovative methodology as a means to change existing norms within the behavioural sciences and move towards increased data sharing. Based on recent advances in social psychology, we theorize that a Survey Research Instrument that takes into account basic psychological processes can be effective in promoting data sharing norms.

  14. Data sharing by scientists: Practices and perceptions

    Science.gov (United States)

    Tenopir, C.; Allard, S.; Douglass, K.; Aydinoglu, A.U.; Wu, L.; Read, E.; Manoff, M.; Frame, M.

    2011-01-01

    Background: Scientific research in the 21st century is more data intensive and collaborative than in the past. It is important to study the data practices of researchers - data accessibility, discovery, re-use, preservation and, particularly, data sharing. Data sharing is a valuable part of the scientific method allowing for verification of results and extending research from prior results. Methodology/Principal Findings: A total of 1329 scientists participated in this survey exploring current data sharing practices and perceptions of the barriers and enablers of data sharing. Scientists do not make their data electronically available to others for various reasons, including insufficient time and lack of funding. Most respondents are satisfied with their current processes for the initial and short-term parts of the data or research lifecycle (collecting their research data; searching for, describing or cataloging, analyzing, and short-term storage of their data) but are not satisfied with long-term data preservation. Many organizations do not provide support to their researchers for data management both in the short- and long-term. If certain conditions are met (such as formal citation and sharing reprints) respondents agree they are willing to share their data. There are also significant differences and approaches in data management practices based on primary funding agency, subject discipline, age, work focus, and world region. Conclusions/Significance: Barriers to effective data sharing and preservation are deeply rooted in the practices and culture of the research process as well as the researchers themselves. New mandates for data management plans from NSF and other federal agencies and world-wide attention to the need to share and preserve data could lead to changes. Large scale programs, such as the NSF-sponsored DataNET (including projects like DataONE) will both bring attention and resources to the issue and make it easier for scientists to apply sound

  15. Data Sharing and Cardiology: Platforms and Possibilities.

    Science.gov (United States)

    Dey, Pranammya; Ross, Joseph S; Ritchie, Jessica D; Desai, Nihar R; Bhavnani, Sanjeev P; Krumholz, Harlan M

    2017-12-19

    Sharing deidentified patient-level research data presents immense opportunities to all stakeholders involved in cardiology research and practice. Sharing data encourages the use of existing data for knowledge generation to improve practice, while also allowing for validation of disseminated research. In this review, we discuss key initiatives and platforms that have helped to accelerate progress toward greater sharing of data. These efforts are being prompted by government, universities, philanthropic sponsors of research, major industry players, and collaborations among some of these entities. As data sharing becomes a more common expectation, policy changes will be required to encourage and assist data generators with the process of sharing the data they create. Patients also will need access to their own data and to be empowered to share those data with researchers. Although medicine still lags behind other fields in achieving data sharing's full potential, cardiology research has the potential to lead the way. Copyright © 2017 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  16. Data Sharing: Convert Challenges into Opportunities

    Directory of Open Access Journals (Sweden)

    Ana Sofia Figueiredo

    2017-12-01

    Full Text Available Initiatives for sharing research data are opportunities to increase the pace of knowledge discovery and scientific progress. The reuse of research data has the potential to avoid the duplication of data sets and to bring new views from multiple analysis of the same data set. For example, the study of genomic variations associated with cancer profits from the universal collection of such data and helps in selecting the most appropriate therapy for a specific patient. However, data sharing poses challenges to the scientific community. These challenges are of ethical, cultural, legal, financial, or technical nature. This article reviews the impact that data sharing has in science and society and presents guidelines to improve the efficient sharing of research data.

  17. Data Sharing: Convert Challenges into Opportunities.

    Science.gov (United States)

    Figueiredo, Ana Sofia

    2017-01-01

    Initiatives for sharing research data are opportunities to increase the pace of knowledge discovery and scientific progress. The reuse of research data has the potential to avoid the duplication of data sets and to bring new views from multiple analysis of the same data set. For example, the study of genomic variations associated with cancer profits from the universal collection of such data and helps in selecting the most appropriate therapy for a specific patient. However, data sharing poses challenges to the scientific community. These challenges are of ethical, cultural, legal, financial, or technical nature. This article reviews the impact that data sharing has in science and society and presents guidelines to improve the efficient sharing of research data.

  18. From shared data to sharing workflow: Merging PACS and teleradiology

    International Nuclear Information System (INIS)

    Benjamin, Menashe; Aradi, Yinon; Shreiber, Reuven

    2010-01-01

    Due to a host of technological, interface, operational and workflow limitations, teleradiology and PACS/RIS were historically developed as separate systems serving different purposes. PACS/RIS handled local radiology storage and workflow management while teleradiology addressed remote access to images. Today advanced PACS/RIS support complete site radiology workflow for attending physicians, whether on-site or remote. In parallel, teleradiology has emerged into a service of providing remote, off-hours, coverage for emergency radiology and to a lesser extent subspecialty reading to subscribing sites and radiology groups. When attending radiologists use teleradiology for remote access to a site, they may share all relevant patient data and participate in the site's workflow like their on-site peers. The operation gets cumbersome and time consuming when these radiologists serve multi-sites, each requiring a different remote access, or when the sites do not employ the same PACS/RIS/Reporting Systems and do not share the same ownership. The least efficient operation is of teleradiology companies engaged in reading for multiple facilities. As these services typically employ non-local radiologists, they are allowed to share some of the available patient data necessary to provide an emergency report but, by enlarge, they do not share the workflow of the sites they serve. Radiology stakeholders usually prefer to have their own radiologists perform all radiology tasks including interpretation of off-hour examinations. It is possible with current technology to create a system that combines the benefits of local radiology services to multiple sites with the advantages offered by adding subspecialty and off-hours emergency services through teleradiology. Such a system increases efficiency for the radiology groups by enabling all users, regardless of location, to work 'local' and fully participate in the workflow of every site. We refer to such a system as SuperPACS.

  19. Sharing casting technological data on web site

    Directory of Open Access Journals (Sweden)

    Li Hailan

    2008-11-01

    Full Text Available Based on database and asp.net technologies, a web platform of scientific data in the casting technology fi eld has been developed. This paper presents the relevant data system structure, the approaches to the data collection, the applying methods and policy in data sharing, and depicts the collected and shared data recently fi nished. Statistics showed that there are about 20,000 visitors in China every day visiting the related data through the web, proving that many engineers or other relevant persons are interested in the data.

  20. Data sharing, small science and institutional repositories.

    Science.gov (United States)

    Cragin, Melissa H; Palmer, Carole L; Carlson, Jacob R; Witt, Michael

    2010-09-13

    Results are presented from the Data Curation Profiles project research, on who is willing to share what data with whom and when. Emerging from scientists' discussions on sharing are several dimensions suggestive of the variation in both what it means 'to share' and how these processes are carried out. This research indicates that data curation services will need to accommodate a wide range of subdisciplinary data characteristics and sharing practices. As part of a larger set of strategies emerging across academic institutions, institutional repositories (IRs) will contribute to the stewardship and mobilization of scientific research data for e-Research and learning. There will be particular types of data that can be managed well in an IR context when characteristics and practices are well understood. Findings from this study elucidate scientists' views on 'sharable' forms of data-the particular representation that they view as most valued for reuse by others within their own research areas-and the anticipated duration for such reuse. Reported sharing incidents that provide insights into barriers to sharing and related concerns on data misuse are included.

  1. A simple tool for neuroimaging data sharing

    Directory of Open Access Journals (Sweden)

    Christian eHaselgrove

    2014-05-01

    Full Text Available Data sharing is becoming increasingly common, but despite encouragement and facilitation by funding agencies, journals, and some research efforts, most neuroimaging data acquired today is still not shared due to political, financial, social, and technical barriers to sharing data that remain. In particular, technical solutions are few for researchers that are not a part of larger efforts with dedicated sharing infrastructures, and social barriers such as the time commitment required to share can keep data from becoming publicly available.We present a system for sharing neuroimaging data, designed to be simple to use and to provide benefit to the data provider. The system consists of a server at the International Neuroinformatics Coordinating Facility (INCF and user tools for uploading data to the server. The primary design principle for the user tools is ease of use: the user identifies a directory containing DICOM data, provides their INCF Portal authentication, and provides identifiers for the subject and imaging session. The user tool anonymizes the data and sends it to the server. The server then runs quality control routines on the data, and the data and the quality control reports are made public. The user retains control of the data and may change the sharing policy as they need. The result is that in a few minutes of the user’s time, DICOM data can be anonymized and made publicly available, and an initial quality control assessment can be performed on the data. The system is currently functional, and user tools and access to the public image database are available at http://xnat.incf.org/.

  2. To share or not to share? Expected pros and cons of data sharing in radiological research.

    Science.gov (United States)

    Sardanelli, Francesco; Alì, Marco; Hunink, Myriam G; Houssami, Nehmat; Sconfienza, Luca M; Di Leo, Giovanni

    2018-06-01

    The aims of this paper are to illustrate the trend towards data sharing, i.e. the regulated availability of the original patient-level data obtained during a study, and to discuss the expected advantages (pros) and disadvantages (cons) of data sharing in radiological research. Expected pros include the potential for verification of original results with alternative or supplementary analyses (including estimation of reproducibility), advancement of knowledge by providing new results by testing new hypotheses (not explored by the original authors) on pre-existing databases, larger scale analyses based on individual-patient data, enhanced multidisciplinary cooperation, reduced publication of false studies, improved clinical practice, and reduced cost and time for clinical research. Expected cons are outlined as the risk that the original authors could not exploit the entire potential of the data they obtained, possible failures in patients' privacy protection, technical barriers such as the lack of standard formats, and possible data misinterpretation. Finally, open issues regarding data ownership, the role of individual patients, advocacy groups and funding institutions in decision making about sharing of data and images are discussed. • Regulated availability of patient-level data of published clinical studies (data-sharing) is expected. • Expected benefits include verification/advancement of knowledge, reduced cost/time of research, clinical improvement. • Potential drawbacks include faults in patients' identity protection and data misinterpretation.

  3. Data sharing system for lithography APC

    Science.gov (United States)

    Kawamura, Eiichi; Teranishi, Yoshiharu; Shimabara, Masanori

    2007-03-01

    We have developed a simple and cost-effective data sharing system between fabs for lithography advanced process control (APC). Lithography APC requires process flow, inter-layer information, history information, mask information and so on. So, inter-APC data sharing system has become necessary when lots are to be processed in multiple fabs (usually two fabs). The development cost and maintenance cost also have to be taken into account. The system handles minimum information necessary to make trend prediction for the lots. Three types of data have to be shared for precise trend prediction. First one is device information of the lots, e.g., process flow of the device and inter-layer information. Second one is mask information from mask suppliers, e.g., pattern characteristics and pattern widths. Last one is history data of the lots. Device information is electronic file and easy to handle. The electronic file is common between APCs and uploaded into the database. As for mask information sharing, mask information described in common format is obtained via Wide Area Network (WAN) from mask-vender will be stored in the mask-information data server. This information is periodically transferred to one specific lithography-APC server and compiled into the database. This lithography-APC server periodically delivers the mask-information to every other lithography-APC server. Process-history data sharing system mainly consists of function of delivering process-history data. In shipping production lots to another fab, the product-related process-history data is delivered by the lithography-APC server from the shipping site. We have confirmed the function and effectiveness of data sharing systems.

  4. Data Sharing & Publishing at Nature Publishing Group

    Science.gov (United States)

    VanDecar, J. C.; Hrynaszkiewicz, I.; Hufton, A. L.

    2015-12-01

    In recent years, the research community has come to recognize that upon-request data sharing has important limitations1,2. The Nature-titled journals feel that researchers have a duty to share data without undue qualifications, in a manner that allows others to replicate and build upon their published findings. Historically, the Nature journals have been strong supporters of data deposition in communities with existing data mandates, and have required data sharing upon request in all other cases. To help address some of the limitations of upon-request data sharing, the Nature titles have strengthened their existing data policies and forged a new partnership with Scientific Data, to promote wider data sharing in discoverable, citeable and reusable forms, and to ensure that scientists get appropriate credit for sharing3. Scientific Data is a new peer-reviewed journal for descriptions of research datasets, which works with a wide of range of public data repositories4. Articles at Scientific Data may either expand on research publications at other journals or may be used to publish new datasets. The Nature Publishing Group has also signed the Joint Declaration of Data Citation Principles5, and Scientific Data is our first journal to include formal data citations. We are currently in the process of adding data citation support to our various journals. 1 Wicherts, J. M., Borsboom, D., Kats, J. & Molenaar, D. The poor availability of psychological research data for reanalysis. Am. Psychol. 61, 726-728, doi:10.1037/0003-066x.61.7.726 (2006). 2 Vines, T. H. et al. Mandated data archiving greatly improves access to research data. FASEB J. 27, 1304-1308, doi:10.1096/fj.12-218164 (2013). 3 Data-access practices strengthened. Nature 515, 312, doi:10.1038/515312a (2014). 4 More bang for your byte. Sci. Data 1, 140010, doi:10.1038/sdata.2014.10 (2014). 5 Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. (FORCE11, San Diego, CA, 2014).

  5. Data sharing and dual-use issues.

    Science.gov (United States)

    Bezuidenhout, Louise

    2013-03-01

    The concept of dual-use encapsulates the potential for well-intentioned, beneficial scientific research to also be misused by a third party for malicious ends. The concept of dual-use challenges scientists to look beyond the immediate outcomes of their research and to develop an awareness of possible future (mis)uses of scientific research. Since 2001 much attention has been paid to the possible need to regulate the dual-use potential of the life sciences. Regulation initiatives fall under two broad categories-those that develop the ethical education of scientists and foster an awareness and responsibility of dual-use issues, and those which assess the regulation of information being generated by current research. Both types of initiatives are premised on a cautious, risk-adverse philosophy which advocates careful examination of all future endpoints of research endeavors. This caution advocated within initiatives such as pre-publication review of journal articles contrasts to the obligation to share underpinning data sharing discussions. As the dual-use debate has yet to make a significant impact on data sharing discussions (and vice versa) it is possible that these two areas of knowledge control may present areas of ethical conflict for scientists, and thus need to be more closely examined. This paper examines the tension between the obligation to share exemplified by data sharing principles and the concerns raised by the risk-cautious culture of the dual-use debates. The paper concludes by reflecting on the issues of responsibility as raised by dual-use as relating to data sharing, such as the chain of custody for shared data.

  6. The backstage work of data sharing

    Energy Technology Data Exchange (ETDEWEB)

    Kervin, Karina E. [Univ. of Michigan, Ann Arbor, MI (United States); Cook, Robert B. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Michener, William K. [Univ. of New Mexico, Albuquerque, NM (United States)

    2014-11-09

    Conventional wisdom makes the suggestion that there are benefits to the creation of shared repositories of scientific data. Funding agencies require that the data from sponsored projects be shared publicly, but individual researchers often see little personal benefit to offset the work of creating easily sharable data. These conflicting forces have led to the emergence of a new role to support researchers: data managers. This paper identifies key differences between the socio-technical context of data managers and other "human infrastructure" roles articulated previously in Computer Supported Cooperative Work (CSCW) literature and summarizes the challenges that data managers face when accepting data for archival and reuse. Finally, while data managers' work is critical for advancing science and science policy, their work is often invisible and under-appreciated since it takes place behind the scenes.

  7. Open Access Data Sharing in Genomic Research

    Directory of Open Access Journals (Sweden)

    Stacey Pereira

    2014-08-01

    Full Text Available The current emphasis on broad sharing of human genomic data generated in research in order to maximize utility and public benefit is a significant legacy of the Human Genome Project. Concerns about privacy and discrimination have led to policy responses that restrict access to genomic data as the means for protecting research participants. Our research and experience show, however, that a considerable number of research participants agree to open access sharing of their genomic data when given the choice. General policies that limit access to all genomic data fail to respect the autonomy of these participants and, at the same time, unnecessarily limit the utility of the data. We advocate instead a more balanced approach that allows for individual choice and encourages informed decision making, while protecting against the misuse of genomic data through enhanced legislation.

  8. 2012 Information Sharing Environment Performance Data

    Data.gov (United States)

    Information Sharing Environment — This is a survey of federal departments and agencies who share terrorism information and are therefore considered part of the Information Sharing Environment. The...

  9. 2013 Information Sharing Environment Performance Data

    Data.gov (United States)

    Information Sharing Environment — This is a survey of federal departments and agencies who share terrorism information and are therefore considered part of the Information Sharing Environment. The...

  10. Examining Data Repository Guidelines for Qualitative Data Sharing.

    Science.gov (United States)

    Antes, Alison L; Walsh, Heidi A; Strait, Michelle; Hudson-Vitale, Cynthia R; DuBois, James M

    2018-02-01

    Qualitative data provide rich information on research questions in diverse fields. Recent calls for increased transparency and openness in research emphasize data sharing. However, qualitative data sharing has yet to become the norm internationally and is particularly uncommon in the United States. Guidance for archiving and secondary use of qualitative data is required for progress in this regard. In this study, we review the benefits and concerns associated with qualitative data sharing and then describe the results of a content analysis of guidelines from international repositories that archive qualitative data. A minority of repositories provide qualitative data sharing guidelines. Of the guidelines available, there is substantial variation in whether specific topics are addressed. Some topics, such as removing direct identifiers, are consistently addressed, while others, such as providing an anonymization log, are not. We discuss the implications of our study for education, best practices, and future research.

  11. Sharing information among existing data sources

    Science.gov (United States)

    Ashley, W. R., III

    1999-01-01

    share crucial investigative information across jurisdictional bounds by establishing a communications infrastructure for all of its law enforcement jurisdictions. The Criminal Justice Network (CJ-Net) is a statewide TCP/IP network, dedicated to the sharing of law enforcement information. CJ-Net is managed and maintained by the Florida Department of Law Enforcement (FDLE) and provides open access and privileges to any criminal justice agency, including the state court and penitentiary systems. In addition to Florida, other states, such as North Carolina, are also beginning to implement common protocol communication infrastructures and architectures in order to link local jurisdictions together throughout the state. The law enforcement domain in an optimum situation for information-sharing technologies. Communication infrastructures are continually established, and as such, action is required to effectively use these networks to their full potential. Information technologies that are best suited for the law enforcement domain, must be evaluated and implemented in a cost-effective manner. Unlike the Defense Department and other large federal agencies, individual jurisdictions at both the local and state level cannot afford to expend limited resources on research and development of prototype systems. Therefore, we must identify enabling technologies that have matured in related domains and transition them into law enforcement at a minimum cost. Crucial to this measure, is the selection of the appropriate levels of information-sharing technologies to be inserted. Information-sharing technologies that are unproven or have extensive recurring costs are not suitable for this domain. Information-sharing technologies traditionally exist between two distinct polar bounds: the data warehousing approach and mediation across distributed heterogeneous data sources. These two ends of the spectrum represent extremely different philosophies in accomplishing the same goal. In the

  12. Interoperable Data Sharing for Diverse Scientific Disciplines

    Science.gov (United States)

    Hughes, John S.; Crichton, Daniel; Martinez, Santa; Law, Emily; Hardman, Sean

    2016-04-01

    For diverse scientific disciplines to interoperate they must be able to exchange information based on a shared understanding. To capture this shared understanding, we have developed a knowledge representation framework using ontologies and ISO level archive and metadata registry reference models. This framework provides multi-level governance, evolves independent of implementation technologies, and promotes agile development, namely adaptive planning, evolutionary development, early delivery, continuous improvement, and rapid and flexible response to change. The knowledge representation framework is populated through knowledge acquisition from discipline experts. It is also extended to meet specific discipline requirements. The result is a formalized and rigorous knowledge base that addresses data representation, integrity, provenance, context, quantity, and their relationships within the community. The contents of the knowledge base is translated and written to files in appropriate formats to configure system software and services, provide user documentation, validate ingested data, and support data analytics. This presentation will provide an overview of the framework, present the Planetary Data System's PDS4 as a use case that has been adopted by the international planetary science community, describe how the framework is being applied to other disciplines, and share some important lessons learned.

  13. Balancing data sharing requirements for analyses with data sensitivity

    Science.gov (United States)

    Jarnevich, C.S.; Graham, J.J.; Newman, G.J.; Crall, A.W.; Stohlgren, T.J.

    2007-01-01

    Data sensitivity can pose a formidable barrier to data sharing. Knowledge of species current distributions from data sharing is critical for the creation of watch lists and an early warning/rapid response system and for model generation for the spread of invasive species. We have created an on-line system to synthesize disparate datasets of non-native species locations that includes a mechanism to account for data sensitivity. Data contributors are able to mark their data as sensitive. This data is then 'fuzzed' in mapping applications and downloaded files to quarter-quadrangle grid cells, but the actual locations are available for analyses. We propose that this system overcomes the hurdles to data sharing posed by sensitive data. ?? 2006 Springer Science+Business Media B.V.

  14. Collaborating and sharing data in epilepsy research.

    Science.gov (United States)

    Wagenaar, Joost B; Worrell, Gregory A; Ives, Zachary; Dümpelmann, Matthias; Matthias, Dümpelmann; Litt, Brian; Schulze-Bonhage, Andreas

    2015-06-01

    Technological advances are dramatically advancing translational research in Epilepsy. Neurophysiology, imaging, and metadata are now recorded digitally in most centers, enabling quantitative analysis. Basic and translational research opportunities to use these data are exploding, but academic and funding cultures prevent this potential from being realized. Research on epileptogenic networks, antiepileptic devices, and biomarkers could progress rapidly if collaborative efforts to digest this "big neuro data" could be organized. Higher temporal and spatial resolution data are driving the need for novel multidimensional visualization and analysis tools. Crowd-sourced science, the same that drives innovation in computer science, could easily be mobilized for these tasks, were it not for competition for funding, attribution, and lack of standard data formats and platforms. As these efforts mature, there is a great opportunity to advance Epilepsy research through data sharing and increase collaboration between the international research community.

  15. Common Errors in Ecological Data Sharing

    Directory of Open Access Journals (Sweden)

    Robert B. Cook

    2013-04-01

    Full Text Available Objectives: (1 to identify common errors in data organization and metadata completeness that would preclude a “reader” from being able to interpret and re-use the data for a new purpose; and (2 to develop a set of best practices derived from these common errors that would guide researchers in creating more usable data products that could be readily shared, interpreted, and used.Methods: We used directed qualitative content analysis to assess and categorize data and metadata errors identified by peer reviewers of data papers published in the Ecological Society of America’s (ESA Ecological Archives. Descriptive statistics provided the relative frequency of the errors identified during the peer review process.Results: There were seven overarching error categories: Collection & Organization, Assure, Description, Preserve, Discover, Integrate, and Analyze/Visualize. These categories represent errors researchers regularly make at each stage of the Data Life Cycle. Collection & Organization and Description errors were some of the most common errors, both of which occurred in over 90% of the papers.Conclusions: Publishing data for sharing and reuse is error prone, and each stage of the Data Life Cycle presents opportunities for mistakes. The most common errors occurred when the researcher did not provide adequate metadata to enable others to interpret and potentially re-use the data. Fortunately, there are ways to minimize these mistakes through carefully recording all details about study context, data collection, QA/ QC, and analytical procedures from the beginning of a research project and then including this descriptive information in the metadata.

  16. Remote monitoring, data sharing, and information security

    International Nuclear Information System (INIS)

    Parise, D.; Dalton, C.; Regula, J.

    2009-01-01

    Full-text: Remote Monitoring (RM) is being used with increased frequency by the IAEA for safeguards in many parts of the world. This is especially true in Japan where there are also agreements for data sharing. The automated nature of RM lends itself to assist in modernizing old cumbersome data sharing techniques. For example, electronic declarations can be received, parsed and checked; then data for that time period and facility can be automatically released. This could save considerable time and effort now spent processing paper declarations and hand copying data. But care must be taken to ensure the parsing, transfers, and connections for these systems are secure. Advanced authentication and encryption techniques are still vital in this process. This paper will describe how to improve security with vulnerability assessments, the use of certificates, avoiding compromising dial-up connections and other methods. A detailed network layout will be presented that will resemble a future RM collaboration with the IAEA and the Japanese. From this network design, key strategic security points will be identified and suggestions will be made to help secure them. (author)

  17. Collection, verification, sharing and dissemination of data

    DEFF Research Database (Denmark)

    Saarnak, Christopher; Utzinger, Jürg; Kristensen, Thomas K.

    2013-01-01

    that would enable all project partners to have access through a password protected Internet-based data portal. This required anonymous agreement on several common standardised sample forms, ranging from the mundane but important issue of using the same units of measurement to more complex challenges......, for instance agreeing on the same protocols for double-treatment of praziquantel in different settings. With the experiences gained by the CONTRAST project, this paper discusses issues of data management and sharing in research projects in the light of the current donor demand, and offers advice and specific......The scientific community is charged with growing demands regarding the management of project data and outputs and the dissemination of key results to various stakeholders. We discuss experiences and lessons from CONTRAST, a multidisciplinary alliance that had been funded by the European Commission...

  18. Social Media, Education and Data Sharing

    Science.gov (United States)

    King, T. A.; Walker, R. J.; Masters, A.

    2011-12-01

    Social media is a blending of technology and social interactions which allows for the creation and exchange of user-generated content. Social media started as conversations between groups of people, now companies are using social media to communicate with customers and politicians use it to communicate with their constituents. Social media is now finding uses in the science communities. This adoption is driven by the expectation of students that technology will be an integral part of their research and that it will match the technology they use in their social lifes. Students are using social media to keep informed and collaborate with others. They have also replaced notepads with smart mobile devices. We have been introducing social media components into Virtual Observatories as a way to quickly access and exchange information with a tap or a click. We discuss the use of Quick Response (QR) codes, Digital Object Identifiers (DOIs), unique identifiers, Twitter, Facebook and tiny URL redirects as ways to enable easier sharing of data and information. We also discuss what services and features are needed in a Virtual Observatory to make data sharing with social media possible.

  19. Sharing Earth Observation Data When Health Management

    Science.gov (United States)

    Cox, E. L., Jr.

    2015-12-01

    While the global community is struck by pandemics and epidemics from time to time the ability to fully utilize earth observations and integrate environmental information has been limited - until recently. Mature science understanding is allowing new levels of situational awareness be possible when and if the relevant data is available and shared in a timely and useable manner. Satellite and other remote sensing tools have been used to observe, monitor, assess and predict weather and water impacts for decades. In the last few years much of this has included a focus on the ability to monitor changes on climate scales that suggest changes in quantity and quality of ecosystem resources or the "one-health" approach where trans-disciplinary links between environment, animal and vegetative health may provide indications of best ways to manage susceptibility to infectious disease or outbreaks. But the scale of impacts and availability of information from earth observing satellites, airborne platforms, health tracking systems and surveillance networks offer new integrated tools. This presentation will describe several recent events, such as Superstorm Sandy in the United States and the Ebola outbreak in Africa, where public health and health infrastructure have been exposed to environmental hazards and lessons learned from disaster response in the ability to share data have been effective in risk reduction.

  20. Design and study of geosciences data share platform :platform framework, data interoperability, share approach

    Science.gov (United States)

    Lu, H.; Yi, D.

    2010-12-01

    The Deep Exploration is one of the important approaches to the Geoscience research. Since 1980s we had started it and achieved a lot of data. Researchers usually integrate both data of space exploration and deep exploration to study geological structures and represent the Earth’s subsurface, and analyze and explain on the base of integrated data. Due to the different exploration approach it results the heterogeneity of data, and therefore the data achievement is always of the import issue to make the researchers confused. The problem of data share and interaction has to be solved during the development of the SinoProbe research project. Through the research of domestic and overseas well-known exploration project and geosciences data platform, the subject explores the solution of data share and interaction. Based on SOA we present the deep exploration data share framework which comprises three level: data level is used for the solution of data store and the integration of the heterogeneous data; medial level provides the data service of geophysics, geochemistry, etc. by the means of Web service, and carry out kinds of application combination by the use of GIS middleware and Eclipse RCP; interaction level provides professional and non-professional customer the access to different accuracy data. The framework adopts GeoSciML data interaction approach. GeoSciML is a geosciences information markup language, as an application of the OpenGIS Consortium’s (OGC) Geography Markup Language (GML). It transfers heterogeneous data into one earth frame and implements inter-operation. We dissertate in this article the solution how to integrate the heterogeneous data and share the data in the project of SinoProbe.

  1. Case Study: Indigenous Knowledge and Data Sharing

    Directory of Open Access Journals (Sweden)

    Cameron Neylon

    2017-10-01

    Full Text Available The IDRC-funded project 'Empowering Indigenous Peoples and Knowledge Systems Related to Climate Change and Intellectual Property Rights' is part of the Open and Collaborative Science in Development Network (OCSDNet. The project “examiners processes of open and collaborative science related to indigenous peoples’ knowledge, climate change and intellectual property rights”. Natural Justice, the lead organisation has a strong ethical stance on the agency and control over knowledge being vested with the contributing project participants, communities of the Nama and Griqua peoples of the Western Cape of South Africa. The project focuses on questions of how climate change is affecting these communities, how do they produce and maintain knowledge relating to climate change, how that knowledge is characterised and shared (or not with wider publics, and how legal frameworks promote or hinder the agenda of these indigenous communities and their choices to communicate and collaborate with wider publics. Indigenous Knowledge is an area where ethical issues of informed consent, historical injustice, non-compatible epistemologies and political, legal, and economic issues all collide in ways that challenge western and Anglo-American assumptions about data sharing. The group seeks to strongly model and internally critique their own ethical stance in the process of their research, through for instance, using community contracts and questioning institutional informed consent systems.

  2. HydroShare: A Platform for Collaborative Data and Model Sharing in Hydrology

    Science.gov (United States)

    Tarboton, D. G.; Idaszak, R.; Horsburgh, J. S.; Ames, D. P.; Goodall, J. L.; Couch, A.; Hooper, R. P.; Dash, P. K.; Stealey, M.; Yi, H.; Bandaragoda, C.; Castronova, A. M.

    2017-12-01

    HydroShare is an online, collaboration system for sharing of hydrologic data, analytical tools, and models. It supports the sharing of and collaboration around "resources" which are defined by standardized content types for data formats and models commonly used in hydrology. With HydroShare you can: Share your data and models with colleagues; Manage who has access to the content that you share; Share, access, visualize and manipulate a broad set of hydrologic data types and models; Use the web services application programming interface (API) to program automated and client access; Publish data and models and obtain a citable digital object identifier (DOI); Aggregate your resources into collections; Discover and access data and models published by others; Use web apps to visualize, analyze and run models on data in HydroShare. This presentation will describe the functionality and architecture of HydroShare highlighting its use as a virtual environment supporting education and research. HydroShare has components that support: (1) resource storage, (2) resource exploration, and (3) web apps for actions on resources. The HydroShare data discovery, sharing and publishing functions as well as HydroShare web apps provide the capability to analyze data and execute models completely in the cloud (servers remote from the user) overcoming desktop platform limitations. The HydroShare GIS app provides a basic capability to visualize spatial data. The HydroShare JupyterHub Notebook app provides flexible and documentable execution of Python code snippets for analysis and modeling in a way that results can be shared among HydroShare users and groups to support research collaboration and education. We will discuss how these developments can be used to support different types of educational efforts in Hydrology where being completely web based is of value in an educational setting as students can all have access to the same functionality regardless of their computer.

  3. Data Sharing For Precision Medicine: Policy Lessons And Future Directions.

    Science.gov (United States)

    Blasimme, Alessandro; Fadda, Marta; Schneider, Manuel; Vayena, Effy

    2018-05-01

    Data sharing is a precondition of precision medicine. Numerous organizations have produced abundant guidance on data sharing. Despite such efforts, data are not being shared to a degree that can trigger the expected data-driven revolution in precision medicine. We set out to explore why. Here we report the results of a comprehensive analysis of data-sharing guidelines issued over the past two decades by multiple organizations. We found that the guidelines overlap on a restricted set of policy themes. However, we observed substantial fragmentation in the policy landscape across specific organizations and data types. This may have contributed to the current stalemate in data sharing. To move toward a more efficient data-sharing ecosystem for precision medicine, policy makers should explore innovative ways to cope with central policy themes such as privacy, consent, and data quality; focus guidance on interoperability, attribution, and public engagement; and promote data-sharing policies that can be adapted to multiple data types.

  4. Data Sharing Interviews with Crop Sciences Faculty: Why They Share Data and How the Library Can Help

    Science.gov (United States)

    Williams, Sarah C.

    2013-01-01

    This study was designed to generate a deeper understanding of data sharing by targeting faculty members who had already made data publicly available. During interviews, crop scientists at the University of Illinois at Urbana-Champaign were asked why they decided to share data, why they chose a data sharing method (e. g., supplementary file,…

  5. Journal Data Sharing Policies and Statistical Reporting Inconsistencies in Psychology

    Directory of Open Access Journals (Sweden)

    Michèle B. Nuijten

    2017-12-01

    Full Text Available In this paper, we present three retrospective observational studies that investigate the relation between data sharing and statistical reporting inconsistencies. Previous research found that reluctance to share data was related to a higher prevalence of statistical errors, often in the direction of statistical significance (Wicherts, Bakker, & Molenaar, 2011. We therefore hypothesized that journal policies about data sharing and data sharing itself would reduce these inconsistencies. In Study 1, we compared the prevalence of reporting inconsistencies in two similar journals on decision making with different data sharing policies. In Study 2, we compared reporting inconsistencies in psychology articles published in PLOS journals (with a data sharing policy and Frontiers in Psychology (without a stipulated data sharing policy. In Study 3, we looked at papers published in the journal Psychological Science to check whether papers with or without an Open Practice Badge differed in the prevalence of reporting errors. Overall, we found no relationship between data sharing and reporting inconsistencies. We did find that journal policies on data sharing seem extremely effective in promoting data sharing. We argue that open data is essential in improving the quality of psychological science, and we discuss ways to detect and reduce reporting inconsistencies in the literature.

  6. Sharing Neuron Data: Carrots, Sticks, and Digital Records.

    Directory of Open Access Journals (Sweden)

    Giorgio A Ascoli

    2015-10-01

    Full Text Available Routine data sharing is greatly benefiting several scientific disciplines, such as molecular biology, particle physics, and astronomy. Neuroscience data, in contrast, are still rarely shared, greatly limiting the potential for secondary discovery and the acceleration of research progress. Although the attitude toward data sharing is non-uniform across neuroscience subdomains, widespread adoption of data sharing practice will require a cultural shift in the community. Digital reconstructions of axonal and dendritic morphology constitute a particularly "sharable" kind of data. The popularity of the public repository NeuroMorpho.Org demonstrates that data sharing can benefit both users and contributors. Increased data availability is also catalyzing the grassroots development and spontaneous integration of complementary resources, research tools, and community initiatives. Even in this rare successful subfield, however, more data are still unshared than shared. Our experience as developers and curators of NeuroMorpho.Org suggests that greater transparency regarding the expectations and consequences of sharing (or not sharing data, combined with public disclosure of which datasets are shared and which are not, may expedite the transition to community-wide data sharing.

  7. A hybrid personalized data recommendation approach for geoscience data sharing

    Science.gov (United States)

    WANG, M.; Wang, J.

    2016-12-01

    Recommender systems are effective tools helping Internet users overcome information overloading. The two most widely used recommendation algorithms are collaborating filtering (CF) and content-based filtering (CBF). A number of recommender systems based on those two algorithms were developed for multimedia, online sells, and other domains. Each of the two algorithms has its advantages and shortcomings. Hybrid approaches that combine these two algorithms are better choices in many cases. In geoscience data sharing domain, where the items (datasets) are more informative (in space and time) and domain-specific, no recommender system is specialized for data users. This paper reports a dynamic weighted hybrid recommendation algorithm that combines CF and CBF for geoscience data sharing portal. We first derive users' ratings on items with their historical visiting time by Jenks Natural Break. In the CBF part, we incorporate the space, time, and subject information of geoscience datasets to compute item similarity. Predicted ratings were computed with k-NN method separately using CBF and CF, and then combined with weights. With training dataset we attempted to find the best model describing ideal weights and users' co-rating numbers. A logarithmic function was confirmed to be the best model. The model was then used to tune the weights of CF and CBF on user-item basis with test dataset. Evaluation results show that the dynamic weighted approach outperforms either solo CF or CBF approach in terms of Precision and Recall.

  8. Advancing Collaboration through Hydrologic Data and Model Sharing

    Science.gov (United States)

    Tarboton, D. G.; Idaszak, R.; Horsburgh, J. S.; Ames, D. P.; Goodall, J. L.; Band, L. E.; Merwade, V.; Couch, A.; Hooper, R. P.; Maidment, D. R.; Dash, P. K.; Stealey, M.; Yi, H.; Gan, T.; Castronova, A. M.; Miles, B.; Li, Z.; Morsy, M. M.

    2015-12-01

    HydroShare is an online, collaborative system for open sharing of hydrologic data, analytical tools, and models. It supports the sharing of and collaboration around "resources" which are defined primarily by standardized metadata, content data models for each resource type, and an overarching resource data model based on the Open Archives Initiative's Object Reuse and Exchange (OAI-ORE) standard and a hierarchical file packaging system called "BagIt". HydroShare expands the data sharing capability of the CUAHSI Hydrologic Information System by broadening the classes of data accommodated to include geospatial and multidimensional space-time datasets commonly used in hydrology. HydroShare also includes new capability for sharing models, model components, and analytical tools and will take advantage of emerging social media functionality to enhance information about and collaboration around hydrologic data and models. It also supports web services and server/cloud based computation operating on resources for the execution of hydrologic models and analysis and visualization of hydrologic data. HydroShare uses iRODS as a network file system for underlying storage of datasets and models. Collaboration is enabled by casting datasets and models as "social objects". Social functions include both private and public sharing, formation of collaborative groups of users, and value-added annotation of shared datasets and models. The HydroShare web interface and social media functions were developed using the Django web application framework coupled to iRODS. Data visualization and analysis is supported through the Tethys Platform web GIS software stack. Links to external systems are supported by RESTful web service interfaces to HydroShare's content. This presentation will introduce the HydroShare functionality developed to date and describe ongoing development of functionality to support collaboration and integration of data and models.

  9. Data Publishing and Sharing Via the THREDDS Data Repository

    Science.gov (United States)

    Wilson, A.; Caron, J.; Davis, E.; Baltzer, T.

    2007-12-01

    The terms "Team Science" and "Networked Science" have been coined to describe a virtual organization of researchers tied via some intellectual challenge, but often located in different organizations and locations. A critical component to these endeavors is publishing and sharing of content, including scientific data. Imagine pointing your web browser to a web page that interactively lets you upload data and metadata to a repository residing on a remote server, which can then be accessed by others in a secure fasion via the web. While any content can be added to this repository, it is designed particularly for storing and sharing scientific data and metadata. Server support includes uploading of data files that can subsequently be subsetted, aggregrated, and served in NetCDF or other scientific data formats. Metadata can be associated with the data and interactively edited. The THREDDS Data Repository (TDR) is a server that provides client initiated, on demand, location transparent storage for data of any type that can then be served by the THREDDS Data Server (TDS). The TDR provides functionality to: * securely store and "own" data files and associated metadata * upload files via HTTP and gridftp * upload a collection of data as single file * modify and restructure repository contents * incorporate metadata provided by the user * generate additional metadata programmatically * edit individual metadata elements The TDR can exist separately from a TDS, serving content via HTTP. Also, it can work in conjunction with the TDS, which includes functionality to provide: * access to data in a variety of formats via -- OPeNDAP -- OGC Web Coverage Service (for gridded datasets) -- bulk HTTP file transfer * a NetCDF view of datasets in NetCDF, OPeNDAP, HDF-5, GRIB, and NEXRAD formats * serving of very large volume datasets, such as NEXRAD radar * aggregation into virtual datasets * subsetting via OPeNDAP and NetCDF Subsetting services This talk will discuss TDR

  10. Data sharing in international transboundary contexts: The Vietnamese perspective on data sharing in the Lower Mekong Basin

    Science.gov (United States)

    Thu, Hang Ngo; Wehn, Uta

    2016-05-01

    Transboundary data sharing is widely recognised as a necessary element in the successful handling of water-related climate change issues, as it is a means towards integrated water resources management (IWRM). However, in practice it is often a challenge to achieve it. The Mekong River Commission (MRC), an inter-governmental agency established by Cambodia, Lao PDR, Thailand and Vietnam, has adopted IWRM in its water strategy plan in order to properly manage the transboundary waters of the Mekong River. In this context, data sharing procedures were institutionalised and have been officially implemented by the four member countries since 2001. This paper uses a systematic approach to identify the extent of data sharing and the factors influencing the willingness of key individuals in the Vietnam National Mekong Committee and its Primary Custodians to share data. We find that the initial objectives of the Procedures for Data and Information Exchange and Sharing (PDIES) have not been fully achieved and, further, that Vietnam has much to gain and little to lose by engaging in data sharing in the MRC context. The primary motivation for data sharing stems from the desire to protect national benefits and to prevent upstream countries from overexploiting the shared water resources. However, data sharing is hindered by a lack of national regulations in the Vietnam context concerning data sharing between state agencies and outdated information management systems.

  11. An ethical framework for sharing patient data without consent

    Directory of Open Access Journals (Sweden)

    Robert Navarro

    2008-12-01

    Discussion The hard problem of non-consented data sharing should be divided into the easier (though non-trivial ones of data and recipient breach risk measurement. Directed research in these two areas will help move the data sharing problem into the 'solved' pile.

  12. Principles of data management facilitating information sharing

    CERN Document Server

    Gordon, Keith

    2013-01-01

    Data is a valuable corporate asset and its effective management can be vital to success. This professional guide covers all the key areas of data management, including database development and corporate data modelling. The new edition covers web technology and its relation to databases and includes material on the management of master data.

  13. Principles of Data Management Facilitating Information Sharing

    CERN Document Server

    Gordon, Keith

    2007-01-01

    Organisations increasingly view data as a valuable corporate asset and its effective management can be vital to success. This professional guide covers all the key areas including database development, data quality and corporate data modelling. It provides the knowledge and techniques required to successfully implement the data management function.

  14. C-share: Optical circuits sharing for software-defined data-centers [arXiv

    DEFF Research Database (Denmark)

    Ben-Itzhak, Yaniv; Caba, Cosmin Marius; Schour, Liran

    2016-01-01

    Integrating optical circuit switches in data-centers is an ongoing research challenge. In recent years, state-of-the-art solutions introduce hybrid packet/circuit architectures for different optical circuit switch technologies, control techniques, and traffic rerouting methods. These solutions...... are based on separated packet and circuit planes which do not have the ability to utilize an optical circuit with flows that do not arrive from or delivered to switches directly connected to the circuit’s end-points. Moreover, current SDN-based elephant flow rerouting methods require a forwarding rule...... for each flow, which raise scalability issues. In this paper, we present C-Share - a practical, scalable SDN-based circuit sharing solution for data center networks. C-Share inherently enable elephant flows to share optical circuits by exploiting a flat upper tier network topology. C-Share is based...

  15. Data-sharing protocol: A prototype implementation

    International Nuclear Information System (INIS)

    Gibney, T.; Greenwood, D.

    1992-12-01

    This paper describes a client/server communication protocol which will allow physicists to access data from cooperating remote experiments. Special low-level ''client'' software within the user's ''home'' data-access library formulates a request for data from the remote experiment. This request is sent over a network to a server at the remote site. The server has specific knowledge about the location and format of the requested data. The server gets the data and sends it over the network to the requesting client, which reformats the data according to the local library's conventions. Our prototype is being developed to suppose remote access to data from ATF, PBX, and micro Vax data from Tore-Supra. We have attempted to create a flexible design which should accommodate data from other experiments as well

  16. Citizen Science: Data Sharing For, By, and With the Public

    Science.gov (United States)

    Wiggins, A.

    2017-12-01

    Data sharing in citizen science is just as challenging as it is for any other type of science, except that there are more parties involved, with more diverse needs and interests. This talk provides an overview of the challenges and current efforts to advance data sharing in citizen science, and suggests refocusing data management activities on supporting the needs of multiple audiences. Early work on data sharing in citizen science advocated applying the standards and practices of academia, which can only address the needs of one of several audiences for citizen science data, and academics are not always the primary audience. Practitioners still need guidance on how to better share data other key parties, such as participants and policymakers, and which data management practices to prioritize for addressing the needs of multiple audiences. The benefits to the project of investing scarce resources into data products and dissemination strategies for each target audience still remain variable, unclear, or unpredictable. And as projects mature and change, the importance of data sharing activities and audiences are likely to change as well. This combination of multiple diverse audiences, shifting priorities, limited resources, and unclear benefits creates a perfect storm of conditions to suppress data sharing. Nonetheless, many citizen science projects make the effort, with exemplars showing substantial returns on data stewardship investments, and international initiatives are underway to bolster the data sharing capacity of the field. To improve the state of data sharing in citizen science, strategic use of limited resources suggests prioritizing data management activities that support the needs of multiple audiences. These may include better transparency about data access and usage, and standardized reporting of broader impacts from secondary data users, to both reward projects and incentivize further data sharing.

  17. Perceptions and Practices of Data Sharing in Engineering Education

    Science.gov (United States)

    Johri, Aditya; Yang, Seungwon; Vorvoreanu, Mihaela; Madhavan, Krishna

    2016-01-01

    As part of our NSF funded collaborative project on Data Sharing within Engineering Education Community, we conducted an empirical study to better understand the current climate of data sharing and participants' future expectations of the field. We present findings of this mixed method study and discuss implications. Overall, we found strong…

  18. Journal data sharing policies and statistical reporting inconsistencies in psychology.

    NARCIS (Netherlands)

    Nuijten, M.B.; Borghuis, J.; Veldkamp, C.L.S.; Dominguez Alvarez, L.; van Assen, M.A.L.M.; Wicherts, J.M.

    2018-01-01

    In this paper, we present three retrospective observational studies that investigate the relation between data sharing and statistical reporting inconsistencies. Previous research found that reluctance to share data was related to a higher prevalence of statistical errors, often in the direction of

  19. Privacy-preserving heterogeneous health data sharing.

    Science.gov (United States)

    Mohammed, Noman; Jiang, Xiaoqian; Chen, Rui; Fung, Benjamin C M; Ohno-Machado, Lucila

    2013-05-01

    Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data. The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy. We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis. The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases. Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis.

  20. Data sharing policy design for consortia: challenges for sustainability.

    Science.gov (United States)

    Kaye, Jane; Hawkins, Naomi

    2014-01-01

    The field of human genomics has led advances in the sharing of data with a view to facilitating translation of research into innovations for human health. This change in scientific practice has been implemented through new policy developed by many principal investigators, project managers and funders, which has ultimately led to new forms of practice and innovative governance models for data sharing. Here, we examine the development of the governance of data sharing in genomics, and explore some of the key challenges associated with the design and implementation of these policies. We examine how the incremental nature of policy design, the perennial problem of consent, the gridlock caused by multiple and overlapping access systems, the administrative burden and the problems with incentives and acknowledgment all have an impact on the potential for data sharing to be maximized. We conclude by proposing ways in which the scientific community can address these problems, to improve the sustainability of data sharing into the future.

  1. Sharing Data in the Global Ocean Observing System (Invited)

    Science.gov (United States)

    Lindstrom, E. J.; McCurdy, A.; Young, J.; Fischer, A. S.

    2010-12-01

    We examine the evolution of data sharing in the field of physical oceanography to highlight the challenges now before us. Synoptic global observation of the ocean from space and in situ platforms has significantly matured over the last two decades. In the early 1990’s the community data sharing challenges facing the World Ocean Circulation Experiment (WOCE) largely focused on the behavior of individual scientists. Satellite data sharing depended on the policy of individual agencies. Global data sets were delivered with considerable delay and with enormous personal sacrifice. In the 2000’s the requirements for global data sets and sustained observations from the likes of the U.N. Framework Convention on Climate Change have led to data sharing and cooperation at a grander level. It is more effective and certainly more efficient. The Joint WMO/IOC Technical Commission on Oceanography and Marine Meteorology (JCOMM) provided the means to organize many aspects of data collection and data dissemination globally, for the common good. In response the Committee on Earth Observing Satellites organized Virtual Constellations to enable the assembly and sharing of like kinds of satellite data (e.g., sea surface topography, ocean vector winds, and ocean color). Individuals in physical oceanography have largely adapted to the new rigors of sharing data for the common good, and as a result of this revolution new science has been enabled. Primary obstacles to sharing have shifted from the individual level to the national level. As we enter into the 2010’s the demands for ocean data continue to evolve with an expanded requirement for more real-time reporting and broader disciplinary coverage, to answer key scientific and societal questions. We are also seeing the development of more numerous national contributions to the global observing system. The drivers for the establishment of global ocean observing systems are expanding beyond climate to include biological and

  2. Data Sharing and the Cancer Moonshot

    Science.gov (United States)

    Health data enthusiasts of all stripes were in Washington, D.C., today for Health Datapalooza. NCI's Dr. Warren Kibbe explains that this annual event explores a topic that is central to NCI’s efforts against cancer: creating knowledge from data.

  3. Data Storage and sharing for the long tail of science

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, B. [Purdue Univ., West Lafayette, IN (United States); Pouchard, L. [Purdue Univ., West Lafayette, IN (United States); Smith, P. M. [Purdue Univ., West Lafayette, IN (United States); Gasc, A. [Purdue Univ., West Lafayette, IN (United States); Pijanowski, B. C. [Purdue Univ., West Lafayette, IN (United States)

    2016-11-21

    Research data infrastructure such as storage must now accommodate new requirements resulting from trends in research data management that require researchers to store their data for the long term and make it available to other researchers. We propose Data Depot, a system and service that provides capabilities for shared space within a group, shared applications, flexible access patterns and ease of transfer at Purdue University. We evaluate Depot as a solution for storing and sharing multiterabytes of data produced in the long tail of science with a use case in soundscape ecology studies from the Human- Environment Modeling and Analysis Laboratory. We observe that with the capabilities enabled by Data Depot, researchers can easily deploy fine-grained data access control, manage data transfer and sharing, as well as integrate their workflows into a High Performance Computing environment.

  4. Stakeholders' views on data sharing in multicenter studies.

    Science.gov (United States)

    Mazor, Kathleen M; Richards, Allison; Gallagher, Mia; Arterburn, David E; Raebel, Marsha A; Nowell, W Benjamin; Curtis, Jeffrey R; Paolino, Andrea R; Toh, Sengwee

    2017-09-01

    To understand stakeholders' views on data sharing in multicenter comparative effectiveness research studies and the value of privacy-protecting methods. Semistructured interviews with five US stakeholder groups. We completed 11 interviews, involving patients (n = 15), researchers (n = 10), Institutional Review Board and regulatory staff (n = 3), multicenter research governance experts (n = 2) and healthcare system leaders (n = 4). Perceptions of the benefits and value of research were the strongest influences toward data sharing; cost and security risks were primary influences against sharing. Privacy-protecting methods that share summary-level data were acknowledged as being appealing, but there were concerns about increased cost and potential loss of research validity. Stakeholders were open to data sharing in multicenter studies that offer value and minimize security risks.

  5. Biospecimen Repository Access and Data Sharing (BRADS)

    Data.gov (United States)

    U.S. Department of Health & Human Services — BRADS is a repository for data and biospecimens from population health research initiatives and clinical or interventional trials designed and implemented by NICHD’s...

  6. Classification of cognitive systems dedicated to data sharing

    Science.gov (United States)

    Ogiela, Lidia; Ogiela, Marek R.

    2017-08-01

    In this paper will be presented classification of new cognitive information systems dedicated to cryptographic data splitting and sharing processes. Cognitive processes of semantic data analysis and interpretation, will be used to describe new classes of intelligent information and vision systems. In addition, cryptographic data splitting algorithms and cryptographic threshold schemes will be used to improve processes of secure and efficient information management with application of such cognitive systems. The utility of the proposed cognitive sharing procedures and distributed data sharing algorithms will be also presented. A few possible application of cognitive approaches for visual information management and encryption will be also described.

  7. Impact of HIPAA's minimum necessary standard on genomic data sharing.

    Science.gov (United States)

    Evans, Barbara J; Jarvik, Gail P

    2018-04-01

    This article provides a brief introduction to the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule's minimum necessary standard, which applies to sharing of genomic data, particularly clinical data, following 2013 Privacy Rule revisions. This research used the Thomson Reuters Westlaw database and law library resources in its legal analysis of the HIPAA privacy tiers and the impact of the minimum necessary standard on genomic data sharing. We considered relevant example cases of genomic data-sharing needs. In a climate of stepped-up HIPAA enforcement, this standard is of concern to laboratories that generate, use, and share genomic information. How data-sharing activities are characterized-whether for research, public health, or clinical interpretation and medical practice support-affects how the minimum necessary standard applies and its overall impact on data access and use. There is no clear regulatory guidance on how to apply HIPAA's minimum necessary standard when considering the sharing of information in the data-rich environment of genomic testing. Laboratories that perform genomic testing should engage with policy makers to foster sound, well-informed policies and appropriate characterization of data-sharing activities to minimize adverse impacts on day-to-day workflows.

  8. Adaptative Peer to Peer Data Sharing for Technology Enhanced Learning

    Science.gov (United States)

    Angelaccio, Michele; Buttarazzi, Berta

    Starting from the hypothesis that P2P Data Sharing in a direct teaching scenario (e.g.: a classroom lesson) may lead to relevant benefits, this paper explores the features of EduSHARE a Collaborative Learning System useful for Enhanced Learning Process.

  9. Sharing Data for Global Infectious Disease Surveillance and Outbreak Detection

    DEFF Research Database (Denmark)

    Aarestrup, Frank Møller; Koopmans, Marion G.

    2016-01-01

    Rapid global sharing and comparison of epidemiological and genomic data on infectious diseases would enable more rapid and efficient global outbreak control and tracking of diseases. Several barriers for global sharing exist but, in our opinion, the presumed magnitude of the problems appears larger...

  10. Anonymizing patient genomic data for public sharing association studies.

    Science.gov (United States)

    Fernandez-Lozano, Carlos; Lopez-Campos, Guillermo; Seoane, Jose A; Lopez-Alonso, Victoria; Dorado, Julian; Martín-Sanchez, Fernando; Pazos, Alejandro

    2013-01-01

    The development of personalized medicine is tightly linked with the correct exploitation of molecular data, especially those associated with the genome sequence along with these use of genomic data there is an increasing demand to share these data for research purposes. Transition of clinical data to research is based in the anonymization of these data so the patient cannot be identified, the use of genomic data poses a great challenge because its nature of identifying data. In this work we have analyzed current methods for genome anonymization and propose a one way encryption method that may enable the process of genomic data sharing accessing only to certain regions of genomes for research purposes.

  11. STEP - Product Model Data Sharing and Exchange

    DEFF Research Database (Denmark)

    Kroszynski, Uri

    1998-01-01

    During the last fifteen years, a very large effort to standardize the product models employed in product design, manufacturing and other life-cycle phases has been undertaken. This effort has the acronym STEP, and resulted in the International Standard ISO-10303 "Industrial Automation Systems...... - Product Data Representation and Exchange", featuring at present some 30 released parts, and growing continuously. Many of the parts are Application Protocols (AP). This article presents an overview of STEP, based upon years of involvement in three ESPRIT projects, which contributed to the development...

  12. Promises and pitfalls of data sharing in qualitative research.

    Science.gov (United States)

    Tsai, Alexander C; Kohrt, Brandon A; Matthews, Lynn T; Betancourt, Theresa S; Lee, Jooyoung K; Papachristos, Andrew V; Weiser, Sheri D; Dworkin, Shari L

    2016-11-01

    The movement for research transparency has gained irresistible momentum over the past decade. Although qualitative research is rarely published in the high-impact journals that have adopted, or are most likely to adopt, data sharing policies, qualitative researchers who publish work in these and similar venues will likely encounter questions about data sharing within the next few years. The fundamental ways in which qualitative and quantitative data differ should be considered when assessing the extent to which qualitative and mixed methods researchers should be expected to adhere to data sharing policies developed with quantitative studies in mind. We outline several of the most critical concerns below, while also suggesting possible modifications that may help to reduce the probability of unintended adverse consequences and to ensure that the sharing of qualitative data is consistent with ethical standards in research. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Security and efficiency data sharing scheme for cloud storage

    International Nuclear Information System (INIS)

    Han, Ke; Li, Qingbo; Deng, Zhongliang

    2016-01-01

    With the adoption and diffusion of data sharing paradigm in cloud storage, there have been increasing demands and concerns for shared data security. Ciphertext Policy Attribute-Based Encryption (CP-ABE) is becoming a promising cryptographic solution to the security problem of shared data in cloud storage. However due to key escrow, backward security and inefficiency problems, existing CP-ABE schemes cannot be directly applied to cloud storage system. In this paper, an effective and secure access control scheme for shared data is proposed to solve those problems. The proposed scheme refines the security of existing CP-ABE based schemes. Specifically, key escrow and conclusion problem are addressed by dividing key generation center into several distributed semi-trusted parts. Moreover, secrecy revocation algorithm is proposed to address not only back secrecy but efficient problem in existing CP-ABE based scheme. Furthermore, security and performance analyses indicate that the proposed scheme is both secure and efficient for cloud storage.

  14. Privacy Protection in Data Sharing : Towards Feedback Solutions

    NARCIS (Netherlands)

    R. Meijer; P. Conradie; R. Choenni; M.S. Bargh

    2014-01-01

    Sharing data is gaining importance in recent years due to proliferation of social media and a growing tendency of governments to gain citizens’ trust through being transparent. Data dissemination, however, increases chance of compromising privacy sensitive data, which undermines trust of data

  15. SCSODC: Integrating Ocean Data for Visualization Sharing and Application

    International Nuclear Information System (INIS)

    Xu, C; Xie, Q; Li, S; Wang, D

    2014-01-01

    The South China Sea Ocean Data Center (SCSODC) was founded in 2010 in order to improve collecting and managing of ocean data of the South China Sea Institute of Oceanology (SCSIO). The mission of SCSODC is to ensure the long term scientific stewardship of ocean data, information and products – collected through research groups, monitoring stations and observation cruises – and to facilitate the efficient use and distribution to possible users. However, data sharing and applications were limited due to the characteristics of distribution and heterogeneity that made it difficult to integrate the data. To surmount those difficulties, the Data Sharing System has been developed by the SCSODC using the most appropriate information management and information technology. The Data Sharing System uses open standards and tools to promote the capability to integrate ocean data and to interact with other data portals or users and includes a full range of processes such as data discovery, evaluation and access combining C/S and B/S mode. It provides a visualized management interface for the data managers and a transparent and seamless data access and application environment for users. Users are allowed to access data using the client software and to access interactive visualization application interface via a web browser. The architecture, key technologies and functionality of the system are discussed briefly in this paper. It is shown that the system of SCSODC is able to implement web visualization sharing and seamless access to ocean data in a distributed and heterogeneous environment

  16. SCSODC: Integrating Ocean Data for Visualization Sharing and Application

    Science.gov (United States)

    Xu, C.; Li, S.; Wang, D.; Xie, Q.

    2014-02-01

    The South China Sea Ocean Data Center (SCSODC) was founded in 2010 in order to improve collecting and managing of ocean data of the South China Sea Institute of Oceanology (SCSIO). The mission of SCSODC is to ensure the long term scientific stewardship of ocean data, information and products - collected through research groups, monitoring stations and observation cruises - and to facilitate the efficient use and distribution to possible users. However, data sharing and applications were limited due to the characteristics of distribution and heterogeneity that made it difficult to integrate the data. To surmount those difficulties, the Data Sharing System has been developed by the SCSODC using the most appropriate information management and information technology. The Data Sharing System uses open standards and tools to promote the capability to integrate ocean data and to interact with other data portals or users and includes a full range of processes such as data discovery, evaluation and access combining C/S and B/S mode. It provides a visualized management interface for the data managers and a transparent and seamless data access and application environment for users. Users are allowed to access data using the client software and to access interactive visualization application interface via a web browser. The architecture, key technologies and functionality of the system are discussed briefly in this paper. It is shown that the system of SCSODC is able to implement web visualization sharing and seamless access to ocean data in a distributed and heterogeneous environment.

  17. If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology.

    Science.gov (United States)

    Wallis, Jillian C; Rolando, Elizabeth; Borgman, Christine L

    2013-01-01

    Research on practices to share and reuse data will inform the design of infrastructure to support data collection, management, and discovery in the long tail of science and technology. These are research domains in which data tend to be local in character, minimally structured, and minimally documented. We report on a ten-year study of the Center for Embedded Network Sensing (CENS), a National Science Foundation Science and Technology Center. We found that CENS researchers are willing to share their data, but few are asked to do so, and in only a few domain areas do their funders or journals require them to deposit data. Few repositories exist to accept data in CENS research areas.. Data sharing tends to occur only through interpersonal exchanges. CENS researchers obtain data from repositories, and occasionally from registries and individuals, to provide context, calibration, or other forms of background for their studies. Neither CENS researchers nor those who request access to CENS data appear to use external data for primary research questions or for replication of studies. CENS researchers are willing to share data if they receive credit and retain first rights to publish their results. Practices of releasing, sharing, and reusing of data in CENS reaffirm the gift culture of scholarship, in which goods are bartered between trusted colleagues rather than treated as commodities.

  18. Sharing and community curation of mass spectrometry data with GNPS

    Science.gov (United States)

    Nguyen, Don Duy; Watrous, Jeramie; Kapono, Clifford A; Luzzatto-Knaan, Tal; Porto, Carla; Bouslimani, Amina; Melnik, Alexey V; Meehan, Michael J; Liu, Wei-Ting; Crüsemann, Max; Boudreau, Paul D; Esquenazi, Eduardo; Sandoval-Calderón, Mario; Kersten, Roland D; Pace, Laura A; Quinn, Robert A; Duncan, Katherine R; Hsu, Cheng-Chih; Floros, Dimitrios J; Gavilan, Ronnie G; Kleigrewe, Karin; Northen, Trent; Dutton, Rachel J; Parrot, Delphine; Carlson, Erin E; Aigle, Bertrand; Michelsen, Charlotte F; Jelsbak, Lars; Sohlenkamp, Christian; Pevzner, Pavel; Edlund, Anna; McLean, Jeffrey; Piel, Jörn; Murphy, Brian T; Gerwick, Lena; Liaw, Chih-Chuang; Yang, Yu-Liang; Humpf, Hans-Ulrich; Maansson, Maria; Keyzers, Robert A; Sims, Amy C; Johnson, Andrew R.; Sidebottom, Ashley M; Sedio, Brian E; Klitgaard, Andreas; Larson, Charles B; P., Cristopher A Boya; Torres-Mendoza, Daniel; Gonzalez, David J; Silva, Denise B; Marques, Lucas M; Demarque, Daniel P; Pociute, Egle; O'Neill, Ellis C; Briand, Enora; Helfrich, Eric J. N.; Granatosky, Eve A; Glukhov, Evgenia; Ryffel, Florian; Houson, Hailey; Mohimani, Hosein; Kharbush, Jenan J; Zeng, Yi; Vorholt, Julia A; Kurita, Kenji L; Charusanti, Pep; McPhail, Kerry L; Nielsen, Kristian Fog; Vuong, Lisa; Elfeki, Maryam; Traxler, Matthew F; Engene, Niclas; Koyama, Nobuhiro; Vining, Oliver B; Baric, Ralph; Silva, Ricardo R; Mascuch, Samantha J; Tomasi, Sophie; Jenkins, Stefan; Macherla, Venkat; Hoffman, Thomas; Agarwal, Vinayak; Williams, Philip G; Dai, Jingqui; Neupane, Ram; Gurr, Joshua; Rodríguez, Andrés M. C.; Lamsa, Anne; Zhang, Chen; Dorrestein, Kathleen; Duggan, Brendan M; Almaliti, Jehad; Allard, Pierre-Marie; Phapale, Prasad; Nothias, Louis-Felix; Alexandrov, Theodore; Litaudon, Marc; Wolfender, Jean-Luc; Kyle, Jennifer E; Metz, Thomas O; Peryea, Tyler; Nguyen, Dac-Trung; VanLeer, Danielle; Shinn, Paul; Jadhav, Ajit; Müller, Rolf; Waters, Katrina M; Shi, Wenyuan; Liu, Xueting; Zhang, Lixin; Knight, Rob; Jensen, Paul R; Palsson, Bernhard O; Pogliano, Kit; Linington, Roger G; Gutiérrez, Marcelino; Lopes, Norberto P; Gerwick, William H; Moore, Bradley S; Dorrestein, Pieter C; Bandeira, Nuno

    2017-01-01

    The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry techniques are well-suited to high-throughput characterization of natural products, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social molecular networking (GNPS, http://gnps.ucsd.edu), an open-access knowledge base for community wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of ‘living data’ through continuous reanalysis of deposited data. PMID:27504778

  19. Developing a data sharing community for spinal cord injury research.

    Science.gov (United States)

    Callahan, Alison; Anderson, Kim D; Beattie, Michael S; Bixby, John L; Ferguson, Adam R; Fouad, Karim; Jakeman, Lyn B; Nielson, Jessica L; Popovich, Phillip G; Schwab, Jan M; Lemmon, Vance P

    2017-09-01

    The rapid growth in data sharing presents new opportunities across the spectrum of biomedical research. Global efforts are underway to develop practical guidance for implementation of data sharing and open data resources. These include the recent recommendation of 'FAIR Data Principles', which assert that if data is to have broad scientific value, then digital representations of that data should be Findable, Accessible, Interoperable and Reusable (FAIR). The spinal cord injury (SCI) research field has a long history of collaborative initiatives that include sharing of preclinical research models and outcome measures. In addition, new tools and resources are being developed by the SCI research community to enhance opportunities for data sharing and access. With this in mind, the National Institute of Neurological Disorders and Stroke (NINDS) at the National Institutes of Health (NIH) hosted a workshop on October 5-6, 2016 in Bethesda, MD, in collaboration with the Open Data Commons for Spinal Cord Injury (ODC-SCI) titled "Preclinical SCI Data: Creating a FAIR Share Community". Workshop invitees were nominated by the workshop steering committee (co-chairs: ARF and VPL; members: AC, KDA, MSB, KF, LBJ, PGP, JMS), to bring together junior and senior level experts including preclinical and basic SCI researchers from academia and industry, data science and bioinformatics experts, investigators with expertise in other neurological disease fields, clinical researchers, members of the SCI community, and program staff representing federal and private funding agencies. The workshop and ODC-SCI efforts were sponsored by the International Spinal Research Trust (ISRT), the Rick Hansen Institute, Wings for Life, the Craig H. Neilsen Foundation and NINDS. The number of attendees was limited to ensure active participation and feedback in small groups. The goals were to examine the current landscape for data sharing in SCI research and provide a path to its future. Below are

  20. Open sharing of genomic data: Who does it and why?

    Directory of Open Access Journals (Sweden)

    Tobias Haeusermann

    Full Text Available We explored the characteristics and motivations of people who, having obtained their genetic or genomic data from Direct-To-Consumer genetic testing (DTC-GT companies, voluntarily decide to share them on the publicly accessible web platform openSNP. The study is the first attempt to describe open data sharing activities undertaken by individuals without institutional oversight. In the paper we provide a detailed overview of the distribution of the demographic characteristics and motivations of people engaged in genetic or genomic open data sharing. The geographical distribution of the respondents showed the USA as dominant. There was no significant gender divide, the age distribution was broad, educational background varied and respondents with and without children were equally represented. Health, even though prominent, was not the respondents' primary or only motivation to be tested. As to their motivations to openly share their data, 86.05% indicated wanting to learn about themselves as relevant, followed by contributing to the advancement of medical research (80.30%, improving the predictability of genetic testing (76.02% and considering it fun to explore genotype and phenotype data (75.51%. Whereas most respondents were well aware of the privacy risks of their involvement in open genetic data sharing and considered the possibility of direct, personal repercussions troubling, they estimated the risk of this happening to be negligible. Our findings highlight the diversity of DTC-GT consumers who decide to openly share their data. Instead of focusing exclusively on health-related aspects of genetic testing and data sharing, our study emphasizes the importance of taking into account benefits and risks that stretch beyond the health spectrum. Our results thus lend further support to the call for a broader and multi-faceted conceptualization of genomic utility.

  1. Finding the Law for Sharing Data in Academia

    NARCIS (Netherlands)

    Hoorn, Esther; Domingus, Marlon; Schmidt, Birgit; Dobreva, Milena

    2015-01-01

    How can universities provide good advice about the legal aspects of research data management? At the same time, how can universities prevent that perceived legal risks become barriers to: conducting research, sharing research data, valorisation of research data, and control mechanisms for the

  2. Including all voices in international data-sharing governance.

    Science.gov (United States)

    Kaye, Jane; Terry, Sharon F; Juengst, Eric; Coy, Sarah; Harris, Jennifer R; Chalmers, Don; Dove, Edward S; Budin-Ljøsne, Isabelle; Adebamowo, Clement; Ogbe, Emilomo; Bezuidenhout, Louise; Morrison, Michael; Minion, Joel T; Murtagh, Madeleine J; Minari, Jusaku; Teare, Harriet; Isasi, Rosario; Kato, Kazuto; Rial-Sebbag, Emmanuelle; Marshall, Patricia; Koenig, Barbara; Cambon-Thomsen, Anne

    2018-03-07

    Governments, funding bodies, institutions, and publishers have developed a number of strategies to encourage researchers to facilitate access to datasets. The rationale behind this approach is that this will bring a number of benefits and enable advances in healthcare and medicine by allowing the maximum returns from the investment in research, as well as reducing waste and promoting transparency. As this approach gains momentum, these data-sharing practices have implications for many kinds of research as they become standard practice across the world. The governance frameworks that have been developed to support biomedical research are not well equipped to deal with the complexities of international data sharing. This system is nationally based and is dependent upon expert committees for oversight and compliance, which has often led to piece-meal decision-making. This system tends to perpetuate inequalities by obscuring the contributions and the important role of different data providers along the data stream, whether they be low- or middle-income country researchers, patients, research participants, groups, or communities. As research and data-sharing activities are largely publicly funded, there is a strong moral argument for including the people who provide the data in decision-making and to develop governance systems for their continued participation. We recommend that governance of science becomes more transparent, representative, and responsive to the voices of many constituencies by conducting public consultations about data-sharing addressing issues of access and use; including all data providers in decision-making about the use and sharing of data along the whole of the data stream; and using digital technologies to encourage accessibility, transparency, and accountability. We anticipate that this approach could enhance the legitimacy of the research process, generate insights that may otherwise be overlooked or ignored, and help to bring valuable

  3. Sharing and reuse of individual participant data from clinical trials

    DEFF Research Database (Denmark)

    Ohmann, Christian; Banzi, Rita; Canham, Steve

    2017-01-01

    : The adoption of the recommendations in this document would help to promote and support data sharing and reuse among researchers, adequately inform trial participants and protect their rights, and provide effective and efficient systems for preparing, storing and accessing data. The recommendations now need......OBJECTIVES: We examined major issues associated with sharing of individual clinical trial data and developed a consensus document on providing access to individual participant data from clinical trials, using a broad interdisciplinary approach. DESIGN AND METHODS: This was a consensus...... Research Infrastructures Building Enduring Life-science Services) and coordinated by the European Clinical Research Infrastructure Network. Thus, the focus was on non-commercial trials and the perspective mainly European. OUTCOME: We developed principles and practical recommendations on how to share data...

  4. Open innovation: Towards sharing of data, models and workflows.

    Science.gov (United States)

    Conrado, Daniela J; Karlsson, Mats O; Romero, Klaus; Sarr, Céline; Wilkins, Justin J

    2017-11-15

    Sharing of resources across organisations to support open innovation is an old idea, but which is being taken up by the scientific community at increasing speed, concerning public sharing in particular. The ability to address new questions or provide more precise answers to old questions through merged information is among the attractive features of sharing. Increased efficiency through reuse, and increased reliability of scientific findings through enhanced transparency, are expected outcomes from sharing. In the field of pharmacometrics, efforts to publicly share data, models and workflow have recently started. Sharing of individual-level longitudinal data for modelling requires solving legal, ethical and proprietary issues similar to many other fields, but there are also pharmacometric-specific aspects regarding data formats, exchange standards, and database properties. Several organisations (CDISC, C-Path, IMI, ISoP) are working to solve these issues and propose standards. There are also a number of initiatives aimed at collecting disease-specific databases - Alzheimer's Disease (ADNI, CAMD), malaria (WWARN), oncology (PDS), Parkinson's Disease (PPMI), tuberculosis (CPTR, TB-PACTS, ReSeqTB) - suitable for drug-disease modelling. Organized sharing of pharmacometric executable model code and associated information has in the past been sparse, but a model repository (DDMoRe Model Repository) intended for the purpose has recently been launched. In addition several other services can facilitate model sharing more generally. Pharmacometric workflows have matured over the last decades and initiatives to more fully capture those applied to analyses are ongoing. In order to maximize both the impact of pharmacometrics and the knowledge extracted from clinical data, the scientific community needs to take ownership of and create opportunities for open innovation. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Sharing data under the 21st Century Cures Act.

    Science.gov (United States)

    Majumder, Mary A; Guerrini, Christi J; Bollinger, Juli M; Cook-Deegan, Robert; McGuire, Amy L

    2017-12-01

    On 13 December 2016, President Obama signed the 21st Century Cures Act ("the Act") into law. Many of its provisions support the creation of an "Information Commons," an ecosystem of separate but interconnected initiatives that facilitate open and responsible sharing of genomic and other data for research and clinical purposes. For example, the Act supports the National Institutes of Health in mandating data sharing, provides funding and guidance for the large national cohort program now known as All of Us, expresses congressional support for a global pediatric study network, and strengthens patient access to health information. The Act also addresses potential barriers to data sharing. For example, it makes the issuance of certificates of confidentiality automatic for federally funded research involving "identifiable, sensitive" information and strengthens the associated protections. At the same time, the Act exacerbates or neglects several challenges, for example, increasing complexity by adding a new definition of "identifiable" and failing to address the financial sustainability of data sharing and the scope of commercialization. In sum, the Act is a positive step, yet there is still much work to be done before the goals of broad data sharing and utilization can be achieved.

  6. Best practice for analysis of shared clinical trial data

    Directory of Open Access Journals (Sweden)

    Sally Hollis

    2016-07-01

    Full Text Available Abstract Background Greater transparency, including sharing of patient-level data for further research, is an increasingly important topic for organisations who sponsor, fund and conduct clinical trials. This is a major paradigm shift with the aim of maximising the value of patient-level data from clinical trials for the benefit of future patients and society. We consider the analysis of shared clinical trial data in three broad categories: (1 reanalysis - further investigation of the efficacy and safety of the randomized intervention, (2 meta-analysis, and (3 supplemental analysis for a research question that is not directly assessing the randomized intervention. Discussion In order to support appropriate interpretation and limit the risk of misleading findings, analysis of shared clinical trial data should have a pre-specified analysis plan. However, it is not generally possible to limit bias and control multiplicity to the extent that is possible in the original trial design, conduct and analysis, and this should be acknowledged and taken into account when interpreting results. We highlight a number of areas where specific considerations arise in planning, conducting, interpreting and reporting analyses of shared clinical trial data. A key issue is that that these analyses essentially share many of the limitations of any post hoc analyses beyond the original specified analyses. The use of individual patient data in meta-analysis can provide increased precision and reduce bias. Supplemental analyses are subject to many of the same issues that arise in broader epidemiological analyses. Specific discussion topics are addressed within each of these areas. Summary Increased provision of patient-level data from industry and academic-led clinical trials for secondary research can benefit future patients and society. Responsible data sharing, including transparency of the research objectives, analysis plans and of the results will support appropriate

  7. Exploiting differentiated tuple distribution in shared data spaces

    NARCIS (Netherlands)

    Russello, G.; Chaudron, M.R.V.; Steen, van M.; Danelutto, M.; Vanneschi, M.

    2004-01-01

    The shared data space model has proven to be an effective paradigm for building distributed applications. However, building an efficient distributed implementation remains a challenge. A plethora of different implementations exists. Each of them has a specific policy for distributing data across

  8. Enabling Interoperable and Selective Data Sharing among Social Networking Sites

    Science.gov (United States)

    Shin, Dongwan; Lopes, Rodrigo

    With the widespread use of social networking (SN) sites and even introduction of a social component in non-social oriented services, there is a growing concern over user privacy in general, how to handle and share user profiles across SN sites in particular. Although there have been several proprietary or open source-based approaches to unifying the creation of third party applications, the availability and retrieval of user profile information are still limited to the site where the third party application is run, mostly devoid of the support for data interoperability. In this paper we propose an approach to enabling interopearable and selective data sharing among SN sites. To support selective data sharing, we discuss an authenticated dictionary (ADT)-based credential which enables a user to share only a subset of her information certified by external SN sites with applications running on an SN site. For interoperable data sharing, we propose an extension to the OpenSocial API so that it can provide an open source-based framework for allowing the ADT-based credential to be used seamlessly among different SN sites.

  9. Unidata: A geoscience e-infrastructure for International Data Sharing

    Science.gov (United States)

    Ramamurthy, Mohan

    2017-04-01

    The Internet and its myriad manifestations, including the World Wide Web, have amply demonstrated the compounding benefits of a global cyberinfrastructure and the power of networked communities as institutions and people exchange knowledge, ideas, and resources. The Unidata Program recognizes those benefits, and over the past several years it has developed a growing portfolio of international data distribution activities, conducted in close collaboration with academic, research and operational institutions on several continents, to advance earth system science education and research. The portfolio includes provision of data, tools, support and training as well as outreach activities that bring various stakeholders together to address important issues, all toward the goals of building a community with a shared vision. The overarching goals of Unidata's international data sharing activities include: • democratization of access-to and use-of data that describe the dynamic earth system by facilitating data access to a broad spectrum of observations and forecasts • building capacity and empowering geoscientists and educators worldwide by building encouraging local communities where data, tools, and best practices in education and research are shared • strengthening international science partnerships for exchanging knowledge and expertise • Supporting faculty and students at research and educational institutions in the use of Unidata systems building regional and global communities around specific geoscientific themes. In this presentation, I will present Unidata's ongoing data sharing activities in Latin America, Europe, Africa and Antarctica that are enabling linkages to existing and emergent e-infrastructures and operational networks, including recent advances to develop interoperable data systems, tools, and services that benefit the geosciences. Particular emphasis in the presentation will be made to describe the examples of the use of Unidata

  10. Managing and sharing research data a guide to good practice

    CERN Document Server

    Corti, Louise; Bishop, Libby; Woollard, Matthew

    2014-01-01

    Research funders in the UK, USA and across Europe are implementing data management and sharing policies to maximize openness of data, transparency and accountability of the research they support. Written by experts from the UK Data Archive with over 20 years experience, this book gives post-graduate students, researchers and research support staff the data management skills required in today’s changing research environment.

  11. iDASH: integrating data for analysis, anonymization, and sharing

    Science.gov (United States)

    Bafna, Vineet; Boxwala, Aziz A; Chapman, Brian E; Chapman, Wendy W; Chaudhuri, Kamalika; Day, Michele E; Farcas, Claudiu; Heintzman, Nathaniel D; Jiang, Xiaoqian; Kim, Hyeoneui; Kim, Jihoon; Matheny, Michael E; Resnic, Frederic S; Vinterbo, Staal A

    2011-01-01

    iDASH (integrating data for analysis, anonymization, and sharing) is the newest National Center for Biomedical Computing funded by the NIH. It focuses on algorithms and tools for sharing data in a privacy-preserving manner. Foundational privacy technology research performed within iDASH is coupled with innovative engineering for collaborative tool development and data-sharing capabilities in a private Health Insurance Portability and Accountability Act (HIPAA)-certified cloud. Driving Biological Projects, which span different biological levels (from molecules to individuals to populations) and focus on various health conditions, help guide research and development within this Center. Furthermore, training and dissemination efforts connect the Center with its stakeholders and educate data owners and data consumers on how to share and use clinical and biological data. Through these various mechanisms, iDASH implements its goal of providing biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses. PMID:22081224

  12. Informatics methods to enable sharing of quantitative imaging research data.

    Science.gov (United States)

    Levy, Mia A; Freymann, John B; Kirby, Justin S; Fedorov, Andriy; Fennessy, Fiona M; Eschrich, Steven A; Berglund, Anders E; Fenstermacher, David A; Tan, Yongqiang; Guo, Xiaotao; Casavant, Thomas L; Brown, Bartley J; Braun, Terry A; Dekker, Andre; Roelofs, Erik; Mountz, James M; Boada, Fernando; Laymon, Charles; Oborski, Matt; Rubin, Daniel L

    2012-11-01

    The National Cancer Institute Quantitative Research Network (QIN) is a collaborative research network whose goal is to share data, algorithms and research tools to accelerate quantitative imaging research. A challenge is the variability in tools and analysis platforms used in quantitative imaging. Our goal was to understand the extent of this variation and to develop an approach to enable sharing data and to promote reuse of quantitative imaging data in the community. We performed a survey of the current tools in use by the QIN member sites for representation and storage of their QIN research data including images, image meta-data and clinical data. We identified existing systems and standards for data sharing and their gaps for the QIN use case. We then proposed a system architecture to enable data sharing and collaborative experimentation within the QIN. There are a variety of tools currently used by each QIN institution. We developed a general information system architecture to support the QIN goals. We also describe the remaining architecture gaps we are developing to enable members to share research images and image meta-data across the network. As a research network, the QIN will stimulate quantitative imaging research by pooling data, algorithms and research tools. However, there are gaps in current functional requirements that will need to be met by future informatics development. Special attention must be given to the technical requirements needed to translate these methods into the clinical research workflow to enable validation and qualification of these novel imaging biomarkers. Copyright © 2012 Elsevier Inc. All rights reserved.

  13. Information And Data-Sharing Plan of IPY China Activity

    Science.gov (United States)

    Zhang, X.; Cheng, W.

    2007-12-01

    Polar Data-Sharing is an effective resolution to global system and polar science problems and to interdisciplinary and sustainable study, as well as an important means to deal with IPY scientific heritages and realize IPY goals. Corresponding to IPY Data-Sharing policies, Information and Data-Sharing Plan was listed in five sub-plans of IPY Chinese Programme launched in March, 2007,they are Scientific research program of the Prydz Bay, Amery Ice Shelf and Dome A transects(short title:'PANDA'), the Arctic Scientific Research Expedition Plan, International Cooperation Plan, Information and Data-Sharing Plan, Education and Outreach. China, since the foundation of Antarctic Zhongshan Station in 1989, has carried out systematic scientific expeditions and researches in Larsemann Hills, Prydz Bay and the neighbouring sea areas, organized 14 Prydz Bay oceanographic investigations, 3 Amery Ice Shelf expeditions, 4 Grove Mountains expeditions and 5 inland ice cap scientific expeditions. 2 comprehensive oceanographic investigations in the Arctic Ocean were conducted in 1999 and 2003, acquired a large amount of data and samples in PANDA section and fan areas of Pacific Ocean in the Arctic Ocean. A mechanism of basic data submitting ,sharing and archiving has been gradually set up since 2000. Presently, Polar Science Database and Polar Sample Resource Sharing Platform of China with the aim of sharing polar data and samples has been initially established and began to provide sharing service to domestic and oversea users. According to IPY Chinese Activity, 2 scientific expeditions in the Arctic Ocean, 3 in the South Ocean, 2 at Amery Ice Shelf, 1 on Grove Mountains and 2 inland ice cap expeditions on Dome A will be carried out during IPY period. According to the experiences accumulated in the past and the jobs in the future, the Information and Data- Sharing Plan, during 2007-2010, will save, archive, and provide exchange and sharing services upon the data obtained by scientific

  14. Data sharing platforms for de-identified data from human clinical trials.

    Science.gov (United States)

    Huser, Vojtech; Shmueli-Blumberg, Dikla

    2018-04-01

    Data sharing of de-identified individual participant data is being adopted by an increasing number of sponsors of human clinical trials. In addition to standardizing data syntax for shared trial data, semantic integration of various data elements is the focus of several initiatives that define research common data elements. This perspective article, in the first part, compares several data sharing platforms for de-identified clinical research data in terms of their size, policies and supported features. In the second part, we use a case study approach to describe in greater detail one data sharing platform (Data Share from National Institute of Drug Abuse). We present data on the past use of the platform, data formats offered, data de-identification approaches and its use of research common data elements. We conclude with a summary of current and expected future trends that facilitate secondary research use of data from completed human clinical trials.

  15. Big data from small data: data-sharing in the ‘long tail’ of neuroscience

    Science.gov (United States)

    Ferguson, Adam R; Nielson, Jessica L; Cragin, Melissa H; Bandrowski, Anita E; Martone, Maryann E

    2016-01-01

    The launch of the US BRAIN and European Human Brain Projects coincides with growing international efforts toward transparency and increased access to publicly funded research in the neurosciences. The need for data-sharing standards and neuroinformatics infrastructure is more pressing than ever. However, ‘big science’ efforts are not the only drivers of data-sharing needs, as neuroscientists across the full spectrum of research grapple with the overwhelming volume of data being generated daily and a scientific environment that is increasingly focused on collaboration. In this commentary, we consider the issue of sharing of the richly diverse and heterogeneous small data sets produced by individual neuroscientists, so-called long-tail data. We consider the utility of these data, the diversity of repositories and options available for sharing such data, and emerging best practices. We provide use cases in which aggregating and mining diverse long-tail data convert numerous small data sources into big data for improved knowledge about neuroscience-related disorders. PMID:25349910

  16. Facilitating Secure Sharing of Personal Health Data in the Cloud.

    Science.gov (United States)

    Thilakanathan, Danan; Calvo, Rafael A; Chen, Shiping; Nepal, Surya; Glozier, Nick

    2016-05-27

    Internet-based applications are providing new ways of promoting health and reducing the cost of care. Although data can be kept encrypted in servers, the user does not have the ability to decide whom the data are shared with. Technically this is linked to the problem of who owns the data encryption keys required to decrypt the data. Currently, cloud service providers, rather than users, have full rights to the key. In practical terms this makes the users lose full control over their data. Trust and uptake of these applications can be increased by allowing patients to feel in control of their data, generally stored in cloud-based services. This paper addresses this security challenge by providing the user a way of controlling encryption keys independently of the cloud service provider. We provide a secure and usable system that enables a patient to share health information with doctors and specialists. We contribute a secure protocol for patients to share their data with doctors and others on the cloud while keeping complete ownership. We developed a simple, stereotypical health application and carried out security tests, performance tests, and usability tests with both students and doctors (N=15). We developed the health application as an app for Android mobile phones. We carried out the usability tests on potential participants and medical professionals. Of 20 participants, 14 (70%) either agreed or strongly agreed that they felt safer using our system. Using mixed methods, we show that participants agreed that privacy and security of health data are important and that our system addresses these issues. We presented a security protocol that enables patients to securely share their eHealth data with doctors and nurses and developed a secure and usable system that enables patients to share mental health information with doctors.

  17. Beyond Our Borders? Public Resistance to Global Genomic Data Sharing.

    Directory of Open Access Journals (Sweden)

    Mary A Majumder

    2016-11-01

    Full Text Available Prospects have never seemed better for a truly global approach to science to improve human health, with leaders of national initiatives laying out their vision of a worldwide network of related projects. An extensive literature addresses obstacles to global genomic data sharing, yet a series of public polls suggests that the scientific community may be overlooking a significant barrier: potential public resistance to data sharing across national borders. In several large United States surveys, university researchers in other countries were deemed the least acceptable group of data users, and a just-completed US survey found a marked increase in privacy and security concerns related to data access by non-US researchers. Furthermore, diminished support for sharing beyond national borders is not unique to the US, although the limited data from outside the US suggest variation across countries as well as demographic groups. Possible sources of resistance include apprehension about privacy and security protections. Strategies for building public support include making the affirmative case for global data sharing, addressing privacy, security, and other legitimate concerns, and investigating public concerns in greater depth.

  18. Human Proteinpedia enables sharing of human protein data

    Energy Technology Data Exchange (ETDEWEB)

    Mathivanan, Suresh; Ahmed, Mukhtar; Ahn, Natalie G.; Alexandre, Hainard; Amanchy, Ramars; Andrews, Philip C.; Bader, Joel S.; Balgley, Brian M.; Bantscheff, Marcus; Bennett, Keiryn; Bjorling, Erik; Blagoev, Blagoy; Bose , Ron; Brahmachari, Samir K.; Burlingame, Alma S.; Bustelo, Xos R.; Cagney, Gerard; Cantin, Greg T; Cardasis, Helene L; Celis, Julio E; Chaerkady, Raghothama; Chu, Feixia; Cole, Phillip A.; Costello, Catherine E; Cotter , Robert J.; Crockett, David; DeLany , James P.; De Marzo, Angelo M; DeSouza, Leroi V; Deutsch, Eric W.; Dransfield , Eric; Drewes , Gerard; Droit , Arnaud; Dunn, Michael; Elenitoba-Johnson, Kojo; Ewing, Rob M.; Van Eyk , Jennifer; Faca , Vitor; Falkner , Jayson; Fang, Xiangming; Fenselau , Catherine; Figeys , Daniel; Gagne , Pierre; Gelfi , Cecilia; Gevaert , Kris; Gimble , Jeffrey; Gnad , Florian; Goel, Renu; Gromov , Pavel; Hanash, Samir M.; Hancock, William S.; Harsha , HC; Hart , Gerald; Faith , Hays; He , Fuchu; Hebbar , Prashantha; Helsens , Kenny; Hermeking , Heiko; Hide , Winston; Hjerno, Karin; Hochstrasser, Denis F.; Hofmann, Oliver; Horn , David M.; Hruban , Ralph H.; Ibarrola , Nieves; James , Peter; Jensen , Ole N.; Jensen, Pia H.; Jung , Peter; Kandasamy, Kumaran; Kheterpal , Indu; Kikuno , Reiko; Korf, Ulrike; Korner, Roman; Kuster, Bernhard; Kwon , Min-Seok; Lee , Hyoung-Joo; Lee , Young - Jin; Lefevre , Michael; Lehvaslaiho, Minna; Lescuyer, Pierre; Levander, Fredrik; Lim, Megan S.; Lobke, Christian; Loo, Joseph; Mann, Matthias; Martens , Lennart; Martinez-Heredia, Juan; McComb, Mark E.; McRedmond , James; Mehrle, Alexander; Menon, Rajasree; Miller, Christine A.; Mischak, Harald; Mohan, S Sujatha; Mohmood , Riaz; Molina , Henrik; Moran , Michael F.; Morgan, James D.; Moritz , Robert; Morzel, Martine; Muddiman, David C.; Nalli , Anuradha; Navarro, J. D.; Neubert , Thomas A.; Ohara , Osamu; Oliva, Rafael; Omenn, Gilbert; Oyama , Masaaki; Paik, Young-Ki; Pennington , Kyla; Pepperkok, Rainer; Periaswamy, Balamurugan; Petricoin, Emanuel F.; Poirier, Guy G.; Prasad, T S Keshava; Purvine, Samuel O.; Rahiman , B Abdul; Ramachandran, Prasanna; Ramachandra , Y L; Rice, Robert H.; Rick , Jens; Ronnholm , Ragna H.; Salonen , Johanna; Sanchez , Jean - Charles; Sayd , Thierry; Seshi, Beerelli; Shankari, Kripa; Sheng , Shi Jun; Shetty , Vivekananda; Shivakumar, K.; Simpson, Richard J.; Sirdeshmukh, Ravi; Siu , K W Michael; Smith, Jeffrey C.; Smith, Richard D.; States, David J.; Sugano, Sumio; Sullivan , Matthew; Superti - Furga, Giulio; Takatalo , Maarit; Thongboonkerd , Visith; Trinidad , Jonathan C.; Uhlen , Mathias; Vandekerckhove, Joel; Vasilescu , Julian; Veenstra, Timothy D.; Vidal - Taboada, Jose - Manuel; Vihinen, Mauno; Wait , Robin; Wang, Xiaoyue; Wiemann, Stefan; Wu , Billy; Xu, Tao; Yates, John R.; Zhong, Jun; Zhou, Ming; Zhu, Yunping; Zurbig, Petra; Pandey, Akhilesh

    2008-02-01

    Proteomic technologies, such as yeast twohybrid, mass spectrometry (MS), protein/ peptide arrays and fluorescence microscopy, yield multi-dimensional data sets, which are often quite large and either not published or published as supplementary information that is not easily searchable. Without a system in place for standardizing and sharing data, it is not fruitful for the biomedical community to contribute these types of data to centralized repositories. Even more difficult is the annotation and display of pertinent information in the context of the corresponding proteins. Wikipedia, an online encyclopedia that anyone can edit, has already proven quite successful1 and can be used as a model for sharing biological data. However, the need for experimental evidence, data standardization and ownership of data creates scientific obstacles.

  19. Sharing Hydrologic Data with the CUAHSI Hydrologic Information System (Invited)

    Science.gov (United States)

    Tarboton, D. G.; Maidment, D. R.; Zaslavsky, I.; Horsburgh, J. S.; Whiteaker, T.; Piasecki, M.; Goodall, J. L.; Valentine, D. W.; Whitenack, T.

    2009-12-01

    The CUAHSI Hydrologic Information System (HIS) is an internet based system to support the sharing of hydrologic data consisting of databases connected using the internet through web services as well as software for data discovery, access and publication. The HIS is founded upon an information model for observations at stationary points that supports its data services. A data model, the CUAHSI Observations Data Model (ODM), provides community defined semantics needed to allow sharing information from diverse data sources. A defined set of CUAHSI HIS web services allows for the development of data services, which scale from centralized data services which support access to National Datasets such as the USGS National Water Information System (NWIS) and EPA Storage and Retrieval System (STORET), in a standard way; to distributed data services which allow users to establish their own server and publish their data. User data services are registered to a central HIS website, and they become searchable and accessible through the centralized discovery and data access tools. HIS utilizes both an XML and relational database schema for transmission and storage of data respectively. WaterML is the XML schema used for data transmission that underlies the machine to machine communications, while the ODM is implemented as relational database model for persistent data storage. Web services support access to hydrologic data stored in ODM and communicate using WaterML directly from applications software such as Excel, MATLAB and ArcGIS that have Simple Object Access Protocol (SOAP) capability. A significant value of web services derives from the capability to use them from within a user’s preferred analysis environment, using community defined semantics, rather than requiring a user to learn new software. This allows a user to work with data from national and academic sources, almost as though it was on their local disk. Users wishing to share or publish their data through CUAHSI

  20. Genomic research and wide data sharing: views of prospective participants.

    Science.gov (United States)

    Trinidad, Susan Brown; Fullerton, Stephanie M; Bares, Julie M; Jarvik, Gail P; Larson, Eric B; Burke, Wylie

    2010-08-01

    Sharing study data within the research community generates tension between two important goods: promoting scientific goals and protecting the privacy interests of study participants. This study was designed to explore the perceptions, beliefs, and attitudes of research participants and possible future participants regarding genome-wide association studies and repository-based research. Focus group sessions with (1) current research participants, (2) surrogate decision-makers, and (3) three age-defined cohorts (18-34 years, 35-50, >50). Participants expressed a variety of opinions about the acceptability of wide sharing of genetic and phenotypic information for research purposes through large, publicly accessible data repositories. Most believed that making de-identified study data available to the research community is a social good that should be pursued. Privacy and confidentiality concerns were common, although they would not necessarily preclude participation. Many participants voiced reservations about sharing data with for-profit organizations. Trust is central in participants' views regarding data sharing. Further research is needed to develop governance models that enact the values of stewardship.

  1. Book Review. Mapping the determinants of spatial data sharing By ...

    African Journals Online (AJOL)

    Book Review. Mapping the determinants of spatial data sharing. By Uta Wehn de Montalvo (2003). Yoichi Mine. Abstract. Aldershot: Ashgate. Africa Development Vol. XXX(3) 2005: 145-146. http://dx.doi.org/10.4314/ad.v30i3.22237 · AJOL African Journals Online. HOW TO USE AJOL... for Researchers · for Librarians ...

  2. Resource Planning for SPARQL Query Execution on Data Sharing Platforms

    DEFF Research Database (Denmark)

    Hagedorn, Stefan; Hose, Katja; Sattler, Kai-Uwe

    2014-01-01

    To increase performance, data sharing platforms often make use of clusters of nodes where certain tasks can be executed in parallel. Resource planning and especially deciding how many processors should be chosen to exploit parallel processing is complex in such a setup as increasing the number...

  3. Transparency in Teaching: Faculty Share Data and Improve Students' Learning

    Science.gov (United States)

    Winkelmes, Mary-Ann

    2013-01-01

    The Illinois Initiative on Transparency in Learning and Teaching is a grassroots assessment project designed to promote students' conscious understanding of how they learn and to enable faculty to gather, share, and promptly benefit from data about students' learning by coordinating their efforts across disciplines, institutions, and countries.…

  4. Preventing, Controlling, and Sharing Data of Arsenicosis in China

    Directory of Open Access Journals (Sweden)

    Yuanqing Tong

    2007-09-01

    Full Text Available The first case of arsenicosis was reported in China in the 1950s. That incident was associated with the so-called "black foot disease." In the late 1970s and early 1980s, arsenic specific coetaneous changes were diagnosed in the Xinjiang Autonomous Rregion and subsequently in the Inner Mongolia Autonomous Region and Shanxi Province. Recently, endemic arsenicosis was also found in Jilin, Ningxia, Qinghai, and Anhui Provinces. The prevalence of arsenicosis in China is becoming more and more serious. In order to prevent and control it, many departments and institutes have begun to work in this field. They have made a great progress including also the sharing of arsenicosis data within a limited area. But the limited nature of this data sharing is a barrier for preventing and controlling arsenicosis. Only once data sharing is realized within the whole nation, can we discover the best way of eliminating arsenicosis. With this goal in mind, we have set up a rudimentary platform of asenicosis data sharing. This gradually needs to be improved and improved.

  5. Operating System Support for Shared Hardware Data Structures

    Science.gov (United States)

    2013-01-31

    2.1.4 Systolic Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.5 Abstract Datatype Processors...intelligent software support is necessary to achieve good perfor- mance from HWDSs in the presence of overflow and sharing. 2.1.5 Abstract Datatype ...abstract datatype processors, which accelerate data types with mechanisms and performance similar to HWDSs. Abstract datatype instructions can reduce

  6. Geospatial Data Repository. Sharing Data Across the Organization and Beyond

    National Research Council Canada - National Science Library

    Ruiz, Marilyn

    2001-01-01

    .... This short Technical Note discusses a five-part approach to creating a data repository that addresses the problems of the historical organizational framework for geospatial data. Fort Hood, Texas was the site used to develop the prototype. A report documenting the complete study will be available in late Spring 2001.

  7. Protecting Privacy of Shared Epidemiologic Data without Compromising Analysis Potential

    Directory of Open Access Journals (Sweden)

    John Cologne

    2012-01-01

    Full Text Available Objective. Ensuring privacy of research subjects when epidemiologic data are shared with outside collaborators involves masking (modifying the data, but overmasking can compromise utility (analysis potential. Methods of statistical disclosure control for protecting privacy may be impractical for individual researchers involved in small-scale collaborations. Methods. We investigated a simple approach based on measures of disclosure risk and analytical utility that are straightforward for epidemiologic researchers to derive. The method is illustrated using data from the Japanese Atomic-bomb Survivor population. Results. Masking by modest rounding did not adequately enhance security but rounding to remove several digits of relative accuracy effectively reduced the risk of identification without substantially reducing utility. Grouping or adding random noise led to noticeable bias. Conclusions. When sharing epidemiologic data, it is recommended that masking be performed using rounding. Specific treatment should be determined separately in individual situations after consideration of the disclosure risks and analysis needs.

  8. A crystallographic perspective on sharing data and knowledge

    Science.gov (United States)

    Bruno, Ian J.; Groom, Colin R.

    2014-10-01

    The crystallographic community is in many ways an exemplar of the benefits and practices of sharing data. Since the inception of the technique, virtually every published crystal structure has been made available to others. This has been achieved through the establishment of several specialist data centres, including the Cambridge Crystallographic Data Centre, which produces the Cambridge Structural Database. Containing curated structures of small organic molecules, some containing a metal, the database has been produced for almost 50 years. This has required the development of complex informatics tools and an environment allowing expert human curation. As importantly, a financial model has evolved which has, to date, ensured the sustainability of the resource. However, the opportunities afforded by technological changes and changing attitudes to sharing data make it an opportune moment to review current practices.

  9. Protecting Privacy of Shared Epidemiologic Data without Compromising Analysis Potential

    International Nuclear Information System (INIS)

    Cologne, J.; Nakashima, E.; Funamoto, S.; Grant, E.J.; Chen, Y.; Hiroaki Katayama, H.

    2012-01-01

    Objective. Ensuring privacy of research subjects when epidemiologic data are shared with outside collaborators involves masking (modifying) the data, but over masking can compromise utility (analysis potential). Methods of statistical disclosure control for protecting privacy may be impractical for individual researchers involved in small-scale collaborations. Methods. We investigated a simple approach based on measures of disclosure risk and analytical utility that are straightforward for epidemiologic researchers to derive. The method is illustrated using data from the Japanese Atomic-bomb Survivor population. Results. Masking by modest rounding did not adequately enhance security but rounding to remove several digits of relative accuracy effectively reduced the risk of identification without substantially reducing utility. Grouping or adding random noise led to noticeable bias. Conclusions. When sharing epidemiologic data, it is recommended that masking be performed using rounding. Specific treatment should be determined separately in individual situations after consideration of the disclosure risks and analysis needs

  10. Generating Sustainable Value from Open Data in a Sharing Society

    DEFF Research Database (Denmark)

    Jetzek, Thorhildur; Avital, Michel; Bjørn-Andersen, Niels

    2014-01-01

    Our societies are in the midst of a paradigm shift that transforms hierarchical markets into an open and networked economy based on digital technology and information. In that context, open data is widely presumed to have a positive effect on social, environmental and economic value; however...... the evidence to that effect has remained scarce. Subsequently, we address the question how the use of open data can stimulate the generation of sustainable value. We argue that open data sharing and reuse can empower new ways of generating value in the sharing society. Moreover, we propose a model...... that describes how different mechanisms that take part within an open system generate sustainable value. These mechanisms are enabled by a number of contextual factors that provide individuals with the motivation, opportunity and ability to generate sustainable value...

  11. AMRZone: A Runtime AMR Data Sharing Framework For Scientific Applications

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Wenzhao; Tang, Houjun; Harenberg, Steven; Byna, Suren; Zou, Xiaocheng; Devendran, Dharshi; Martin, Daniel; Wu, Kesheng; Dong, Bin; Klasky, Scott; Samatova, Nagiza

    2017-08-31

    Frameworks that facilitate runtime data sharing across multiple applications are of great importance for scientific data analytics. Although existing frameworks work well over uniform mesh data, they can not effectively handle adaptive mesh refinement (AMR) data. Among the challenges to construct an AMR-capable framework include: (1) designing an architecture that facilitates online AMR data management; (2) achieving a load-balanced AMR data distribution for the data staging space at runtime; and (3) building an effective online index to support the unique spatial data retrieval requirements for AMR data. Towards addressing these challenges to support runtime AMR data sharing across scientific applications, we present the AMRZone framework. Experiments over real-world AMR datasets demonstrate AMRZone's effectiveness at achieving a balanced workload distribution, reading/writing large-scale datasets with thousands of parallel processes, and satisfying queries with spatial constraints. Moreover, AMRZone's performance and scalability are even comparable with existing state-of-the-art work when tested over uniform mesh data with up to 16384 cores; in the best case, our framework achieves a 46% performance improvement.

  12. The HydroServer Platform for Sharing Hydrologic Data

    Science.gov (United States)

    Tarboton, D. G.; Horsburgh, J. S.; Schreuders, K.; Maidment, D. R.; Zaslavsky, I.; Valentine, D. W.

    2010-12-01

    The CUAHSI Hydrologic Information System (HIS) is an internet based system that supports sharing of hydrologic data. HIS consists of databases connected using the Internet through Web services, as well as software for data discovery, access, and publication. The HIS system architecture is comprised of servers for publishing and sharing data, a centralized catalog to support cross server data discovery and a desktop client to access and analyze data. This paper focuses on HydroServer, the component developed for sharing and publishing space-time hydrologic datasets. A HydroServer is a computer server that contains a collection of databases, web services, tools, and software applications that allow data producers to store, publish, and manage the data from an experimental watershed or project site. HydroServer is designed to permit publication of data as part of a distributed national/international system, while still locally managing access to the data. We describe the HydroServer architecture and software stack, including tools for managing and publishing time series data for fixed point monitoring sites as well as spatially distributed, GIS datasets that describe a particular study area, watershed, or region. HydroServer adopts a standards based approach to data publication, relying on accepted and emerging standards for data storage and transfer. CUAHSI developed HydroServer code is free with community code development managed through the codeplex open source code repository and development system. There is some reliance on widely used commercial software for general purpose and standard data publication capability. The sharing of data in a common format is one way to stimulate interdisciplinary research and collaboration. It is anticipated that the growing, distributed network of HydroServers will facilitate cross-site comparisons and large scale studies that synthesize information from diverse settings, making the network as a whole greater than the sum of its

  13. Challenges in sharing of geospatial data by data custodians in South Africa

    Science.gov (United States)

    Kay, Sissiel E.

    2018-05-01

    As most development planning and rendering of public services happens at a place or in a space, geospatial data is required. This geospatial data is best managed through a spatial data infrastructure, which has as a key objective to share geospatial data. The collection and maintenance of geospatial data is expensive and time consuming and so the principle of "collect once - use many times" should apply. It is best to obtain the geospatial data from the authoritative source - the appointed data custodian. In South Africa the South African Spatial Data Infrastructure (SASDI) is the means to achieve the requirement for geospatial data sharing. This requires geospatial data sharing to take place between the data custodian and the user. All data custodians are expected to comply with the Spatial Data Infrastructure Act (SDI Act) in terms of geo-spatial data sharing. Currently data custodians are experiencing challenges with regard to the sharing of geospatial data. This research is based on the current ten data themes selected by the Committee for Spatial Information and the organisations identified as the data custodians for these ten data themes. The objectives are to determine whether the identified data custodians comply with the SDI Act with respect to geospatial data sharing, and if not what are the reasons for this. Through an international comparative assessment it then determines if the compliance with the SDI Act is not too onerous on the data custodians. The research concludes that there are challenges with geospatial data sharing in South Africa and that the data custodians only partially comply with the SDI Act in terms of geospatial data sharing. However, it is shown that the South African legislation is not too onerous on the data custodians.

  14. Secure, Policy-Based, Multi-Recipient Data Sharing

    Science.gov (United States)

    2009-01-01

    directed by: Professor Virgil D. Gligor Department of Electrical and Computer Engineering In distributed systems users often need to share sensitive data...organizational role with short-term responsibilities result in useful compound attributes; e.g., ‘Faculty’ in ‘College of Engineering ’ serving as...KDC then contacts the Attribute Database that manages user attributes and privileges and enviromental attributes (i.e., context). The KDC uses these

  15. Virtual laboratory strategies for data sharing, communications and development

    Directory of Open Access Journals (Sweden)

    Enrique Canessa

    2006-01-01

    Full Text Available We present an overview of the VL approach to promote research and education in developing countries and to help reduce the technology gap of the digital divide. We discuss software tools for instrument control, data sharing and e-collaboration and communications with special attention to low-bandwidth networks. We analyse the tentative costs involved in VL and the skills needed for VL administration. We conclude by identifying some VL strategies for development.

  16. A Qualitative Analysis of Real-Time Continuous Glucose Monitoring Data Sharing with Care Partners: To Share or Not to Share?

    Science.gov (United States)

    Litchman, Michelle L; Allen, Nancy A; Colicchio, Vanessa D; Wawrzynski, Sarah E; Sparling, Kerri M; Hendricks, Krissa L; Berg, Cynthia A

    2018-01-01

    Little research exists regarding how real-time continuous glucose monitoring (RT-CGM) data sharing plays a role in the relationship between patients and their care partners. To (1) identify the benefits and challenges related to RT-CGM data sharing from the patient and care partner perspective and (2) to explore the number and type of individuals who share and follow RT-CGM data. This qualitative content analysis was conducted by examining publicly available blogs focused on RT-CGM and data sharing. A thematic analysis of blogs and associated comments was conducted. A systematic appraisal of personal blogs examined 39 blogs with 206 corresponding comments. The results of the study provided insight about the benefits and challenges related to individuals with diabetes sharing their RT-CGM data with a care partner(s). The analysis resulted in three themes: (1) RT-CGM data sharing enhances feelings of safety, (2) the need to communicate boundaries to avoid judgment, and (3) choice about sharing and following RT-CGM data. RT-CGM data sharing occurred within dyads (n = 46), triads (n = 15), and tetrads (n = 2). Adults and children with type 1 diabetes and their care partners are empowered by the ability to share and follow RT-CGM data. Our findings suggest that RT-CGM data sharing between an individual with diabetes and their care partner can complicate relationships. Healthcare providers need to engage patients and care partners in discussions about best practices related to RT-CGM sharing and following to avoid frustrations within the relationship.

  17. Whose data set is it anyway? Sharing raw data from randomized trials

    Directory of Open Access Journals (Sweden)

    Vickers Andrew J

    2006-05-01

    Full Text Available Abstract Background Sharing of raw research data is common in many areas of medical research, genomics being perhaps the most well-known example. In the clinical trial community investigators routinely refuse to share raw data from a randomized trial without giving a reason. Discussion Data sharing benefits numerous research-related activities: reproducing analyses; testing secondary hypotheses; developing and evaluating novel statistical methods; teaching; aiding design of future trials; meta-analysis; and, possibly, preventing error, fraud and selective reporting. Clinical trialists, however, sometimes appear overly concerned with being scooped and with misrepresentation of their work. Both possibilities can be avoided with simple measures such as inclusion of the original trialists as co-authors on any publication resulting from data sharing. Moreover, if we treat any data set as belonging to the patients who comprise it, rather than the investigators, such concerns fall away. Conclusion Technological developments, particularly the Internet, have made data sharing generally a trivial logistical problem. Data sharing should come to be seen as an inherent part of conducting a randomized trial, similar to the way in which we consider ethical review and publication of study results. Journals and funding bodies should insist that trialists make raw data available, for example, by publishing data on the Web. If the clinical trial community continues to fail with respect to data sharing, we will only strengthen the public perception that we do clinical trials to benefit ourselves, not our patients.

  18. Whose data set is it anyway? Sharing raw data from randomized trials.

    Science.gov (United States)

    Vickers, Andrew J

    2006-05-16

    Sharing of raw research data is common in many areas of medical research, genomics being perhaps the most well-known example. In the clinical trial community investigators routinely refuse to share raw data from a randomized trial without giving a reason. Data sharing benefits numerous research-related activities: reproducing analyses; testing secondary hypotheses; developing and evaluating novel statistical methods; teaching; aiding design of future trials; meta-analysis; and, possibly, preventing error, fraud and selective reporting. Clinical trialists, however, sometimes appear overly concerned with being scooped and with misrepresentation of their work. Both possibilities can be avoided with simple measures such as inclusion of the original trialists as co-authors on any publication resulting from data sharing. Moreover, if we treat any data set as belonging to the patients who comprise it, rather than the investigators, such concerns fall away. Technological developments, particularly the Internet, have made data sharing generally a trivial logistical problem. Data sharing should come to be seen as an inherent part of conducting a randomized trial, similar to the way in which we consider ethical review and publication of study results. Journals and funding bodies should insist that trialists make raw data available, for example, by publishing data on the Web. If the clinical trial community continues to fail with respect to data sharing, we will only strengthen the public perception that we do clinical trials to benefit ourselves, not our patients.

  19. Data Sharing in DHT Based P2P Systems

    Science.gov (United States)

    Roncancio, Claudia; Del Pilar Villamil, María; Labbé, Cyril; Serrano-Alvarado, Patricia

    The evolution of peer-to-peer (P2P) systems triggered the building of large scale distributed applications. The main application domain is data sharing across a very large number of highly autonomous participants. Building such data sharing systems is particularly challenging because of the “extreme” characteristics of P2P infrastructures: massive distribution, high churn rate, no global control, potentially untrusted participants... This article focuses on declarative querying support, query optimization and data privacy on a major class of P2P systems, that based on Distributed Hash Table (P2P DHT). The usual approaches and the algorithms used by classic distributed systems and databases for providing data privacy and querying services are not well suited to P2P DHT systems. A considerable amount of work was required to adapt them for the new challenges such systems present. This paper describes the most important solutions found. It also identifies important future research trends in data management in P2P DHT systems.

  20. Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes

    Science.gov (United States)

    DiNucci, David C.; Balley, David H. (Technical Monitor)

    1997-01-01

    Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which

  1. A Philosophy Research Database to Share Data Resources

    Directory of Open Access Journals (Sweden)

    Jili Cheng

    2007-12-01

    Full Text Available Philosophy research used to rely mainly on the traditional published journals and newspapers for collecting or communicating data. However, because of financial limits or lack of capability to collect data, required published materials and even restricted materials and developing information from research projects often could not be obtained. The rise of digital techniques and Internet opportunities has allowed data resource sharing of philosophy research. However, although there are several ICPs with large-scale comprehensive commercial databases in the field in China, no real non-profit professional database for philosophy researchers exists. Therefore, in 2002, the Philosophy Institute of the Chinese Academy of Social Sciences began a project to build "The Database of Philosophy Research." Until Mar. 2006 the number of subsets had reached 30, with more than 30,000 records, retrieval services reached 6,000, and article-reading reached 30,000. Because of the concept of intellectual property, the service of the database is currently limited to the information held in CASS. Nevertheless, this is the first academic database for philosophy research, so its orientation is towards resource-sharing, leading users to data, and serving large number of demands from other provinces and departments.

  2. Data on fossil fuel availability for Shared Socioeconomic Pathways

    Directory of Open Access Journals (Sweden)

    Nico Bauer

    2017-02-01

    Full Text Available The data files contain the assumptions and results for the construction of cumulative availability curves for coal, oil and gas for the five Shared Socioeconomic Pathways. The files include the maximum availability (also known as cumulative extraction cost curves and the assumptions that are applied to construct the SSPs. The data is differentiated into twenty regions. The resulting cumulative availability curves are plotted and the aggregate data as well as cumulative availability curves are compared across SSPs. The methodology, the data sources and the assumptions are documented in a related article (N. Bauer, J. Hilaire, R.J. Brecha, J. Edmonds, K. Jiang, E. Kriegler, H.-H. Rogner, F. Sferra, 2016 [1] under DOI: http://dx.doi.org/10.1016/j.energy.2016.05.088.

  3. Practical Barriers and Ethical Challenges in Genetic Data Sharing

    Directory of Open Access Journals (Sweden)

    Claire L. Simpson

    2014-08-01

    Full Text Available The underlying ethos of dbGaP is that access to these data by secondary data analysts facilitates advancement of science. NIH has required that genome-wide association study data be deposited in the Database of Genotypes and Phenotypes (dbGaP since 2003. In 2013, a proposed updated policy extended this requirement to next-generation sequencing data. However, recent literature and anecdotal reports suggest lingering logistical and ethical concerns about subject identifiability, informed consent, publication embargo enforcement, and difficulty in accessing dbGaP data. We surveyed the International Genetic Epidemiology Society (IGES membership about their experiences. One hundred and seventy five (175 individuals completed the survey, a response rate of 27%. Of respondents who received data from dbGaP (43%, only 32% perceived the application process as easy but most (75% received data within five months. Remaining challenges include difficulty in identifying an institutional signing official and an overlong application process. Only 24% of respondents had contributed data to dbGaP. Of these, 31% reported local IRB restrictions on data release; an additional 15% had to reconsent study participants before depositing data. The majority of respondents (56% disagreed that the publication embargo period was sufficient. In response, we recommend longer embargo periods and use of varied data-sharing models rather than a one-size-fits-all approach.

  4. Data publication and sharing using the SciDrive service

    Science.gov (United States)

    Mishin, Dmitry; Medvedev, D.; Szalay, A. S.; Plante, R. L.

    2014-01-01

    Despite the last years progress in scientific data storage, still remains the problem of public data storage and sharing system for relatively small scientific datasets. These are collections forming the “long tail” of power log datasets distribution. The aggregated size of the long tail data is comparable to the size of all data collections from large archives, and the value of data is significant. The SciDrive project's main goal is providing the scientific community with a place to reliably and freely store such data and provide access to it to broad scientific community. The primary target audience of the project is astoromy community, and it will be extended to other fields. We're aiming to create a simple way of publishing a dataset, which can be then shared with other people. Data owner controls the permissions to modify and access the data and can assign a group of users or open the access to everyone. The data contained in the dataset will be automaticaly recognized by a background process. Known data formats will be extracted according to the user's settings. Currently tabular data can be automatically extracted to the user's MyDB table where user can make SQL queries to the dataset and merge it with other public CasJobs resources. Other data formats can be processed using a set of plugins that upload the data or metadata to user-defined side services. The current implementation targets some of the data formats commonly used by the astronomy communities, including FITS, ASCII and Excel tables, TIFF images, and YT simulations data archives. Along with generic metadata, format-specific metadata is also processed. For example, basic information about celestial objects is extracted from FITS files and TIFF images, if present. A 100TB implementation has just been put into production at Johns Hopkins University. The system features public data storage REST service supporting VOSpace 2.0 and Dropbox protocols, HTML5 web portal, command-line client and Java

  5. Sustainable Materials Management: U.S. State Data Measurement Sharing Program

    Science.gov (United States)

    The State Data Measurement Sharing Program (SMP) is an online reporting, information sharing, and measurement tool that allows U.S. states to share a wide range of information about waste, recycling, and composting.

  6. Nebhydro: Sharing Geospatial Data to Supportwater Management in Nebraska

    Science.gov (United States)

    Kamble, B.; Irmak, A.; Hubbard, K.; Deogun, J.; Dvorak, B.

    2012-12-01

    Recent advances in web-enabled geographical technologies have the potential to make a dramatic impact on development of highly interactive spatial applications on the web for visualization of large-scale geospatial data by water resources and irrigation scientists. Spatial and point scale water resources data visualization are an emerging and challenging application domain. Query based visual explorations of geospatial hydrological data can play an important role in stimulating scientific hypotheses and seeking causal relationships among hydro variables. The Nebraska Hydrological Information System (NebHydro) utilizes ESRI's ArcGIS server technology to increase technological awareness among farmers, irrigation managers and policy makers. Web-based geospatial applications are an effective way to expose scientific hydrological datasets to the research community and the public. NebHydro uses Adobe Flex technology to offer an online visualization and data analysis system for presentation of social and economic data. Internet mapping services is an integrated product of GIS and Internet technologies; it is a favored solution to achieve the interoperability of GIS. The development of Internet based GIS services in the state of Nebraska showcases the benefits of sharing geospatial hydrological data among agencies, resource managers and policy makers. Geospatial hydrological Information (Evapotranspiration from Remote Sensing, vegetation indices (NDVI), USGS Stream gauge data, Climatic data etc.) is generally generated through model simulation (METRIC, SWAP, Linux, Python based scripting etc). Information is compiled into and stored within object oriented relational spatial databases using a geodatabase information model that supports the key data types needed by applications including features, relationships, networks, imagery, terrains, maps and layers. The system provides online access, querying, visualization, and analysis of the hydrological data from several sources

  7. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography.

    Science.gov (United States)

    Argimón, Silvia; Abudahab, Khalil; Goater, Richard J E; Fedosejev, Artemij; Bhai, Jyothish; Glasner, Corinna; Feil, Edward J; Holden, Matthew T G; Yeats, Corin A; Grundmann, Hajo; Spratt, Brian G; Aanensen, David M

    2016-11-01

    Visualization is frequently used to aid our interpretation of complex datasets. Within microbial genomics, visualizing the relationships between multiple genomes as a tree provides a framework onto which associated data (geographical, temporal, phenotypic and epidemiological) are added to generate hypotheses and to explore the dynamics of the system under investigation. Selected static images are then used within publications to highlight the key findings to a wider audience. However, these images are a very inadequate way of exploring and interpreting the richness of the data. There is, therefore, a need for flexible, interactive software that presents the population genomic outputs and associated data in a user-friendly manner for a wide range of end users, from trained bioinformaticians to front-line epidemiologists and health workers. Here, we present Microreact, a web application for the easy visualization of datasets consisting of any combination of trees, geographical, temporal and associated metadata. Data files can be uploaded to Microreact directly via the web browser or by linking to their location (e.g. from Google Drive/Dropbox or via API), and an integrated visualization via trees, maps, timelines and tables provides interactive querying of the data. The visualization can be shared as a permanent web link among collaborators, or embedded within publications to enable readers to explore and download the data. Microreact can act as an end point for any tool or bioinformatic pipeline that ultimately generates a tree, and provides a simple, yet powerful, visualization method that will aid research and discovery and the open sharing of datasets.

  8. Strategies for Sharing Seismic Data Among Multiple Computer Platforms

    Science.gov (United States)

    Baker, L. M.; Fletcher, J. B.

    2001-12-01

    Seismic waveform data is readily available from a variety of sources, but it often comes in a distinct, instrument-specific data format. For example, data may be from portable seismographs, such as those made by Refraction Technology or Kinemetrics, from permanent seismograph arrays, such as the USGS Parkfield Dense Array, from public data centers, such as the IRIS Data Center, or from personal communication with other researchers through e-mail or ftp. A computer must be selected to import the data - usually whichever is the most suitable for reading the originating format. However, the computer best suited for a specific analysis may not be the same. When copies of the data are then made for analysis, a proliferation of copies of the same data results, in possibly incompatible, computer-specific formats. In addition, if an error is detected and corrected in one copy, or some other change is made, all the other copies must be updated to preserve their validity. Keeping track of what data is available, where it is located, and which copy is authoritative requires an effort that is easy to neglect. We solve this problem by importing waveform data to a shared network file server that is accessible to all our computers on our campus LAN. We use a Network Appliance file server running Sun's Network File System (NFS) software. Using an NFS client software package on each analysis computer, waveform data can then be read by our MatLab or Fortran applications without first copying the data. Since there is a single copy of the waveform data in a single location, the NFS file system hierarchy provides an implicit complete waveform data catalog and the single copy is inherently authoritative. Another part of our solution is to convert the original data into a blocked-binary format (known historically as USGS DR100 or VFBB format) that is interpreted by MatLab or Fortran library routines available on each computer so that the idiosyncrasies of each machine are not visible to

  9. Sharing data for production scheduling using the ISA-95 standard

    Directory of Open Access Journals (Sweden)

    Iiro eHarjunkoski

    2014-10-01

    Full Text Available In the development and deployment of production scheduling solutions one major challenge is to establish efficient information sharing with industrial production management systems. Information comprising production orders to be scheduled, processing plant structure, product recipes, available equipment and other resources are necessary for producing a realistic short-term production plan. Currently, a widely-accepted standard for information sharing is missing. This often leads to the implementation of costly custom-tailored interfaces, or in the worst case the scheduling solution will be abandoned. Additionally, it becomes difficult to easily compare different methods on various problem instances, which complicates the re-use of existing scheduling solutions. In order to overcome these hurdles, a platform-independent and holistic approach is needed. Nevertheless, it is difficult for any new solution to gain wide acceptance within industry as new standards are often refused by companies already using a different established interface. From an acceptance point of view, the ISA-95 standard could act as a neutral data-exchange platform. In this paper, we assess if this already widespread standard is simple, yet powerful enough to act as the desired holistic data-exchange for scheduling solutions.

  10. Sharing Data for Production Scheduling Using the ISA-95 Standard

    Energy Technology Data Exchange (ETDEWEB)

    Harjunkoski, Iiro, E-mail: iiro.harjunkoski@de.abb.com; Bauer, Reinhard [ABB Corporate Research, Industrial Software and Applications, Ladenburg (Germany)

    2014-10-21

    In the development and deployment of production scheduling solutions, one major challenge is to establish efficient information sharing with industrial production management systems. Information comprising production orders to be scheduled, processing plant structure, product recipes, available equipment, and other resources are necessary for producing a realistic short-term production plan. Currently, a widely accepted standard for information sharing is missing. This often leads to the implementation of costly custom-tailored interfaces, or in the worst case the scheduling solution will be abandoned. Additionally, it becomes difficult to easily compare different methods on various problem instances, which complicates the re-use of existing scheduling solutions. In order to overcome these hurdles, a platform-independent and holistic approach is needed. Nevertheless, it is difficult for any new solution to gain wide acceptance within industry as new standards are often refused by companies already using a different established interface. From an acceptance point of view, the ISA-95 standard could act as a neutral data-exchange platform. In this paper, we assess if this already widespread standard is simple, yet powerful enough to act as the desired holistic data exchange for scheduling solutions.

  11. Sharing Data for Production Scheduling Using the ISA-95 Standard

    International Nuclear Information System (INIS)

    Harjunkoski, Iiro; Bauer, Reinhard

    2014-01-01

    In the development and deployment of production scheduling solutions, one major challenge is to establish efficient information sharing with industrial production management systems. Information comprising production orders to be scheduled, processing plant structure, product recipes, available equipment, and other resources are necessary for producing a realistic short-term production plan. Currently, a widely accepted standard for information sharing is missing. This often leads to the implementation of costly custom-tailored interfaces, or in the worst case the scheduling solution will be abandoned. Additionally, it becomes difficult to easily compare different methods on various problem instances, which complicates the re-use of existing scheduling solutions. In order to overcome these hurdles, a platform-independent and holistic approach is needed. Nevertheless, it is difficult for any new solution to gain wide acceptance within industry as new standards are often refused by companies already using a different established interface. From an acceptance point of view, the ISA-95 standard could act as a neutral data-exchange platform. In this paper, we assess if this already widespread standard is simple, yet powerful enough to act as the desired holistic data exchange for scheduling solutions.

  12. Modelling human mobility patterns using photographic data shared online.

    Science.gov (United States)

    Barchiesi, Daniele; Preis, Tobias; Bishop, Steven; Moat, Helen Susannah

    2015-08-01

    Humans are inherently mobile creatures. The way we move around our environment has consequences for a wide range of problems, including the design of efficient transportation systems and the planning of urban areas. Here, we gather data about the position in space and time of about 16 000 individuals who uploaded geo-tagged images from locations within the UK to the Flickr photo-sharing website. Inspired by the theory of Lévy flights, which has previously been used to describe the statistical properties of human mobility, we design a machine learning algorithm to infer the probability of finding people in geographical locations and the probability of movement between pairs of locations. Our findings are in general agreement with official figures in the UK and on travel flows between pairs of major cities, suggesting that online data sources may be used to quantify and model large-scale human mobility patterns.

  13. Data Rights and Responsibilities: A Human Rights Perspective on Data Sharing.

    Science.gov (United States)

    Harris, Theresa L; Wyndham, Jessica M

    2015-07-01

    A human-rights-based analysis can be a useful tool for the scientific community and policy makers as they develop codes of conduct, harmonized standards, and national policies for data sharing. The human rights framework provides a shared set of values and norms across borders, defines rights and responsibilities of various actors involved in data sharing, addresses the potential harms as well as the benefits of data sharing, and offers a framework for balancing competing values. The right to enjoy the benefits of scientific progress and its applications offers a particularly helpful lens through which to view data as both a tool of scientific inquiry to which access is vital and as a product of science from which everyone should benefit. © The Author(s) 2015.

  14. A SOA-based approach to geographical data sharing

    Science.gov (United States)

    Li, Zonghua; Peng, Mingjun; Fan, Wei

    2009-10-01

    In the last few years, large volumes of spatial data have been available in different government departments in China, but these data are mainly used within these departments. With the e-government project initiated, spatial data sharing become more and more necessary. Currently, the Web has been used not only for document searching but also for the provision and use of services, known as Web services, which are published in a directory and may be automatically discovered by software agents. Particularly in the spatial domain, the possibility of accessing these large spatial datasets via Web services has motivated research into the new field of Spatial Data Infrastructure (SDI) implemented using service-oriented architecture. In this paper a Service-Oriented Architecture (SOA) based Geographical Information Systems (GIS) is proposed, and a prototype system is deployed based on Open Geospatial Consortium (OGC) standard in Wuhan, China, thus that all the departments authorized can access the spatial data within the government intranet, and also these spatial data can be easily integrated into kinds of applications.

  15. Data Sharing and Publication Using the SciDrive Service

    Science.gov (United States)

    Mishin, D.; Medvedev, D.; Szalay, A. S.; Plante, R.; Graham, M.

    2014-05-01

    Despite all the progress made during the last years in the field of cloud data storage, the problem of fast and reliable data storage for the scientific community still remains open. The SciDrive project meets the need for a free open-source scientific data publishing platform. Having the primary target audience of astronomers as the largest data producers, the platform however is not bound to any scientific domain and can be used by different communities. Our current installation provides a free and safe storage platform for scientists to publish their data and share it with the community with the simplicity of Dropbox. The system allows service providers to harvest from the files and derive their broader context in a fairly automated fashion. Collecting various scientific data files in a single location or multiple connected sites allows building an intelligent system of metadata extractors. Our system is aimed at simplifying the cataloging and processing of large file collections for the long tail of scientific data. We propose an extensible plugin architecture for automatic metadata extraction and storage. The current implementation targets some of the data formats commonly used by the astronomy communities, including FITS, ASCII and Excel tables, TIFF images, and YT simulations data archives. Along with generic metadata, format-specific metadata is also processed. For example, basic information about celestial objects is extracted from FITS files and TIFF images, if present. This approach makes the simple BLOB storage a smart system providing access to various data in its own representation, such as a database for files containing tables, or providing additional search and access features such as full-text search, image pyramids or thumbnails creation, simulation dataset id extractor for fast search. A 100TB implementation has just been put into production at Johns Hopkins University.

  16. International Data Sharing in Practice: New Technologies Meet Old Governance.

    Science.gov (United States)

    Murtagh, Madeleine J; Turner, Andrew; Minion, Joel T; Fay, Michaela; Burton, Paul R

    2016-06-01

    The social structures that govern data/sample release aim to safeguard the confidentiality and privacy of cohort research participants (without whom there would be no data or samples) and enable the realization of societal benefit through optimizing the scientific use of those cohorts. Within collaborations involving multiple cohorts and biobanks, however, the local, national, and supranational institutional and legal guidelines for research (which produce a multiplicity of data access governance structures and guidelines) risk impeding the very science that is the raison d'etre of these consortia. We present an ethnographic study, which examined the epistemic and nonepistemic values driving decisions about data access and their consequences in the context of the pilot of an integrated approach to co-analysis of data. We demonstrate how the potential analytic flexibility offered by this approach was lost under contemporary data access governance. We identify three dominant values: protecting the research participant, protecting the study, and protecting the researcher. These values were both supported by and juxtaposed against a "public good" argument, and each was used as a rationale to both promote and inhibit sharing of data. While protection of the research participants was central to access permissions, decisions were also attentive to the desire of researchers to see their efforts in building population biobanks and cohorts realized in the form of scientific outputs. We conclude that systems for governing and enabling data access in large consortia need to (1) protect disclosure of research participant information or identity, (2) ensure the specific expectations of research participants are met, (3) embody systems of review that are transparent and not compromised by the specific interests of one particular group of stakeholders, and (4) facilitate data access procedures that are timely and efficient. Practical solutions are urgently needed. New approaches

  17. A SOA-Based Platform to Support Clinical Data Sharing

    Directory of Open Access Journals (Sweden)

    R. Gazzarata

    2017-01-01

    Full Text Available The eSource Data Interchange Group, part of the Clinical Data Interchange Standards Consortium, proposed five scenarios to guide stakeholders in the development of solutions for the capture of eSource data. The fifth scenario was subdivided into four tiers to adapt the functionality of electronic health records to support clinical research. In order to develop a system belonging to the “Interoperable” Tier, the authors decided to adopt the service-oriented architecture paradigm to support technical interoperability, Health Level Seven Version 3 messages combined with LOINC (Logical Observation Identifiers Names and Codes vocabulary to ensure semantic interoperability, and Healthcare Services Specification Project standards to provide process interoperability. The developed architecture enhances the integration between patient-care practice and medical research, allowing clinical data sharing between two hospital information systems and four clinical data management systems/clinical registries. The core is formed by a set of standardized cloud services connected through standardized interfaces, involving client applications. The system was approved by a medical staff, since it reduces the workload for the management of clinical trials. Although this architecture can realize the “Interoperable” Tier, the current solution actually covers the “Connected” Tier, due to local hospital policy restrictions.

  18. How Collecting and Freely Sharing Geophysical Data Broadly Benefits Society

    Science.gov (United States)

    Frassetto, A.; Woodward, R.; Detrick, R. S.

    2017-12-01

    Valuable but often unintended observations of environmental and human-related processes have resulted from open sharing of multidisciplinary geophysical observations collected over the past 33 years. These data, intended to fuel fundamental academic research, are part of the Incorporated Research Institutions for Seismology (IRIS), which is sponsored by the National Science Foundation and has provided a community science facility supporting earthquake science and related disciplines since 1984. These community facilities have included arrays of geophysical instruments operated for EarthScope, an NSF-sponsored science initiative designed to understand the architecture and evolution of the North American continent, as well as the Global Seismographic Network, Greenland Ice Sheet Monitoring Network, a repository of data collected around the world, and other community assets. All data resulting from this facility have been made openly available to support researchers across any field of study and this has expanded the impact of these data beyond disciplinary boundaries. This presentation highlights vivid examples of how basic research activities using open data, collected as part of a community facility, can inform our understanding of manmade earthquakes, geomagnetic hazards, climate change, and illicit testing of nuclear weapons.

  19. Produce More Oil Gas via eBusiness Data Sharing

    Energy Technology Data Exchange (ETDEWEB)

    Paul Jehn; Mike Stettner

    2004-09-30

    GWPC, DOGGR, and other state agencies propose to build eBusiness applications based on a .NET front-end user interface for the DOE's Energy 100 Award-winning Risk Based Data Management System (RBDMS) data source and XML Web services. This project will slash the costs of regulatory compliance by automating routine regulatory reporting and permit notice review and by making it easier to exchange data with the oil and gas industry--especially small, independent operators. Such operators, who often do not have sophisticated in-house databases, will be able to use a subset of the same RBDMS tools available to the agencies on the desktop to file permit notices and production reports online. Once the data passes automated quality control checks, the application will upload the data into the agency's RBDMS data source. The operators also will have access to state agency datasets to focus exploration efforts and to perform production forecasting, economic evaluations, and risk assessments. With the ability to identify economically feasible oil and gas prospects, including unconventional plays, over the Internet, operators will minimize travel and other costs. Because GWPC will coordinate these data sharing efforts with the Bureau of Land Management (BLM), this project will improve access to public lands and make strides towards reducing the duplicative reporting to which industry is now subject for leases that cross jurisdictions. The resulting regulatory streamlining and improved access to agency data will make more domestic oil and gas available to the American public while continuing to safeguard environmental assets.

  20. Data Sharing in Astrobiology: The Astrobiology Habitable Environments Database (AHED)

    Science.gov (United States)

    Lafuente, B.; Bristow, T.; Stone, N.; Pires, A.; Keller, R. M.; Downs, R. T.; Blake, D.; Fonda, M.

    2017-01-01

    Astrobiology is a multidisciplinary area of scientific research focused on studying the origins of life on Earth and the conditions under which life might have emerged elsewhere in the universe. NASA uses the results of Astrobiology research to help define targets for future missions that are searching for life elsewhere in the universe. The understanding of complex questions in Astrobiology requires integration and analysis of data spanning a range of disciplines including biology, chemistry, geology, astronomy and planetary science. However, the lack of a centralized repository makes it difficult for Astrobiology teams to share data and benefit from resultant synergies. Moreover, in recent years, federal agencies are requiring that results of any federally funded scientific research must be available and useful for the public and the science community. The Astrobiology Habitable Environments Database (AHED), developed with a consolidated group of astrobiologists from different active research teams at NASA Ames Research Center, is designed to help to address these issues. AHED is a central, high-quality, long-term data repository for mineralogical, textural, morphological, inorganic and organic chemical, isotopic and other information pertinent to the advancement of the field of Astrobiology.

  1. Big heart data: advancing health informatics through data sharing in cardiovascular imaging.

    Science.gov (United States)

    Suinesiaputra, Avan; Medrano-Gracia, Pau; Cowan, Brett R; Young, Alistair A

    2015-07-01

    The burden of heart disease is rapidly worsening due to the increasing prevalence of obesity and diabetes. Data sharing and open database resources for heart health informatics are important for advancing our understanding of cardiovascular function, disease progression and therapeutics. Data sharing enables valuable information, often obtained at considerable expense and effort, to be reused beyond the specific objectives of the original study. Many government funding agencies and journal publishers are requiring data reuse, and are providing mechanisms for data curation and archival. Tools and infrastructure are available to archive anonymous data from a wide range of studies, from descriptive epidemiological data to gigabytes of imaging data. Meta-analyses can be performed to combine raw data from disparate studies to obtain unique comparisons or to enhance statistical power. Open benchmark datasets are invaluable for validating data analysis algorithms and objectively comparing results. This review provides a rationale for increased data sharing and surveys recent progress in the cardiovascular domain. We also highlight the potential of recent large cardiovascular epidemiological studies enabling collaborative efforts to facilitate data sharing, algorithms benchmarking, disease modeling and statistical atlases.

  2. Data Citation Standard: A Means to Support Data Sharing, Attribution, and Traceability

    Directory of Open Access Journals (Sweden)

    McCallum I.

    2013-04-01

    Full Text Available An important incentive for scientists and researchers is the recognition and renown given to them in citations of their work. While citation rules are well developed for the use of papers published by others, very little rules are available for the citation of data made available by others. Increasingly, citation of the source of data is also requested in the context of socially relevant topics, such as climate change and its potential impacts. Providing means for data citation would be a strong incentive for data sharing. Georeferenced data are crucial for addressing many of the burning societal problems and to support related interdisciplinary research. The lack of a widely accepted method for giving credit to those who make their data freely available and for tracking the use of data throughout their life-cycle hampers data sharing. Furthermore, only clear and transparent data citation allows other scientists to obtain the identical data to replicate findings or for further research.

  3. Accelerating knowledge discovery through community data sharing and integration.

    Science.gov (United States)

    Yip, Y L

    2009-01-01

    To summarize current excellent research in the field of bioinformatics. Synopsis of the articles selected for the IMIA Yearbook 2009. The selection process for this yearbook's section on Bioinformatics results in six excellent articles highlighting several important trends First, it can be noted that Semantic Web technology continues to play an important role in heterogeneous data integration. Novel applications also put more emphasis on its ability to make logical inferences leading to new insights and discoveries. Second, translational research, due to its complex nature, increasingly relies on collective intelligence made available through the adoption of community-defined protocols or software architectures for secure data annotation, sharing and analysis. Advances in systems biology, bio-ontologies and text-ming can also be noted. Current biomedical research gradually evolves towards an environment characterized by intensive collaboration and more sophisticated knowledge processing activities. Enabling technologies, either Semantic Web or other solutions, are expected to play an increasingly important role in generating new knowledge in the foreseeable future.

  4. LC Data QUEST: A Technical Architecture for Community Federated Clinical Data Sharing.

    Science.gov (United States)

    Stephens, Kari A; Lin, Ching-Ping; Baldwin, Laura-Mae; Echo-Hawk, Abigail; Keppel, Gina A; Buchwald, Dedra; Whitener, Ron J; Korngiebel, Diane M; Berg, Alfred O; Black, Robert A; Tarczy-Hornoch, Peter

    2012-01-01

    The University of Washington Institute of Translational Health Sciences is engaged in a project, LC Data QUEST, building data sharing capacity in primary care practices serving rural and tribal populations in the Washington, Wyoming, Alaska, Montana, Idaho region to build research infrastructure. We report on the iterative process of developing the technical architecture for semantically aligning electronic health data in primary care settings across our pilot sites and tools that will facilitate linkages between the research and practice communities. Our architecture emphasizes sustainable technical solutions for addressing data extraction, alignment, quality, and metadata management. The architecture provides immediate benefits to participating partners via a clinical decision support tool and data querying functionality to support local quality improvement efforts. The FInDiT tool catalogues type, quantity, and quality of the data that are available across the LC Data QUEST data sharing architecture. These tools facilitate the bi-directional process of translational research.

  5. Sharing Privacy Protected and Statistically Sound Clinical Research Data Using Outsourced Data Storage

    Directory of Open Access Journals (Sweden)

    Geontae Noh

    2014-01-01

    Full Text Available It is critical to scientific progress to share clinical research data stored in outsourced generally available cloud computing services. Researchers are able to obtain valuable information that they would not otherwise be able to access; however, privacy concerns arise when sharing clinical data in these outsourced publicly available data storage services. HIPAA requires researchers to deidentify private information when disclosing clinical data for research purposes and describes two available methods for doing so. Unfortunately, both techniques degrade statistical accuracy. Therefore, the need to protect privacy presents a significant problem for data sharing between hospitals and researchers. In this paper, we propose a controlled secure aggregation protocol to secure both privacy and accuracy when researchers outsource their clinical research data for sharing. Since clinical data must remain private beyond a patient’s lifetime, we take advantage of lattice-based homomorphic encryption to guarantee long-term security against quantum computing attacks. Using lattice-based homomorphic encryption, we design an aggregation protocol that aggregates outsourced ciphertexts under distinct public keys. It enables researchers to get aggregated results from outsourced ciphertexts of distinct researchers. To the best of our knowledge, our protocol is the first aggregation protocol which can aggregate ciphertexts which are encrypted with distinct public keys.

  6. Value at Risk on Composite Price Share Index Stock Data

    Science.gov (United States)

    Oktaviarina, A.

    2018-01-01

    The financial servicest authority was declared Let’s Save Campaign on n commemoration of the World Savings Day that falls on this day, October 31, 2016. The activity was greeted enthusiastically by Indonesia Stock Exchange by taking out the slogan Let’s Save The Stocks. Stock is a form of investment that is expected to benefit in the future despite has risks. Value at Risk (VaR) is a method that can measure how much the risk of a financial investment. Composite Stock Price Indeks is the stock price index used by Indonesia Stock Exchange as stock volatility benchmarks in Indonesia. This study aimed to estimate Value at Risk (VaR) on closing price Composite Price Share Index Stock data on the period 20 September 2016 until 20 September 2017. Box-Pierce test results p value=0.9528 which is greater than a, that shows homoskedasticity. Value at Risk (VaR) with Variance Covariance Method is Rp.3.054.916,07 which means with 99% confindence interval someone who invests Rp.100.000.000,00 will get Rp.3.054.916,07 as a maximum loss.

  7. Revocable Key-Aggregate Cryptosystem for Data Sharing in Cloud

    Directory of Open Access Journals (Sweden)

    Qingqing Gan

    2017-01-01

    Full Text Available With the rapid development of network and storage technology, cloud storage has become a new service mode, while data sharing and user revocation are important functions in the cloud storage. Therefore, according to the characteristics of cloud storage, a revocable key-aggregate encryption scheme is put forward based on subset-cover framework. The proposed scheme not only has the key-aggregate characteristics, which greatly simplifies the user’s key management, but also can revoke user access permissions, realizing the flexible and effective access control. When user revocation occurs, it allows cloud server to update the ciphertext so that revoked users can not have access to the new ciphertext, while nonrevoked users do not need to update their private keys. In addition, a verification mechanism is provided in the proposed scheme, which can verify the updated ciphertext and ensure that the user revocation is performed correctly. Compared with the existing schemes, this scheme can not only reduce the cost of key management and storage, but also realize user revocation and achieve user’s access control efficiently. Finally, the proposed scheme can be proved to be selective chosen-plaintext security in the standard model.

  8. The need to redefine genomic data sharing: A focus on data accessibility

    Directory of Open Access Journals (Sweden)

    Tempest A. van Schaik

    2014-12-01

    Full Text Available DNAdigest's mission is to investigate and address the issues hindering efficient and ethical genomic data sharing in the human genomics research community. We conducted contextual interviews with human genomics researchers in clinical, academic or industrial R&D settings about their experience with accessing and sharing human genomic data. The qualitative interviews were followed by an online survey which provided quantitative support for our findings. Here we present the generalised workflow for accessing human genomic data through both public and restricted-access repositories and discuss reported points of frustration and their possible improvements. We discuss how data discoverability and accessibility are lacking in current mechanisms and how these are the prerequisites for adoption of best practices in the research community. We summarise current initiatives related to genomic data discovery and present a new data discovery platform available at http://nucleobase.co.uk.

  9. ASTER Global DEM contribution to GEOSS demonstrates open data sharing

    Science.gov (United States)

    Sohre, T.; Duda, K. A.; Meyer, D. J.; Behnke, J.; Nasa Esdis Lp Daac

    2010-12-01

    across all the GEOSS Societal Benefit areas was shown. The release of the global tiled research-grade DEM resulted in a significant increase in demand for ASTER elevation models, and increased awareness of related products. No cost access to these data has also promoted new applications of remotely sensed data, increasing their use across the full range of the GEOSS societal benefit areas. In addition, the simplified data access and greatly expanded pool of users resulted in a number of suggestions from researchers in many disciplines for possible enhancements to future versions of the ASTER GDEM. The broad distribution of the product can be directly attributed to the adoption of fundamental GEOSS data sharing principles, which are directed toward expanded access by minimizing time delay and cost, thus facilitating data use for education, research, and a range of other applications. The ASTER GDEM demonstrated the need and user demand for an improved global DEM product as well as the added benefit of not only “full and open” distribution, but “free and open” distribution.

  10. The Mason Water Data Information System (MWDIS): Enabling data sharing and discovery at George Mason University

    Science.gov (United States)

    Ferreira, C.; Da Silva, A. L.; Nunes, A.; Haddad, J.; Lawler, S.

    2014-12-01

    Enabling effective data use and re-use in scientific investigations relies heavily not only on data availability but also on efficient data sharing discovery. The CUAHSI led Hydrological Information Systems (HIS) and supporting products have paved the way to efficient data sharing and discovery in the hydrological sciences. Based on the CUAHSI-HIS framework concepts for hydrologic data sharing we developed a unique system devoted to the George Mason University scientific community to support university wide data sharing and discovery as well as real time data access for extreme events situational awareness. The internet-based system will provide an interface where the researchers will input data collected from the measurement stations and present them to the public in form of charts, tables, maps, and documents. Moreover, the system is developed in ASP.NET MVC 4 using as Database Management System, Microsoft SQL Server 2008 R2, and hosted by Amazon Web Services. Currently the system is supporting the Mason Watershed Project providing historical hydrological, atmospheric and water quality data for the campus watershed and real time flood conditions in the campus. The system is also a gateway for unprecedented data collection of hurricane storm surge hydrodynamics in coastal wetlands in the Chesapeake Bay providing not only access to historical data but recent storms such as Hurricane Arthur. Future research includes coupling the system to a real-time flood alert system on campus, and besides providing data on the World Wide Web, to foment and provide a venue for interdisciplinary collaboration within the water scientists in the region.

  11. Organized Communities as a Hybrid Form of Data Sharing: Experiences from the Global STEP Project

    Directory of Open Access Journals (Sweden)

    Isabell Stamm

    2018-01-01

    Full Text Available With this article, I explore a new way of how social scientists can share primary qualitative data with each other. More specifically, I examine organized research communities, which are small membership groups of scholars. This hybrid form of data sharing is positioned between informal sharing through collaboration and institutionalized sharing through accessing research archives. Using the global "Successful Transgenerational Entrepreneurship Practices" (STEP project as an example, I draw attention to the pragmatic practices of data sharing in such communities. Through ongoing negotiations, organized communities can, at least temporarily, put forward sharing policies and create a culture of data sharing that elevates the re-use of qualitative data while being mindful of the data's intersubjective and processual character.

  12. Codifying collegiality: recent developments in data sharing policy in the life sciences.

    Directory of Open Access Journals (Sweden)

    Genevieve Pham-Kanter

    Full Text Available Over the last decade, there have been significant changes in data sharing policies and in the data sharing environment faced by life science researchers. Using data from a 2013 survey of over 1600 life science researchers, we analyze the effects of sharing policies of funding agencies and journals. We also examine the effects of new sharing infrastructure and tools (i.e., third party repositories and online supplements. We find that recently enacted data sharing policies and new sharing infrastructure and tools have had a sizable effect on encouraging data sharing. In particular, third party repositories and online supplements as well as data sharing requirements of funding agencies, particularly the NIH and the National Human Genome Research Institute, were perceived by scientists to have had a large effect on facilitating data sharing. In addition, we found a high degree of compliance with these new policies, although noncompliance resulted in few formal or informal sanctions. Despite the overall effectiveness of data sharing policies, some significant gaps remain: about one third of grant reviewers placed no weight on data sharing plans in their reviews, and a similar percentage ignored the requirements of material transfer agreements. These patterns suggest that although most of these new policies have been effective, there is still room for policy improvement.

  13. Meta-Key: A Secure Data-Sharing Protocol under Blockchain-Based Decentralised Storage Architecture

    OpenAIRE

    Fu, Yue

    2017-01-01

    In this paper a secure data-sharing protocol under blockchain-based decentralised storage architecture is proposed, which fulfils users who need to share their encrypted data on-cloud. It implements a remote data-sharing mechanism that enables data owners to share their encrypted data to other users without revealing the original key. Nor do they have to download on-cloud data with re-encryption and re-uploading. Data security as well as efficiency are ensured by symmetric encryption, whose k...

  14. Data sharing for public health research: A qualitative study of industry and academia.

    Science.gov (United States)

    Saunders, Pamela A; Wilhelm, Erin E; Lee, Sinae; Merkhofer, Elizabeth; Shoulson, Ira

    2014-01-01

    Data sharing is a key biomedical research theme for the 21st century. Biomedical data sharing is the exchange of data among (non)affiliated parties under mutually agreeable terms to promote scientific advancement and the development of safe and effective medical products. Wide sharing of research data is important for scientific discovery, medical product development, and public health. Data sharing enables improvements in development of medical products, more attention to rare diseases, and cost-efficiencies in biomedical research. We interviewed 11 participants about their attitudes and beliefs about data sharing. Using a qualitative, thematic analysis approach, our analysis revealed a number of themes including: experiences, approaches, perceived challenges, and opportunities for sharing data.

  15. Supporting the Maritime Information Dominance: Optimizing Tactical Network for Biometric Data Sharing in Maritime Interdiction Operations

    Science.gov (United States)

    2015-03-01

    biometric data collection. Capture role- player mock biometric data including finger prints, iris scans, and facial recognition photos. (MOC training...MARITIME INFORMATION DOMINANCE: OPTIMIZING TACTICAL NETWORK FOR BIOMETRIC DATA SHARING IN MARITIME INTERDICTION OPERATIONS by Adam R. Sinsel...MARITIME INFORMATION DOMINANCE: OPTIMIZING TACTICAL NETWORK FOR BIOMETRIC DATA SHARING IN MARITIME INTERDICTION OPERATIONS 6. AUTHOR(S) Adam R

  16. Data sharing and reanalysis of randomized controlled trials in leading biomedical journals with a full data sharing policy: survey of studies published in The BMJ and PLOS Medicine

    Science.gov (United States)

    Naudet, Florian; Sakarovitch, Charlotte; Janiaud, Perrine; Cristea, Ioana; Fanelli, Daniele; Moher, David

    2018-01-01

    Abstract Objectives To explore the effectiveness of data sharing by randomized controlled trials (RCTs) in journals with a full data sharing policy and to describe potential difficulties encountered in the process of performing reanalyses of the primary outcomes. Design Survey of published RCTs. Setting PubMed/Medline. Eligibility criteria RCTs that had been submitted and published by The BMJ and PLOS Medicine subsequent to the adoption of data sharing policies by these journals. Main outcome measure The primary outcome was data availability, defined as the eventual receipt of complete data with clear labelling. Primary outcomes were reanalyzed to assess to what extent studies were reproduced. Difficulties encountered were described. Results 37 RCTs (21 from The BMJ and 16 from PLOS Medicine) published between 2013 and 2016 met the eligibility criteria. 17/37 (46%, 95% confidence interval 30% to 62%) satisfied the definition of data availability and 14 of the 17 (82%, 59% to 94%) were fully reproduced on all their primary outcomes. Of the remaining RCTs, errors were identified in two but reached similar conclusions and one paper did not provide enough information in the Methods section to reproduce the analyses. Difficulties identified included problems in contacting corresponding authors and lack of resources on their behalf in preparing the datasets. In addition, there was a range of different data sharing practices across study groups. Conclusions Data availability was not optimal in two journals with a strong policy for data sharing. When investigators shared data, most reanalyses largely reproduced the original results. Data sharing practices need to become more widespread and streamlined to allow meaningful reanalyses and reuse of data. Trial registration Open Science Framework osf.io/c4zke. PMID:29440066

  17. Parallel compression of data chunks of a shared data object using a log-structured file system

    Science.gov (United States)

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2016-10-25

    Techniques are provided for parallel compression of data chunks being written to a shared object. A client executing on a compute node or a burst buffer node in a parallel computing system stores a data chunk generated by the parallel computing system to a shared data object on a storage node by compressing the data chunk; and providing the data compressed data chunk to the storage node that stores the shared object. The client and storage node may employ Log-Structured File techniques. The compressed data chunk can be de-compressed by the client when the data chunk is read. A storage node stores a data chunk as part of a shared object by receiving a compressed version of the data chunk from a compute node; and storing the compressed version of the data chunk to the shared data object on the storage node.

  18. Health Data Sharing Preferences of Consumers: Public Policy and Legal Implications of Consumer-Mediated Data Management

    Science.gov (United States)

    Moon, Lisa A.

    2017-01-01

    An individual's choice to share or have control of the sharing or withholding of their personal health information is one of the most significant public policy challenges associated with electronic information exchange. There were four aims of this study. First, to describe predictors of health data sharing preferences of consumers. Second, to…

  19. Principles of big data preparing, sharing, and analyzing complex information

    CERN Document Server

    Berman, Jules J

    2013-01-01

    Principles of Big Data helps readers avoid the common mistakes that endanger all Big Data projects. By stressing simple, fundamental concepts, this book teaches readers how to organize large volumes of complex data, and how to achieve data permanence when the content of the data is constantly changing. General methods for data verification and validation, as specifically applied to Big Data resources, are stressed throughout the book. The book demonstrates how adept analysts can find relationships among data objects held in disparate Big Data resources, when the data objects are endo

  20. 30 CFR 280.73 - Will MMS share data and information with coastal States?

    Science.gov (United States)

    2010-07-01

    ... 30 Mineral Resources 2 2010-07-01 2010-07-01 false Will MMS share data and information with coastal States? 280.73 Section 280.73 Mineral Resources MINERALS MANAGEMENT SERVICE, DEPARTMENT OF THE... Data Requirements Protections § 280.73 Will MMS share data and information with coastal States? (a) We...

  1. Internet Data Distribution – extending real-time data sharing throughout the Americas

    Directory of Open Access Journals (Sweden)

    T. Yoksas

    2006-01-01

    Full Text Available The Unidata Program Center (Unidata of the University Corporation of Atmospheric Research (UCAR is involved in three international collaborations whose goals are extension of real-time data delivery-to and sharing-of locally held datasets-by educational institutions throughout the Americas. These efforts are based on the use of Unidata's Internet Data Distribution (IDD system which is built on top of its proven Local Data Manager Version 6 (LDM-6 technology. The Unidata IDD is an event-driven network of cooperating Unidata LDM servers that distributes discipline-neutral data products in near real-time over wide-area networks. The IDD, a collaboration of over 150 mostly North American institutions of higher education, has been the primary source of real-time atmospheric science data for the US university community for over a decade,. In addition to providing a highly reliable mechanism for delivering real-time data, the IDD allows users to easily share locally held datasets.

  2. Sharing Responsibility for Data Stewardship Between Scientists and Curators

    Science.gov (United States)

    Hedstrom, M. L.

    2012-12-01

    Data stewardship is becoming increasingly important to support accurate conclusions from new forms of data, integration of and computation across heterogeneous data types, interactions between models and data, replication of results, data governance and long-term archiving. In addition to increasing recognition of the importance of data management, data science, and data curation by US and international scientific agencies, the National Academies of Science Board on Research Data and Information is sponsoring a study on Data Curation Education and Workforce Issues. Effective data stewardship requires a distributed effort among scientists who produce data, IT staff and/or vendors who provide data storage and computational facilities and services, and curators who enhance data quality, manage data governance, provide access to third parties, and assume responsibility for long-term archiving of data. The expertise necessary for scientific data management includes a mix of knowledge of the scientific domain; an understanding of domain data requirements, standards, ontologies and analytical methods; facility with leading edge information technology; and knowledge of data governance, standards, and best practices for long-term preservation and access that rarely are found in a single individual. Rather than developing data science and data curation as new and distinct occupations, this paper examines the set of tasks required for data stewardship. The paper proposes an alternative model that embeds data stewardship in scientific workflows and coordinates hand-offs between instruments, repositories, analytical processing, publishers, distributors, and archives. This model forms the basis for defining knowledge and skill requirements for specific actors in the processes required for data stewardship and the corresponding educational and training needs.

  3. DataUp: A tool to help researchers describe and share tabular data.

    Science.gov (United States)

    Strasser, Carly; Kunze, John; Abrams, Stephen; Cruse, Patricia

    2014-01-01

    Scientific datasets have immeasurable value, but they lose their value over time without proper documentation, long-term storage, and easy discovery and access. Across disciplines as diverse as astronomy, demography, archeology, and ecology, large numbers of small heterogeneous datasets (i.e., the long tail of data) are especially at risk unless they are properly documented, saved, and shared. One unifying factor for many of these at-risk datasets is that they reside in spreadsheets. In response to this need, the California Digital Library (CDL) partnered with Microsoft Research Connections and the Gordon and Betty Moore Foundation to create the DataUp data management tool for Microsoft Excel. Many researchers creating these small, heterogeneous datasets use Excel at some point in their data collection and analysis workflow, so we were interested in developing a data management tool that fits easily into those work flows and minimizes the learning curve for researchers. The DataUp project began in August 2011. We first formally assessed the needs of researchers by conducting surveys and interviews of our target research groups: earth, environmental, and ecological scientists. We found that, on average, researchers had very poor data management practices, were not aware of data centers or metadata standards, and did not understand the benefits of data management or sharing. Based on our survey results, we composed a list of desirable components and requirements and solicited feedback from the community to prioritize potential features of the DataUp tool. These requirements were then relayed to the software developers, and DataUp was successfully launched in October 2012.

  4. Classification of processes involved in sharing individual participant data from clinical trials.

    Science.gov (United States)

    Ohmann, Christian; Canham, Steve; Banzi, Rita; Kuchinke, Wolfgang; Battaglia, Serena

    2018-01-01

    Background: In recent years, a cultural change in the handling of data from research has resulted in the strong promotion of a culture of openness and increased sharing of data. In the area of clinical trials, sharing of individual participant data involves a complex set of processes and the interaction of many actors and actions. Individual services/tools to support data sharing are available, but what is missing is a detailed, structured and comprehensive list of processes/subprocesses involved and tools/services needed. Methods : Principles and recommendations from a published data sharing consensus document are analysed in detail by a small expert group. Processes/subprocesses involved in data sharing are identified and linked to actors and possible services/tools. Definitions are adapted from the business process model and notation (BPMN) and applied in the analysis. Results: A detailed and comprehensive list of individual processes/subprocesses involved in data sharing, structured according to 9 main processes, is provided. Possible tools/services to support these processes/subprocesses are identified and grouped according to major type of support. Conclusions: The list of individual processes/subprocesses and tools/services identified is a first step towards development of a generic framework or architecture for sharing of data from clinical trials. Such a framework is strongly needed to give an overview of how various actors, research processes and services could form an interoperable system for data sharing.

  5. A Spatial Data Infrastructure to Share Earth and Space Science Data

    Science.gov (United States)

    Nativi, S.; Mazzetti, P.; Bigagli, L.; Cuomo, V.

    2006-05-01

    Spatial Data Infrastructure:SDI (also known as Geospatial Data Infrastructure) is fundamentally a mechanism to facilitate the sharing and exchange of geospatial data. SDI is a scheme necessary for the effective collection, management, access, delivery and utilization of geospatial data; it is important for: objective decision making and sound land based policy, support economic development and encourage socially and environmentally sustainable development. As far as data model and semantics are concerned, a valuable and effective SDI should be able to cross the boundaries between the Geographic Information System/Science (GIS) and Earth and Space Science (ESS) communities. Hence, SDI should be able to discover, access and share information and data produced and managed by both GIS and ESS communities, in an integrated way. In other terms, SDI must be built on a conceptual and technological framework which abstracts the nature and structure of shared dataset: feature-based data or Imagery, Gridded and Coverage Data (IGCD). ISO TC211 and the Open Geospatial Consortium provided important artifacts to build up this framework. In particular, the OGC Web Services (OWS) initiatives and several Interoperability Experiment (e.g. the GALEON IE) are extremely useful for this purpose. We present a SDI solution which is able to manage both GIS and ESS datasets. It is based on OWS and other well-accepted or promising technologies, such as: UNIDATA netCDF and CDM, ncML and ncML-GML. Moreover, it uses a specific technology to implement a distributed and federated system of catalogues: the GI-Cat. This technology performs data model mediation and protocol adaptation tasks. It is used to work out a metadata clearinghouse service, implementing a common (federal) catalogue model which is based on the ISO 19115 core metadata for geo-dataset. Nevertheless, other well- accepted or standard catalogue data models can be easily implemented as common view (e.g. OGC CS-W, the next coming

  6. A vision for end-to-end data services to foster international partnerships through data sharing

    Science.gov (United States)

    Ramamurthy, M.; Yoksas, T.

    2009-04-01

    Increasingly, the conduct of science requires scientific partnerships and sharing of knowledge, information, and other assets. This is particularly true in our field where the highly-coupled Earth system and its many linkages have heightened the importance of collaborations across geographic, disciplinary, and organizational boundaries. The climate system, for example, is far too complex a puzzle to be unraveled by individual investigators or nations. As articulated in the NSF Strategic Plan: FY 2006-2011, "…discovery increasingly requires expertise of individuals from different disciplines, with diverse perspectives, and often from different nations, working together to accommodate the extraordinary complexity of today's science and engineering challenges." The Nobel Prize winning IPCC assessments are a prime example of such an effort. Earth science education is also uniquely suited to drawing connections between the dynamic Earth system and societal issues. Events like the 2004 Indian Ocean tsunami and Hurricane Katrina provide ample evidence of this relevance, as they underscore the importance of timely and interdisciplinary integration and synthesis of data. Our success in addressing such complex problems and advancing geosciences depends on the availability of a state-of-the-art and robust cyberinfrastructure, transparent and timely access to high-quality data from diverse sources, and requisite tools to integrate and use the data effectively, toward creating new knowledge. To that end, Unidata's vision calls for providing comprehensive, well-integrated, and end-to-end data services for the geosciences. These include an array of functions for collecting, finding, and accessing data; data management tools for generating, cataloging, and exchanging metadata; and submitting or publishing, sharing, analyzing, visualizing, and integrating data. When this vision is realized, users — no matter where they are, how they are connected to the Internet, or what

  7. Global bike share: What the data tells us about road safety.

    Science.gov (United States)

    Fishman, Elliot; Schepers, Paul

    2016-02-01

    Bike share has emerged as a rapidly growing mode of transport in over 800 cities globally, up from just a handful in the 1990s. Some analysts had forecast a rise in the number of bicycle crashes after the introduction of bike share, but empirical research on bike share safety is rare. The goal of this study is to examine the impact of bike share programs on cycling safety. The paper has two substudies. Study 1 was a secondary analysis of longitudinal hospital injury data from the Graves et al. (2014) study. It compared cycling safety in cities that introduced bike share programs with cities that did not. Study 2 combined ridership data with crash data of selected North American and European cities to compare bike share users to other cyclists. Study 1 indicated that the introduction of a bike share system was associated with a reduction in cycling injury risk. Study 2 found that bike share users were less likely than other cyclists to sustain fatal or severe injuries. On a per kilometer basis, bike share is associated with decreased risk of both fatal and non-fatal bicycle crashes when compared to private bike riding. The results of this study suggest that concerns of decreased levels of cycling safety are unjustified and should not prevent decision makers from introducing public bike share schemes, especially if combined with other safety measures like traffic calming. Copyright © 2015 Elsevier Ltd and National Safety Council. All rights reserved.

  8. BI-LEVEL AUTHENTICATION FOR EFFECTIVE DATA SHARING IN CLOUD VIA PRIVACY-PRESERVING AUTHENTICATION PROTOCOL

    OpenAIRE

    J. Jeya Praise; A. Sam Silva

    2017-01-01

    Cloud computing is an emerging technology of distributed computing where users can remotely store their data in cloud storage and enjoy the on-demand cloud applications and services from a shared pool of configurable computing resources, without the burden of local infrastructure and maintenance. During data accessing, different users may share their data to achieve productive benefits. Storing the data in third party’s cloud system causes serious concern over the data confidentiality. The e...

  9. Hidden concerns of sharing research data by low/middle-income country scientists.

    Science.gov (United States)

    Bezuidenhout, Louise; Chakauya, Ereck

    2018-01-01

    There has considerable interest in bringing low/middle-income countries (LMIC) scientists into discussions on Open Data - both as contributors and users. The establishment of in situ data sharing practices within LMIC research institutions is vital for the development of an Open Data landscape in the Global South. Nonetheless, many LMICs have significant challenges - resource provision, research support and extra-laboratory infrastructures. These low-resourced environments shape data sharing activities, but are rarely examined within Open Data discourse. In particular, little attention is given to how these research environments shape scientists' perceptions of data sharing (dis)incentives. This paper expands on these issues of incentivizing data sharing, using data from a quantitative survey disseminated to life scientists in 13 countries in sub-Saharan Africa. This interrogated not only perceptions of data sharing amongst LMIC scientists, but also how these are connected to the research environments and daily challenges experienced by them. The paper offers a series of analysis around commonly cited (dis)incentives such as data sharing as a means of improving research visibility; sharing and funding; and online connectivity. It identifies key areas that the Open Data community need to consider if true openness in research is to be established in the Global South.

  10. Clinical research data sharing: what an open science world means for researchers involved in evidence synthesis.

    Science.gov (United States)

    Ross, Joseph S

    2016-09-20

    The International Committee of Medical Journal Editors (ICMJE) recently announced a bold step forward to require data generated by interventional clinical trials that are published in its member journals to be responsibly shared with external investigators. The movement toward a clinical research culture that supports data sharing has important implications for the design, conduct, and reporting of systematic reviews and meta-analyses. While data sharing is likely to enhance the science of evidence synthesis, facilitating the identification and inclusion of all relevant research, it will also pose key challenges, such as requiring broader search strategies and more thorough scrutiny of identified research. Furthermore, the adoption of data sharing initiatives by the clinical research community should challenge the community of researchers involved in evidence synthesis to follow suit, including the widespread adoption of systematic review registration, results reporting, and data sharing, to promote transparency and enhance the integrity of the research process.

  11. Data Citation Standard: A Means to Support Data Sharing, Attribution, and Traceability

    Science.gov (United States)

    McCallum, I.; Plag, H. P.; Fritz, S.

    2012-04-01

    Geo-referenced data are crucial for addressing many of the burning societal problems and to support related interdisciplinary research. Data sharing is hampered by the lack of a widely accepted method for giving credit to those who make their data freely available and for tracking the use of data throughout it's life-cycle. Particularly in the scientific community, recognition and renown are important currencies. Providing means for data citation would be a strong incentive for data sharing. Recently, a number of organizations and projects have started to address the concept of data citation (e.g., PANGAEA, NASA DAACS, USGS, NOAA National Data Centers, ESIP, US National Academy of Sciences, and EGIDA). A number of proposals for data citation guidelines have emerged and a better understanding of the many issues at hand is evolving, but to date, no standard has been accepted. This is not surprising, as data citation is far more complicated than citation of scientific publication. Data sets differ in many aspects from standard scientific publications. For example, data sets generally are not locatable and attributable in the same way as scientific publications. Data sets often are not static (introducing versioning), and they are mostly not peer-reviewed (requiring quality control). There is a consensus that the implementation of a standard would reveal new issues that are not obvious today. With the Global Earth Observation System of Systems (GEOSS), the Group on Earth Observations (GEO) is in a unique position to provide the testbed for the implementation of a draft standard. The GEO Plenary supports the implementation of a draft standard developed by the Science and Technology Committee (STC) of GEO with support of the EGIDA Project. This draft is based on guidelines developed by international groups. Currently, users of the GEO-Portal are not obliged or encouraged to cite data accessed through GEOSS - if at all, citation requirements come from the individual data

  12. Sharing and executing linked data queries in a collaborative environment.

    Science.gov (United States)

    García Godoy, María Jesús; López-Camacho, Esteban; Navas-Delgado, Ismael; Aldana-Montes, José F

    2013-07-01

    Life Sciences have emerged as a key domain in the Linked Data community because of the diversity of data semantics and formats available through a great variety of databases and web technologies. Thus, it has been used as the perfect domain for applications in the web of data. Unfortunately, bioinformaticians are not exploiting the full potential of this already available technology, and experts in Life Sciences have real problems to discover, understand and devise how to take advantage of these interlinked (integrated) data. In this article, we present Bioqueries, a wiki-based portal that is aimed at community building around biological Linked Data. This tool has been designed to aid bioinformaticians in developing SPARQL queries to access biological databases exposed as Linked Data, and also to help biologists gain a deeper insight into the potential use of this technology. This public space offers several services and a collaborative infrastructure to stimulate the consumption of biological Linked Data and, therefore, contribute to implementing the benefits of the web of data in this domain. Bioqueries currently contains 215 query entries grouped by database and theme, 230 registered users and 44 end points that contain biological Resource Description Framework information. The Bioqueries portal is freely accessible at http://bioqueries.uma.es. Supplementary data are available at Bioinformatics online.

  13. Desktop war - data suppliers competing for bigger market share

    International Nuclear Information System (INIS)

    Sword, M.

    1999-01-01

    The intense competition among suppliers of computerized data and computer software to the petroleum and natural gas industry in western Canada is discussed. It is estimated that the Canadian oil patch spends a large sum, about $ 400 million annually on geoscience information and related costs and industry is looking for ways to significantly reduce those costs. There is a need for integrated, desktop driven data sets. Sensing the determination of industry to reduce information acquisition costs, data providers are responding with major consolidation of data sets. The major evolution in the industry is on-line access to increase the speed of information delivery. Data vendors continue to integrate land, well, log, production and other data sets whether public or proprietary. The result is stronger foundations as platforms for interpretive software. Another development is the rise of the Internet and Intranets and the re-definition of the role of information technology departments in the industry as both of these are paving the way for electronic delivery of information and software tools to the desktop. Development of proprietary data sets, acquisition of competitors with complimentary data sets that enhances products and services are just some of the ways data vendors are using to get a bigger piece of the exploration and development pie

  14. A framework for secure data sharing in the cloud | Akomolafe ...

    African Journals Online (AJOL)

    Cloud storage is not a new technology and it is being embraced more every day. Security and privacy concern of the data on the cloud is growing every day, this ... a framework that allows user revocation without re-encrypting previous data.

  15. Data format standard for sharing light source measurements

    Science.gov (United States)

    Gregory, G. Groot; Ashdown, Ian; Brandenburg, Willi; Chabaud, Dominique; Dross, Oliver; Gangadhara, Sanjay; Garcia, Kevin; Gauvin, Michael; Hansen, Dirk; Haraguchi, Kei; Hasna, Günther; Jiao, Jianzhong; Kelley, Ryan; Koshel, John; Muschaweck, Julius

    2013-09-01

    Optical design requires accurate characterization of light sources for computer aided design (CAD) software. Various methods have been used to model sources, from accurate physical models to measurement of light output. It has become common practice for designers to include measured source data for design simulations. Typically, a measured source will contain rays which sample the output distribution of the source. The ray data must then be exported to various formats suitable for import into optical analysis or design software. Source manufacturers are also making measurements of their products and supplying CAD models along with ray data sets for designers. The increasing availability of data has been beneficial to the design community but has caused a large expansion in storage needs for the source manufacturers since each software program uses a unique format to describe the source distribution. In 2012, the Illuminating Engineering Society (IES) formed a working group to understand the data requirements for ray data and recommend a standard file format. The working group included representatives from software companies supplying the analysis and design tools, source measurement companies providing metrology, source manufacturers creating the data and users from the design community. Within one year the working group proposed a file format which was recently approved by the IES for publication as TM-25. This paper will discuss the process used to define the proposed format, highlight some of the significant decisions leading to the format and list the data to be included in the first version of the standard.

  16. Using Globus to Transfer and Share Big Data | Poster

    Science.gov (United States)

    By Ashley DeVine, Staff Writer, and Mark Wance, Guest Writer; photo by Richard Frederickson, Staff Photographer Editor's note: This article was updated April 30, 2018. Transferring big data, such as the genomics data delivered to customers from the Center for Cancer Research Sequencing Facility (CCR SF), has been difficult in the past because the transfer systems have not kept

  17. Sharing waste management data over a wide area computer network

    International Nuclear Information System (INIS)

    Menke, W.; Friberg, P.

    1992-01-01

    In this paper the authors envision a time when waste management professionals from any institution will be able to access high quality data, regardless of where this data may actually be archived. They will not have to know anything about where the data actually resides or what format it is stored in. They will only have to specify the type of data and the workstation software will handle the rest of the details of finding them and accessing them. A method - now in use at the Lamont-Doherty Geological Observatory of Columbia University and several other institutions - of achieving this vision is described in this paper. Institutions make views of their databases publicly available to users of the wide-area network (e.g. Internet), using database serving software that runs on one of their computers. This software completely automates the process of finding out what kind of data are available and of retrieving them

  18. Sharing Planetary-Scale Data in the Cloud

    Science.gov (United States)

    Sundwall, J.; Flasher, J.

    2016-12-01

    On 19 March 2015, Amazon Web Services (AWS) announced Landsat on AWS, an initiative to make data from the U.S. Geological Survey's Landsat satellite program freely available in the cloud. Because of Landsat's global coverage and long history, it has become a reference point for all Earth observation work and is considered the gold standard of natural resource satellite imagery. Within the first year of Landsat on AWS, the service served over a billion requests for Landsat imagery and metadata, globally. Availability of the data in the cloud has led to new product development by companies and startups including Mapbox, Esri, CartoDB, MathWorks, Development Seed, Trimble, Astro Digital, Blue Raster and Timbr.io. The model of staging data for analysis in the cloud established by Landsat on AWS has since been applied to high resolution radar data, European Space Agency satellite imagery, global elevation data and EPA air quality models. This session will provide an overview of lessons learned throughout these projects. It will demonstrate how cloud-based object storage is democratizing access to massive publicly-funded data sets that have previously only been available to people with access to large amounts of storage, bandwidth, and computing power. Technical discussion points will include: The differences between staging data for analysis using object storage versus file storage Using object stores to design simple RESTful APIs through thoughtful file naming conventions, header fields, and HTTP Range Requests Managing costs through data architecture and Amazon S3's "requester pays" feature Building tools that allow users to take their algorithm to the data in the cloud Using serverless technologies to display dynamic frontends for massive data sets

  19. Sharing Health Big Data for Research - A Design by Use Cases: The INSHARE Platform Approach.

    Science.gov (United States)

    Bouzillé, Guillaume; Westerlynck, Richard; Defossez, Gautier; Bouslimi, Dalel; Bayat, Sahar; Riou, Christine; Busnel, Yann; Le Guillou, Clara; Cauvin, Jean-Michel; Jacquelinet, Christian; Pladys, Patrick; Oger, Emmanuel; Stindel, Eric; Ingrand, Pierre; Coatrieux, Gouenou; Cuggia, Marc

    2017-01-01

    Sharing and exploiting Health Big Data (HBD) allow tackling challenges: data protection/governance taking into account legal, ethical, and deontological aspects enables trust, transparent and win-win relationship between researchers, citizens, and data providers. Lack of interoperability: compartmentalized and syntactically/semantica heterogeneous data. INSHARE project using experimental proof of concept explores how recent technologies overcome such issues. Using 6 data providers, platform is designed via 3 steps to: (1) analyze use cases, needs, and requirements; (2) define data sharing governance, secure access to platform; and (3) define platform specifications. Three use cases - from 5 studies and 11 data sources - were analyzed for platform design. Governance derived from SCANNER model was adapted to data sharing. Platform architecture integrates: data repository and hosting, semantic integration services, data processing, aggregate computing, data quality and integrity monitoring, Id linking, multisource query builder, visualization and data export services, data governance, study management service and security including data watermarking.

  20. Web Platform for Sharing Spatial Data and Manipulating Them Online

    Science.gov (United States)

    Bachelet, Dominique; Comendant, Tosha; Strittholt, Jim

    2011-04-01

    To fill the need for readily accessible conservation-relevant spatial data sets, the Conservation Biology Institute (CBI) launched in 2010 a Web-based platform called Data Basin (http://www.databasin.org). It is the first custom application of ArcGIS technology, which provides Web access to free maps and imagery using the most current version of Environmental Systems Research Institute (ESRI; http://www.esri.com/) geographic information system (GIS) software, and its core functionality is being made freely available. Data Basin includes spatial data sets (Arc format shapefiles and grids, or layer packages) that can be biological (e.g., prairie dog range), physical (e.g., average summer temperature, 1950-2000), or socioeconomic (e.g., locations of Alaska oil and gas wells); based on observations as well as on simulation results; and of local to global relevance. They can be uploaded, downloaded, or simply visualized. Maps (overlays of multiple data sets) can be created and customized (e.g., western Massachusetts protected areas, time series of the Deep Water Horizon oil spill). Galleries are folders containing data sets and maps focusing on a theme (e.g., sea level rise projections for the Pacific Northwest region from the National Wildlife Federation, soil data sets for the conterminous United States).

  1. Time to consider sharing data extracted from trials included in systematic reviews

    Directory of Open Access Journals (Sweden)

    Luke Wolfenden

    2016-11-01

    Full Text Available Abstract Background While the debate regarding shared clinical trial data has shifted from whether such data should be shared to how this is best achieved, the sharing of data collected as part of systematic reviews has received little attention. In this commentary, we discuss the potential benefits of coordinated efforts to share data collected as part of systematic reviews. Main body There are a number of potential benefits of systematic review data sharing. Shared information and data obtained as part of the systematic review process may reduce unnecessary duplication, reduce demand on trialist to service repeated requests from reviewers for data, and improve the quality and efficiency of future reviews. Sharing also facilitates research to improve clinical trial and systematic review methods and supports additional analyses to address secondary research questions. While concerns regarding appropriate use of data, costs, or the academic return for original review authors may impede more open access to information extracted as part of systematic reviews, many of these issues are being addressed, and infrastructure to enable greater access to such information is being developed. Conclusion Embracing systems to enable more open access to systematic review data has considerable potential to maximise the benefits of research investment in undertaking systematic reviews.

  2. A Requirement Engineering Framework for Electronic Data Sharing of Health Care Data Between Organizations

    Science.gov (United States)

    Liu, Xia; Peyton, Liam; Kuziemsky, Craig

    Health care is increasingly provided to citizens by a network of collaboration that includes multiple providers and locations. Typically, that collaboration is on an ad-hoc basis via phone calls, faxes, and paper based documentation. Internet and wireless technologies provide an opportunity to improve this situation via electronic data sharing. These new technologies make possible new ways of working and collaboration but it can be difficult for health care organizations to understand how to use the new technologies while still ensuring that their policies and objectives are being met. It is also important to have a systematic approach to validate that e-health processes deliver the performance improvements that are expected. Using a case study of a palliative care patient receiving home care from a team of collaborating health organizations, we introduce a framework based on requirements engineering. Key concerns and objectives are identified and modeled (privacy, security, quality of care, and timeliness of service). And, then, proposed business processes which use new technologies are modeled in terms of these concerns and objectives to assess their impact and ensure that electronic data sharing is well regulated.

  3. Sharing adverse drug event data using business intelligence technology.

    Science.gov (United States)

    Horvath, Monica M; Cozart, Heidi; Ahmad, Asif; Langman, Matthew K; Ferranti, Jeffrey

    2009-03-01

    Duke University Health System uses computerized adverse drug event surveillance as an integral part of medication safety at 2 community hospitals and an academic medical center. This information must be swiftly communicated to organizational patient safety stakeholders to find opportunities to improve patient care; however, this process is encumbered by highly manual methods of preparing the data. Following the examples of other industries, we deployed a business intelligence tool to provide dynamic safety reports on adverse drug events. Once data were migrated into the health system data warehouse, we developed census-adjusted reports with user-driven prompts. Drill down functionality enables navigation from aggregate trends to event details by clicking report graphics. Reports can be accessed by patient safety leadership either through an existing safety reporting portal or the health system performance improvement Web site. Elaborate prompt screens allow many varieties of reports to be created quickly by patient safety personnel without consultation with the research analyst. The reduction in research analyst workload because of business intelligence implementation made this individual available to additional patient safety projects thereby leveraging their talents more effectively. Dedicated liaisons are essential to ensure clear communication between clinical and technical staff throughout the development life cycle. Design and development of the business intelligence model for adverse drug event data must reflect the eccentricities of the operational system, especially as new areas of emphasis evolve. Future usability studies examining the data presentation and access model are needed.

  4. Ethical sharing of health data in online platforms – which values should be considered?

    DEFF Research Database (Denmark)

    Riso, Brigida; Tupasela, Aaro Mikael; Vears, Danya

    2017-01-01

    Intensified and extensive data production and data storage are characteristics of contemporary western societies. Health data sharing is increasing with the growth of Information and Communication Technology (ICT) platforms devoted to the collection of personal health and genomic data. However...... and Ethical perspectives’ (IS 1303) identified six core values they considered to be essential for the ethical sharing of health data using ICT platforms. We believe that using this ethical framework will promote respectful scientific practices in order to maintain individuals’ trust in research. We use...... these values to analyse five ICT platforms and explore how emerging data sharing platforms are reconfiguring the data sharing experience from a range of perspectives. We discuss which types of values, rights and responsibilities they entail and enshrine within their philosophy or outlook on what it means...

  5. Sharing ATLAS data and research with young students

    CERN Document Server

    AUTHOR|(CDS)2073758; The ATLAS collaboration; Ould-Saada, Farid; Bugge, Magnar Kopangen

    2016-01-01

    In recent years the International Masterclasses (IMC) featured the use of real experimental data as produced by the Large Hadron Collider (LHC) and collected by the detectors. We present ATLAS-based educational material using these data allowing high-school students to learn about properties of known particles and search for new phenomena. The ambition to bring to the “classrooms” important LHC discoveries is realised using the recent discovery of the Higgs boson. Approximately 10% of the ATLAS discovery data are made available for students to search for the Higgs boson: 2 fb−1 at 8 TeV for the Z path, and 1 fb−1 at 7 TeV for the W path, in the 2014 version of IMC. The Higgs study samples constitute one third of the total sample including Z, W and other low mass resonances. The educational material is tuned and expanded to follow LHC “heartbeats”.

  6. Generating Sustainable Value from Open Data in a Sharing Society

    DEFF Research Database (Denmark)

    Jetzek, Thorhildur; Avital, Michel; Bjørn-Andersen, Niels

    2014-01-01

    Our societies are in the midst of a paradigm shift that transforms hierarchical markets into an open and networked economy based on digital technology and information. In that context, open data is widely presumed to have a positive effect on social, environmental and economic value; however...

  7. Consumers and their data : When and why they share it

    NARCIS (Netherlands)

    Demmers, J.

    2018-01-01

    Companies’ ability to collect, analyse, and use consumer data to gain insights into consumer preferences and behavior is pivotal in today’s marketing landscape and will become even more important as more elements of human life become digitalized. Concerns about consumer privacy may increasingly

  8. An Effective Grouping Method for Privacy-Preserving Bike Sharing Data Publishing

    Directory of Open Access Journals (Sweden)

    A S M Touhidul Hasan

    2017-10-01

    Full Text Available Bike sharing programs are eco-friendly transportation systems that are widespread in smart city environments. In this paper, we study the problem of privacy-preserving bike sharing microdata publishing. Bike sharing systems collect visiting information along with user identity and make it public by removing the user identity. Even after excluding user identification, the published bike sharing dataset will not be protected against privacy disclosure risks. An adversary may arrange published datasets based on bike’s visiting information to breach a user’s privacy. In this paper, we propose a grouping based anonymization method to protect published bike sharing dataset from linking attacks. The proposed Grouping method ensures that the published bike sharing microdata will be protected from disclosure risks. Experimental results show that our approach can protect user privacy in the released datasets from disclosure risks and can keep more data utility compared with existing methods.

  9. Grant Project Information via a Shared Data Base

    Directory of Open Access Journals (Sweden)

    Justine Roberts

    1973-09-01

    Full Text Available A quarterly keyword index to campus grant projects is provided by the Health Science Library at the University of California, San Francisco, using a data base created and maintained by the campus' Contracts & Grants Office. The index is printed in KWOC format, using the chief investigator's name as the key to a section of project summaries. A third section is also included, listing the summaries under the name of the sponsoring department.

  10. Protecting patient privacy when sharing patient-level data from clinical trials.

    Science.gov (United States)

    Tucker, Katherine; Branson, Janice; Dilleen, Maria; Hollis, Sally; Loughlin, Paul; Nixon, Mark J; Williams, Zoë

    2016-07-08

    Greater transparency and, in particular, sharing of patient-level data for further scientific research is an increasingly important topic for the pharmaceutical industry and other organisations who sponsor and conduct clinical trials as well as generally in the interests of patients participating in studies. A concern remains, however, over how to appropriately prepare and share clinical trial data with third party researchers, whilst maintaining patient confidentiality. Clinical trial datasets contain very detailed information on each participant. Risk to patient privacy can be mitigated by data reduction techniques. However, retention of data utility is important in order to allow meaningful scientific research. In addition, for clinical trial data, an excessive application of such techniques may pose a public health risk if misleading results are produced. After considering existing guidance, this article makes recommendations with the aim of promoting an approach that balances data utility and privacy risk and is applicable across clinical trial data holders. Our key recommendations are as follows: 1. Data anonymisation/de-identification: Data holders are responsible for generating de-identified datasets which are intended to offer increased protection for patient privacy through masking or generalisation of direct and some indirect identifiers. 2. Controlled access to data, including use of a data sharing agreement: A legally binding data sharing agreement should be in place, including agreements not to download or further share data and not to attempt to seek to identify patients. Appropriate levels of security should be used for transferring data or providing access; one solution is use of a secure 'locked box' system which provides additional safeguards. This article provides recommendations on best practices to de-identify/anonymise clinical trial data for sharing with third-party researchers, as well as controlled access to data and data sharing

  11. Sharing privacy-sensitive access to neuroimaging and genetics data: a review and preliminary validation.

    Science.gov (United States)

    Sarwate, Anand D; Plis, Sergey M; Turner, Jessica A; Arbabshirani, Mohammad R; Calhoun, Vince D

    2014-01-01

    The growth of data sharing initiatives for neuroimaging and genomics represents an exciting opportunity to confront the "small N" problem that plagues contemporary neuroimaging studies while further understanding the role genetic markers play in the function of the brain. When it is possible, open data sharing provides the most benefits. However, some data cannot be shared at all due to privacy concerns and/or risk of re-identification. Sharing other data sets is hampered by the proliferation of complex data use agreements (DUAs) which preclude truly automated data mining. These DUAs arise because of concerns about the privacy and confidentiality for subjects; though many do permit direct access to data, they often require a cumbersome approval process that can take months. An alternative approach is to only share data derivatives such as statistical summaries-the challenges here are to reformulate computational methods to quantify the privacy risks associated with sharing the results of those computations. For example, a derived map of gray matter is often as identifiable as a fingerprint. Thus alternative approaches to accessing data are needed. This paper reviews the relevant literature on differential privacy, a framework for measuring and tracking privacy loss in these settings, and demonstrates the feasibility of using this framework to calculate statistics on data distributed at many sites while still providing privacy.

  12. Sharing privacy-sensitive access to neuroimaging and genetics data: a review and preliminary validation

    Directory of Open Access Journals (Sweden)

    Anand D. Sarwate

    2014-04-01

    Full Text Available The growth of data sharing initiatives for neuroimaging and genomics represents an exciting opportunity to confront the ``small $N$'' problem that plagues contemporary neuroimaging studies while further understanding the role genetic markers play in in the function of the brain. When it is possible, open data sharing provides the most benefits. However some data cannot be shared at all due to privacy concerns and/or risk of re-identification. Sharing other data sets is hampered by the proliferation of complex data use agreements (DUAs which preclude truly automated data mining. These DUAs arise because of concerns about the privacy and confidentiality for subjects; though many do permit direct access to data, they often require a cumbersome approval process that can take months. An alternative approach is to only share data derivatives such as statistical summaries -- the challenges here are to reformulate computational methods to quantify the privacy risks associated with sharing the results of those computations. For example, a derived map of gray matter is often as identifiable as a fingerprint. Thus alternative approaches to accessing data are needed. This paper reviews the relevant literature on differential privacy, a framework for measuring and tracking privacy loss in these settings, and demonstrates the feasibility of using this framework to calculate statistics on data distributed at many sites while still providing privacy.

  13. Balancing the risks and benefits of genomic data sharing: genome research participants' perspectives.

    Science.gov (United States)

    Oliver, J M; Slashinski, M J; Wang, T; Kelly, P A; Hilsenbeck, S G; McGuire, A L

    2012-01-01

    Technological advancements are rapidly propelling the field of genome research forward, while lawmakers attempt to keep apace with the risks these advances bear. Balancing normative concerns of maximizing data utility and protecting human subjects, whose privacy is at risk due to the identifiability of DNA data, are central to policy decisions. Research on genome research participants making real-time data sharing decisions is limited; yet, these perspectives could provide critical information to ongoing deliberations. We conducted a randomized trial of 3 consent types affording varying levels of control over data release decisions. After debriefing participants about the randomization process, we invited them to a follow-up interview to assess their attitudes toward genetic research, privacy and data sharing. Participants were more restrictive in their reported data sharing preferences than in their actual data sharing decisions. They saw both benefits and risks associated with sharing their genomic data, but risks were seen as less concrete or happening in the future, and were largely outweighed by purported benefits. Policymakers must respect that participants' assessment of the risks and benefits of data sharing and their privacy-utility determinations, which are associated with their final data release decisions, vary. In order to advance the ethical conduct of genome research, proposed policy changes should carefully consider these stakeholder perspectives. Copyright © 2011 S. Karger AG, Basel.

  14. Ethical sharing of health data in online platforms - which values should be considered?

    Science.gov (United States)

    Riso, Brígida; Tupasela, Aaro; Vears, Danya F; Felzmann, Heike; Cockbain, Julian; Loi, Michele; Kongsholm, Nana C H; Zullo, Silvia; Rakic, Vojin

    2017-08-21

    Intensified and extensive data production and data storage are characteristics of contemporary western societies. Health data sharing is increasing with the growth of Information and Communication Technology (ICT) platforms devoted to the collection of personal health and genomic data. However, the sensitive and personal nature of health data poses ethical challenges when data is disclosed and shared even if for scientific research purposes.With this in mind, the Science and Values Working Group of the COST Action CHIP ME 'Citizen's Health through public-private Initiatives: Public health, Market and Ethical perspectives' (IS 1303) identified six core values they considered to be essential for the ethical sharing of health data using ICT platforms. We believe that using this ethical framework will promote respectful scientific practices in order to maintain individuals' trust in research.We use these values to analyse five ICT platforms and explore how emerging data sharing platforms are reconfiguring the data sharing experience from a range of perspectives. We discuss which types of values, rights and responsibilities they entail and enshrine within their philosophy or outlook on what it means to share personal health information. Through this discussion we address issues of the design and the development process of personal health data and patient-oriented infrastructures, as well as new forms of technologically-mediated empowerment.

  15. Sharing privacy-sensitive access to neuroimaging and genetics data: a review and preliminary validation

    Science.gov (United States)

    Sarwate, Anand D.; Plis, Sergey M.; Turner, Jessica A.; Arbabshirani, Mohammad R.; Calhoun, Vince D.

    2014-01-01

    The growth of data sharing initiatives for neuroimaging and genomics represents an exciting opportunity to confront the “small N” problem that plagues contemporary neuroimaging studies while further understanding the role genetic markers play in the function of the brain. When it is possible, open data sharing provides the most benefits. However, some data cannot be shared at all due to privacy concerns and/or risk of re-identification. Sharing other data sets is hampered by the proliferation of complex data use agreements (DUAs) which preclude truly automated data mining. These DUAs arise because of concerns about the privacy and confidentiality for subjects; though many do permit direct access to data, they often require a cumbersome approval process that can take months. An alternative approach is to only share data derivatives such as statistical summaries—the challenges here are to reformulate computational methods to quantify the privacy risks associated with sharing the results of those computations. For example, a derived map of gray matter is often as identifiable as a fingerprint. Thus alternative approaches to accessing data are needed. This paper reviews the relevant literature on differential privacy, a framework for measuring and tracking privacy loss in these settings, and demonstrates the feasibility of using this framework to calculate statistics on data distributed at many sites while still providing privacy. PMID:24778614

  16. Healthcare.gov shares personal data with third parties

    Directory of Open Access Journals (Sweden)

    Robbins RA

    2015-01-01

    Full Text Available No abstract available. Article truncated after 150 words. According to the Associated Press, the Centers for Medicare and Medicaid's (CMS website, HealthCare.gov, has been sending consumers’ personal data to private companies that specialize in advertising and analyzing Internet data for performance and marketing (1. What information is being disclosed was not immediately clear, but it could include age, income, ZIP code, and smoking status. It could also include a computer’s Internet address, which can identify a person’s name or address when combined with other information collected by sophisticated online marketing or advertising firms. “We deploy tools on the window shopping application that collect basic information to optimize and assess system performance,” said CMS’s Aaron Albright in a statement. “We believe that the use of these tools are common and represent best practices for a typical e-commerce site.” There is no evidence that personal information has been misused. But connections to dozens of third-party tech firms were documented by ...

  17. From Data-Sharing to Model-Sharing: SCEC and the Development of Earthquake System Science (Invited)

    Science.gov (United States)

    Jordan, T. H.

    2009-12-01

    Earthquake system science seeks to construct system-level models of earthquake phenomena and use them to predict emergent seismic behavior—an ambitious enterprise that requires high degree of interdisciplinary, multi-institutional collaboration. This presentation will explore model-sharing structures that have been successful in promoting earthquake system science within the Southern California Earthquake Center (SCEC). These include disciplinary working groups to aggregate data into community models; numerical-simulation working groups to investigate system-specific phenomena (process modeling) and further improve the data models (inverse modeling); and interdisciplinary working groups to synthesize predictive system-level models. SCEC has developed a cyberinfrastructure, called the Community Modeling Environment, that can distribute the community models; manage large suites of numerical simulations; vertically integrate the hardware, software, and wetware needed for system-level modeling; and promote the interactions among working groups needed for model validation and refinement. Various socio-scientific structures contribute to successful model-sharing. Two of the most important are “communities of trust” and collaborations between government and academic scientists on mission-oriented objectives. The latter include improvements of earthquake forecasts and seismic hazard models and the use of earthquake scenarios in promoting public awareness and disaster management.

  18. Storing, Browsing, Querying, and Sharing Data: the THREDDS Data Repository (TDR)

    Science.gov (United States)

    Wilson, A.; Lindholm, D.; Baltzer, T.

    2005-12-01

    The Unidata Internet Data Distribution (IDD) network delivers gigabytes of data per day in near real time to sites across the U.S. and beyond. The THREDDS Data Server (TDS) supports public browsing of metadata and data access via OPeNDAP enabled URLs for datasets such as these. With such large quantities of data, sites generally employ a simple data management policy, keeping the data for a relatively short term on the order of hours to perhaps a week or two. In order to save interesting data in longer term storage and make it available for sharing, a user must move the data herself. In this case the user is responsible for determining where space is available, executing the data movement, generating any desired metadata, and setting access control to enable sharing. This task sequence is generally based on execution of a sequence of low level operating system specific commands with significant user involvement. The LEAD (Linked Environments for Atmospheric Discovery) project is building a cyberinfrastructure to support research and education in mesoscale meteorology. LEAD orchestrations require large, robust, and reliable storage with speedy access to stage data and store both intermediate and final results. These requirements suggest storage solutions that involve distributed storage, replication, and interfacing to archival storage systems such as mass storage systems and tape or removable disks. LEAD requirements also include metadata generation and access in order to support querying. In support of both THREDDS and LEAD requirements, Unidata is designing and prototyping the THREDDS Data Repository (TDR), a framework for a modular data repository to support distributed data storage and retrieval using a variety of back end storage media and interchangeable software components. The TDR interface will provide high level abstractions for long term storage, controlled, fast and reliable access, and data movement capabilities via a variety of technologies such as

  19. MareData, an initiative for the flow of shared data

    Directory of Open Access Journals (Sweden)

    Remedios MELERO-MELERO

    2018-01-01

    Full Text Available Research data (or scientific data arouse great interest in their potential use and reuse, not only by the R & D sector but also by other actors such as industry or service companies that can use them for innovation of new products or for creation of new jobs. From the point of view of research, sharing and accessing the data generated during the research activity entails multiple benefits at institutional and individual level. Institutions provide transparency in the processes of obtaining or generating data. Researchers promote collaboration between interdisciplinary groups, and avoid duplication. For society, the benefits of making data available enhance confidence in the science system and is an exercise in transparency, accountability and responsibility in the use of investment in science. Multiple forums point to the value of data and the need for collaboration among all stakeholders, since science forms a particularly complex socio-technological infrastructure, involving both the public and private sectors. The partners of the Maredata network come from seven working groups (CSIC-IATA, CSIC-INGENIO, UA, UB, UC3M, UOC, UPV that have research lines related to the management of research data: interoperability, access, preservation, and metrics. The overall objective of this subject oriented network is to coordinate the action of these groups and to consolidate in Spain the different actors interested in research data as the fundamental basis of open science, a responsible, transparent and accessible science. By consolidating the collaboration between the working groups, it will be possible to address research data issues in a multidimensional manner, and promote cross-cutting synergies that will enable the industrial sector, the infomedia sector and Spanish society in general to be reached.

  20. Who shares? Who doesn't? Factors associated with openly archiving raw research data.

    Science.gov (United States)

    Piwowar, Heather A

    2011-01-01

    Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.

  1. Who shares? Who doesn't? Factors associated with openly archiving raw research data.

    Directory of Open Access Journals (Sweden)

    Heather A Piwowar

    Full Text Available Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.

  2. The Safe and Effective Use of Shared Data Underpinned by Stakeholder Engagement and Evaluation Practice.

    Science.gov (United States)

    Georgiou, Andrew; Magrabi, Farah; Hypponen, Hannele; Wong, Zoie Shui-Yee; Nykänen, Pirkko; Scott, Philip J; Ammenwerth, Elske; Rigby, Michael

    2018-04-22

     The paper draws attention to: i) key considerations involving the confidentiality, privacy, and security of shared data; and ii) the requirements needed to build collaborative arrangements encompassing all stakeholders with the goal of ensuring safe, secure, and quality use of shared data.  A narrative review of existing research and policy approaches along with expert perspectives drawn from the International Medical Informatics Association (IMIA) Working Group on Technology Assessment and Quality Development in Health Care and the European Federation for Medical Informatics (EFMI) Working Group for Assessment of Health Information Systems.  The technological ability to merge, link, re-use, and exchange data has outpaced the establishment of policies, procedures, and processes to monitor the ethics and legality of shared use of data. Questions remain about how to guarantee the security of shared data, and how to establish and maintain public trust across large-scale shared data enterprises. This paper identifies the importance of data governance frameworks (incorporating engagement with all stakeholders) to underpin the management of the ethics and legality of shared data use. The paper also provides some key considerations for the establishment of national approaches and measures to monitor compliance with best practice. Data sharing endeavours can help to underpin new collaborative models of health care which provide shared information, engagement, and accountability amongst all stakeholders. We believe that commitment to rigorous evaluation and stakeholder engagement will be critical to delivering health data benefits and the establishment of collaborative models of health care into the future. Georg Thieme Verlag KG Stuttgart.

  3. Meta-analysis of randomized clinical trials in the era of individual patient data sharing.

    Science.gov (United States)

    Kawahara, Takuya; Fukuda, Musashi; Oba, Koji; Sakamoto, Junichi; Buyse, Marc

    2018-06-01

    Individual patient data (IPD) meta-analysis is considered to be a gold standard when the results of several randomized trials are combined. Recent initiatives on sharing IPD from clinical trials offer unprecedented opportunities for using such data in IPD meta-analyses. First, we discuss the evidence generated and the benefits obtained by a long-established prospective IPD meta-analysis in early breast cancer. Next, we discuss a data-sharing system that has been adopted by several pharmaceutical sponsors. We review a number of retrospective IPD meta-analyses that have already been proposed using this data-sharing system. Finally, we discuss the role of data sharing in IPD meta-analysis in the future. Treatment effects can be more reliably estimated in both types of IPD meta-analyses than with summary statistics extracted from published papers. Specifically, with rich covariate information available on each patient, prognostic and predictive factors can be identified or confirmed. Also, when several endpoints are available, surrogate endpoints can be assessed statistically. Although there are difficulties in conducting, analyzing, and interpreting retrospective IPD meta-analysis utilizing the currently available data-sharing systems, data sharing will play an important role in IPD meta-analysis in the future.

  4. Building a DBMS on top of the JuxMem Grid Data-Sharing Service

    OpenAIRE

    Almousa Almaksour , Abdullah; Antoniu , Gabriel; Bougé , Luc; Cudennec , Loïc; Gançarski , Stéphane

    2007-01-01

    Held in conjunction with Parallel Architectures and Compilation Techniques 2007 (PACT2007); International audience; We claim that building a distributed DBMS on top of a general-purpose grid data-sharing service is a natural extension of previous approaches based on the distributed shared memory paradigm. The approach we propose consists in providing the DBMS with a transparent, persistent and fault-tolerant access to the stored data, within a unstable, volatile and dynamic environment. The D...

  5. Public and Biobank Participant Attitudes toward Genetic Research Participation and Data Sharing

    OpenAIRE

    Lemke, A.A.; Wolf, W.A.; Hebert-Beirne, J.; Smith, M.E.

    2010-01-01

    Research assessing attitudes toward consent processes for high-throughput genomic-wide technologies and widespread sharing of data is limited. In order to develop a better understanding of stakeholder views toward these issues, this cross-sectional study assessed public and biorepository participant attitudes toward research participation and sharing of genetic research data. Forty-nine individuals participated in 6 focus groups; 28 in 3 public focus groups and 21 in 3 NUgene biorepository pa...

  6. Involving Research Stakeholders in Developing Policy on Sharing Public Health Research Data in Kenya

    Science.gov (United States)

    Jao, Irene; Kombe, Francis; Mwalukore, Salim; Bull, Susan; Parker, Michael; Kamuya, Dorcas; Molyneux, Sassy

    2015-01-01

    Increased global sharing of public health research data has potential to advance scientific progress but may present challenges to the interests of research stakeholders, particularly in low-to-middle income countries. Policies for data sharing should be responsive to public views, but there is little evidence of the systematic study of these from low-income countries. This qualitative study explored views on fair data-sharing processes among 60 stakeholders in Kenya with varying research experience, using a deliberative approach. Stakeholders’ attitudes were informed by perceptions of benefit and concerns for research data sharing, including risks of stigmatization, loss of privacy, and undermining scientific careers and validity, reported in detail elsewhere. In this article, we discuss institutional trust-building processes seen as central to perceptions of fairness in sharing research data in this setting, including forms of community involvement, individual prior awareness and agreement to data sharing, independence and accountability of governance mechanisms, and operating under a national framework. PMID:26297748

  7. An Empirical Study on Android for Saving Non-shared Data on Public Storage

    OpenAIRE

    Liu, Xiangyu; Zhou, Zhe; Diao, Wenrui; Li, Zhou; Zhang, Kehuan

    2014-01-01

    With millions of apps that can be downloaded from official or third-party market, Android has become one of the most popular mobile platforms today. These apps help people in all kinds of ways and thus have access to lots of user's data that in general fall into three categories: sensitive data, data to be shared with other apps, and non-sensitive data not to be shared with others. For the first and second type of data, Android has provided very good storage models: an app's private sensitive...

  8. When data sharing gets close to 100%: what human paleogenetics can teach the open science movement.

    Science.gov (United States)

    Anagnostou, Paolo; Capocasa, Marco; Milia, Nicola; Sanna, Emanuele; Battaggia, Cinzia; Luzi, Daniela; Destro Bisol, Giovanni

    2015-01-01

    This study analyzes data sharing regarding mitochondrial, Y chromosomal and autosomal polymorphisms in a total of 162 papers on ancient human DNA published between 1988 and 2013. The estimated sharing rate was not far from totality (97.6% ± 2.1%) and substantially higher than observed in other fields of genetic research (evolutionary, medical and forensic genetics). Both a questionnaire-based survey and the examination of Journals' editorial policies suggest that this high sharing rate cannot be simply explained by the need to comply with stakeholders requests. Most data were made available through body text, but the use of primary databases increased in coincidence with the introduction of complete mitochondrial and next-generation sequencing methods. Our study highlights three important aspects. First, our results imply that researchers' awareness of the importance of openness and transparency for scientific progress may complement stakeholders' policies in achieving very high sharing rates. Second, widespread data sharing does not necessarily coincide with a prevalent use of practices which maximize data findability, accessibility, useability and preservation. A detailed look at the different ways in which data are released can be very useful to detect failures to adopt the best sharing modalities and understand how to correct them. Third and finally, the case of human paleogenetics tells us that a widespread awareness of the importance of Open Science may be important to build reliable scientific practices even in the presence of complex experimental challenges.

  9. Parallel checksumming of data chunks of a shared data object using a log-structured file system

    Science.gov (United States)

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2016-09-06

    Checksum values are generated and used to verify the data integrity. A client executing in a parallel computing system stores a data chunk to a shared data object on a storage node in the parallel computing system. The client determines a checksum value for the data chunk; and provides the checksum value with the data chunk to the storage node that stores the shared object. The data chunk can be stored on the storage node with the corresponding checksum value as part of the shared object. The storage node may be part of a Parallel Log-Structured File System (PLFS), and the client may comprise, for example, a Log-Structured File System client on a compute node or burst buffer. The checksum value can be evaluated when the data chunk is read from the storage node to verify the integrity of the data that is read.

  10. Determining firms׳ utility functions and competitive roles from data on market shares using Lotka–Volterra models

    Directory of Open Access Journals (Sweden)

    A. Marasco

    2016-06-01

    Full Text Available In this article, we include data on historical and estimated market shares of two markets. In particular, we include annual data on the market shares of the Japanese beer market (1963–2000 and biannual data on the market shares of the mobile phones market in Greece (1998–2007. In addition, we estimate monthly data on market shares for both markets. We show how this data can be used to derive firms’ utility functions and their competitive roles.

  11. DeID – A Data Sharing Tool for Neuroimaging Studies

    Directory of Open Access Journals (Sweden)

    Xuebo eSong

    2015-09-01

    Full Text Available Funding institutions and researchers increasingly expect that data will be shared to increase scientific integrity and provide other scientists with the opportunity to use the data with novel methods that may advance understanding in a particular field of study. In practice, sharing human subject data can be complicated because data must be de-identified prior to sharing. Moreover, integrating varied data types collected in a study can be challenging and time consuming. For example, sharing data from structural imaging studies of a complex disorder requires the integration of imaging, demographic and/or behavioral data in a way that no subject identifiers are included in the de-identified dataset and with new subject labels or identification values that cannot be tracked back to the original ones. We have developed a Java program that users can use to remove identifying information in neuroimaging datasets, while still maintaining the association among different data types from the same subject for further studies. This software provides a series of user interaction wizards to allow users to select data variables to be de-identified, implements functions for auditing and validation of de-identified data, and enables the user to share the de-identified data in a single compressed package through various communication protocols, such as FTPS and SFTP. DeID runs with Windows, Linux, and Mac operating systems and its open architecture allows it to be easily adapted to support a broader array of data types, with the goal of facilitating data sharing. DeID can be obtained at http://www.nitrc.org/projects/deid.

  12. Efficient Attribute-Based Secure Data Sharing with Hidden Policies and Traceability in Mobile Health Networks

    Directory of Open Access Journals (Sweden)

    Changhee Hahn

    2016-01-01

    Full Text Available Mobile health (also written as mHealth provisions the practice of public health supported by mobile devices. mHealth systems let patients and healthcare providers collect and share sensitive information, such as electronic and personal health records (EHRs at any time, allowing more rapid convergence to optimal treatment. Key to achieving this is securely sharing data by providing enhanced access control and reliability. Typically, such sharing follows policies that depend on patient and physician preferences defined by a set of attributes. In mHealth systems, not only the data but also the policies for sharing it may be sensitive since they directly contain sensitive information which can reveal the underlying data protected by the policy. Also, since the policies usually incur linearly increasing communication costs, mHealth is inapplicable to resource-constrained environments. Lastly, access privileges may be publicly known to users, so a malicious user could illegally share his access privileges without the risk of being traced. In this paper, we propose an efficient attribute-based secure data sharing scheme in mHealth. The proposed scheme guarantees a hidden policy, constant-sized ciphertexts, and traces, with security analyses. The computation cost to the user is reduced by delegating approximately 50% of the decryption operations to the more powerful storage systems.

  13. Data Use for School Improvement : Knowledge Sharing and Knowledge Brokerage in Network Structures

    NARCIS (Netherlands)

    Hubers, Mireille Desirée; Moolenaar, Nienke; Schildkamp, Kim; Handelzalts, Adam; Pieters, Julius Marie; Daly, A.J.; Daly, Alan J.

    2015-01-01

    Data teams are used in Dutch secondary education to support schools in data use for school improvement. Such teams are likely to be most effective when knowledge is shared between the data team members and brokered throughout the school. Social network structures may play an important role in this.

  14. ISA-TAB-Nano: A Specification for Sharing Nanomaterial Research Data in Spreadsheet-based Format

    Science.gov (United States)

    2013-01-01

    Background and motivation The high-throughput genomics communities have been successfully using standardized spreadsheet-based formats to capture and share data within labs and among public repositories. The nanomedicine community has yet to adopt similar standards to share the diverse and multi-dimensional types of data (including metadata) pertaining to the description and characterization of nanomaterials. Owing to the lack of standardization in representing and sharing nanomaterial data, most of the data currently shared via publications and data resources are incomplete, poorly-integrated, and not suitable for meaningful interpretation and re-use of the data. Specifically, in its current state, data cannot be effectively utilized for the development of predictive models that will inform the rational design of nanomaterials. Results We have developed a specification called ISA-TAB-Nano, which comprises four spreadsheet-based file formats for representing and integrating various types of nanomaterial data. Three file formats (Investigation, Study, and Assay files) have been adapted from the established ISA-TAB specification; while the Material file format was developed de novo to more readily describe the complexity of nanomaterials and associated small molecules. In this paper, we have discussed the main features of each file format and how to use them for sharing nanomaterial descriptions and assay metadata. Conclusion The ISA-TAB-Nano file formats provide a general and flexible framework to record and integrate nanomaterial descriptions, assay data (metadata and endpoint measurements) and protocol information. Like ISA-TAB, ISA-TAB-Nano supports the use of ontology terms to promote standardized descriptions and to facilitate search and integration of the data. The ISA-TAB-Nano specification has been submitted as an ASTM work item to obtain community feedback and to provide a nanotechnology data-sharing standard for public development and adoption. PMID

  15. Construction and Application of a National Data-Sharing Service Network of Material Environmental Corrosion

    Directory of Open Access Journals (Sweden)

    Xiaogang Li

    2007-12-01

    Full Text Available This article discusses the key features of a newly developed national data-sharing online network for material environmental corrosion. Written in Java language and based on Oracle database technology, the central database in the network is supported with two unique series of corrosion failure data, both of which were accumulated during a long period of time. The first category of data, provided by national environment corrosion test sites, is corrosion failure data for different materials in typical environments (atmosphere, seawater and soil. The other category is corrosion data in production environments, provided by a variety of firms. This network system enables standardized management of environmental corrosion data, an effective data sharing process, and research and development support for new products and after-sale services. Moreover this network system provides a firm base and data-service platform for the evaluation of project bids, safety, and service life. This article also discusses issues including data quality management and evaluation in the material corrosion data sharing process, access authority of different users, compensation for providers of shared historical data, and finally, the related policy and law legal processes, which are required to protect the intellectual property rights of the database.

  16. Towards Blockchain-based Auditable Storage and Sharing of IoT Data

    OpenAIRE

    Shafagh , Hossein; Hithnawi , Anwar; Duquennoy , Simon

    2017-01-01

    International audience; Today the cloud plays a central role in storing, processing , and distributing data. Despite contributing to the rapid development of various applications, including the IoT, the current centralized storage architecture has led into a myriad of isolated data silos and is preventing the full potential of holistic data-driven analytics for IoT data. In this abstract, we advocate a data-centric design for IoT with focus on resilience, sharing, and auditable protection of ...

  17. Sharing Electrophysiological Data and Metadata on HBP Platforms – An Example Collaboratory Workflow

    OpenAIRE

    Sprenger, Julia; Yegenoglu, Alper; Grün, Sonja; Denker, Michael

    2017-01-01

    IntroductionThe Human Brain Project (HBP) [1] aims at creating and operating a European scientific Research Infrastructurefor the neurosciences. A main goal is to gather, organise and disseminate data describing the brain and itsdiseases on the basis of experimental as well as simulated data. Therefore a lot of effort is put into thedevelopment of tools for data registration, storage, access and sharing. The most prominent data type availablethrough the HBP to date are anatomical data and dat...

  18. Determining firms' utility functions and competitive roles from data on market shares using Lotka-Volterra models

    NARCIS (Netherlands)

    A. Marasco; A. Picucci; A. Romano (Alessandro)

    2016-01-01

    textabstractIn this article, we include data on historical and estimated market shares of two markets. In particular, we include annual data on the market shares of the Japanese beer market (1963-2000) and biannual data on the market shares of the mobile phones market in Greece (1998-2007). In

  19. A secure and efficient audit mechanism for dynamic shared data in cloud storage.

    Science.gov (United States)

    Kwon, Ohmin; Koo, Dongyoung; Shin, Yongjoo; Yoon, Hyunsoo

    2014-01-01

    With popularization of cloud services, multiple users easily share and update their data through cloud storage. For data integrity and consistency in the cloud storage, the audit mechanisms were proposed. However, existing approaches have some security vulnerabilities and require a lot of computational overheads. This paper proposes a secure and efficient audit mechanism for dynamic shared data in cloud storage. The proposed scheme prevents a malicious cloud service provider from deceiving an auditor. Moreover, it devises a new index table management method and reduces the auditing cost by employing less complex operations. We prove the resistance against some attacks and show less computation cost and shorter time for auditing when compared with conventional approaches. The results present that the proposed scheme is secure and efficient for cloud storage services managing dynamic shared data.

  20. From Rosalind Franklin to Barack Obama: Data Sharing Challenges and Solutions in Genomics and Personalised Medicine.

    Science.gov (United States)

    Lawler, Mark; Maughan, Tim

    2017-04-01

    The collection, storage and use of genomic and clinical data from patients and healthy individuals is a key component of personalised medicine enterprises such as the Precision Medicine Initiative, the Cancer Moonshot and the 100,000 Genomes Project. In order to maximise the value of this data, it is important to embed a culture within the scientific, medical and patient communities that supports the appropriate sharing of genomic and clinical information. However, this aspiration raises a number of ethical, legal and regulatory challenges that need to be addressed. The Global Alliance for Genomics and Health, a worldwide coalition of researchers, healthcare professionals, patients and industry partners, is developing innovative solutions to support the responsible and effective sharing of genomic and clinical data. This article identifies the challenges that a data sharing culture poses and highlights a series of practical solutions that will benefit patients, researchers and society.

  1. A Secure and Efficient Audit Mechanism for Dynamic Shared Data in Cloud Storage

    Science.gov (United States)

    2014-01-01

    With popularization of cloud services, multiple users easily share and update their data through cloud storage. For data integrity and consistency in the cloud storage, the audit mechanisms were proposed. However, existing approaches have some security vulnerabilities and require a lot of computational overheads. This paper proposes a secure and efficient audit mechanism for dynamic shared data in cloud storage. The proposed scheme prevents a malicious cloud service provider from deceiving an auditor. Moreover, it devises a new index table management method and reduces the auditing cost by employing less complex operations. We prove the resistance against some attacks and show less computation cost and shorter time for auditing when compared with conventional approaches. The results present that the proposed scheme is secure and efficient for cloud storage services managing dynamic shared data. PMID:24959630

  2. Data-Sharing Method for Multi-Smart Devices at Close Range

    Directory of Open Access Journals (Sweden)

    Myoungbeom Chung

    2015-01-01

    Full Text Available We proposed a useful data-sharing method among multi-smart devices at close range using inaudible frequencies and Wi-Fi. The existing near data-sharing methods mostly use Bluetooth technology, but these methods have the problem of being unable to be operated using different operating systems. To correct this flaw, the proposed method that uses inaudible frequencies through the inner speaker and microphone of smart device can solve the problems of the existing methods. Using the proposed method, the sending device generates trigger signals composed of inaudible sound. Moreover, smart devices that receive the signals obtain the shared data from the sending device through Wi-Fi. To evaluate the efficacy of the proposed method, we developed a near data-sharing application based on the trigger signals and conducted a performance evaluation experiment. The success rate of the proposed method was 98.8%. Furthermore, we tested the user usability of the Bump application and the proposed method and found that the proposed method is more useful than Bump. Therefore, the proposed method is an effective approach for sharing data practically among multi-smart devices at close range.

  3. Network computing infrastructure to share tools and data in global nuclear energy partnership

    International Nuclear Information System (INIS)

    Kim, Guehee; Suzuki, Yoshio; Teshima, Naoya

    2010-01-01

    CCSE/JAEA (Center for Computational Science and e-Systems/Japan Atomic Energy Agency) integrated a prototype system of a network computing infrastructure for sharing tools and data to support the U.S. and Japan collaboration in GNEP (Global Nuclear Energy Partnership). We focused on three technical issues to apply our information process infrastructure, which are accessibility, security, and usability. In designing the prototype system, we integrated and improved both network and Web technologies. For the accessibility issue, we adopted SSL-VPN (Security Socket Layer - Virtual Private Network) technology for the access beyond firewalls. For the security issue, we developed an authentication gateway based on the PKI (Public Key Infrastructure) authentication mechanism to strengthen the security. Also, we set fine access control policy to shared tools and data and used shared key based encryption method to protect tools and data against leakage to third parties. For the usability issue, we chose Web browsers as user interface and developed Web application to provide functions to support sharing tools and data. By using WebDAV (Web-based Distributed Authoring and Versioning) function, users can manipulate shared tools and data through the Windows-like folder environment. We implemented the prototype system in Grid infrastructure for atomic energy research: AEGIS (Atomic Energy Grid Infrastructure) developed by CCSE/JAEA. The prototype system was applied for the trial use in the first period of GNEP. (author)

  4. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking

    DEFF Research Database (Denmark)

    Wang, Mingxun; Carver, Jeremy J.; Pevzner, Pavel

    2016-01-01

    are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization...... and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations...

  5. Neuroinformatics Software Applications Supporting Electronic Data Capture, Management, and Sharing for the Neuroimaging Community.

    Science.gov (United States)

    Nichols, B Nolan; Pohl, Kilian M

    2015-09-01

    Accelerating insight into the relation between brain and behavior entails conducting small and large-scale research endeavors that lead to reproducible results. Consensus is emerging between funding agencies, publishers, and the research community that data sharing is a fundamental requirement to ensure all such endeavors foster data reuse and fuel reproducible discoveries. Funding agency and publisher mandates to share data are bolstered by a growing number of data sharing efforts that demonstrate how information technologies can enable meaningful data reuse. Neuroinformatics evaluates scientific needs and develops solutions to facilitate the use of data across the cognitive and neurosciences. For example, electronic data capture and management tools designed to facilitate human neurocognitive research can decrease the setup time of studies, improve quality control, and streamline the process of harmonizing, curating, and sharing data across data repositories. In this article we outline the advantages and disadvantages of adopting software applications that support these features by reviewing the tools available and then presenting two contrasting neuroimaging study scenarios in the context of conducting a cross-sectional and a multisite longitudinal study.

  6. Legal assessment tool (LAT): an interactive tool to address privacy and data protection issues for data sharing.

    Science.gov (United States)

    Kuchinke, Wolfgang; Krauth, Christian; Bergmann, René; Karakoyun, Töresin; Woollard, Astrid; Schluender, Irene; Braasch, Benjamin; Eckert, Martin; Ohmann, Christian

    2016-07-07

    In an unprecedented rate data in the life sciences is generated and stored in many different databases. An ever increasing part of this data is human health data and therefore falls under data protected by legal regulations. As part of the BioMedBridges project, which created infrastructures that connect more than 10 ESFRI research infrastructures (RI), the legal and ethical prerequisites of data sharing were examined employing a novel and pragmatic approach. We employed concepts from computer science to create legal requirement clusters that enable legal interoperability between databases for the areas of data protection, data security, Intellectual Property (IP) and security of biosample data. We analysed and extracted access rules and constraints from all data providers (databases) involved in the building of data bridges covering many of Europe's most important databases. These requirement clusters were applied to five usage scenarios representing the data flow in different data bridges: Image bridge, Phenotype data bridge, Personalised medicine data bridge, Structural data bridge, and Biosample data bridge. A matrix was built to relate the important concepts from data protection regulations (e.g. pseudonymisation, identifyability, access control, consent management) with the results of the requirement clusters. An interactive user interface for querying the matrix for requirements necessary for compliant data sharing was created. To guide researchers without the need for legal expert knowledge through legal requirements, an interactive tool, the Legal Assessment Tool (LAT), was developed. LAT provides researchers interactively with a selection process to characterise the involved types of data and databases and provides suitable requirements and recommendations for concrete data access and sharing situations. The results provided by LAT are based on an analysis of the data access and sharing conditions for different kinds of data of major databases in Europe

  7. Improving the analysis, storage and sharing of neuroimaging data using relational databases and distributed computing.

    Science.gov (United States)

    Hasson, Uri; Skipper, Jeremy I; Wilde, Michael J; Nusbaum, Howard C; Small, Steven L

    2008-01-15

    The increasingly complex research questions addressed by neuroimaging research impose substantial demands on computational infrastructures. These infrastructures need to support management of massive amounts of data in a way that affords rapid and precise data analysis, to allow collaborative research, and to achieve these aims securely and with minimum management overhead. Here we present an approach that overcomes many current limitations in data analysis and data sharing. This approach is based on open source database management systems that support complex data queries as an integral part of data analysis, flexible data sharing, and parallel and distributed data processing using cluster computing and Grid computing resources. We assess the strengths of these approaches as compared to current frameworks based on storage of binary or text files. We then describe in detail the implementation of such a system and provide a concrete description of how it was used to enable a complex analysis of fMRI time series data.

  8. Measuring and detecting errors in occupational coding: an analysis of SHARE data

    NARCIS (Netherlands)

    Belloni, M.; Brugiavini, A.; Meschi, E.; Tijdens, K.

    2016-01-01

    This article studies coding errors in occupational data, as the quality of this data is important but often neglected. In particular, we recoded open-ended questions on occupation for last and current job in the Dutch sample of the “Survey of Health, Ageing and Retirement in Europe” (SHARE) using a

  9. Perspectives on open science and scientific data sharing : An interdisciplinary workshop”

    NARCIS (Netherlands)

    Destro Bisol, G.; Anagnostou, P.; Capocasa, M.; Bencivelli, S.; Cerroni, A.; Contreras, J.; Enke, N.; Fantini, B.; Greco, P.; Heeney, C.; Luzi, D.; Manghi, P.; Mascalzoni, D.; Molloy, J.; Parenti, F.; Wicherts, J.M.; Boulton, G.

    2014-01-01

    Looking at Open Science and Open Data from a broad perspective. This is the idea behind “Scientific data sharing: an interdisciplinary workshop”, an initiative designed to foster dialogue between scholars from different scientific domains which was organized by the Istituto Italiano di Antropologia

  10. Rethinking Data Sharing and Human Participant Protection in Social Science Research: Applications from the Qualitative Realm

    Directory of Open Access Journals (Sweden)

    Dessi Kirilova

    2017-09-01

    Full Text Available While data sharing is becoming increasingly common in quantitative social inquiry, qualitative data are rarely shared. One factor inhibiting data sharing is a concern about human participant protections and privacy. Protecting the confidentiality and safety of research participants is a concern for both quantitative and qualitative researchers, but it raises specific concerns within the epistemic context of qualitative research. Thus, the applicability of emerging protection models from the quantitative realm must be carefully evaluated for application to the qualitative realm. At the same time, qualitative scholars already employ a variety of strategies for human-participant protection implicitly or informally during the research process. In this practice paper, we assess available strategies for protecting human participants and how they can be deployed. We describe a spectrum of possible data management options, such as de-identification and applying access controls, including some already employed by the Qualitative Data Repository (QDR in tandem with its pilot depositors. Throughout the discussion, we consider the tension between modifying data or restricting access to them, and retaining their analytic value. We argue that developing explicit guidelines for sharing qualitative data generated through interaction with humans will allow scholars to address privacy concerns and increase the secondary use of their data.

  11. Group Clustering Mechanism for P2P Large Scale Data Sharing Collaboration

    Institute of Scientific and Technical Information of China (English)

    DENGQianni; LUXinda; CHENLi

    2005-01-01

    Research shows that P2P scientific collaboration network will exhibit small-world topology, as do a large number of social networks for which the same pattern has been documented. In this paper we propose a topology building protocol to benefit from the small world feature. We find that the idea of Freenet resembles the dynamic pattern of social interactions in scientific data sharing and the small world characteristic of Freenet is propitious to improve the file locating performance in scientificdata sharing. But the LRU (Least recently used) datas-tore cache replacement scheme of Freenet is not suitableto be used in scientific data sharing network. Based onthe group locality of scientific collaboration, we proposean enhanced group clustering cache replacement scheme.Simulation shows that this scheme improves the request hitratio dramatically while keeping the small average hops per successful request comparable to LRU.

  12. Harnessing modern web application technology to create intuitive and efficient data visualization and sharing tools

    Directory of Open Access Journals (Sweden)

    Dylan eWood

    2014-08-01

    Full Text Available Neuroscientists increasingly need to work with big data in order to derive meaningful results in their field. Collecting, organizing and analyzing this data can be a major hurdle on the road to scientific discovery. This hurdle can be lowered using the same technologies that are currently revolutionizing the way that cultural and social media sites represent and share information with their users. Web application technologies and standards such as RESTful webservices, HTML5 and high-performance in-browser JavaScript engines are being utilized to vastly improve the way that the world accesses and shares information. The neuroscience community can also benefit tremendously from these technologies. We present here a web application that allows users to explore and request the complex datasets that need to be shared among the neuroimaging community. The COINS (Collaborative Informatics and Neuroimaging Suite Data Exchange uses web application technologies to facilitate data sharing in three phases: Exploration, Request/Communication, and Download. This paper will focus on the first phase, and how intuitive exploration of large and complex datasets is achieved using a framework that centers around asynchronous client-server communication (AJAX and also exposes a powerful API that can be utilized by other applications to explore available data. First opened to the neuroscience community in August 2012, the Data Exchange has already provided researchers with over 2500 GB of data.

  13. Harnessing modern web application technology to create intuitive and efficient data visualization and sharing tools.

    Science.gov (United States)

    Wood, Dylan; King, Margaret; Landis, Drew; Courtney, William; Wang, Runtang; Kelly, Ross; Turner, Jessica A; Calhoun, Vince D

    2014-01-01

    Neuroscientists increasingly need to work with big data in order to derive meaningful results in their field. Collecting, organizing and analyzing this data can be a major hurdle on the road to scientific discovery. This hurdle can be lowered using the same technologies that are currently revolutionizing the way that cultural and social media sites represent and share information with their users. Web application technologies and standards such as RESTful webservices, HTML5 and high-performance in-browser JavaScript engines are being utilized to vastly improve the way that the world accesses and shares information. The neuroscience community can also benefit tremendously from these technologies. We present here a web application that allows users to explore and request the complex datasets that need to be shared among the neuroimaging community. The COINS (Collaborative Informatics and Neuroimaging Suite) Data Exchange uses web application technologies to facilitate data sharing in three phases: Exploration, Request/Communication, and Download. This paper will focus on the first phase, and how intuitive exploration of large and complex datasets is achieved using a framework that centers around asynchronous client-server communication (AJAX) and also exposes a powerful API that can be utilized by other applications to explore available data. First opened to the neuroscience community in August 2012, the Data Exchange has already provided researchers with over 2500 GB of data.

  14. Sharing data for public health research by members of an international online diabetes social network.

    Directory of Open Access Journals (Sweden)

    Elissa R Weitzman

    2011-04-01

    Full Text Available Surveillance and response to diabetes may be accelerated through engaging online diabetes social networks (SNs in consented research. We tested the willingness of an online diabetes community to share data for public health research by providing members with a privacy-preserving social networking software application for rapid temporal-geographic surveillance of glycemic control.SN-mediated collection of cross-sectional, member-reported data from an international online diabetes SN entered into a software application we made available in a "Facebook-like" environment to enable reporting, charting and optional sharing of recent hemoglobin A1c values through a geographic display. Self-enrollment by 17% (n = 1,136 of n = 6,500 active members representing 32 countries and 50 US states. Data were current with 83.1% of most recent A1c values reported obtained within the past 90 days. Sharing was high with 81.4% of users permitting data donation to the community display. 34.1% of users also displayed their A1cs on their SN profile page. Users selecting the most permissive sharing options had a lower average A1c (6.8% than users not sharing with the community (7.1%, p = .038. 95% of users permitted re-contact. Unadjusted aggregate A1c reported by US users closely resembled aggregate 2007-2008 NHANES estimates (respectively, 6.9% and 6.9%, p = 0.85.Success within an early adopter community demonstrates that online SNs may comprise efficient platforms for bidirectional communication with and data acquisition from disease populations. Advancing this model for cohort and translational science and for use as a complementary surveillance approach will require understanding of inherent selection and publication (sharing biases in the data and a technology model that supports autonomy, anonymity and privacy.

  15. Sharing data for public health research by members of an international online diabetes social network.

    Science.gov (United States)

    Weitzman, Elissa R; Adida, Ben; Kelemen, Skyler; Mandl, Kenneth D

    2011-04-27

    Surveillance and response to diabetes may be accelerated through engaging online diabetes social networks (SNs) in consented research. We tested the willingness of an online diabetes community to share data for public health research by providing members with a privacy-preserving social networking software application for rapid temporal-geographic surveillance of glycemic control. SN-mediated collection of cross-sectional, member-reported data from an international online diabetes SN entered into a software application we made available in a "Facebook-like" environment to enable reporting, charting and optional sharing of recent hemoglobin A1c values through a geographic display. Self-enrollment by 17% (n = 1,136) of n = 6,500 active members representing 32 countries and 50 US states. Data were current with 83.1% of most recent A1c values reported obtained within the past 90 days. Sharing was high with 81.4% of users permitting data donation to the community display. 34.1% of users also displayed their A1cs on their SN profile page. Users selecting the most permissive sharing options had a lower average A1c (6.8%) than users not sharing with the community (7.1%, p = .038). 95% of users permitted re-contact. Unadjusted aggregate A1c reported by US users closely resembled aggregate 2007-2008 NHANES estimates (respectively, 6.9% and 6.9%, p = 0.85). Success within an early adopter community demonstrates that online SNs may comprise efficient platforms for bidirectional communication with and data acquisition from disease populations. Advancing this model for cohort and translational science and for use as a complementary surveillance approach will require understanding of inherent selection and publication (sharing) biases in the data and a technology model that supports autonomy, anonymity and privacy.

  16. Security controls in an integrated Biobank to protect privacy in data sharing: rationale and study design.

    Science.gov (United States)

    Takai-Igarashi, Takako; Kinoshita, Kengo; Nagasaki, Masao; Ogishima, Soichi; Nakamura, Naoki; Nagase, Sachiko; Nagaie, Satoshi; Saito, Tomo; Nagami, Fuji; Minegishi, Naoko; Suzuki, Yoichi; Suzuki, Kichiya; Hashizume, Hiroaki; Kuriyama, Shinichi; Hozawa, Atsushi; Yaegashi, Nobuo; Kure, Shigeo; Tamiya, Gen; Kawaguchi, Yoshio; Tanaka, Hiroshi; Yamamoto, Masayuki

    2017-07-06

    With the goal of realizing genome-based personalized healthcare, we have developed a biobank that integrates personal health, genome, and omics data along with biospecimens donated by volunteers of 150,000. Such a large-scale of data integration involves obvious risks of privacy violation. The research use of personal genome and health information is a topic of global discussion with regard to the protection of privacy while promoting scientific advancement. The present paper reports on our plans, current attempts, and accomplishments in addressing security problems involved in data sharing to ensure donor privacy while promoting scientific advancement. Biospecimens and data have been collected in prospective cohort studies with the comprehensive agreement. The sample size of 150,000 participants was required for multiple researches including genome-wide screening of gene by environment interactions, haplotype phasing, and parametric linkage analysis. We established the T ohoku M edical M egabank (TMM) data sharing policy: a privacy protection rule that requires physical, personnel, and technological safeguards against privacy violation regarding the use and sharing of data. The proposed policy refers to that of NCBI and that of the Sanger Institute. The proposed policy classifies shared data according to the strength of re-identification risks. Local committees organized by TMM evaluate re-identification risk and assign a security category to a dataset. Every dataset is stored in an assigned segment of a supercomputer in accordance with its security category. A security manager should be designated to handle all security problems at individual data use locations. The proposed policy requires closed networks and IP-VPN remote connections. The mission of the biobank is to distribute biological resources most productively. This mission motivated us to collect biospecimens and health data and simultaneously analyze genome/omics data in-house. The biobank also has the

  17. Web Syndication Approaches for Sharing Primary Data in "Small Science" Domains

    Directory of Open Access Journals (Sweden)

    Eric C Kansa

    2010-06-01

    Full Text Available In some areas of science, sophisticated web services and semantics underlie "cyberinfrastructure". However, in "small science" domains, especially in field sciences such as archaeology, conservation, and public health, datasets often resist standardization. Publishing data in the small sciences should embrace this diversity rather than attempt to corral research into "universal" (domain standards. A growing ecosystem of increasingly powerful Web syndication based approaches for sharing data on the public Web can offer a viable approach. Atom Feed based services can be used with scientific collections to identify and create linkages across different datasets, even across disciplinary boundaries without shared domain standards.

  18. Ethics of sharing scientific and technological data: a heuristic for coping with complexity & uncertainty

    Directory of Open Access Journals (Sweden)

    J E Sieber

    2006-01-01

    Full Text Available Data sharing poses complex ethical questions for data management. Manifold conflicting and shifting values need to be reconciled in pursuing viable data-management policies. For example, how does one make data available in useable form to stakeholders including scientists, governments and businesses worldwide, while assuring confidentiality, satisfying one's research ethics committee, protecting intellectual property and national security, and containing costs? Increasingly, ethical problem solving requires integration of ethics with technological "know how" and empirical research on the presenting problem. Each problem is highly contextual; broad application of general ethical principles such as always practice openness, or prepare all data for sharing, may have harmful unintended consequences. Chaos theory provides a heuristic or vision for understanding and coping with complexity and uncertainty. It does not provide answers to problems of data management, but frames the issues, and provides appropriate expectations and heuristics for considering data management problems.

  19. The Data Warehouse: Keeping It Simple. MIT Shares Valuable Lessons Learned from a Successful Data Warehouse Implementation.

    Science.gov (United States)

    Thorne, Scott

    2000-01-01

    Explains why the data warehouse is important to the Massachusetts Institute of Technology community, describing its basic functions and technical design points; sharing some non-technical aspects of the school's data warehouse implementation that have proved to be important; examining the importance of proper training in a successful warehouse…

  20. Ownership as an Issue in Data and Information Sharing: a philosophically based review

    Directory of Open Access Journals (Sweden)

    Dennis Hart

    2002-11-01

    Full Text Available It has long been an aim of information management and information systems development to enable more effective and efficient data and information sharing within organisations. A commonplace assertion has been that data and information belong, or should belong, to the organisation as a whole as opposed to any individual or stakeholder within it. Nevertheless, despite the potential benefits of data and information sharing within organisations, efforts to achieve it have typically run into more difficulty than expected and have frequently been less successful than the technological capabilities would, at least prima facie, allow. This paper is based on the proposition that perceptions of ownership can have an important influence on data and information sharing behaviour, and explores philosophical theories of ownership and property with the aim of better understanding the origins of such behaviour. It is further proposed that what are here called “implicit” theories of information ownership on the part of different individuals or parties within an organisation can lead to varying perceptions as to who is the legitimate owner of particular data or information, and that this view is illuminating of the difficulties that have often been experienced in trying to achieve effective organisational data and information sharing.

  1. Adapting federated cyberinfrastructure for shared data collection facilities in structural biology.

    Science.gov (United States)

    Stokes-Rees, Ian; Levesque, Ian; Murphy, Frank V; Yang, Wei; Deacon, Ashley; Sliz, Piotr

    2012-05-01

    Early stage experimental data in structural biology is generally unmaintained and inaccessible to the public. It is increasingly believed that this data, which forms the basis for each macromolecular structure discovered by this field, must be archived and, in due course, published. Furthermore, the widespread use of shared scientific facilities such as synchrotron beamlines complicates the issue of data storage, access and movement, as does the increase of remote users. This work describes a prototype system that adapts existing federated cyberinfrastructure technology and techniques to significantly improve the operational environment for users and administrators of synchrotron data collection facilities used in structural biology. This is achieved through software from the Virtual Data Toolkit and Globus, bringing together federated users and facilities from the Stanford Synchrotron Radiation Lightsource, the Advanced Photon Source, the Open Science Grid, the SBGrid Consortium and Harvard Medical School. The performance and experience with the prototype provide a model for data management at shared scientific facilities.

  2. Sharing Data to Build a Medical Information Commons: From Bermuda to the Global Alliance.

    Science.gov (United States)

    Cook-Deegan, Robert; Ankeny, Rachel A; Maxson Jones, Kathryn

    2017-08-31

    The Human Genome Project modeled its open science ethos on nematode biology, most famously through daily release of DNA sequence data based on the 1996 Bermuda Principles. That open science philosophy persists, but daily, unfettered release of data has had to adapt to constraints occasioned by the use of data from individual people, broader use of data not only by scientists but also by clinicians and individuals, the global reach of genomic applications and diverse national privacy and research ethics laws, and the rising prominence of a diverse commercial genomics sector. The Global Alliance for Genomics and Health was established to enable the data sharing that is essential for making meaning of genomic variation. Data-sharing policies and practices will continue to evolve as researchers, health professionals, and individuals strive to construct a global medical and scientific information commons.

  3. Towards efficient data exchange and sharing for big-data driven materials science: metadata and data formats

    Science.gov (United States)

    Ghiringhelli, Luca M.; Carbogno, Christian; Levchenko, Sergey; Mohamed, Fawzi; Huhs, Georg; Lüders, Martin; Oliveira, Micael; Scheffler, Matthias

    2017-11-01

    With big-data driven materials research, the new paradigm of materials science, sharing and wide accessibility of data are becoming crucial aspects. Obviously, a prerequisite for data exchange and big-data analytics is standardization, which means using consistent and unique conventions for, e.g., units, zero base lines, and file formats. There are two main strategies to achieve this goal. One accepts the heterogeneous nature of the community, which comprises scientists from physics, chemistry, bio-physics, and materials science, by complying with the diverse ecosystem of computer codes and thus develops "converters" for the input and output files of all important codes. These converters then translate the data of each code into a standardized, code-independent format. The other strategy is to provide standardized open libraries that code developers can adopt for shaping their inputs, outputs, and restart files, directly into the same code-independent format. In this perspective paper, we present both strategies and argue that they can and should be regarded as complementary, if not even synergetic. The represented appropriate format and conventions were agreed upon by two teams, the Electronic Structure Library (ESL) of the European Center for Atomic and Molecular Computations (CECAM) and the NOvel MAterials Discovery (NOMAD) Laboratory, a European Centre of Excellence (CoE). A key element of this work is the definition of hierarchical metadata describing state-of-the-art electronic-structure calculations.

  4. Public and biobank participant attitudes toward genetic research participation and data sharing.

    Science.gov (United States)

    Lemke, A A; Wolf, W A; Hebert-Beirne, J; Smith, M E

    2010-01-01

    Research assessing attitudes toward consent processes for high-throughput genomic-wide technologies and widespread sharing of data is limited. In order to develop a better understanding of stakeholder views toward these issues, this cross-sectional study assessed public and biorepository participant attitudes toward research participation and sharing of genetic research data. Forty-nine individuals participated in 6 focus groups; 28 in 3 public focus groups and 21 in 3 NUgene biorepository participant focus groups. In the public focus groups, 75% of participants were women, 75% had some college education or more, 46% were African-American and 29% were Hispanic. In the NUgene focus groups, 67% of participants were women, 95% had some college education or more, and the majority (76%) of participants was Caucasian. Five major themes were identified in the focus group data: (a) a wide spectrum of understanding of genetic research; (b) pros and cons of participation in genetic research; (c) influence of credibility and trust of the research institution; (d) concerns about sharing genetic research data and need for transparency in the Policy for Sharing of Data in National Institutes of Health-Supported or Conducted Genome-Wide Association Studies; (e) a need for more information and education about genetic research. In order to increase public understanding and address potential concerns about genetic research, future efforts should be aimed at involving the public in genetic research policy development and in identifying or developing appropriate educational strategies to meet the public's needs.

  5. Understanding Spatiotemporal Patterns of Biking Behavior by Analyzing Massive Bike Sharing Data in Chicago.

    Science.gov (United States)

    Zhou, Xiaolu

    2015-01-01

    The growing number of bike sharing systems (BSS) in many cities largely facilitates biking for transportation and recreation. Most recent bike sharing systems produce time and location specific data, which enables the study of travel behavior and mobility of each individual. However, despite a rapid growth of interest, studies on massive bike sharing data and the underneath travel pattern are still limited. Few studies have explored and visualized spatiotemporal patterns of bike sharing behavior using flow clustering, nor examined the station functional profiles based on over-demand patterns. This study investigated the spatiotemporal biking pattern in Chicago by analyzing massive BSS data from July to December in 2013 and 2014. The BSS in Chicago gained more popularity. About 15.9% more people subscribed to this service. Specifically, we constructed bike flow similarity graph and used fastgreedy algorithm to detect spatial communities of biking flows. By using the proposed methods, we discovered unique travel patterns on weekdays and weekends as well as different travel trends for customers and subscribers from the noisy massive amount data. In addition, we also examined the temporal demands for bikes and docks using hierarchical clustering method. Results demonstrated the modeled over-demand patterns in Chicago. This study contributes to offer better knowledge of biking flow patterns, which was difficult to obtain using traditional methods. Given the trend of increasing popularity of the BSS and data openness in different cities, methods used in this study can extend to examine the biking patterns and BSS functionality in different cities.

  6. Applying Triple-Matrix Masking for Privacy Preserving Data Collection and Sharing in HIV Studies.

    Science.gov (United States)

    Pei, Qinglin; Chen, Shigang; Xiao, Yao; Wu, Samuel S

    2016-01-01

    Many HIV research projects are plagued by the high missing rate of selfreported information during data collection. Also, due to the sensitive nature of the HIV research data, privacy protection is always a concern for data sharing in HIV studies. This paper applies a data masking approach, called triple-matrix masking [1], to the context of HIV research for ensuring privacy protection during the process of data collection and data sharing. Using a set of generated HIV patient data, we show step by step how the data are randomly transformed (masked) before leaving the patients' individual data collection device (which ensures that nobody sees the actual data) and how the masked data are further transformed by a masking service provider and a data collector. We demonstrate that the masked data retain statistical utility of the original data, yielding the exactly same inference results in the planned logistic regression on the effect of age on the adherence to antiretroviral therapy and in the Cox proportional hazard model for the age effect on time to viral load suppression. Privacy-preserving data collection method may help resolve the privacy protection issue in HIV research. The individual sensitive data can be completely hidden while the same inference results can still be obtained from the masked data, with the use of common statistical analysis methods.

  7. UnLynx: A Decentralized System for Privacy-Conscious Data Sharing

    Directory of Open Access Journals (Sweden)

    Froelicher David

    2017-10-01

    Full Text Available Current solutions for privacy-preserving data sharing among multiple parties either depend on a centralized authority that must be trusted and provides only weakest-link security (e.g., the entity that manages private/secret cryptographic keys, or leverage on decentralized but impractical approaches (e.g., secure multi-party computation. When the data to be shared are of a sensitive nature and the number of data providers is high, these solutions are not appropriate. Therefore, we present UnLynx, a new decentralized system for efficient privacy-preserving data sharing. We consider m servers that constitute a collective authority whose goal is to verifiably compute on data sent from n data providers. UnLynx guarantees the confidentiality, unlinkability between data providers and their data, privacy of the end result and the correctness of computations by the servers. Furthermore, to support differentially private queries, UnLynx can collectively add noise under encryption. All of this is achieved through a combination of a set of new distributed and secure protocols that are based on homomorphic cryptography, verifiable shuffling and zero-knowledge proofs. UnLynx is highly parallelizable and modular by design as it enables multiple security/privacy vs. runtime tradeoffs. Our evaluation shows that UnLynx can execute a secure survey on 400,000 personal data records containing 5 encrypted attributes, distributed over 20 independent databases, for a total of 2,000,000 ciphertexts, in 24 minutes.

  8. Use FlowRepository to share your clinical data upon study publication.

    Science.gov (United States)

    Spidlen, Josef; Brinkman, Ryan R

    2018-01-01

    A fundamental tenet of scientific research is that published results including underlying data should be open to independent validation and refutation. Data sharing encourages collaboration, facilitates quality and reduces redundancy in data production. Authors submitting manuscripts to several journals have already adopted the habit of sharing their underlying flow cytometry data by deposition to FlowRepository-a data repository that is jointly supported by the International Society for Advancement of Cytometry, the International Clinical Cytometry Society and the European Society for Clinical Cell Analysis. De-identification is required for publishing data from clinical studies and we discuss ways to satisfy data sharing requirements and patient privacy requirements simultaneously. Scientific communities in the fields of microarray, proteomics, and sequencing have been benefiting from reuse and re-exploration of data in public repositories for over decade. We believe it is time that clinicians follow suit and that de-identified clinical data also become routinely available along with published cytometry-based findings. © 2016 International Clinical Cytometry Society. © 2016 International Clinical Cytometry Society.

  9. Practicing what we preach: developing a data sharing policy for the Journal of the Medical Library Association.

    Science.gov (United States)

    Read, Kevin B; Amos, Liz; Federer, Lisa M; Logan, Ayaba; Plutchak, T Scott; Akers, Katherine G

    2018-04-01

    Providing access to the data underlying research results in published literature allows others to reproduce those results or analyze the data in new ways. Health sciences librarians and information professionals have long been advocates of data sharing. It is time for us to practice what we preach and share the data associated with our published research. This editorial describes the activity of a working group charged with developing a research data sharing policy for the Journal of the Medical Library Association.

  10. Challenges in Archiving and Sharing Video Data: Considering Moral, Pragmatic, and Substantial Arguments

    Directory of Open Access Journals (Sweden)

    Terhi Kirsi Korkiakangas

    2014-05-01

    Full Text Available Social science researchers are facing new challenges in data archiving and sharing. The challenges encountered for video data are different from those encountered for other types of qualitative data. I will consider these challenges with respect to the moral, pragmatic, and substantial arguments with which funding bodies justify data archiving and sharing. Throughout the article, I will draw on a recent Economic and Social Research Council funded project, “Transient Teams in the Operating Theatre,” in which our research team video recorded work activities in the operating theatre of a UK hospital, thereby dealing with highly sensitive footage. I will consider how video data, on most occasions, cannot be archived for re-use by the wider research community, but how new avenues could be developed so as to benefit from further research on such “unarchivable” datasets.

  11. Documentation in Otolaryngology. Sharing Otolaryngology research data in an open science ecosyste

    Directory of Open Access Journals (Sweden)

    Fernanda PESET

    2018-01-01

    Full Text Available Introduction and objective: The present text addresses the most significant aspects to share research data in otolaryngology in the context of open science as an ecosystem. Its aim is to offer a panoramic view that helps the researcher to manage their data as part of enriched science. Method: A bibliographical review and of the own experience in the field of the investigation data was performed. Results: The basic pillars for success are offered: its political, technical and necessary capacities. Discussion: The tasks of making data available should be recognized as part of the researcher's curriculum because documenting them to be reusable is a highly specialized and time-consuming task. Conclusions: It is considered that we are at a crucial moment to begin to share data. It is being considered in all scientific policy scenarios as in the EU through the European Open Science Computing.

  12. Hierarchical data security in a Query-By-Example interface for a shared database.

    Science.gov (United States)

    Taylor, Merwyn

    2002-06-01

    Whenever a shared database resource, containing critical patient data, is created, protecting the contents of the database is a high priority goal. This goal can be achieved by developing a Query-By-Example (QBE) interface, designed to access a shared database, and embedding within the QBE a hierarchical security module that limits access to the data. The security module ensures that researchers working in one clinic do not get access to data from another clinic. The security can be based on a flexible taxonomy structure that allows ordinary users to access data from individual clinics and super users to access data from all clinics. All researchers submit queries through the same interface and the security module processes the taxonomy and user identifiers to limit access. Using this system, two different users with different access rights can submit the same query and get different results thus reducing the need to create different interfaces for different clinics and access rights.

  13. NeuroLOG: sharing neuroimaging data using an ontology-based federated approach.

    Science.gov (United States)

    Gibaud, Bernard; Kassel, Gilles; Dojat, Michel; Batrancourt, Bénédicte; Michel, Franck; Gaignard, Alban; Montagnat, Johan

    2011-01-01

    This paper describes the design of the NeuroLOG middleware data management layer, which provides a platform to share heterogeneous and distributed neuroimaging data using a federated approach. The semantics of shared information is captured through a multi-layer application ontology and a derived Federated Schema used to align the heterogeneous database schemata from different legacy repositories. The system also provides a facility to translate the relational data into a semantic representation that can be queried using a semantic search engine thus enabling the exploitation of knowledge embedded in the ontology. This work shows the relevance of the distributed approach for neurosciences data management. Although more complex than a centralized approach, it is also more realistic when considering the federation of large data sets, and open strong perspectives to implement multi-centric neurosciences studies.

  14. Data Sharing in Interpretive Engineering Education Research: Challenges and Opportunities from a Research Quality Perspective

    Science.gov (United States)

    Walther, Joachim; Sochacka, Nicola W.; Pawley, Alice L.

    2016-01-01

    This article explores challenges and opportunities associated with sharing qualitative data in engineering education research. This exploration is theoretically informed by an existing framework of interpretive research quality with a focus on the concept of Communicative Validation. Drawing on practice anecdotes from the authors' work, the…

  15. An experimental evaluation of self-managing availability in shared data spaces

    NARCIS (Netherlands)

    Russello, G.; Chaudron, M.R.V.; Steen, van M.; Bokharouss, I.

    2007-01-01

    With its decoupling of processes in space and time, the shared data space model has proven to be a well-suited solution for developing distributed component-based systems. However, as in many distributed applications, functional and extra-functional aspects are still interwoven in components. In

  16. Legacy data sharing to improve drug safety assessment: the eTOX project

    DEFF Research Database (Denmark)

    Sanz, Ferran; Pognan, François; Steger-Hartmann, Thomas

    2017-01-01

    The sharing of legacy preclinical safety data among pharmaceutical companies and its integration with other information sources offers unprecedented opportunities to improve the early assessment of drug safety. Here, we discuss the experience of the eTOX project, which was established through...

  17. Inference for shared-frailty survival models with left-truncated data

    NARCIS (Netherlands)

    van den Berg, G.J.; Drepper, B.

    2016-01-01

    Shared-frailty survival models specify that systematic unobserved determinants of duration outcomes are identical within groups of individuals. We consider random-effects likelihood-based statistical inference if the duration data are subject to left-truncation. Such inference with left-truncated

  18. Seeing through the clouds: Processes and challenges for sharing geospatial data for disaster management in Haiti

    DEFF Research Database (Denmark)

    Clark, Nathan Edward; Guiffault, Flore

    2018-01-01

    This article examines the ways in which the production and sharing of geospatial data for disaster management purposes have evolved in Haiti, within the context of the 2010 earthquake and 2016 Hurricane Matthew. The conditions for these developments are traced through the institutional and operat...

  19. Genomic Research Data Generation, Analysis and Sharing – Challenges in the African Setting

    Directory of Open Access Journals (Sweden)

    Nicola Mulder

    2017-11-01

    Full Text Available Genomics is the study of the genetic material that constitutes the genomes of organisms. This genetic material can be sequenced and it provides a powerful tool for the study of human, plant and animal evolutionary history and diseases. Genomics research is becoming increasingly commonplace due to significant advances in and reducing costs of technologies such as sequencing. This has led to new challenges including increasing cost and complexity of data. There is, therefore, an increasing need for computing infrastructure and skills to manage, store, analyze and interpret the data. In addition, there is a significant cost associated with recruitment of participants and collection and processing of biological samples, particularly for large human genetics studies on specific diseases. As a result, researchers are often reluctant to share the data due to the effort and associated cost. In Africa, where researchers are most commonly at the study recruitment, determination of phenotypes and collection of biological samples end of the genomic research spectrum, rather than the generation of genomic data, data sharing without adequate safeguards for the interests of the primary data generators is a concern. There are substantial ethical considerations in the sharing of human genomics data. The broad consent for data sharing preferred by genomics researchers and funders does not necessarily align with the expectations of researchers, research participants, legal authorities and bioethicists. In Africa, this is complicated by concerns about comprehension of genomics research studies, quality of research ethics reviews and understanding of the implications of broad consent, secondary analyses of shared data, return of results and incidental findings. Additional challenges with genomics research in Africa include the inability to transfer, store, process and analyze large-scale genomics data on the continent, because this requires highly specialized skills

  20. Sharing and reuse of individual participant data from clinical trials: principles and recommendations.

    Science.gov (United States)

    Ohmann, Christian; Banzi, Rita; Canham, Steve; Battaglia, Serena; Matei, Mihaela; Ariyo, Christopher; Becnel, Lauren; Bierer, Barbara; Bowers, Sarion; Clivio, Luca; Dias, Monica; Druml, Christiane; Faure, Hélène; Fenner, Martin; Galvez, Jose; Ghersi, Davina; Gluud, Christian; Groves, Trish; Houston, Paul; Karam, Ghassan; Kalra, Dipak; Knowles, Rachel L; Krleža-Jerić, Karmela; Kubiak, Christine; Kuchinke, Wolfgang; Kush, Rebecca; Lukkarinen, Ari; Marques, Pedro Silverio; Newbigging, Andrew; O'Callaghan, Jennifer; Ravaud, Philippe; Schlünder, Irene; Shanahan, Daniel; Sitter, Helmut; Spalding, Dylan; Tudur-Smith, Catrin; van Reusel, Peter; van Veen, Evert-Ben; Visser, Gerben Rienk; Wilson, Julia; Demotes-Mainard, Jacques

    2017-12-14

    We examined major issues associated with sharing of individual clinical trial data and developed a consensus document on providing access to individual participant data from clinical trials, using a broad interdisciplinary approach. This was a consensus-building process among the members of a multistakeholder task force, involving a wide range of experts (researchers, patient representatives, methodologists, information technology experts, and representatives from funders, infrastructures and standards development organisations). An independent facilitator supported the process using the nominal group technique. The consensus was reached in a series of three workshops held over 1 year, supported by exchange of documents and teleconferences within focused subgroups when needed. This work was set within the Horizon 2020-funded project CORBEL (Coordinated Research Infrastructures Building Enduring Life-science Services) and coordinated by the European Clinical Research Infrastructure Network. Thus, the focus was on non-commercial trials and the perspective mainly European. We developed principles and practical recommendations on how to share data from clinical trials. The task force reached consensus on 10 principles and 50 recommendations, representing the fundamental requirements of any framework used for the sharing of clinical trials data. The document covers the following main areas: making data sharing a reality (eg, cultural change, academic incentives, funding), consent for data sharing, protection of trial participants (eg, de-identification), data standards, rights, types and management of access (eg, data request and access models), data management and repositories, discoverability, and metadata. The adoption of the recommendations in this document would help to promote and support data sharing and reuse among researchers, adequately inform trial participants and protect their rights, and provide effective and efficient systems for preparing, storing and

  1. Sustainable Regulation of Information Sharing with Electronic Data Interchange by a Trust-Embedded Contract

    Directory of Open Access Journals (Sweden)

    Guanghua Han

    2017-06-01

    Full Text Available This paper studies the risks in demand information sharing applications by electronic soft-orders using electronic data interchange (EDI systems in e-commerce and aims to suggest a sustainable regulation mechanism with a trust-embedded contract. In a supply chain with one retailer and one supplier, the retailer solicits private forecasted demand and places soft-orders via EDI to the supplier. To ensure abundant supply, the retailer has an incentive to inflate her soft-orders, which potentially harms the credible information sharing and sustainability of business cooperation. Normally, the degree to which the supplier relies on the retailer’s order information is specified by trust, which is evaluated according to the retailer’s reputation and supplier’s intuition in this study. Based on standard game theory, we find that both the retailer’s order and the quantity of supplier prepared materials are independent of the retailer’s forecast. Therefore, EDI based information sharing in e-commerce without a regulation mechanism leads to inefficient demand information sharing. Since both the supplier and retailer are proved to faces huge of potential profit losses due to the failure of information sharing, the commerce by EDI based information sharing is full of risk and unsustainable. Therefore, a regulation mechanism that leaded by the retailer is proposed to establish ‘win-win’ sustainable cooperation. Numerical experiments highlight the value of trust, the impact of reputation and intuition in decisions, and the effectiveness of the regulation mechanism by a cost-sharing contract.

  2. Technical and policy approaches to balancing patient privacy and data sharing in clinical and translational research.

    Science.gov (United States)

    Malin, Bradley; Karp, David; Scheuermann, Richard H

    2010-01-01

    Clinical researchers need to share data to support scientific validation and information reuse and to comply with a host of regulations and directives from funders. Various organizations are constructing informatics resources in the form of centralized databases to ensure reuse of data derived from sponsored research. The widespread use of such open databases is contingent on the protection of patient privacy. We review privacy-related problems associated with data sharing for clinical research from technical and policy perspectives. We investigate existing policies for secondary data sharing and privacy requirements in the context of data derived from research and clinical settings. In particular, we focus on policies specified by the US National Institutes of Health and the Health Insurance Portability and Accountability Act and touch on how these policies are related to current and future use of data stored in public database archives. We address aspects of data privacy and identifiability from a technical, although approachable, perspective and summarize how biomedical databanks can be exploited and seemingly anonymous records can be reidentified using various resources without hacking into secure computer systems. We highlight which clinical and translational data features, specified in emerging research models, are potentially vulnerable or exploitable. In the process, we recount a recent privacy-related concern associated with the publication of aggregate statistics from pooled genome-wide association studies that have had a significant impact on the data sharing policies of National Institutes of Health-sponsored databanks. Based on our analysis and observations we provide a list of recommendations that cover various technical, legal, and policy mechanisms that open clinical databases can adopt to strengthen data privacy protection as they move toward wider deployment and adoption.

  3. Toward a Tiered Model to Share Clinical Trial Data and Samples in Precision Oncology.

    Science.gov (United States)

    Broes, Stefanie; Lacombe, Denis; Verlinden, Michiel; Huys, Isabelle

    2018-01-01

    The recent revolution in science and technology applied to medical research has left in its wake a trial of biomedical data and human samples; however, its opportunities remain largely unfulfilled due to a number of legal, ethical, financial, strategic, and technical barriers. Precision oncology has been at the vanguard to leverage this potential of "Big data" and samples into meaningful solutions for patients, considering the need for new drug development approaches in this area (due to high costs, late-stage failures, and the molecular diversity of cancer). To harness the potential of the vast quantities of data and samples currently fragmented across databases and biobanks, it is critical to engage all stakeholders and share data and samples across research institutes. Here, we identified two general types of sharing strategies. First, open access models, characterized by the absence of any review panel or decision maker, and second controlled access model where some form of control is exercised by either the donor (i.e., patient), the data provider (i.e., initial organization), or an independent party. Further, we theoretically describe and provide examples of nine different strategies focused on greater sharing of patient data and material. These models provide varying levels of control, access to various data and/or samples, and different types of relationship between the donor, data provider, and data requester. We propose a tiered model to share clinical data and samples that takes into account privacy issues and respects sponsors' legitimate interests. Its implementation would contribute to maximize the value of existing datasets, enabling unraveling the complexity of tumor biology, identify novel biomarkers, and re-direct treatment strategies better, ultimately to help patients with cancer.

  4. Toward a Tiered Model to Share Clinical Trial Data and Samples in Precision Oncology

    Directory of Open Access Journals (Sweden)

    Stefanie Broes

    2018-01-01

    Full Text Available The recent revolution in science and technology applied to medical research has left in its wake a trial of biomedical data and human samples; however, its opportunities remain largely unfulfilled due to a number of legal, ethical, financial, strategic, and technical barriers. Precision oncology has been at the vanguard to leverage this potential of “Big data” and samples into meaningful solutions for patients, considering the need for new drug development approaches in this area (due to high costs, late-stage failures, and the molecular diversity of cancer. To harness the potential of the vast quantities of data and samples currently fragmented across databases and biobanks, it is critical to engage all stakeholders and share data and samples across research institutes. Here, we identified two general types of sharing strategies. First, open access models, characterized by the absence of any review panel or decision maker, and second controlled access model where some form of control is exercised by either the donor (i.e., patient, the data provider (i.e., initial organization, or an independent party. Further, we theoretically describe and provide examples of nine different strategies focused on greater sharing of patient data and material. These models provide varying levels of control, access to various data and/or samples, and different types of relationship between the donor, data provider, and data requester. We propose a tiered model to share clinical data and samples that takes into account privacy issues and respects sponsors’ legitimate interests. Its implementation would contribute to maximize the value of existing datasets, enabling unraveling the complexity of tumor biology, identify novel biomarkers, and re-direct treatment strategies better, ultimately to help patients with cancer.

  5. Perspectives on Open Science and scientific data sharing:an interdisciplinary workshop.

    Science.gov (United States)

    Destro Bisol, Giovanni; Anagnostou, Paolo; Capocasa, Marco; Bencivelli, Silvia; Cerroni, Andrea; Contreras, Jorge; Enke, Neela; Fantini, Bernardino; Greco, Pietro; Heeney, Catherine; Luzi, Daniela; Manghi, Paolo; Mascalzoni, Deborah; Molloy, Jennifer; Parenti, Fabio; Wicherts, Jelte; Boulton, Geoffrey

    2014-01-01

    Looking at Open Science and Open Data from a broad perspective. This is the idea behind "Scientific data sharing: an interdisciplinary workshop", an initiative designed to foster dialogue between scholars from different scientific domains which was organized by the Istituto Italiano di Antropologia in Anagni, Italy, 2-4 September 2013.We here report summaries of the presentations and discussions at the meeting. They deal with four sets of issues: (i) setting a common framework, a general discussion of open data principles, values and opportunities; (ii) insights into scientific practices, a view of the way in which the open data movement is developing in a variety of scientific domains (biology, psychology, epidemiology and archaeology); (iii) a case study of human genomics, which was a trail-blazer in data sharing, and which encapsulates the tension that can occur between large-scale data sharing and one of the boundaries of openness, the protection of individual data; (iv) open science and the public, based on a round table discussion about the public communication of science and the societal implications of open science. There were three proposals for the planning of further interdisciplinary initiatives on open science. Firstly, there is a need to integrate top-down initiatives by governments, institutions and journals with bottom-up approaches from the scientific community. Secondly, more should be done to popularize the societal benefits of open science, not only in providing the evidence needed by citizens to draw their own conclusions on scientific issues that are of concern to them, but also explaining the direct benefits of data sharing in areas such as the control of infectious disease. Finally, introducing arguments from social sciences and humanities in the educational dissemination of open data may help students become more profoundly engaged with Open Science and look at science from a broader perspective.

  6. “Personas” to Support Development of Cyberinfrastructure for Scientific Data Sharing

    Directory of Open Access Journals (Sweden)

    Kevin Crowston

    2015-11-01

    Full Text Available Objective: To ensure that cyberinfrastructure for sharing scientific data is useful, system developers need to understand what scientists and other intended users do as well as the attitudes and beliefs that shape their behaviours. This paper introduces personas — detailed descriptions of an “archetypical user of a system” — as an approach for capturing and sharing knowledge about potential system users. Setting: Personas were developed to support development of the ‘DataONE’ (Data Observation Network for Earth project, which has developed and deployed a sustainable long-term data preservation and access network to ensure the preservation and access to multi-scale, multi-discipline, and multi-national environmental and biological science data (https://www.dataone.org/what-dataone (Michener et al. 2012. Methods: Personas for DataONE were developed based on data from surveys and interviews done by members of DataONE working groups along with sources such as usage scenarios for DataONE and the Data Conservancy project and the Purdue Data Curation Profiles (Witt et al. 2009. Results: A total of 11 personas were developed: five for various kinds of research scientists (e.g., at different career stages and using different types of data; a science data librarian; and five for secondary roles. Conclusion: Personas were found to be useful for helping developers and other project members to understand users and their needs. The developed DataONE personas may be useful for others trying to develop systems or programs for scientists involved in data sharing.

  7. Towards open sharing of task-based fMRI data: The OpenfMRI project

    Directory of Open Access Journals (Sweden)

    Russell A Poldrack

    2013-07-01

    Full Text Available The large-scale sharing of task-based functional neuroimaging data has the potential to allow novel insights into the organization of mental function in the brain, but the field of neuroimaging has lagged behind other areas of bioscience in the development of data sharing resources. This paper describes the OpenFMRI project (accessible online at http://www.openfmri.org, which aims to provide the neuroimaging community with a resource to support open sharing of task-based fMRI studies. We describe the motivation behind the project, focusing particularly on how this project addresses some of the well-known challenges to sharing of task-based fMRI data. Results from a preliminary analysis of the current database are presented, which demonstrate the ability to classify between task contrasts with high generalization accuracy across subjects, and the ability to identify individual subjects from their activation maps with moderately high accuracy. Clustering analyses show that the similarity relations between statistical maps have a somewhat orderly relation to the mental functions engaged by the relevant tasks. These results highlight the potential of the project to support large-scale multivariate analyses of the relation between mental processes and brain function.

  8. Sharing product data of nuclear power plants across their lifecycles by utilizing a neutral model

    Energy Technology Data Exchange (ETDEWEB)

    Mun, Duhwan [WIG Craft Research Division, Maritime and Ocean Engineering Research Institute, KORDI, 171 Jang-dong, Yuseong-gu, Daejeon 305-343 (Korea, Republic of)], E-mail: dhmun@moeri.re.kr; Hwang, Jinsang [Department of Mechanical Engineering, KAIST (Korea, Republic of)], E-mail: mars@icad.kaist.ac.kr; Han, Soonhung [Department of Mechanical Engineering, KAIST (Korea, Republic of)], E-mail: shhan@kaist.ac.kr; Seki, Hiroshi [Hitachi Research Laboratory, Hitachi, Ltd. (Japan)], E-mail: hiroshi.seki.mf@hitachi.com; Yang, Jeongsam [Industrial and Information Systems Engineering, Ajou University (Korea, Republic of)], E-mail: jyang@ajou.ac.kr

    2008-02-15

    Many public and private Korean organizations are involved during the lifecycle of a domestic nuclear power plant. Korea Plant Engineering Co. (KOPEC) participates in the design stage, Korea Hydraulic and Nuclear Power (KHNP) operates and manages all nuclear power plants in Korea, Doosan Heavy Industry and Construction Co. manufactures the main equipment, and a construction company constructs the plant. Even though each organization has its own digital data management system and obtains a certain level of automation, data sharing among organizations is poor. KHNP obtains drawings and technical specifications from KOPEC in the form of paper. This results in manual re-work of definitions, and errors can potentially occur in the process. In order to establish an information bridge between design and operation and maintenance (O and M) phases, a generic product model (GPM), a data model from Hitachi, is extended for constructing a neutral data warehouse and the Korean Nuclear Power Plant Information Sharing System (KNPISS) is implemented.

  9. Sharing product data of nuclear power plants across their lifecycles by utilizing a neutral model

    International Nuclear Information System (INIS)

    Mun, Duhwan; Hwang, Jinsang; Han, Soonhung; Seki, Hiroshi; Yang, Jeongsam

    2008-01-01

    Many public and private Korean organizations are involved during the lifecycle of a domestic nuclear power plant. Korea Plant Engineering Co. (KOPEC) participates in the design stage, Korea Hydraulic and Nuclear Power (KHNP) operates and manages all nuclear power plants in Korea, Doosan Heavy Industry and Construction Co. manufactures the main equipment, and a construction company constructs the plant. Even though each organization has its own digital data management system and obtains a certain level of automation, data sharing among organizations is poor. KHNP obtains drawings and technical specifications from KOPEC in the form of paper. This results in manual re-work of definitions, and errors can potentially occur in the process. In order to establish an information bridge between design and operation and maintenance (O and M) phases, a generic product model (GPM), a data model from Hitachi, is extended for constructing a neutral data warehouse and the Korean Nuclear Power Plant Information Sharing System (KNPISS) is implemented

  10. Ethics policies and ethics work in cross-national genetic research and data sharing

    DEFF Research Database (Denmark)

    Hoeyer, Klaus; Tupasela, Aaro; Rasmussen, Malene B.

    2017-01-01

    of scientific work. This paper takes its point of departure in the practices of a Danish laboratory with great experience in international collaboration regarding genetic research. We focus on a simple query, what makes genetic material and health data flow, and which hopes and concerns travel along with them......In recent years, cross-national collaboration in medical research has gained increased policy attention. Policies are developed to enhance data sharing, ensure open-access, and harmonize international standards and ethics rules in order to promote access to existing resources and increase...... scientific output. In tandem with this promotion of data sharing, numerous ethics policies are developed to control data flows and protect privacy and confidentiality. Both sets of policy making, however, pay limited attention to the moral decisions and social ties enacted in the everyday routines...

  11. Installation Mapping Enables Many Missions: The Benefits of and Barriers to Sharing Geospatial Data Assets

    Science.gov (United States)

    2007-01-01

    Sonoran Institute, and Instituto del Medio Ambiente y el Desarrollo Sustentable del Estado de Sonora with support from Department of Defense... Mexico , has an Environmental Data Management System (EDMS) web site for managing and mapping shared site data for environmen- tal cleanup of UXO and...San Pedro River Basin in southeastern Arizona and northern Mexico . Many believe that the presence of large-scale groundwater pumping in the nearby

  12. Factors affecting willingness to share electronic health data among California consumers.

    Science.gov (United States)

    Kim, Katherine K; Sankar, Pamela; Wilson, Machelle D; Haynes, Sarah C

    2017-04-04

    Robust technology infrastructure is needed to enable learning health care systems to improve quality, access, and cost. Such infrastructure relies on the trust and confidence of individuals to share their health data for healthcare and research. Few studies have addressed consumers' views on electronic data sharing and fewer still have explored the dual purposes of healthcare and research together. The objective of the study is to explore factors that affect consumers' willingness to share electronic health information for healthcare and research. This study involved a random-digit dial telephone survey of 800 adult Californians conducted in English and Spanish. Logistic regression was performed using backward selection to test for significant (p-value ≤ 0.05) associations of each explanatory variable with the outcome variable. The odds of consent for electronic data sharing for healthcare decreased as Likert scale ratings for EHR impact on privacy worsened, odds ratio (OR) = 0.74, 95% CI [0.60, 0.90]; security, OR = 0.80, 95% CI [0.66, 0.98]; and quality, OR = 0.59, 95% CI [0.46-0.75]. The odds of consent for sharing for research was greater for those who think EHR will improve research quality, OR = 11.26, 95% CI [4.13, 30.73]; those who value research benefit over privacy OR = 2.72, 95% CI [1.55, 4.78]; and those who value control over research benefit OR = 0.49, 95% CI [0.26, 0.94]. Consumers' choices about electronically sharing health information are affected by their attitudes toward EHRs as well as beliefs about research benefit and individual control. Design of person-centered interventions utilizing electronically collected health information, and policies regarding data sharing should address these values of importance to people. Understanding of these perspectives is critical for leveraging health data to support learning health care systems.

  13. Data governance and data sharing agreements for community-wide health information exchange: lessons from the beacon communities.

    Science.gov (United States)

    Allen, Claudia; Des Jardins, Terrisca R; Heider, Arvela; Lyman, Kristin A; McWilliams, Lee; Rein, Alison L; Schachter, Abigail A; Singh, Ranjit; Sorondo, Barbara; Topper, Joan; Turske, Scott A

    2014-01-01

    Unprecedented efforts are underway across the United States to electronically capture and exchange health information to improve health care and population health, and reduce costs. This increased collection and sharing of electronic patient data raises several governance issues, including privacy, security, liability, and market competition. Those engaged in such efforts have had to develop data sharing agreements (DSAs) among entities involved in information exchange, many of whom are "nontraditional" health care entities and/or new partners. This paper shares lessons learned based on the experiences of six federally funded communities participating in the Beacon Community Cooperative Agreement Program, and offers guidance for navigating data governance issues and developing DSAs to facilitate community-wide health information exchange. While all entities involved in electronic data sharing must address governance issues and create DSAs accordingly, until recently little formal guidance existed for doing so - particularly for community-based initiatives. Despite this lack of guidance, together the Beacon Communities' experiences highlight promising strategies for navigating complex governance issues, which may be useful to other entities or communities initiating information exchange efforts to support delivery system transformation. For the past three years, AcademyHealth has provided technical assistance to most of the 17 Beacon Communities, 6 of whom contributed to this collaborative writing effort. Though these communities varied widely in terms of their demographics, resources, and Beacon-driven priorities, common themes emerged as they described their approaches to data governance and DSA development. The 6 Beacon Communities confirmed that DSAs are necessary to satisfy legal and market-based concerns, and they identified several specific issues, many of which have been noted by others involved in network data sharing initiatives. More importantly, these

  14. DataUp 2.0: Improving On a Tool For Helping Researchers Archive, Manage, and Share Their Tabular Data

    Science.gov (United States)

    Strasser, C.; Borda, S.; Cruse, P.; Kunze, J.

    2013-12-01

    There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are a lack of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. Last year we developed an open source web application, DataUp, to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. With funding from the NSF via a supplemental grant to the DataONE project, we are working to improve upon DataUp. Our main goal for DataUp 2.0 is to ensure organizations and repositories are able to adopt and adapt DataUp to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between the California Digital Library, DataONE, the San Diego Supercomputing Center, and Microsoft Research Connections.

  15. A secure data outsourcing scheme based on Asmuth-Bloom secret sharing

    Science.gov (United States)

    Idris Muhammad, Yusuf; Kaiiali, Mustafa; Habbal, Adib; Wazan, A. S.; Sani Ilyasu, Auwal

    2016-11-01

    Data outsourcing is an emerging paradigm for data management in which a database is provided as a service by third-party service providers. One of the major benefits of offering database as a service is to provide organisations, which are unable to purchase expensive hardware and software to host their databases, with efficient data storage accessible online at a cheap rate. Despite that, several issues of data confidentiality, integrity, availability and efficient indexing of users' queries at the server side have to be addressed in the data outsourcing paradigm. Service providers have to guarantee that their clients' data are secured against internal (insider) and external attacks. This paper briefly analyses the existing indexing schemes in data outsourcing and highlights their advantages and disadvantages. Then, this paper proposes a secure data outsourcing scheme based on Asmuth-Bloom secret sharing which tries to address the issues in data outsourcing such as data confidentiality, availability and order preservation for efficient indexing.

  16. Rescuing and Sharing Historical Vegetation Data for Ecological Analysis: The California Vegetation Type Mapping Project

    Directory of Open Access Journals (Sweden)

    Maggi Kelly

    2016-10-01

    Full Text Available Research efforts that synthesize historical and contemporary ecological data with modeling approaches improve our understanding of the complex response of species, communities, and landscapes to changing biophysical conditions through time and in space. Historical ecological data are particularly important in this respect. There are remaining barriers that limit such data synthesis, and technological improvements that make multiple diverse datasets more readily available for integration and synthesis are needed. This paper presents one case study of the Wieslander Vegetation Type Mapping project in California and highlights the importance of rescuing, digitizing and sharing historical datasets. We review the varied ecological uses of the historical collection: the vegetation maps have been used to understand legacies of land use change and plan for the future; the plot data have been used to examine changes to chaparral and forest communities around the state and to predict community structure and shifts under a changing climate; the photographs have been used to understand changing vegetation structure; and the voucher specimens in combination with other specimen collections have been used for large scale distribution modeling efforts. The digitization and sharing of the data via the web has broadened the scope and scale of the types of analysis performed. Yet, additional research avenues can be pursued using multiple types of VTM data, and by linking VTM data with contemporary data. The digital VTM collection is an example of a data infrastructure that expands the potential of large scale research through the integration and synthesis of data drawn from numerous data sources; its journey from analog to digital is a cautionary tale of the importance of finding historical data, digitizing it with best practices, linking it with other datasets, and sharing it with the research community.

  17. Scaling to diversity: The DERECHOS distributed infrastructure for analyzing and sharing data

    Science.gov (United States)

    Rilee, M. L.; Kuo, K. S.; Clune, T.; Oloso, A.; Brown, P. G.

    2016-12-01

    Integrating Earth Science data from diverse sources such as satellite imagery and simulation output can be expensive and time-consuming, limiting scientific inquiry and the quality of our analyses. Reducing these costs will improve innovation and quality in science. The current Earth Science data infrastructure focuses on downloading data based on requests formed from the search and analysis of associated metadata. And while the data products provided by archives may use the best available data sharing technologies, scientist end-users generally do not have such resources (including staff) available to them. Furthermore, only once an end-user has received the data from multiple diverse sources and has integrated them can the actual analysis and synthesis begin. The cost of getting from idea to where synthesis can start dramatically slows progress. In this presentation we discuss a distributed computational and data storage framework that eliminates much of the aforementioned cost. The SciDB distributed array database is central as it is optimized for scientific computing involving very large arrays, performing better than less specialized frameworks like Spark. Adding spatiotemporal functions to the SciDB creates a powerful platform for analyzing and integrating massive, distributed datasets. SciDB allows Big Earth Data analysis to be performed "in place" without the need for expensive downloads and end-user resources. Spatiotemporal indexing technologies such as the hierarchical triangular mesh enable the compute and storage affinity needed to efficiently perform co-located and conditional analyses minimizing data transfers. These technologies automate the integration of diverse data sources using the framework, a critical step beyond current metadata search and analysis. Instead of downloading data into their idiosyncratic local environments, end-users can generate and share data products integrated from diverse multiple sources using a common shared environment

  18. New tools for Content Innovation and data sharing: Enhancing reproducibility and rigor in biomechanics research.

    Science.gov (United States)

    Guilak, Farshid

    2017-03-21

    We are currently in one of the most exciting times for science and engineering as we witness unprecedented growth in our computational and experimental capabilities to generate new data and models. To facilitate data and model sharing, and to enhance reproducibility and rigor in biomechanics research, the Journal of Biomechanics has introduced a number of tools for Content Innovation to allow presentation, sharing, and archiving of methods, models, and data in our articles. The tools include an Interactive Plot Viewer, 3D Geometric Shape and Model Viewer, Virtual Microscope, Interactive MATLAB Figure Viewer, and Audioslides. Authors are highly encouraged to make use of these in upcoming journal submissions. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. A model-driven privacy compliance decision support for medical data sharing in Europe.

    Science.gov (United States)

    Boussi Rahmouni, H; Solomonides, T; Casassa Mont, M; Shiu, S; Rahmouni, M

    2011-01-01

    Clinical practitioners and medical researchers often have to share health data with other colleagues across Europe. Privacy compliance in this context is very important but challenging. Automated privacy guidelines are a practical way of increasing users' awareness of privacy obligations and help eliminating unintentional breaches of privacy. In this paper we present an ontology-plus-rules based approach to privacy decision support for the sharing of patient data across European platforms. We use ontologies to model the required domain and context information about data sharing and privacy requirements. In addition, we use a set of Semantic Web Rule Language rules to reason about legal privacy requirements that are applicable to a specific context of data disclosure. We make the complete set invocable through the use of a semantic web application acting as an interactive privacy guideline system can then invoke the full model in order to provide decision support. When asked, the system will generate privacy reports applicable to a specific case of data disclosure described by the user. Also reports showing guidelines per Member State may be obtained. The advantage of this approach lies in the expressiveness and extensibility of the modelling and inference languages adopted and the ability they confer to reason with complex requirements interpreted from high level regulations. However, the system cannot at this stage fully simulate the role of an ethics committee or review board.

  20. Open Data Strategies and Experiences to Improve Sharing and Publication of Public Sector Information

    Directory of Open Access Journals (Sweden)

    Laura María Gutiérrez Medina

    2014-11-01

    Full Text Available The Canary Islands receive 10 million tourists every year. Tourism represents a key sector for economic development in the Canaries. This work presents the benefits of open data usages in the tourism sector both in municipalities and in the island government. These public institutions have valuable information that should be shared with other institutions: 600 hotels and apartments, 10,000 bars and restaurants, and more than 15,000 retail businesses. This article describes an open data project to validate and to publish such data across multiple administrations. The main benefits for the public sector are the improvement of the data quality and the interoperability between different administrations.

  1. GIFT-Cloud: A data sharing and collaboration platform for medical imaging research.

    Science.gov (United States)

    Doel, Tom; Shakir, Dzhoshkun I; Pratt, Rosalind; Aertsen, Michael; Moggridge, James; Bellon, Erwin; David, Anna L; Deprest, Jan; Vercauteren, Tom; Ourselin, Sébastien

    2017-02-01

    Clinical imaging data are essential for developing research software for computer-aided diagnosis, treatment planning and image-guided surgery, yet existing systems are poorly suited for data sharing between healthcare and academia: research systems rarely provide an integrated approach for data exchange with clinicians; hospital systems are focused towards clinical patient care with limited access for external researchers; and safe haven environments are not well suited to algorithm development. We have established GIFT-Cloud, a data and medical image sharing platform, to meet the needs of GIFT-Surg, an international research collaboration that is developing novel imaging methods for fetal surgery. GIFT-Cloud also has general applicability to other areas of imaging research. GIFT-Cloud builds upon well-established cross-platform technologies. The Server provides secure anonymised data storage, direct web-based data access and a REST API for integrating external software. The Uploader provides automated on-site anonymisation, encryption and data upload. Gateways provide a seamless process for uploading medical data from clinical systems to the research server. GIFT-Cloud has been implemented in a multi-centre study for fetal medicine research. We present a case study of placental segmentation for pre-operative surgical planning, showing how GIFT-Cloud underpins the research and integrates with the clinical workflow. GIFT-Cloud simplifies the transfer of imaging data from clinical to research institutions, facilitating the development and validation of medical research software and the sharing of results back to the clinical partners. GIFT-Cloud supports collaboration between multiple healthcare and research institutions while satisfying the demands of patient confidentiality, data security and data ownership. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  2. Lessons learned while building the Deepwater Horizon Database: Toward improved data sharing in coastal science

    Science.gov (United States)

    Thessen, Anne E.; McGinnis, Sean; North, Elizabeth W.

    2016-02-01

    Process studies and coupled-model validation efforts in geosciences often require integration of multiple data types across time and space. For example, improved prediction of hydrocarbon fate and transport is an important societal need which fundamentally relies upon synthesis of oceanography and hydrocarbon chemistry. Yet, there are no publically accessible databases which integrate these diverse data types in a georeferenced format, nor are there guidelines for developing such a database. The objective of this research was to analyze the process of building one such database to provide baseline information on data sources and data sharing and to document the challenges and solutions that arose during this major undertaking. The resulting Deepwater Horizon Database was approximately 2.4 GB in size and contained over 8 million georeferenced data points collected from industry, government databases, volunteer networks, and individual researchers. The major technical challenges that were overcome were reconciliation of terms, units, and quality flags which were necessary to effectively integrate the disparate data sets. Assembling this database required the development of relationships with individual researchers and data managers which often involved extensive e-mail contacts. The average number of emails exchanged per data set was 7.8. Of the 95 relevant data sets that were discovered, 38 (40%) were obtained, either in whole or in part. Over one third (36%) of the requests for data went unanswered. The majority of responses were received after the first request (64%) and within the first week of the first request (67%). Although fewer than half of the potentially relevant datasets were incorporated into the database, the level of sharing (40%) was high compared to some other disciplines where sharing can be as low as 10%. Our suggestions for building integrated databases include budgeting significant time for e-mail exchanges, being cognizant of the cost versus

  3. A distributed authentication and authorization scheme for in-network big data sharing

    Directory of Open Access Journals (Sweden)

    Ruidong Li

    2017-11-01

    Full Text Available Big data has a strong demand for a network infrastructure with the capability to support data sharing and retrieval efficiently. Information-centric networking (ICN is an emerging approach to satisfy this demand, where big data is cached ubiquitously in the network and retrieved using data names. However, existing authentication and authorization schemes rely mostly on centralized servers to provide certification and mediation services for data retrieval. This causes considerable traffic overhead for the secure distributed sharing of data. To solve this problem, we employ identity-based cryptography (IBC to propose a Distributed Authentication and Authorization Scheme (DAAS, where an identity-based signature (IBS is used to achieve distributed verifications of the identities of publishers and users. Moreover, Ciphertext-Policy Attribute-based encryption (CP-ABE is used to enable the distributed and fine-grained authorization. DAAS consists of three phases: initialization, secure data publication, and secure data retrieval, which seamlessly integrate authentication and authorization with the interest/data communication paradigm in ICN. In particular, we propose trustworthy registration and Network Operator and Authority Manifest (NOAM dissemination to provide initial secure registration and enable efficient authentication for global data retrieval. Meanwhile, Attribute Manifest (AM distribution coupled with automatic attribute update is proposed to reduce the cost of attribute retrieval. We examine the performance of the proposed DAAS, which shows that it can achieve a lower bandwidth cost than existing schemes.

  4. The Computable Catchment: An executable document for model-data software sharing, reproducibility and interactive visualization

    Science.gov (United States)

    Gil, Y.; Duffy, C.

    2015-12-01

    This paper proposes the concept of a "Computable Catchment" which is used to develop a collaborative platform for watershed modeling and data analysis. The object of the research is a sharable, executable document similar to a pdf, but one that includes documentation of the underlying theoretical concepts, interactive computational/numerical resources, linkage to essential data repositories and the ability for interactive model-data visualization and analysis. The executable document for each catchment is stored in the cloud with automatic provisioning and a unique identifier allowing collaborative model and data enhancements for historical hydroclimatic reconstruction and/or future landuse or climate change scenarios to be easily reconstructed or extended. The Computable Catchment adopts metadata standards for naming all variables in the model and the data. The a-priori or initial data is derived from national data sources for soils, hydrogeology, climate, and land cover available from the www.hydroterre.psu.edu data service (Leonard and Duffy, 2015). The executable document is based on Wolfram CDF or Computable Document Format with an interactive open-source reader accessible by any modern computing platform. The CDF file and contents can be uploaded to a website or simply shared as a normal document maintaining all interactive features of the model and data. The Computable Catchment concept represents one application for Geoscience Papers of the Future representing an extensible document that combines theory, models, data and analysis that are digitally shared, documented and reused among research collaborators, students, educators and decision makers.

  5. Using a Computational Study of Hydrodynamics in the Wax Lake Delta to Examine Data Sharing Principles

    Directory of Open Access Journals (Sweden)

    Qian Zhang

    2017-01-01

    Full Text Available In this paper we describe a complex dataset used to study the circulation and wind-driven flows in the Wax Lake Delta, Louisiana, USA under winter storm conditions. The whole package bundles a large dataset (approximately 74 GB, which includes the numerical model, software and scripts for data analysis and visualization, as well as detailed documentation. The raw data came from multiple external sources, including government agencies, community repositories, and deployed field instruments and surveys. Each raw dataset goes through the processes of data QA/QC, data analysis, visualization, and interpretation. After integrating multiple datasets, new data products are obtained which are then used with the numerical model. The numerical model undergoes model verification, testing, calibration, and optimization. With a complex algorithm of computation, the model generates a structured output dataset, which is, after post-data analysis, presented as informative scientific figures and tables that allow interpretations and conclusions contributing to the science of coastal physical oceanography. Performing this study required a tremendous amount of effort. While the work resulted in traditional dissemination via a thesis, journal articles and conference proceedings, more can be gained. The data can be reused to study reproducibility or as preliminary investigation to explore a new topic. With thorough documentation and well-organized data, both the input and output dataset should be ready for sharing in a domain or institutional repository. Furthermore, the data organization and documentation also serves as a guideline for future research data management and the development of workflow protocols. Here we will describe the dataset created by this study, how sharing the dataset publicly could enable validation of the current study and extension by new studies, and the challenges that arise prior to sharing the dataset.

  6. Understanding Spatiotemporal Patterns of Biking Behavior by Analyzing Massive Bike Sharing Data in Chicago.

    Directory of Open Access Journals (Sweden)

    Xiaolu Zhou

    Full Text Available The growing number of bike sharing systems (BSS in many cities largely facilitates biking for transportation and recreation. Most recent bike sharing systems produce time and location specific data, which enables the study of travel behavior and mobility of each individual. However, despite a rapid growth of interest, studies on massive bike sharing data and the underneath travel pattern are still limited. Few studies have explored and visualized spatiotemporal patterns of bike sharing behavior using flow clustering, nor examined the station functional profiles based on over-demand patterns. This study investigated the spatiotemporal biking pattern in Chicago by analyzing massive BSS data from July to December in 2013 and 2014. The BSS in Chicago gained more popularity. About 15.9% more people subscribed to this service. Specifically, we constructed bike flow similarity graph and used fastgreedy algorithm to detect spatial communities of biking flows. By using the proposed methods, we discovered unique travel patterns on weekdays and weekends as well as different travel trends for customers and subscribers from the noisy massive amount data. In addition, we also examined the temporal demands for bikes and docks using hierarchical clustering method. Results demonstrated the modeled over-demand patterns in Chicago. This study contributes to offer better knowledge of biking flow patterns, which was difficult to obtain using traditional methods. Given the trend of increasing popularity of the BSS and data openness in different cities, methods used in this study can extend to examine the biking patterns and BSS functionality in different cities.

  7. Data and Models as Social Objects in the HydroShare System for Collaboration in the Hydrology Community and Beyond

    Science.gov (United States)

    Tarboton, D. G.; Idaszak, R.; Horsburgh, J. S.; Ames, D. P.; Goodall, J. L.; Band, L. E.; Merwade, V.; Couch, A.; Hooper, R. P.; Maidment, D. R.; Dash, P. K.; Stealey, M.; Yi, H.; Gan, T.; Castronova, A. M.; Miles, B.; Li, Z.; Morsy, M. M.; Crawley, S.; Ramirez, M.; Sadler, J.; Xue, Z.; Bandaragoda, C.

    2016-12-01

    How do you share and publish hydrologic data and models for a large collaborative project? HydroShare is a new, web-based system for sharing hydrologic data and models with specific functionality aimed at making collaboration easier. HydroShare has been developed with U.S. National Science Foundation support under the auspices of the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) to support the collaboration and community cyberinfrastructure needs of the hydrology research community. Within HydroShare, we have developed new functionality for creating datasets, describing them with metadata, and sharing them with collaborators. We cast hydrologic datasets and models as "social objects" that can be shared, collaborated around, annotated, published and discovered. In addition to data and model sharing, HydroShare supports web application programs (apps) that can act on data stored in HydroShare, just as software programs on your PC act on your data locally. This can free you from some of the limitations of local computing capacity and challenges in installing and maintaining software on your own PC. HydroShare's web-based cyberinfrastructure can take work off your desk or laptop computer and onto infrastructure or "cloud" based data and processing servers. This presentation will describe HydroShare's collaboration functionality that enables both public and private sharing with individual users and collaborative user groups, and makes it easier for collaborators to iterate on shared datasets and models, creating multiple versions along the way, and publishing them with a permanent landing page, metadata description, and citable Digital Object Identifier (DOI) when the work is complete. This presentation will also describe the web app architecture that supports interoperability with third party servers functioning as application engines for analysis and processing of big hydrologic datasets. While developed to support the

  8. Data Sharing in the Pharmaceutical Enterprise: The Genie's Out of the Bottle.

    Science.gov (United States)

    Beninger, Paul; Connelly, James; Natarajan, Chandrasekhar

    2017-09-01

    This Commentary shows that the present emphasis on the sharing of data from clinical trials can be extended to the entire pharmaceutical enterprise. The authors constructed a Data Sharing Dashboard that shows the relationship between all of the life-cycle domains of the pharmaceutical enterprise from discovery to obsolescence and the domain-bridging disciplines, such as target credentialing, structure-activity relationships, and exposure-effect relationships. The published literature encompassing the pharmaceutical enterprise is expansive, covering the major domains of discovery, translation, clinical development, and post-marketing outcomes research, all of which have even larger, though generally inaccessible, troves of legacy data bases. Notable exceptions include the fields of genomics and bioinformatics. We have the opportunity to broaden the present momentum of interest in data sharing to the entire pharmaceutical enterprise, beginning with discovery and extending into health technology assessment and post-patent expiry generic use with the plan of integrating new levels and disciplines of knowledge and with the ultimate goal of improving the care of our patients. Copyright © 2017 Elsevier HS Journals, Inc. All rights reserved.

  9. The European Union's Adequacy Approach to Privacy and International Data Sharing in Health Research.

    Science.gov (United States)

    Stoddart, Jennifer; Chan, Benny; Joly, Yann

    2016-03-01

    The European Union (EU) approach to data protection consists of assessing the adequacy of the data protection offered by the laws of a particular jurisdiction against a set of principles that includes purpose limitation, transparency, quality, proportionality, security, access, and rectification. The EU's Data Protection Directive sets conditions on the transfer of data to third countries by prohibiting Member States from transferring to such countries as have been deemed inadequate in terms of the data protection regimes. In theory, each jurisdiction is evaluated similarly and must be found fully compliant with the EU's data protection principles to be considered adequate. In practice, the inconsistency with which these evaluations are made presents a hurdle to international data-sharing and makes difficult the integration of different data-sharing approaches; in the 20 years since the Directive was first adopted, the laws of only five countries from outside of the EU, Economic Area, or the European Free Trade Agreement have been deemed adequate to engage in data transfers without the need for further administrative safeguards. © 2016 American Society of Law, Medicine & Ethics.

  10. Adapting federated cyberinfrastructure for shared data collection facilities in structural biology

    International Nuclear Information System (INIS)

    Stokes-Rees, Ian; Levesque, Ian; Murphy, Frank V. IV; Yang, Wei; Deacon, Ashley; Sliz, Piotr

    2012-01-01

    It has been difficult, historically, to manage and maintain early-stage experimental data collected by structural biologists in synchrotron facilities. This work describes a prototype system that adapts existing federated cyberinfrastructure technology and techniques to manage collected data at synchrotrons and to facilitate the efficient and secure transfer of data to the owner's home institution. Early stage experimental data in structural biology is generally unmaintained and inaccessible to the public. It is increasingly believed that this data, which forms the basis for each macromolecular structure discovered by this field, must be archived and, in due course, published. Furthermore, the widespread use of shared scientific facilities such as synchrotron beamlines complicates the issue of data storage, access and movement, as does the increase of remote users. This work describes a prototype system that adapts existing federated cyberinfrastructure technology and techniques to significantly improve the operational environment for users and administrators of synchrotron data collection facilities used in structural biology. This is achieved through software from the Virtual Data Toolkit and Globus, bringing together federated users and facilities from the Stanford Synchrotron Radiation Lightsource, the Advanced Photon Source, the Open Science Grid, the SBGrid Consortium and Harvard Medical School. The performance and experience with the prototype provide a model for data management at shared scientific facilities

  11. Adapting federated cyberinfrastructure for shared data collection facilities in structural biology

    Energy Technology Data Exchange (ETDEWEB)

    Stokes-Rees, Ian [Harvard Medical School, Boston, MA 02115 (United States); Levesque, Ian [Harvard Medical School, Boston, MA 02115 (United States); Harvard Medical School, Boston, MA 02115 (United States); Murphy, Frank V. IV [Argonne National Laboratory, Argonne, IL 60439 (United States); Yang, Wei; Deacon, Ashley [Stanford University, Menlo Park, CA 94025 (United States); Sliz, Piotr, E-mail: piotr-sliz@hms.harvard.edu [Harvard Medical School, Boston, MA 02115 (United States)

    2012-05-01

    It has been difficult, historically, to manage and maintain early-stage experimental data collected by structural biologists in synchrotron facilities. This work describes a prototype system that adapts existing federated cyberinfrastructure technology and techniques to manage collected data at synchrotrons and to facilitate the efficient and secure transfer of data to the owner's home institution. Early stage experimental data in structural biology is generally unmaintained and inaccessible to the public. It is increasingly believed that this data, which forms the basis for each macromolecular structure discovered by this field, must be archived and, in due course, published. Furthermore, the widespread use of shared scientific facilities such as synchrotron beamlines complicates the issue of data storage, access and movement, as does the increase of remote users. This work describes a prototype system that adapts existing federated cyberinfrastructure technology and techniques to significantly improve the operational environment for users and administrators of synchrotron data collection facilities used in structural biology. This is achieved through software from the Virtual Data Toolkit and Globus, bringing together federated users and facilities from the Stanford Synchrotron Radiation Lightsource, the Advanced Photon Source, the Open Science Grid, the SBGrid Consortium and Harvard Medical School. The performance and experience with the prototype provide a model for data management at shared scientific facilities.

  12. A call for BMC Research Notes contributions promoting best practice in data standardization, sharing and publication.

    Science.gov (United States)

    Hrynaszkiewicz, Iain

    2010-09-02

    BMC Research Notes aims to ensure that data files underlying published articles are made available in standard, reusable formats, and the journal is calling for contributions from the scientific community to achieve this goal. Educational Data Notes included in this special series should describe a domain-specific data standard and provide an example data set with the article, or a link to data that are permanently hosted elsewhere. The contributions should also provide some evidence of the data standard's application and preparation guidance that could be used by others wishing to conduct similar experiments. The journal is also keen to receive contributions on broader aspects of scientific data sharing, archiving, and open data.

  13. Safeguards surveillance equipment and data sharing between IAEA and a member state

    International Nuclear Information System (INIS)

    Park, Seung Sik

    1999-01-01

    Efficiency and reliability are two prongs of implementation of safeguards policy. Unattended surveillance is getting wide acceptance through its field trials and technical advances. In achieving goal of safeguards, new safeguards system should provide less intrusiveness than conventional inspection. Unattended surveillance data share will be a major issue among some countries that have own national inspection scheme in place in parallel with international safeguards to check the resources consuming incurred by the repeated installations. Nonetheless, the issue has not been focussed yet among the States concerned, especially for the country like Korea with national inspection in operation. For balanced development in safeguards regime between IAEA and Korea, sharing of unattended surveillance data with SSAC needs to be worked out in conjunction with the joint use of safeguards instruments that is in the process

  14. A framework for secure and decentralized sharing of medical imaging data via blockchain consensus.

    Science.gov (United States)

    Patel, Vishal

    2018-04-01

    The electronic sharing of medical imaging data is an important element of modern healthcare systems, but current infrastructure for cross-site image transfer depends on trust in third-party intermediaries. In this work, we examine the blockchain concept, which enables parties to establish consensus without relying on a central authority. We develop a framework for cross-domain image sharing that uses a blockchain as a distributed data store to establish a ledger of radiological studies and patient-defined access permissions. The blockchain framework is shown to eliminate third-party access to protected health information, satisfy many criteria of an interoperable health system, and readily generalize to domains beyond medical imaging. Relative drawbacks of the framework include the complexity of the privacy and security models and an unclear regulatory environment. Ultimately, the large-scale feasibility of such an approach remains to be demonstrated and will depend on a number of factors which we discuss in detail.

  15. CitSci.org: A New Model for Managing, Documenting, and Sharing Citizen Science Data.

    Directory of Open Access Journals (Sweden)

    Yiwei Wang

    2015-10-01

    Full Text Available Citizen science projects have the potential to advance science by increasing the volume and variety of data, as well as innovation. Yet this potential has not been fully realized, in part because citizen science data are typically not widely shared and reused. To address this and related challenges, we built CitSci.org (see www.citsci.org, a customizable platform that allows users to collect and generate diverse datasets. We hope that CitSci.org will ultimately increase discoverability and confidence in citizen science observations, encouraging scientists to use such data in their own scientific research.

  16. CitSci.org: A New Model for Managing, Documenting, and Sharing Citizen Science Data.

    Science.gov (United States)

    Wang, Yiwei; Kaplan, Nicole; Newman, Greg; Scarpino, Russell

    2015-10-01

    Citizen science projects have the potential to advance science by increasing the volume and variety of data, as well as innovation. Yet this potential has not been fully realized, in part because citizen science data are typically not widely shared and reused. To address this and related challenges, we built CitSci.org (see www.citsci.org), a customizable platform that allows users to collect and generate diverse datasets. We hope that CitSci.org will ultimately increase discoverability and confidence in citizen science observations, encouraging scientists to use such data in their own scientific research.

  17. The EDEN-IW ontology model for sharing knowledge and water quality data between heterogenous databases

    DEFF Research Database (Denmark)

    Stjernholm, M.; Poslad, S.; Zuo, L.

    2004-01-01

    The Environmental Data Exchange Network for Inland Water (EDEN-IW) project's main aim is to develop a system for making disparate and heterogeneous databases of Inland Water quality more accessible to users. The core technology is based upon a combination of: ontological model to represent...... a Semantic Web based data model for IW; software agents as an infrastructure to share and reason about the IW se-mantic data model and XML to make the information accessible to Web portals and mainstream Web services. This presentation focuses on the Semantic Web or Onto-logical model. Currently, we have...

  18. Microbial Diagnostic Array Workstation (MDAW: a web server for diagnostic array data storage, sharing and analysis

    Directory of Open Access Journals (Sweden)

    Chang Yung-Fu

    2008-09-01

    Full Text Available Abstract Background Microarrays are becoming a very popular tool for microbial detection and diagnostics. Although these diagnostic arrays are much simpler when compared to the traditional transcriptome arrays, due to the high throughput nature of the arrays, the data analysis requirements still form a bottle neck for the widespread use of these diagnostic arrays. Hence we developed a new online data sharing and analysis environment customised for diagnostic arrays. Methods Microbial Diagnostic Array Workstation (MDAW is a database driven application designed in MS Access and front end designed in ASP.NET. Conclusion MDAW is a new resource that is customised for the data analysis requirements for microbial diagnostic arrays.

  19. Tutorial: Data sharing from a computer or cluster via SeedMe.org

    OpenAIRE

    Chourasia, Amit; Wong, Mona; Mishin, Dmitry; Nadeau, David; Norman, Michael

    2017-01-01

    High performance computing processes and workflows often have several steps for example input preparation, computation monitoring, output validation, analysis and visualizations. All these processes yield small-scale consumable data such as computation progress, statistics, plots that are of high value for research team. Sharing and accessing this information by team members is often slow and cumbersome in current HPC environment. SeedMe platform lowers these barriers by providing cyberinfras...

  20. Joint analysis of binary and quantitative traits with data sharing and outcome-dependent sampling.

    Science.gov (United States)

    Zheng, Gang; Wu, Colin O; Kwak, Minjung; Jiang, Wenhua; Joo, Jungnam; Lima, Joao A C

    2012-04-01

    We study the analysis of a joint association between a genetic marker with both binary (case-control) and quantitative (continuous) traits, where the quantitative trait values are only available for the cases due to data sharing and outcome-dependent sampling. Data sharing becomes common in genetic association studies, and the outcome-dependent sampling is the consequence of data sharing, under which a phenotype of interest is not measured for some subgroup. The trend test (or Pearson's test) and F-test are often, respectively, used to analyze the binary and quantitative traits. Because of the outcome-dependent sampling, the usual F-test can be applied using the subgroup with the observed quantitative traits. We propose a modified F-test by also incorporating the genotype frequencies of the subgroup whose traits are not observed. Further, a combination of this modified F-test and Pearson's test is proposed by Fisher's combination of their P-values as a joint analysis. Because of the correlation of the two analyses, we propose to use a Gamma (scaled chi-squared) distribution to fit the asymptotic null distribution for the joint analysis. The proposed modified F-test and the joint analysis can also be applied to test single trait association (either binary or quantitative trait). Through simulations, we identify the situations under which the proposed tests are more powerful than the existing ones. Application to a real dataset of rheumatoid arthritis is presented. © 2012 Wiley Periodicals, Inc.

  1. The CUAHSI Water Data Center: Empowering scientists to discover, use, store, and share water data

    Science.gov (United States)

    Couch, A. L.; Hooper, R. P.; Arrigo, J. S.

    2012-12-01

    The proposed CUAHSI Water Data Center (WDC) will provide production-quality water data resources based upon the successful large-scale data services prototype developed by the CUAHSI Hydrologic Information System (HIS) project. The WDC, using the HIS technology, concentrates on providing time series data collected at fixed points or on moving platforms from sensors primarily (but not exclusively) in the medium of water. The WDC's missions include providing simple and effective data discovery tools useful to researchers in a variety of water-related disciplines, and providing simple and cost-effective data publication mechanisms for projects that do not desire to run their own data servers. The WDC's activities will include: 1. Rigorous curation of the water data catalog already assembled during the CUAHSI HIS project, to ensure accuracy of records and existence of declared sources. 2. Data backup and failover services for "at risk" data sources. 3. Creation and support for ubiquitously accessible data discovery and access, web-based search and smartphone applications. 4. Partnerships with researchers to extend the state of the art in water data use. 5. Partnerships with industry to create plug-and-play data publishing from sensors, and to create domain-specific tools. The WDC will serve as a knowledge resource for researchers of water-related issues, and will interface with other data centers to make their data more accessible to water researchers. The WDC will serve as a vehicle for addressing some of the grand challenges of accessing and using water data, including: a. Cross-domain data discovery: different scientific domains refer to the same kind of water data using different terminologies, making discovery of data difficult for researchers outside the data provider's domain. b. Cross-validation of data sources: much water data comes from sources lacking rigorous quality control procedures; such sources can be compared against others with rigorous quality

  2. Multi-Institutional Sharing of Electronic Health Record Data to Assess Childhood Obesity.

    Directory of Open Access Journals (Sweden)

    L Charles Bailey

    Full Text Available To evaluate the validity of multi-institutional electronic health record (EHR data sharing for surveillance and study of childhood obesity.We conducted a non-concurrent cohort study of 528,340 children with outpatient visits to six pediatric academic medical centers during 2007-08, with sufficient data in the EHR for body mass index (BMI assessment. EHR data were compared with data from the 2007-08 National Health and Nutrition Examination Survey (NHANES.Among children 2-17 years, BMI was evaluable for 1,398,655 visits (56%. The EHR dataset contained over 6,000 BMI measurements per month of age up to 16 years, yielding precise estimates of BMI. In the EHR dataset, 18% of children were obese versus 18% in NHANES, while 35% were obese or overweight versus 34% in NHANES. BMI for an individual was highly reliable over time (intraclass correlation coefficient 0.90 for obese children and 0.97 for all children. Only 14% of visits with measured obesity (BMI ≥95% had a diagnosis of obesity recorded, and only 20% of children with measured obesity had the diagnosis documented during the study period. Obese children had higher primary care (4.8 versus 4.0 visits, p<0.001 and specialty care (3.7 versus 2.7 visits, p<0.001 utilization than non-obese counterparts, and higher prevalence of diverse co-morbidities. The cohort size in the EHR dataset permitted detection of associations with rare diagnoses. Data sharing did not require investment of extensive institutional resources, yet yielded high data quality.Multi-institutional EHR data sharing is a promising, feasible, and valid approach for population health surveillance. It provides a valuable complement to more resource-intensive national surveys, particularly for iterative surveillance and quality improvement. Low rates of obesity diagnosis present a significant obstacle to surveillance and quality improvement for care of children with obesity.

  3. The tissue microarray OWL schema: An open-source tool for sharing tissue microarray data

    Directory of Open Access Journals (Sweden)

    Hyunseok P Kang

    2010-01-01

    Full Text Available Background: Tissue microarrays (TMAs are enormously useful tools for translational research, but incompatibilities in database systems between various researchers and institutions prevent the efficient sharing of data that could help realize their full potential. Resource Description Framework (RDF provides a flexible method to represent knowledge in triples, which take the form Subject- Predicate-Object. All data resources are described using Uniform Resource Identifiers (URIs, which are global in scope. We present an OWL (Web Ontology Language schema that expands upon the TMA data exchange specification to address this issue and assist in data sharing and integration. Methods: A minimal OWL schema was designed containing only concepts specific to TMA experiments. More general data elements were incorporated from predefined ontologies such as the NCI thesaurus. URIs were assigned using the Linked Data format. Results: We present examples of files utilizing the schema and conversion of XML data (similar to the TMA DES to OWL. Conclusion: By utilizing predefined ontologies and global unique identifiers, this OWL schema provides a solution to the limitations of XML, which represents concepts defined in a localized setting. This will help increase the utilization of tissue resources, facilitating collaborative translational research efforts.

  4. Automating dChip: toward reproducible sharing of microarray data analysis

    Directory of Open Access Journals (Sweden)

    Li Cheng

    2008-05-01

    Full Text Available Abstract Background During the past decade, many software packages have been developed for analysis and visualization of various types of microarrays. We have developed and maintained the widely used dChip as a microarray analysis software package accessible to both biologist and data analysts. However, challenges arise when dChip users want to analyze large number of arrays automatically and share data analysis procedures and parameters. Improvement is also needed when the dChip user support team tries to identify the causes of reported analysis errors or bugs from users. Results We report here implementation and application of the dChip automation module. Through this module, dChip automation files can be created to include menu steps, parameters, and data viewpoints to run automatically. A data-packaging function allows convenient transfer from one user to another of the dChip software, microarray data, and analysis procedures, so that the second user can reproduce the entire analysis session of the first user. An analysis report file can also be generated during an automated run, including analysis logs, user comments, and viewpoint screenshots. Conclusion The dChip automation module is a step toward reproducible research, and it can prompt a more convenient and reproducible mechanism for sharing microarray software, data, and analysis procedures and results. Automation data packages can also be used as publication supplements. Similar automation mechanisms could be valuable to the research community if implemented in other genomics and bioinformatics software packages.

  5. The Devil's in the Details - Lessons in Operationalising Data Sharing and Credit in the Geosciences

    Science.gov (United States)

    Callaghan, S.

    2017-12-01

    There are many drivers for sharing and opening data, and the consensus opinion is gradually becoming that sharing data is the right thing to do, as is providing appropriate credit and attribution for those who put the effort in to making their data open and usable. The idea of data citation is now well defined in principle. Unsurprisingly, when trying to implement principles, edge cases rapidly appear, resulting in extra thinking and complications. Citation for journal papers is well established and understood, and most researchers have a good understanding of what is, and is not, a publication. Datasets can be considered as a "special type of paper" - but only a small proportion of datasets can actually fit into "paper-like" publication processes. The vast majority of datasets are too big, too complicated, or are being continually modified and/or updated. Data publication (piggy-backing as it does on academic publication to obtain credit for data producers) is only one metaphor that can be used for data - different metaphors can provide other routes for providing credit for data producers. Explaining those metaphors, and how they apply to data and provide credit for producers, can be difficult to those who until now have solely judged research productivity on the numbers of papers produced in high impact factor journals. Different research domains seek to address these issues in different ways, and it is important to recognise that there is no "one size fits all" approach that will cover all disciplines from astronomy to zoology. Similarly, the amount of journal publications are increasing dramatically, resulting in a peer-review system that is creaking at the seams. Adding peer review of data (in order to quantify the "quality" of the data) to the task list of already over-burdened volunteer reviewers will result in less reviewers being willing to take on the task, or reviewers putting less effort into their reviews. This presentation will consist of a series of

  6. Flexible, Secure, and Reliable Data Sharing Service Based on Collaboration in Multicloud Environment

    Directory of Open Access Journals (Sweden)

    Qiang Wei

    2018-01-01

    Full Text Available Due to the abundant storage resources and high reliability data service of cloud computing, more individuals and enterprises are motivated to outsource their data to public cloud platform and enable legal data users to search and download what they need in the outsourced dataset. However, in “Paid Data Sharing” model, some valuable data should be encrypted before outsourcing for protecting owner’s economic benefits, which is an obstacle for flexible application. Specifically, if the owner does not know who (user will download which data files in advance and even does not know the attributes of user, he/she has to either remain online all the time or import a trusted third party (TTP to distribute the file decryption key to data user. Obviously, making the owner always remain online is too inflexible, and wholly depending on the security of TTP is a potential risk. In this paper, we propose a flexible, secure, and reliable data sharing scheme based on collaboration in multicloud environment. For securely and instantly providing data sharing service even if the owner is offline and without TTP, we distribute all encrypted split data/key blocks together to multiple cloud service providers (CSPs, respectively. An elaborate cryptographic protocol we designed helps the owner verify the correctness of data exchange bills, which is directly related to the owner’s economic benefits. Besides, in order to support reliable data service, the erasure-correcting code technic is exploited for tolerating multiple failures among CSPs, and we offer a secure keyword search mechanism that makes the system more close to reality. Extensive security analyses and experiments on real-world data show that our scheme is secure and efficient.

  7. Towards quantitative evaluation of privacy protection schemes for electricity usage data sharing

    Directory of Open Access Journals (Sweden)

    Daisuke Mashima

    2018-03-01

    Full Text Available Thanks to the roll-out of smart meters, availability of fine-grained electricity usage data has rapidly grown. Such data has enabled utility companies to perform robust and efficient grid operations. However, at the same time, privacy concerns associated with sharing and disclosure of such data have been raised. In this paper, we first demonstrate the feasibility of estimating privacy-sensitive household attributes based solely on the energy usage data of residential customers. We then discuss a framework to measure privacy gain and evaluate the effectiveness of customer-centric privacy-protection schemes, namely redaction of data irrelevant to services and addition of bounded artificial noise. Keywords: Privacy, Smart meter data, Quantitative evaluation

  8. A community effort to protect genomic data sharing, collaboration and outsourcing.

    Science.gov (United States)

    Wang, Shuang; Jiang, Xiaoqian; Tang, Haixu; Wang, Xiaofeng; Bu, Diyue; Carey, Knox; Dyke, Stephanie Om; Fox, Dov; Jiang, Chao; Lauter, Kristin; Malin, Bradley; Sofia, Heidi; Telenti, Amalio; Wang, Lei; Wang, Wenhao; Ohno-Machado, Lucila

    2017-01-01

    The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In 2016, we organized the third Critical Assessment of Data Privacy and Protection competition as a community effort to bring together biomedical informaticists, computer privacy and security researchers, and scholars in ethical, legal, and social implications (ELSI) to assess the latest advances on privacy-preserving techniques for protecting human genomic data. Teams were asked to develop novel protection methods for emerging genome privacy challenges in three scenarios: Track (1) data sharing through the Beacon service of the Global Alliance for Genomics and Health. Track (2) collaborative discovery of similar genomes between two institutions; and Track (3) data outsourcing to public cloud services. The latter two tracks represent continuing themes from our 2015 competition, while the former was new and a response to a recently established vulnerability. The winning strategy for Track 1 mitigated the privacy risk by hiding approximately 11% of the variation in the database while permitting around 160,000 queries, a significant improvement over the baseline. The winning strategies in Tracks 2 and 3 showed significant progress over the previous competition by achieving multiple orders of magnitude performance improvement in terms of computational runtime and memory requirements. The outcomes suggest that applying highly optimized privacy-preserving and secure computation techniques to safeguard genomic data sharing and analysis is useful. However, the results also indicate that further efforts are needed to refine these techniques into practical solutions.

  9. South Asia Water Resources Workshop: An effort to promote water quality data sharing in South Asia

    Energy Technology Data Exchange (ETDEWEB)

    RAJEN,GAURAV; BIRINGER,KENT L.; BETSILL,J. DAVID

    2000-04-01

    To promote cooperation in South Asia on environmental research, an international working group comprised of participants from Bangladesh, India, Nepal, Pakistan, Sri Lanka, and the US convened at the Soaltee Hotel in Kathmandu, Nepal, September 12 to 14, 1999. The workshop was sponsored in part by the Cooperative Monitoring Center (CMC) at Sandia National Laboratories in Albuquerque, New Mexico, through funding provided by the Department of Energy (DOE) Office of Nonproliferation and National Security. The CMC promotes collaborations among scientists and researchers in regions throughout the world as a means of achieving common regional security objectives. In the long term, the workshop organizers and participants are interested in the significance of regional information sharing as a means to build confidence and reduce conflict. The intermediate interests of the group focus on activities that might eventually foster regional management of some aspects of water resources utilization. The immediate purpose of the workshop was to begin the implementation phase of a project to collect and share water quality information at a number of river and coastal estuary locations throughout the region. The workshop participants achieved four objectives: (1) gaining a better understanding of the partner organizations involved; (2) garnering the support of existing regional organizations promoting environmental cooperation in South Asia; (3) identifying sites within the region at which data is to be collected; and (4) instituting a data and information collection and sharing process.

  10. Interactive Data Visualization for HIV Cohorts: Leveraging Data Exchange Standards to Share and Reuse Research Tools.

    Directory of Open Access Journals (Sweden)

    Meridith Blevins

    Full Text Available To develop and disseminate tools for interactive visualization of HIV cohort data.If a picture is worth a thousand words, then an interactive video, composed of a long string of pictures, can produce an even richer presentation of HIV population dynamics. We developed an HIV cohort data visualization tool using open-source software (R statistical language. The tool requires that the data structure conform to the HIV Cohort Data Exchange Protocol (HICDEP, and our implementation utilized Caribbean, Central and South America network (CCASAnet data.This tool currently presents patient-level data in three classes of plots: (1 Longitudinal plots showing changes in measurements viewed alongside event probability curves allowing for simultaneous inspection of outcomes by relevant patient classes. (2 Bubble plots showing changes in indicators over time allowing for observation of group level dynamics. (3 Heat maps of levels of indicators changing over time allowing for observation of spatial-temporal dynamics. Examples of each class of plot are given using CCASAnet data investigating trends in CD4 count and AIDS at antiretroviral therapy (ART initiation, CD4 trajectories after ART initiation, and mortality.We invite researchers interested in this data visualization effort to use these tools and to suggest new classes of data visualization. We aim to contribute additional shareable tools in the spirit of open scientific collaboration and hope that these tools further the participation in open data standards like HICDEP by the HIV research community.

  11. Grid-enabled measures: using Science 2.0 to standardize measures and share data.

    Science.gov (United States)

    Moser, Richard P; Hesse, Bradford W; Shaikh, Abdul R; Courtney, Paul; Morgan, Glen; Augustson, Erik; Kobrin, Sarah; Levin, Kerry Y; Helba, Cynthia; Garner, David; Dunn, Marsha; Coa, Kisha

    2011-05-01

    Scientists are taking advantage of the Internet and collaborative web technology to accelerate discovery in a massively connected, participative environment--a phenomenon referred to by some as Science 2.0. As a new way of doing science, this phenomenon has the potential to push science forward in a more efficient manner than was previously possible. The Grid-Enabled Measures (GEM) database has been conceptualized as an instantiation of Science 2.0 principles by the National Cancer Institute (NCI) with two overarching goals: (1) promote the use of standardized measures, which are tied to theoretically based constructs; and (2) facilitate the ability to share harmonized data resulting from the use of standardized measures. The first is accomplished by creating an online venue where a virtual community of researchers can collaborate together and come to consensus on measures by rating, commenting on, and viewing meta-data about the measures and associated constructs. The second is accomplished by connecting the constructs and measures to an ontological framework with data standards and common data elements such as the NCI Enterprise Vocabulary System (EVS) and the cancer Data Standards Repository (caDSR). This paper will describe the web 2.0 principles on which the GEM database is based, describe its functionality, and discuss some of the important issues involved with creating the GEM database such as the role of mutually agreed-on ontologies (i.e., knowledge categories and the relationships among these categories--for data sharing). Published by Elsevier Inc.

  12. A Panel Data Analysis of the Impact of Macroeconomic Indicators on Firms’ Shares Performance in Nigeria

    Directory of Open Access Journals (Sweden)

    Michael S. Ogunmuyiwa

    2016-11-01

    Full Text Available This paper contributes to the ongoing debate on whether the impact of macroeconomic indicators on the stock market is positive or negative or of no effect by analyzing the relationship between macroeconomic fundamentals and performance of quoted firms on the Nigeria Stock Exchange market. A sample of fifty (50 quoted firms across eight (8 major sectors of the market was selected for the study. The static panel regression technique was employed on monthly data sourced from the Nigeria Stock Exchange (NSE and the Central Bank of Nigeria (CBN between 2007:1 and 2013:12. Results from empirical findings reveal that varying impacts exist between the macroeconomic indicators and firm share returns in Nigeria. It goes further to affirm that inflation rate, interest rate and exchange rate are the major significant macroeconomic indicators driving firm share returns in Nigeria.

  13. The Climate-G testbed: towards a large scale data sharing environment for climate change

    Science.gov (United States)

    Aloisio, G.; Fiore, S.; Denvil, S.; Petitdidier, M.; Fox, P.; Schwichtenberg, H.; Blower, J.; Barbera, R.

    2009-04-01

    The Climate-G testbed provides an experimental large scale data environment for climate change addressing challenging data and metadata management issues. The main scope of Climate-G is to allow scientists to carry out geographical and cross-institutional climate data discovery, access, visualization and sharing. Climate-G is a multidisciplinary collaboration involving both climate and computer scientists and it currently involves several partners such as: Centro Euro-Mediterraneo per i Cambiamenti Climatici (CMCC), Institut Pierre-Simon Laplace (IPSL), Fraunhofer Institut für Algorithmen und Wissenschaftliches Rechnen (SCAI), National Center for Atmospheric Research (NCAR), University of Reading, University of Catania and University of Salento. To perform distributed metadata search and discovery, we adopted a CMCC metadata solution (which provides a high level of scalability, transparency, fault tolerance and autonomy) leveraging both on P2P and grid technologies (GRelC Data Access and Integration Service). Moreover, data are available through OPeNDAP/THREDDS services, Live Access Server as well as the OGC compliant Web Map Service and they can be downloaded, visualized, accessed into the proposed environment through the Climate-G Data Distribution Centre (DDC), the web gateway to the Climate-G digital library. The DDC is a data-grid portal allowing users to easily, securely and transparently perform search/discovery, metadata management, data access, data visualization, etc. Godiva2 (integrated into the DDC) displays 2D maps (and animations) and also exports maps for display on the Google Earth virtual globe. Presently, Climate-G publishes (through the DDC) about 2TB of data related to the ENSEMBLES project (also including distributed replicas of data) as well as to the IPCC AR4. The main results of the proposed work are: wide data access/sharing environment for climate change; P2P/grid metadata approach; production-level Climate-G DDC; high quality tools for

  14. Leveraging Crowdsourcing and Linked Open Data for Geoscience Data Sharing and Discovery

    Science.gov (United States)

    Narock, T. W.; Rozell, E. A.; Hitzler, P.; Arko, R. A.; Chandler, C. L.; Wilson, B. D.

    2013-12-01

    Data citation standards can form the basis for increased incentives, recognition, and rewards for scientists. Additionally, knowing which data were utilized in a particular publication can enhance discovery and reuse. Yet, a lack of data citation information in existing publications as well as ambiguities across datasets can limit the accuracy of automated linking approaches. We describe a crowdsourcing approach, based on Linked Open Data, in which AGU abstracts are linked to the data used in those presentations. We discuss our efforts to incentivize participants through promotion of their research, the role that the Semantic Web can play in this effort, and how this work differs from existing platforms such as Mendeley and ResearchGate. Further, we discuss the benefits and challenges of Linked Open Data as a technical solution including the role of provenance, trust, and computational reasoning.

  15. caNanoLab: data sharing to expedite the use of nanotechnology in biomedicine

    International Nuclear Information System (INIS)

    Gaheen, Sharon; Hinkal, George W; Morris, Stephanie A; Lijowski, Michal; Heiskanen, Mervi; Klemm, Juli D

    2013-01-01

    The use of nanotechnology in biomedicine involves the engineering of nanomaterials to act as therapeutic carriers, targeting agents and diagnostic imaging devices. The application of nanotechnology in cancer aims to transform early detection, targeted therapeutics and cancer prevention and control. To assist in expediting and validating the use of nanomaterials in biomedicine, the National Cancer Institute (NCI) Center for Biomedical Informatics and Information Technology, in collaboration with the NCI Alliance for Nanotechnology in Cancer (Alliance), has developed a data sharing portal called caNanoLab. caNanoLab provides access to experimental and literature curated data from the NCI Nanotechnology Characterization Laboratory, the Alliance and the greater cancer nanotechnology community. (paper)

  16. Using Social Media and Mobile Devices to Discover and Share Disaster Data Products Derived From Satellites

    Science.gov (United States)

    Mandl, Daniel; Cappelaere, Patrice; Frye, Stuart; Evans, John; Moe, Karen

    2014-01-01

    Data products derived from Earth observing satellites are difficult to find and share without specialized software and often times a highly paid and specialized staff. For our research effort, we endeavored to prototype a distributed architecture that depends on a standardized communication protocol and applications program interface (API) that makes it easy for anyone to discover and access disaster related data. Providers can easily supply the public with their disaster related products by building an adapter for our API. Users can use the API to browse and find products that relate to the disaster at hand, without a centralized catalogue, for example floods, and then are able to share that data via social media. Furthermore, a longerterm goal for this architecture is to enable other users who see the shared disaster product to be able to generate the same product for other areas of interest via simple point and click actions on the API on their mobile device. Furthermore, the user will be able to edit the data with on the ground local observations and return the updated information to the original repository of this information if configured for this function. This architecture leverages SensorWeb functionality [1] presented at previous IGARSS conferences. The architecture is divided into two pieces, the frontend, which is the GeoSocial API, and the backend, which is a standardized disaster node that knows how to talk to other disaster nodes, and also can communicate with the GeoSocial API. The GeoSocial API, along with the disaster node basic functionality enables crowdsourcing and thus can leverage insitu observations by people external to a group to perform tasks such as improving water reference maps, which are maps of existing water before floods. This can lower the cost of generating precision water maps. Keywords-Data Discovery, Disaster Decision Support, Disaster Management, Interoperability, CEOS WGISS Disaster Architecture

  17. OpenElectrophy: an electrophysiological data- and analysis-sharing framework

    Directory of Open Access Journals (Sweden)

    Samuel Garcia

    2009-05-01

    Full Text Available Progress in experimental tools and design is allowing the acquisition of increasingly large datasets. Storage, manipulation and efficient analyses of such large amounts of data is now a primary issue. We present OpenElectrophy, an electrophysiological data and analysis sharing framework developed to fill this niche. It stores all experiment data and meta-data in a single central MySQL database, and provides a graphic user interface to visualize and explore the data, and a library of functions for user analysis scripting in Python. It implements multiple spike sorting methods, and oscillation detection based on the ridge extraction methods due to Roux et. al., 2007. OpenElectrophy is open-source and is freely available for download at http://neuralensemble.org/trac/OpenElectrophy.

  18. Cross-layer shared protection strategy towards data plane in software defined optical networks

    Science.gov (United States)

    Xiong, Yu; Li, Zhiqiang; Zhou, Bin; Dong, Xiancun

    2018-04-01

    In order to ensure reliable data transmission on the data plane and minimize resource consumption, a novel protection strategy towards data plane is proposed in software defined optical networks (SDON). Firstly, we establish a SDON architecture with hierarchical structure of data plane, which divides the data plane into four layers for getting fine-grained bandwidth resource. Then, we design the cross-layer routing and resource allocation based on this network architecture. Through jointly considering the bandwidth resource on all the layers, the SDN controller could allocate bandwidth resource to working path and backup path in an economical manner. Next, we construct auxiliary graphs and transform the shared protection problem into the graph vertex coloring problem. Therefore, the resource consumption on backup paths can be reduced further. The simulation results demonstrate that the proposed protection strategy can achieve lower protection overhead and higher resource utilization ratio.

  19. The Impact of Varying Statutory Arrangements on Spatial Data Sharing and Access in Regional NRM Bodies

    Science.gov (United States)

    Paudyal, D. R.; McDougall, K.; Apan, A.

    2014-12-01

    Spatial information plays an important role in many social, environmental and economic decisions and increasingly acknowledged as a national resource essential for wider societal and environmental benefits. Natural Resource Management is one area where spatial information can be used for improved planning and decision making processes. In Australia, state government organisations are the custodians of spatial information necessary for natural resource management and regional NRM bodies are responsible to regional delivery of NRM activities. The access and sharing of spatial information between government agencies and regional NRM bodies is therefore as an important issue for improving natural resource management outcomes. The aim of this paper is to evaluate the current status of spatial information access, sharing and use with varying statutory arrangements and its impacts on spatial data infrastructure (SDI) development in catchment management sector in Australia. Further, it critically examined whether any trends and significant variations exist due to different institutional arrangements (statutory versus non-statutory) or not. A survey method was used to collect primary data from 56 regional natural resource management (NRM) bodies responsible for catchment management in Australia. Descriptive statistics method was used to show the similarities and differences between statutory and non-statutory arrangements. The key factors which influence sharing and access to spatial information are also explored. The results show the current statutory and administrative arrangements and regional focus for natural resource management is reasonable from a spatial information management perspective and provides an opportunity for building SDI at the catchment scale. However, effective institutional arrangements should align catchment SDI development activities with sub-national and national SDI development activities to address catchment management issues. We found minor

  20. Building a Culture of Data Sharing: Policy Design and Implementation for Research Data Management in Development Research

    Directory of Open Access Journals (Sweden)

    Cameron Neylon

    2017-10-01

    Full Text Available A pilot project worked with seven existing projects funded by the International Development Research Center of Canada (IDRC to investigate the implementation of data management and sharing requirements within development research projects. The seven projects, which were selected to achieve a diversity of project types, locations, host institutions and subject areas, demonstrated a broad range of existing capacities to work with data and access to technical expertise and infrastructures. The pilot project provided an introduction to data management and sharing concepts, helped projects develop a Data Management Plan, and then observed the implementation of that plan. In examining the uptake of Data Management and Sharing practice amongst these seven groups the project came to question the underlying goals of funders in introducing data management and sharing requirements. It was established that the ultimate goal was a change in culture amongst grantees. The project therefore looked for evidence of how funder interventions might promote or hinder such cultural change. The project had two core findings. First that the shift from an aim of changing behaviour, to changing culture, has both subtle and profound implications for policy design and implementation. A particular finding is that the single point of contact that many data management and sharing policies create where a Data Management Plan is required at grant submission but then not further utilised is at best neutral and likely counter productive in supporting change in researcher culture. As expected, there are significant bottlenecks within research institutions and for grantees in effectively sharing data including a lack of resources and expertise. However, a core finding is that many of the bottlenecks for change relate to structural issues at the funder level. Specifically, the expectation that policy initiatives are implemented, monitored, and evaluated by Program Officers who are

  1. Integrating hydrologic modeling web services with online data sharing to prepare, store, and execute models in hydrology

    Science.gov (United States)

    Gan, T.; Tarboton, D. G.; Dash, P. K.; Gichamo, T.; Horsburgh, J. S.

    2017-12-01

    Web based apps, web services and online data and model sharing technology are becoming increasingly available to support research. This promises benefits in terms of collaboration, platform independence, transparency and reproducibility of modeling workflows and results. However, challenges still exist in real application of these capabilities and the programming skills researchers need to use them. In this research we combined hydrologic modeling web services with an online data and model sharing system to develop functionality to support reproducible hydrologic modeling work. We used HydroDS, a system that provides web services for input data preparation and execution of a snowmelt model, and HydroShare, a hydrologic information system that supports the sharing of hydrologic data, model and analysis tools. To make the web services easy to use, we developed a HydroShare app (based on the Tethys platform) to serve as a browser based user interface for HydroDS. In this integration, HydroDS receives web requests from the HydroShare app to process the data and execute the model. HydroShare supports storage and sharing of the results generated by HydroDS web services. The snowmelt modeling example served as a use case to test and evaluate this approach. We show that, after the integration, users can prepare model inputs or execute the model through the web user interface of the HydroShare app without writing program code. The model input/output files and metadata describing the model instance are stored and shared in HydroShare. These files include a Python script that is automatically generated by the HydroShare app to document and reproduce the model input preparation workflow. Once stored in HydroShare, inputs and results can be shared with other users, or published so that other users can directly discover, repeat or modify the modeling work. This approach provides a collaborative environment that integrates hydrologic web services with a data and model sharing

  2. Challenges and successes in developing a data sharing culture in the Gulf of Mexico following the Deepwater Horizon disaster.

    Science.gov (United States)

    Showalter, L. M.

    2017-12-01

    The Gulf Research Program (GRP) was developed as part of legal settlements with the companies involved in the Deepwater Horizon (DWH) disaster. The Federal Government asked the National Academy of Sciences to establish a new program to fund and conduct activities to enhance offshore energy system safety and protect human health and the environment in the Gulf of Mexico and other regions along the U.S. outer continental shelf. An important part of the program is a commitment to open data and data sharing among the variety of disciplines it funds. The DWH disaster produced a major influx of funding for the Gulf region and various groups and organizations are collaborating to ensure that the science being conducted via these funding streams is not duplicative. A number of data focused sub groups have formed and are working to leverage existing efforts to strengthen data sharing and collaboration in the region. For its part, the GRP is developing a data program that encourages researchers to share data openly while providing avenues for acknowledgement of data sharing and research collaborations. A main problem with collaborative data sharing is often not the technologies available but instead the human component. The "traditional" path for scientific research has not generally involved making data widely or readily available in a short time frame. It takes a lot of effort to challenge this norm and change the way researchers view data sharing and its value for them and the world at large. The GRP data program aims to build a community of researchers that not only share their data but who also help show the value of this practice to the greater scientific community. To this end, the GRP will support a variety of education and training opportunities to help develop a base of researchers more informed on issues related to open data and data sharing and working to leverage the technology and expertise of others to develop a culture of data sharing in the Gulf of Mexico.

  3. Distributed data networks: a blueprint for Big Data sharing and healthcare analytics.

    Science.gov (United States)

    Popovic, Jennifer R

    2017-01-01

    This paper defines the attributes of distributed data networks and outlines the data and analytic infrastructure needed to build and maintain a successful network. We use examples from one successful implementation of a large-scale, multisite, healthcare-related distributed data network, the U.S. Food and Drug Administration-sponsored Sentinel Initiative. Analytic infrastructure-development concepts are discussed from the perspective of promoting six pillars of analytic infrastructure: consistency, reusability, flexibility, scalability, transparency, and reproducibility. This paper also introduces one use case for machine learning algorithm development to fully utilize and advance the portfolio of population health analytics, particularly those using multisite administrative data sources. © 2016 New York Academy of Sciences.

  4. The PREVIEW Global Risk Data Platform: a geoportal to serve and share global data on risk to natural hazards

    Directory of Open Access Journals (Sweden)

    G. Giuliani

    2011-01-01

    Full Text Available With growing world population and concentration in urban and coastal areas, the exposure to natural hazards is increasing and results in higher risk of human and economic losses. Improving the identification of areas, population and assets potentially exposed to natural hazards is essential to reduce the consequences of such events. Disaster risk is a function of hazard, exposure and vulnerability. Modelling risk at the global level requires accessing and processing a large number of data, from numerous collaborating centres.

    These data need to be easily updated, and there is a need for centralizing access to this information as well as simplifying its use for non GIS specialists. The Hyogo Framework for Action provides the mandate for data sharing, so that governments and international development agencies can take appropriate decision for disaster risk reduction.

    Timely access and easy integration of geospatial data are essential to support efforts in Disaster Risk Reduction. However various issues in data availability, accessibility and integration limit the use of such data. In consequence, a framework that facilitate sharing and exchange of geospatial data on natural hazards should improve decision-making process. The PREVIEW Global Risk Data Platform is a highly interactive web-based GIS portal supported by a Spatial Data Infrastructure that offers free and interoperable access to more than 60 global data sets on nine types of natural hazards (tropical cyclones and related storm surges, drought, earthquakes, biomass fires, floods, landslides, tsunamis and volcanic eruptions and related exposure and risk. This application portrays an easy-to-use online interactive mapping interface so that users can easily work with it and seamlessly integrate data in their own data flow using fully compliant OGC Web Services (OWS.

  5. The PREVIEW Global Risk Data Platform: a geoportal to serve and share global data on risk to natural hazards

    Science.gov (United States)

    Giuliani, G.; Peduzzi, P.

    2011-01-01

    With growing world population and concentration in urban and coastal areas, the exposure to natural hazards is increasing and results in higher risk of human and economic losses. Improving the identification of areas, population and assets potentially exposed to natural hazards is essential to reduce the consequences of such events. Disaster risk is a function of hazard, exposure and vulnerability. Modelling risk at the global level requires accessing and processing a large number of data, from numerous collaborating centres. These data need to be easily updated, and there is a need for centralizing access to this information as well as simplifying its use for non GIS specialists. The Hyogo Framework for Action provides the mandate for data sharing, so that governments and international development agencies can take appropriate decision for disaster risk reduction. Timely access and easy integration of geospatial data are essential to support efforts in Disaster Risk Reduction. However various issues in data availability, accessibility and integration limit the use of such data. In consequence, a framework that facilitate sharing and exchange of geospatial data on natural hazards should improve decision-making process. The PREVIEW Global Risk Data Platform is a highly interactive web-based GIS portal supported by a Spatial Data Infrastructure that offers free and interoperable access to more than 60 global data sets on nine types of natural hazards (tropical cyclones and related storm surges, drought, earthquakes, biomass fires, floods, landslides, tsunamis and volcanic eruptions) and related exposure and risk. This application portrays an easy-to-use online interactive mapping interface so that users can easily work with it and seamlessly integrate data in their own data flow using fully compliant OGC Web Services (OWS).

  6. The Nanomaterial Registry: facilitating the sharing and analysis of data in the diverse nanomaterial community

    Directory of Open Access Journals (Sweden)

    Ostraat ML

    2013-09-01

    Full Text Available Michele L Ostraat, Karmann C Mills, Kimberly A Guzan, Damaris MurryRTI International, Durham, NC, USAAbstract: The amount of data being generated in the nanotechnology research space is significant, and the coordination, sharing, and downstream analysis of the data is complex and consistently deliberated. The complexities of the data are due in large part to the inherently complicated characteristics of nanomaterials. Also, testing protocols and assays used for nanomaterials are diverse and lacking standardization. The Nanomaterial Registry has been developed to address such challenges as the need for standard methods, data formatting, and controlled vocabularies for data sharing. The Registry is an authoritative, web-based tool whose purpose is to simplify the community's level of effort in assessing nanomaterial data from environmental and biological interaction studies. Because the registry is meant to be an authoritative resource, all data-driven content is systematically archived and reviewed by subject-matter experts. To support and advance nanomaterial research, a set of minimal information about nanomaterials (MIAN has been developed and is foundational to the Registry data model. The MIAN has been used to create evaluation and similarity criteria for nanomaterials that are curated into the Registry. The Registry is a publicly available resource that is being built through collaborations with many stakeholder groups in the nanotechnology community, including industry, regulatory, government, and academia. Features of the Registry website (https://www.nanomaterialregistry.org/ currently include search, browse, side-by-side comparison of nanomaterials, compliance ratings based on the quality and quantity of data, and the ability to search for similar nanomaterials within the Registry. This paper is a modification and extension of a proceedings paper for the Institute of Electrical and Electronics Engineers.Keywords: nanoinformatics

  7. NREL Develops OpenEI.org, a Public Website Where Energy Data can be Generated, Shared, and Compared (Fact Sheet)

    Energy Technology Data Exchange (ETDEWEB)

    2013-12-01

    The National Renewable Energy Laboratory (NREL) has developed OpenEI.org, a public, open, data-sharing platform where consumers, analysts, industry experts, and energy decision makers can go to boost their energy IQs, search for energy data, share data, and get access to energy applications. The free site blends elements of social media, linked open-data practices, and MediaWiki-based technology to build a collaborative environment for creating and sharing energy data with the world. The result is a powerful platform that is helping government and industry leaders around the world define policy options, make informed investment decisions, and create new businesses.

  8. Research and implementation of geography service bus in spatial data sharing platform

    Science.gov (United States)

    Zou, Zhiqiang; Nan, Jiang; Lin, Tao; Bai, Mingbai; He, Xingfu

    2006-10-01

    Geographic Information Systems, GIS, software has wide applications in business; however, implementation of the interoperability among the GIS has also become a challenge. This paper presents a solution based on Geography Service Bus that uses web services to achieve the interoperability among these heterogeneous GIS to allow users share the Geosciences data as well as access service. Referring to the abstract specification of OWS (OGC Web Services), the proposed solution adopts the SOA (Service-Oriented Architecture) when implementing SDSP (Spatial Data Sharing Platform). To accomplish this, a new abstract layer, GSB (Geography Service Bus), is created to provide standard interface. GSB extends ESB (Enterprise Service Bus) proposed by IBM and SUN, and combines the application in geography. GSB inherits the general features of ESB, such as interoperability, heterogeneity and service-oriented while offering unique functions like the high volume geo-data access and better management in geographic services. GSB includes the following JAVA implemented components: the management component of the geography registry service, the route component of the geography request service and the geographical business process component, etc. GSB plays an important role in SDSP and has been developed and successfully applied in the Data Center for Resources & Environmental Sciences in East China as a key project of Chinese Academy of Sciences. It has been observed that the introduction of GSB has tremendously improved both performance and interoperability of SDSP among heterogeneous GIS than traditional methods.

  9. Industry Contributions to Seafloor Mapping: Building Partnerships for Collecting, Sharing, and Compiling Data

    Science.gov (United States)

    Brumley, K. J.; Mitchell, G. A.; Millar, D.; Saade, E. J.; Gharib, J. J.

    2017-12-01

    In an effort to map the remaining 85% of the worlds seafloor, The Nippon Foundation and GEBCO have launched Seabed 2030 to provide high-resolution bathymetry for all ocean waters by the year 2030. This ambitious effort will require sharing of bathymetric information to build a global baseline bathymetry database. Multibeam echosounder (MBES) data is a promising source of data for Seabed 2030. These data benefit multiple users which includes not only bathymetric information, but also valuable backscatter data, useful for determining seafloor characteristics), as well as water column data, which can be used to explore other aspects of the marine environment and potentially help constrain some of the ocean's methane flux estimates. Fugro provides global survey services for clients in the oil and gas, telecommunications, infrastructure industries, and state and federal agencies. With a global fleet of survey vessels and autonomous vehicles equipped with state-of-the-art MBES systems, Fugro has performed some of the world's largest offshore surveys over the past several years mapping close to 1,000,000 km2 of seafloor per year with high-resolution MBES data using multi-vessel operational models and new methods for merging datasets from different multibeam sonar systems. Although most of these data are proprietary, Fugro is working with clients in the private-sector to make data available to the Seabed 2030 project at a decimated resolution of 100 m. The company is also contributing the MBES data acquired during transits to survey locations. Fugro has also partnered with Shell Ocean Discovery XPRIZE to support development of new rapid, unmanned, high-resolution ocean mapping technologies that can benefit understanding of the world's oceans. Collaborative approaches such as these are helping to establish a new standard for other industry contributions, and to facilitate a new outlook for data sharing among the public and private sectors. Recognizing the importance of an

  10. Can We Drink the Water? Data Sharing Lessons From the Great Lakes

    Science.gov (United States)

    Aufdenkampe, A. K.; Paige, K.; Slawecki, T. A.

    2017-12-01

    The Great Lakes Observing System (GLOS) is one of 11 regional associations of the Integrated Ocean Observing System (IOOS). Over time, GLOS has built a reputation as a trusted data aggregator and resource for managers, policy makers and recreational boaters in the region. This was evidenced best when, in response to the 2014 Lake Erie harmful algal bloom event, local stakeholders including universities, state government, and municipal water managers turned to GLOS as a repository for sharing and finding data. The IOOS Certification process, required under the authority of the Integrated Coastal and Ocean Observation System Act of 2009 (ICOOS Act), further legitimizes these data assembly centers that serve as valuable coordinators of data for their regions.

  11. A Distributed Architecture for Sharing Ecological Data Sets with Access and Usage Control Guarantees

    DEFF Research Database (Denmark)

    Bonnet, Philippe; Gonzalez, Javier; Granados, Joel Andres

    2014-01-01

    new insights, there are signicant barriers to the realization of this vision. One of the key challenge is to allow scientists to share their data widely while retaining some form of control over who accesses this data (access control) and more importantly how it is used (usage control). Access...... and usage control is necessary to enforce existing open data policies. We have proposed the vision of trusted cells: A decentralized infrastructure, based on secure hardware running on devices equipped with trusted execution environments at the edges of the Internet. We originally described the utilization...... data sets with access and usage control guarantees. We rely on examples from terrestrial research and monitoring in the arctic in the context of the INTERACT project....

  12. The Hierarchical Data Format as a Foundation for Community Data Sharing

    Science.gov (United States)

    Habermann, T.

    2017-12-01

    Hierarchical Data Format (HDF) formats and libraries have been used by individual researchers and major science programs across many Earth and Space Science disciplines and sectors to provide high-performance information storage and access for several decades. Generic group, dataset, and attribute objects in HDF have been combined in many ways to form domain objects that scientists understand and use. Well-known applications of HDF in the Earth Sciences include thousands of global satellite observations and products produced by NASA's Earth Observing System using the HDF-EOS conventions, navigation quality bathymetry produced as Bathymetric Attributed Grids (BAGs) by the OpenNavigationSurface project and others, seismic wave collections written into the Adoptable Seismic Data Format (ASDF) and many oceanographic and atmospheric products produced using the climate-forecast conventions with the netCDF4 data model and API to HDF5. This is the modus operandi of these communities: 1) develop a model of scientific data objects and associated metadata used in a domain, 2) implement that model using HDF, 3) develop software libraries that connect that model to tools and 4) encourage adoption of those tools in the community. Understanding these domain object implementations and facilitating communication across communities is an important goal of The HDF Group. We will discuss these examples and approaches to community outreach during this session.

  13. Feasibility of an International Multiple Sclerosis Rehabilitation Data Repository: Perceived Challenges and Motivators for Sharing Data.

    Science.gov (United States)

    Bradford, Elissa Held; Baert, Ilse; Finlayson, Marcia; Feys, Peter; Wagner, Joanne

    2018-01-01

    Multiple sclerosis (MS) rehabilitation evidence is limited due to methodological factors, which may be addressed by a data repository. We describe the perceived challenges of, motivators for, interest in participating in, and key features of an international MS rehabilitation data repository. A multimethod sequential investigation was performed with the results of two focus groups, using nominal group technique, and study aims informing the development of an online questionnaire. Percentage agreement and key quotations illustrated questionnaire findings. Subgroup comparisons were made between clinicians and researchers and between participants in North America and Europe. Rehabilitation professionals from 25 countries participated (focus groups: n = 21; questionnaire: n = 166). The top ten challenges (C) and motivators (M) identified by the focus groups were database control/management (C); ethical/legal concerns (C); data quality (C); time, effort, and cost (C); best practice (M); uniformity (C); sustainability (C); deeper analysis (M); collaboration (M); and identifying research needs (M). Percentage agreement with questionnaire statements regarding challenges to, motivators for, interest in, and key features of a successful repository was at least 80%, 85%, 72%, and 83%, respectively, across each group of statements. Questionnaire subgroup analysis revealed a few differences (P motivator. Findings support clinician and researcher interest in and potential for success of an international MS rehabilitation data repository if prioritized challenges and motivators are addressed and key features are included.

  14. The data harvest how sharing research data can yield knowledge, jobs and growth : an RDA Europe report

    CERN Document Server

    Moran, Nuala

    2014-01-01

    In October 2010, the High Level Group on Scientific Data presented the "Riding the Wave,” report to the European Commission outlining a series of policy recommendations on how Europe could gain from the rising tide of scientific data. Over 4 years later, a team of European experts have generated a new report "The Data Harvest: How sharing research data can yield knowledge, jobs and growth" with an update on the landscape described in the previous report aiming to sound a warning on how Europe must act now to secure its standing in future data markets. In this report, we outline the benefits and challenges, and offer recommendations to European policy makers. The seeds have been sown. Now is the time to plan the harvest.

  15. Patient Perceptions About Data Sharing & Privacy: Insights from ActionADE.

    Science.gov (United States)

    Small, Serena S; Peddie, David; Ackerley, Christine; Hohl, Corinne M; Balka, Ellen

    2017-01-01

    Information communication technologies (ICTs) may improve health delivery by enhancing informational continuity of care and enabling secondary use of health data including public health surveillance and research. ICTs also introduce concerns related to privacy. In this paper, we examine and address this tension in the context of the development and implementation of a novel platform that will enable the documentation and communication of patient-specific ADE information, titled ActionADE. We explored privacy concerns qualitatively from the perspective of patients. Our findings will inform a series of recommendations for system design that seek to balance the need to both share and protect personal health information.

  16. Real-time sharing of gaze data between multiple eye trackers-evaluation, tools, and advice.

    Science.gov (United States)

    Nyström, Marcus; Niehorster, Diederick C; Cornelissen, Tim; Garde, Henrik

    2017-08-01

    Technological advancements in combination with significant reductions in price have made it practically feasible to run experiments with multiple eye trackers. This enables new types of experiments with simultaneous recordings of eye movement data from several participants, which is of interest for researchers in, e.g., social and educational psychology. The Lund University Humanities Laboratory recently acquired 25 remote eye trackers, which are connected over a local wireless network. As a first step toward running experiments with this setup, demanding situations with real time sharing of gaze data were investigated in terms of network performance as well as clock and screen synchronization. Results show that data can be shared with a sufficiently low packet loss (0.1 %) and latency (M = 3 ms, M A D = 2 ms) across 8 eye trackers at a rate of 60 Hz. For a similar performance using 24 computers, the send rate needs to be reduced to 20 Hz. To help researchers conduct similar measurements on their own multi-eye-tracker setup, open source software written in Python and PsychoPy are provided. Part of the software contains a minimal working example to help researchers kick-start experiments with two or more eye trackers.

  17. MedBlock: Efficient and Secure Medical Data Sharing Via Blockchain.

    Science.gov (United States)

    Fan, Kai; Wang, Shangyang; Ren, Yanhui; Li, Hui; Yang, Yintang

    2018-06-21

    With the development of electronic information technology, electronic medical records (EMRs) have been a common way to store the patients' data in hospitals. They are stored in different hospitals' databases, even for the same patient. Therefore, it is difficult to construct a summarized EMR for one patient from multiple hospital databases due to the security and privacy concerns. Meanwhile, current EMRs systems lack a standard data management and sharing policy, making it difficult for pharmaceutical scientists to develop precise medicines based on data obtained under different policies. To solve the above problems, we proposed a blockchain-based information management system, MedBlock, to handle patients' information. In this scheme, the distributed ledger of MedBlock allows the efficient EMRs access and EMRs retrieval. The improved consensus mechanism achieves consensus of EMRs without large energy consumption and network congestion. In addition, MedBlock also exhibits high information security combining the customized access control protocols and symmetric cryptography. MedBlock can play an important role in the sensitive medical information sharing.

  18. Interoperable web applications for sharing data and products of the International DORIS Service

    Science.gov (United States)

    Soudarin, L.; Ferrage, P.

    2017-12-01

    The International DORIS Service (IDS) was created in 2003 under the umbrella of the International Association of Geodesy (IAG) to foster scientific research related to the French satellite tracking system DORIS and to deliver scientific products, mostly related to the International Earth rotation and Reference systems Service (IERS). Since its start, the organization has continuously evolved, leading to additional and improved operational products from an expanded set of DORIS Analysis Centers. In addition, IDS has developed services for sharing data and products with the users. Metadata and interoperable web applications are proposed to explore, visualize and download the key products such as the position time series of the geodetic points materialized at the ground tracking stations. The Global Geodetic Observing System (GGOS) encourages the IAG Services to develop such interoperable facilities on their website. The objective for GGOS is to set up an interoperable portal through which the data and products produced by the IAG Services can be served to the user community. We present the web applications proposed by IDS to visualize time series of geodetic observables or to get information about the tracking ground stations and the tracked satellites. We discuss the future plans for IDS to meet the recommendations of GGOS. The presentation also addresses the needs for the IAG Services to adopt common metadata thesaurus to describe data and products, and interoperability standards to share them.

  19. Archiving, sharing, processing and publishing historical earthquakes data: the IT point of view

    Science.gov (United States)

    Locati, Mario; Rovida, Andrea; Albini, Paola

    2014-05-01

    Digital tools devised for seismological data are mostly designed for handling instrumentally recorded data. Researchers working on historical seismology are forced to perform their daily job using a general purpose tool and/or coding their own to address their specific tasks. The lack of out-of-the-box tools expressly conceived to deal with historical data leads to a huge amount of time lost in performing tedious task to search for the data and, to manually reformat it in order to jump from one tool to the other, sometimes causing a loss of the original data. This reality is common to all activities related to the study of earthquakes of the past centuries, from the interpretations of past historical sources, to the compilation of earthquake catalogues. A platform able to preserve the historical earthquake data, trace back their source, and able to fulfil many common tasks was very much needed. In the framework of two European projects (NERIES and SHARE) and one global project (Global Earthquake History, GEM), two new data portals were designed and implemented. The European portal "Archive of Historical Earthquakes Data" (AHEAD) and the worldwide "Global Historical Earthquake Archive" (GHEA), are aimed at addressing at least some of the above mentioned issues. The availability of these new portals and their well-defined standards makes it easier than before the development of side tools for archiving, publishing and processing the available historical earthquake data. The AHEAD and GHEA portals, their underlying technologies and the developed side tools are presented.

  20. The SEEK: a platform for sharing data and models in systems biology.

    Science.gov (United States)

    Wolstencroft, Katy; Owen, Stuart; du Preez, Franco; Krebs, Olga; Mueller, Wolfgang; Goble, Carole; Snoep, Jacky L

    2011-01-01

    Systems biology research is typically performed by multidisciplinary groups of scientists, often in large consortia and in distributed locations. The data generated in these projects tend to be heterogeneous and often involves high-throughput "omics" analyses. Models are developed iteratively from data generated in the projects and from the literature. Consequently, there is a growing requirement for exchanging experimental data, mathematical models, and scientific protocols between consortium members and a necessity to record and share the outcomes of experiments and the links between data and models. The overall output of a research consortium is also a valuable commodity in its own right. The research and associated data and models should eventually be available to the whole community for reuse and future analysis. The SEEK is an open-source, Web-based platform designed for the management and exchange of systems biology data and models. The SEEK was originally developed for the SysMO (systems biology of microorganisms) consortia, but the principles and objectives are applicable to any systems biology project. The SEEK provides an index of consortium resources and acts as gateway to other tools and services commonly used in the community. For example, the model simulation tool, JWS Online, has been integrated into the SEEK, and a plug-in to PubMed allows publications to be linked to supporting data and author profiles in the SEEK. The SEEK is a pragmatic solution to data management which encourages, but does not force, researchers to share and disseminate their data to community standard formats. It provides tools to assist with management and annotation as well as incentives and added value for following these recommendations. Data exchange and reuse rely on sufficient annotation, consistent metadata descriptions, and the use of standard exchange formats for models, data, and the experiments they are derived from. In this chapter, we present the SEEK platform

  1. Analysis of personal data-sharing consent factors, with focus on loyalty programs in the Czech Republic

    Directory of Open Access Journals (Sweden)

    Tomas Formanek

    2018-05-01

    Full Text Available The purpose of this study is to provide structured, topical and representative analysis of personal data sharing preferences in the Czech Republic. Within the context of personal data sharing and protection, we focus on profiling individuals who voluntarily share their personal data with good-faith corporate entities. Loyalty program operators serve as a common and representative model of commercially driven collection and processing of personal data. We address different types of personal data and factors affecting individual data-sharing consent. Our original research is based on primary surveyed data (806 respondents surveyed during 2017. Multiple quantitative methods such as hierarchical clustering and logistic regression are employed in the analysis. Also, an important part of our research is based on evaluation of structured in-depth interviews, focused on personal data sharing and protection topics. We find pronounced socio-demographic differences in individual propensity to share one’s personal data with commercial data processors. Main findings and contrasting factors are pointed out and discussed within the paper. Our analysis reflects the needs of academic and corporate researches to whom it provides actionable and stratified results, especially in context of the new EU legislation: the GDPR directive on personal data protection.

  2. Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis.

    Science.gov (United States)

    Stoltzfus, Arlin; O'Meara, Brian; Whitacre, Jamie; Mounce, Ross; Gillespie, Emily L; Kumar, Sudhir; Rosauer, Dan F; Vos, Rutger A

    2012-10-22

    Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use. Here we summarize results of an ongoing analysis of current practices for archiving phylogenetic trees and associated data, current practices of re-use, and current barriers to re-use. We find that the technical infrastructure is available to support rudimentary archiving, but the frequency of archiving is low. Currently, most phylogenetic knowledge is not easily re-used due to a lack of archiving, lack of awareness of best practices, and lack of community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree. The technologies and practices that facilitate data re-use can catalyze synthetic and integrative research. However, success will require engagement from various stakeholders including individual scientists who produce or consume shareable data, publishers, policy-makers, technology developers and resource-providers. The critical challenges for facilitating re-use of phylogenetic trees and associated data, we suggest, include: a broader commitment to public archiving; more extensive use of globally meaningful identifiers; development of user-friendly technology for annotating, submitting, searching, and retrieving data and their metadata; and development of a minimum reporting standard (MIAPA) indicating

  3. Supporting Effective Data Sharing and Re-Use: What Can Funders Really Do?

    Science.gov (United States)

    Uhle, M. E.

    2017-12-01

    Most research funding agencies have data policies that grantees must abide to receive financial support for projects and activities. These policies however are typically not uniform, can be inconsistent and in some cases, can be contradictory preventing national and international collaboration. In addition, disciplinary divisions within a single agency may implement agency policy differently. These barriers are particularly profound for multi, inter- and/or transdisciplinary research needed to address many global environmental challenges. Recognizing the crucial role of open and effective data and information exchange to support effective international transdisciplinary research for understanding, mitigating and adapting to global environmental change, the Belmont Forum adopted Open Data Policy and Principles in 2015. This policy signals a commitment by these 25 funders to increase access to scientific data, a step widely recognized as essential to making informed decisions in the face of rapid changes affecting the Earth's environment. Through collaborative research actions and community driven activities, the Belmont Forum seeks to widen access to data, and promote its long-term preservation in global change research; encourage re-use of existing data; help improve data management and exploitation; coordinate and integrate disparate organizational and technical elements; fill critical global e-infrastructure gaps; share best practices; and foster new data literacy.

  4. Designing a scalable video-on-demand server with data sharing

    Science.gov (United States)

    Lim, Hyeran; Du, David H. C.

    2001-01-01

    As current disk space and transfer speed increase, the bandwidth between a server and its disks has become critical for video-on-demand (VOD) services. Our VOD server consists of several hosts sharing data on disks through a ring-based network. Data sharing provided by the spatial-reuse ring network between servers and disks not only increases the utilization towards full bandwidth but also improves the availability of videos. Striping and replication methods are introduced in order to improve the efficiency of our VOD server system as well as the availability of videos. We consider tow kinds of resources of a VOD server system. Given a representative access profile, our intention is to propose an algorithm to find an initial condition, place videos on disks in the system successfully. If any copy of a video cannot be placed due to lack of resources, more servers/disks are added. When all videos are place on the disks by our algorithm, the final configuration is determined with indicator of how tolerable it is against the fluctuation in demand of videos. Considering it is a NP-hard problem, our algorithm generates the final configuration with O(M log M) at best, where M is the number of movies.

  5. The Unstructured Data Sharing System for Natural resources and Environment Science Data of the Chinese Academy of Science

    Directory of Open Access Journals (Sweden)

    Dafang Zhuang

    2007-10-01

    Full Text Available The data sharing system for resource and environment science databases of the Chinese Academy of Science (CAS is of an open three-tiered architecture, which integrates the geographical databases of about 9 institutes of CAS by the mechanism of distributive unstructured data management, metadata integration, catalogue services, and security control. The data tiers consist of several distributive data servers that are located in each CAS institute and support such unstructured data formats as vector files, remote sensing images or other raster files, documents, multi-media files, tables, and other format files. For the spatial data files, format transformation service is provided. The middle tier involves a centralized metadata server, which stores metadata records of data on all data servers. The primary function of this tier is catalog service, supporting the creation, search, browsing, updating, and deletion of catalogs. The client tier involves an integrated client that provides the end-users interfaces to search, browse, and download data or create a catalog and upload data.

  6. Sharing regulatory data as tools for strengthening health systems in the Region of the Americas

    Directory of Open Access Journals (Sweden)

    Varley Dias Sousa

    Full Text Available ABSTRACT Regulatory transparency is an imperative characteristic of a reliable National Regulatory Authority. In the region of the Americas, the process of building an open government is still fragile and fragmented across various Health Regulatory Agencies (HRAs and Regional Reference Authorities (RRAs. This study assessed the transparency status of RRAs, focusing on various medicine life-cycle documents (the Medicine Dossier, Clinical Trial Report, and Inspection Report as tools for strengthening health systems. Based on a narrative (nonsystematic review of RRA regulatory transparency, transparency status was classified as one of two types: public disclosure of information (intra-agency data and data- and work-sharing (inter-agency data. The risks/benefits of public disclosure of medicine-related information were assessed, taking into account 1 the involvement and roles of multiple stakeholders (health care professionals, regulators, industry, community, and academics and 2 the protection of commercial and personal confidential data. Inter-agency data- and work-sharing was evaluated in the context of harmonization and cooperation projects that focus on regulatory convergence. Technical and practical steps for establishing an openness directive for the pharmaceutical regulatory environment are proposed to improve and strengthen health systems in the Americas. Addressing these challenges requires leadership from entities such as the Pan American Health Organization to steer and support collaborative regional alliances that advance the development and establishment of a trustworthy regulatory environment and a sustainable public health system in the Americas, using international successful initiatives as reference and taking into account the domestic characteristics and experiences of each individual country.

  7. Sharing regulatory data as tools for strengthening health systems in the Region of the Americas.

    Science.gov (United States)

    Sousa, Varley Dias; Ramalho, Pedro I; Silveira, Dâmaris

    2016-05-01

    Regulatory transparency is an imperative characteristic of a reliable National Regulatory Authority. In the region of the Americas, the process of building an open government is still fragile and fragmented across various Health Regulatory Agencies (HRAs) and Regional Reference Authorities (RRAs). This study assessed the transparency status of RRAs, focusing on various medicine life-cycle documents (the Medicine Dossier, Clinical Trial Report, and Inspection Report) as tools for strengthening health systems. Based on a narrative (nonsystematic) review of RRA regulatory transparency, transparency status was classified as one of two types: public disclosure of information (intra-agency data) and data- and work-sharing (inter-agency data). The risks/benefits of public disclosure of medicine-related information were assessed, taking into account 1) the involvement and roles of multiple stakeholders (health care professionals, regulators, industry, community, and academics) and 2) the protection of commercial and personal confidential data. Inter-agency data- and work-sharing was evaluated in the context of harmonization and cooperation projects that focus on regulatory convergence. Technical and practical steps for establishing an openness directive for the pharmaceutical regulatory environment are proposed to improve and strengthen health systems in the Americas. Addressing these challenges requires leadership from entities such as the Pan American Health Organization to steer and support collaborative regional alliances that advance the development and establishment of a trustworthy regulatory environment and a sustainable public health system in the Americas, using international successful initiatives as reference and taking into account the domestic characteristics and experiences of each individual country.

  8. Optimizing primary care research participation: a comparison of three recruitment methods in data-sharing studies.

    Science.gov (United States)

    Lord, Paul A; Willis, Thomas A; Carder, Paul; West, Robert M; Foy, Robbie

    2016-04-01

    Recruitment of representative samples in primary care research is essential to ensure high-quality, generalizable results. This is particularly important for research using routinely recorded patient data to examine the delivery of care. Yet little is known about how different recruitment strategies influence the characteristics of the practices included in research. We describe three approaches for recruiting practices to data-sharing studies, examining differences in recruitment levels and practice representativeness. We examined three studies that included varying populations of practices from West Yorkshire, UK. All used anonymized patient data to explore aspects of clinical practice. Recruitment strategies were 'opt-in', 'mixed opt-in and opt-out' and 'opt-out'. We compared aggregated practice data between recruited and not-recruited practices for practice list size, deprivation, chronic disease management, patient experience and rates of unplanned hospital admission. The opt-out strategy had the highest recruitment (80%), followed by mixed (70%) and opt-in (58%). Practices opting-in were larger (median 7153 versus 4722 patients, P = 0.03) than practices that declined to opt-in. Practices recruited by mixed approach were larger (median 7091 versus 5857 patients, P = 0.04) and had differences in the clinical quality measure (58.4% versus 53.9% of diabetic patients with HbA1c ≤ 59 mmol/mol, P Researchers should, with appropriate ethical safeguards, consider opt-out recruitment of practices for studies involving anonymized patient data sharing. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. A shared data approach more accurately represents the rates and patterns of violence with injury assaults.

    Science.gov (United States)

    Gray, Benjamin J; Barton, Emma R; Davies, Alisha R; Long, Sara J; Roderick, Janine; Bellis, Mark A

    2017-12-01

    To investigate whether sharing and linking routinely collected violence data across health and criminal justice systems can provide a more comprehensive understanding of violence, establish patterns of under-reporting and better inform the development, implementation and evaluation of violence prevention initiatives. Police violence with injury (VWI) crimed data and emergency department (ED) assault attendee data for South Wales were collected between 1 April 2014 and 31 March 2016 to examine the rates and patterns of VWI. Person identifiable data (PID) were cross-referenced to establish if certain victims or events were less likely to be reported to criminal justice services. A total of 18 316 police crimed VWI victims and 10 260 individual ED attendances with an assault-related injury were considered. The majority of ED assault attendances (59.0%) were unknown to police. The key demographic identified as under-reporting to police were young males aged 18-34 years, while a significant amount of non-reported assaults involved a stranger. The combined monthly age-standardised rates were recalculated and on average were 74.7 (95% CI 72.1 to 77.2) and 66.1 (95% CI 64.0 to 68.2) per 100 000 population for males and females, respectively. Consideration of the additional ED cases resulted in a 35.3% and 18.1% increase on the original police totals for male and female VWI victims. This study identified that violence is currently undermeasured, demonstrated the importance of continued sharing of routinely collected ED data and highlighted the benefits of using PID from a number of services in a linked way to provide a more comprehensive picture of violence. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  10. The Virtual Seismic Atlas Project: sharing the interpretation of seismic data

    Science.gov (United States)

    Butler, R.; Mortimer, E.; McCaffrey, B.; Stuart, G.; Sizer, M.; Clayton, S.

    2007-12-01

    Through the activities of academic research programs, national institutions and corporations, especially oil and gas companies, there is a substantial volume of seismic reflection data. Although the majority is proprietary and confidential, there are significant volumes of data that are potentially within the public domain and available for research. Yet the community is poorly connected to these data and consequently geological and other research using seismic reflection data is limited to very few groups of researchers. This is about to change. The Virtual Seismic Atlas (VSA) is generating an independent, free-to-use, community based internet resource that captures and shares the geological interpretation of seismic data globally. Images and associated documents are explicitly indexed using not only existing survey and geographical data but also on the geology they portray. By using "Guided Navigation" to search, discover and retrieve images, users are exposed to arrays of geological analogues that provide novel insights and opportunities for research and education. The VSA goes live, with evolving content and functionality, through 2008. There are opportunities for designed integration with other global data programs in the earth sciences.

  11. caNanoLab: data sharing to expedite the use of nanotechnology in biomedicine

    Science.gov (United States)

    Gaheen, Sharon; Hinkal, George W.; Morris, Stephanie A.; Lijowski, Michal; Heiskanen, Mervi

    2014-01-01

    The use of nanotechnology in biomedicine involves the engineering of nanomaterials to act as therapeutic carriers, targeting agents and diagnostic imaging devices. The application of nanotechnology in cancer aims to transform early detection, targeted therapeutics and cancer prevention and control. To assist in expediting and validating the use of nanomaterials in biomedicine, the National Cancer Institute (NCI) Center for Biomedical Informatics and Information Technology, in collaboration with the NCI Alliance for Nanotechnology in Cancer (Alliance), has developed a data sharing portal called caNanoLab. caNanoLab provides access to experimental and literature curated data from the NCI Nanotechnology Characterization Laboratory, the Alliance and the greater cancer nanotechnology community. PMID:25364375

  12. How Do We Measure Value in Data Reuse? Ethical Data Sharing for the Social Sciences and Indigenous Knowledge

    Science.gov (United States)

    Strawhacker, C.

    2017-12-01

    As a result of the `open data' movement, an increased focus on how data should be attributed and cited has become increasingly important. As data becomes reused in analyses not performed by the initial data creator, efforts have turned to crediting the data creator, such as data citation and metrics of reuse to ensure appropriate attribution to the original data author. The increased focused on metrics and citation, however, need to be carefully considered when it comes to social science data, local observations, and Indigenous Knowledge held by Indigenous communities. These diverse and sometimes sensitive data/information/knowledge sets often require deep nuance, thought, and compromise within the `open data' framework, in order to consider issues of the confidentiality of research subject and the ownership of data and information, often in a colonial context. Furthermore, these datasets are often highly valuable to one or two villages, saving lives and retaining culture within. In these cases quantitative metrics of "data reuse" and citation do not adequately measure a dataset's `value.' On this panel, I will provide examples of datasets that are highly valuable to small communities from my research in the Arctic and US Southwest. These datasets are not highly cited or have impressive quantitative metrics (e.g., number of downloads) but have been incredibly valuable to the community where the data/information/Knowledge are held. These cases include atlases of placenames held by elders in small Arctic communities, as well as databases of local observations of wildlife and sea ice in Alaska that are essential for sharing knowledge across multiple villages. These examples suggest that a more nuanced approach to understanding how data should be accredited would be useful when working with social science data and Indigenous Knowledge.

  13. Social Cultural Data - Social Impacts of Catch Shares in the West Coast Groundfish Fishery

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Catch shares are one method of catch allocation utilized by fisheries managers in the United States West Coast groundfish fishery. Catch share management results in...

  14. Finding our way: On the sharing and reuse of animal telemetry data in Australasia

    International Nuclear Information System (INIS)

    Campbell, Hamish A.; Beyer, Hawthorne L.; Dennis, Todd E.; Dwyer, Ross G.; Forester, James D.; Fukuda, Yusuke; Lynch, Catherine; Hindell, Mark A.; Menke, Norbert; Morales, Juan M.; Richardson, Craig; Rodgers, Essie; Taylor, Graeme; Watts, Matt E.; Westcott, David A.

    2015-01-01

    The presence and movements of organisms both reflect and influence the distribution of ecological resources in space and time. The monitoring of animal movement by telemetry devices is being increasingly used to inform management of marine, freshwater and terrestrial ecosystems. Here, we brought together academics, and environmental managers to determine the extent of animal movement research in the Australasian region, and assess the opportunities and challenges in the sharing and reuse of these data. This working group was formed under the Australian Centre for Ecological Analysis and Synthesis (ACEAS), whose overall aim was to facilitate trans-organisational and transdisciplinary synthesis. We discovered that between 2000 and 2012 at least 501 peer-reviewed scientific papers were published that report animal location data collected by telemetry devices from within the Australasian region. Collectively, this involved the capture and electronic tagging of 12 656 animals. The majority of studies were undertaken to address specific management questions; rarely were these data used beyond their original intent. We estimate that approximately half (~ 500) of all animal telemetry projects undertaken remained unpublished, a similar proportion were not discoverable via online resources, and less than 8.8% of all animals tagged and tracked had their data stored in a discoverable and accessible manner. Animal telemetry data contain a wealth of information about how animals and species interact with each other and the landscapes they inhabit. These data are expensive and difficult to collect and can reduce survivorship of the tagged individuals, which implies an ethical obligation to make the data available to the scientific community. This is the first study to quantify the gap between telemetry devices placed on animals and findings/data published, and presents methods for improvement. Instigation of these strategies will enhance the cost-effectiveness of the research and

  15. Finding our way: On the sharing and reuse of animal telemetry data in Australasia

    Energy Technology Data Exchange (ETDEWEB)

    Campbell, Hamish A., E-mail: hamish.campbell@une.edu.au [Department of Ecosystem Management, School of Environment and Rural Sciences, University of New England, Armidale, NSW (Australia); Beyer, Hawthorne L. [ARC Centre of Excellence for Environmental Decisions, Centre for Biodiversity & Conservation Science, University of Queensland, Brisbane, QLD (Australia); Dennis, Todd E. [School of Biological Sciences, University of Auckland, Private Bag 92019, Auckland (New Zealand); Dwyer, Ross G. [School of Biological Sciences, University of Queensland, St Lucia, QLD (Australia); Forester, James D. [Dept. Fisheries, Wildlife, and Conservation Biology, University of Minnesota, St. Paul, MN (United States); Fukuda, Yusuke [Department of Land Resource Management, PO Box 496, Palmerston, NT (Australia); Lynch, Catherine [Arid Recovery, PO Box 147, Roxby Downs, SA (Australia); Hindell, Mark A. [University of Tasmania, Hobart, TAS (Australia); Menke, Norbert [Queensland Department of Science, Information, Technoloty, Innovation and the Arts, Brisbane, QLD (Australia); Morales, Juan M. [Ecotono, INIBIOMA—CONICET, Universidad Nacional del Comahue, Quintral 1250, 8400 Bariloche (Argentina); Richardson, Craig [Ecological Resources Information Network, Department of the Environment, Canberra, ACT (Australia); Rodgers, Essie [School of Biological Sciences, University of Queensland, St Lucia, QLD (Australia); Taylor, Graeme [Department of Conservation, PO Box 10420, Wellington 6143 (New Zealand); Watts, Matt E. [ARC Centre of Excellence for Environmental Decisions, Centre for Biodiversity & Conservation Science, University of Queensland, Brisbane, QLD (Australia); Westcott, David A. [Commonwealth Scientific and Industrial Research Organisation, PO Box 780, Atherton, QLD (Australia)

    2015-11-15

    The presence and movements of organisms both reflect and influence the distribution of ecological resources in space and time. The monitoring of animal movement by telemetry devices is being increasingly used to inform management of marine, freshwater and terrestrial ecosystems. Here, we brought together academics, and environmental managers to determine the extent of animal movement research in the Australasian region, and assess the opportunities and challenges in the sharing and reuse of these data. This working group was formed under the Australian Centre for Ecological Analysis and Synthesis (ACEAS), whose overall aim was to facilitate trans-organisational and transdisciplinary synthesis. We discovered that between 2000 and 2012 at least 501 peer-reviewed scientific papers were published that report animal location data collected by telemetry devices from within the Australasian region. Collectively, this involved the capture and electronic tagging of 12 656 animals. The majority of studies were undertaken to address specific management questions; rarely were these data used beyond their original intent. We estimate that approximately half (~ 500) of all animal telemetry projects undertaken remained unpublished, a similar proportion were not discoverable via online resources, and less than 8.8% of all animals tagged and tracked had their data stored in a discoverable and accessible manner. Animal telemetry data contain a wealth of information about how animals and species interact with each other and the landscapes they inhabit. These data are expensive and difficult to collect and can reduce survivorship of the tagged individuals, which implies an ethical obligation to make the data available to the scientific community. This is the first study to quantify the gap between telemetry devices placed on animals and findings/data published, and presents methods for improvement. Instigation of these strategies will enhance the cost-effectiveness of the research and

  16. Global bike share: What the data tells us about road safety

    NARCIS (Netherlands)

    Fishman, E.; Schepers, J.P.

    Introduction Bike share has emerged as a rapidly growing mode of transport in over 800 cities globally, up from just a handful in the 1990s. Some analysts had forecast a rise in the number of bicycle crashes after the introduction of bike share, but empirical research on bike share safety is rare.

  17. Better governance, better access: practising responsible data sharing in the METADAC governance infrastructure.

    Science.gov (United States)

    Murtagh, Madeleine J; Blell, Mwenza T; Butters, Olly W; Cowley, Lorraine; Dove, Edward S; Goodman, Alissa; Griggs, Rebecca L; Hall, Alison; Hallowell, Nina; Kumari, Meena; Mangino, Massimo; Maughan, Barbara; Mills, Melinda C; Minion, Joel T; Murphy, Tom; Prior, Gillian; Suderman, Matthew; Ring, Susan M; Rogers, Nina T; Roberts, Stephanie J; Van der Straeten, Catherine; Viney, Will; Wiltshire, Deborah; Wong, Andrew; Walker, Neil; Burton, Paul R

    2018-04-26

    Genomic and biosocial research data about individuals is rapidly proliferating, bringing the potential for novel opportunities for data integration and use. The scale, pace and novelty of these applications raise a number of urgent sociotechnical, ethical and legal questions, including optimal methods of data storage, management and access. Although the open science movement advocates unfettered access to research data, many of the UK's longitudinal cohort studies operate systems of managed data access, in which access is governed by legal and ethical agreements between stewards of research datasets and researchers wishing to make use of them. Amongst other things, these agreements aim to respect the reasonable expectations of the research participants who provided data and samples, as expressed in the consent process. Arguably, responsible data management and governance of data and sample use are foundational to the consent process in longitudinal studies and are an important source of trustworthiness in the eyes of those who contribute data to genomic and biosocial research. This paper presents an ethnographic case study exploring the foundational principles of a governance infrastructure for Managing Ethico-social, Technical and Administrative issues in Data ACcess (METADAC), which are operationalised through a committee known as the METADAC Access Committee. METADAC governs access to phenotype, genotype and 'omic' data and samples from five UK longitudinal studies. Using the example of METADAC, we argue that three key structural features are foundational for practising responsible data sharing: independence and transparency; interdisciplinarity; and participant-centric decision-making. We observe that the international research community is proactively working towards optimising the use of research data, integrating/linking these data with routine data generated by health and social care services and other administrative data services to improve the analysis

  18. The national drug abuse treatment clinical trials network data share project: website design, usage, challenges, and future directions.

    Science.gov (United States)

    Shmueli-Blumberg, Dikla; Hu, Lian; Allen, Colleen; Frasketi, Michael; Wu, Li-Tzy; Vanveldhuisen, Paul

    2013-01-01

    There are many benefits of data sharing, including the promotion of new research from effective use of existing data, replication of findings through re-analysis of pooled data files, meta-analysis using individual patient data, and reinforcement of open scientific inquiry. A randomized controlled trial is considered as the 'gold standard' for establishing treatment effectiveness, but clinical trial research is very costly, and sharing data is an opportunity to expand the investment of the clinical trial beyond its original goals at minimal costs. We describe the goals, developments, and usage of the Data Share website (http://www.ctndatashare.org) for the National Drug Abuse Treatment Clinical Trials Network (CTN) in the United States, including lessons learned, limitations, and major revisions, and considerations for future directions to improve data sharing. Data management and programming procedures were conducted to produce uniform and Health Insurance Portability and Accountability Act (HIPAA)-compliant de-identified research data files from the completed trials of the CTN for archiving, managing, and sharing on the Data Share website. Since its inception in 2006 and through October 2012, nearly 1700 downloads from 27 clinical trials have been accessed from the Data Share website, with the use increasing over the years. Individuals from 31 countries have downloaded data from the website, and there have been at least 13 publications derived from analyzing data through the public Data Share website. Minimal control over data requests and usage has resulted in little information and lack of control regarding how the data from the website are used. Lack of uniformity in data elements collected across CTN trials has limited cross-study analyses. The Data Share website offers researchers easy access to de-identified data files with the goal to promote additional research and identify new findings from completed CTN studies. To maximize the utility of the website

  19. Calculation of retention time tolerance windows with absolute confidence from shared liquid chromatographic retention data.

    Science.gov (United States)

    Boswell, Paul G; Abate-Pella, Daniel; Hewitt, Joshua T

    2015-09-18

    Compound identification by liquid chromatography-mass spectrometry (LC-MS) is a tedious process, mainly because authentic standards must be run on a user's system to be able to confidently reject a potential identity from its retention time and mass spectral properties. Instead, it would be preferable to use shared retention time/index data to narrow down the identity, but shared data cannot be used to reject candidates with an absolute level of confidence because the data are strongly affected by differences between HPLC systems and experimental conditions. However, a technique called "retention projection" was recently shown to account for many of the differences. In this manuscript, we discuss an approach to calculate appropriate retention time tolerance windows for projected retention times, potentially making it possible to exclude candidates with an absolute level of confidence, without needing to have authentic standards of each candidate on hand. In a range of multi-segment gradients and flow rates run among seven different labs, the new approach calculated tolerance windows that were significantly more appropriate for each retention projection than global tolerance windows calculated for retention projections or linear retention indices. Though there were still some small differences between the labs that evidently were not taken into account, the calculated tolerance windows only needed to be relaxed by 50% to make them appropriate for all labs. Even then, 42% of the tolerance windows calculated in this study without standards were narrower than those required by WADA for positive identification, where standards must be run contemporaneously. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. OLYMPUS DISS - A Readily Implemented Geographic Data and Information Sharing System

    Science.gov (United States)

    Necsoiu, D. M.; Winfrey, B.; Murphy, K.; McKague, H. L.

    2002-12-01

    Electronic information technology has become a crucial component of business, government, and scientific organizations. In this technology era, many enterprises are moving away from the perception that information repositories are only a tool for decision-making. Instead, many organizations are learning that information systems, which are capable of organizing and following the interrelations between information and both the short-term and strategic organizational goals, are assets themselves, with inherent value. Olympus Data and Information Sharing System (DISS) is a system developed at the Center for Nuclear Waste Regulatory Analyses (CNWRA) to solve several difficult tasks associated with the management of geographical, geological and geophysical data. Three of the tasks were to (1) gather the large amount of heterogeneous information that has accumulated over the operational lifespan of CNWRA, (2) store the data in a central, knowledge-based, searchable database and (3) create quick, easy, convenient, and reliable access to that information. Faced with these difficult tasks CNWRA identified the requirements for designing such a system. Key design criteria were: (a) ability to ingest different data formats (i.e., raster, vector, and tabular data); (b) minimal expense using open-source and commercial off-the-shelf software; (c) seamless management of geospatial data, freeing up time for researchers to focus on analyses or algorithm development, rather than on time consuming format conversions; (d) controlled access; and (e) scalable architecture to meet new and continuing demands. Olympus DISS is a solution that can be easily adapted to small and mid-size enterprises dealing with heterogeneous geographic data. It uses established data standards, provides a flexible mechanism to build applications upon and output geographic data in multiple and clear ways. This abstract is an independent product of the CNWRA and does not necessarily reflect the views or

  1. It's all in the timing: calibrating temporal penalties for biomedical data sharing.

    Science.gov (United States)

    Xia, Weiyi; Wan, Zhiyu; Yin, Zhijun; Gaupp, James; Liu, Yongtai; Clayton, Ellen Wright; Kantarcioglu, Murat; Vorobeychik, Yevgeniy; Malin, Bradley A

    2018-01-01

    Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient's data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications. This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health. The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average. The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  2. A Large-Scale Initiative Inviting Patients to Share Personal Fitness Tracker Data with Their Providers: Initial Results.

    Directory of Open Access Journals (Sweden)

    Joshua M Pevnick

    Full Text Available Personal fitness trackers (PFT have substantial potential to improve healthcare.To quantify and characterize early adopters who shared their PFT data with providers.We used bivariate statistics and logistic regression to compare patients who shared any PFT data vs. patients who did not.A patient portal was used to invite 79,953 registered portal users to share their data. Of 66,105 users included in our analysis, 499 (0.8% uploaded data during an initial 37-day study period. Bivariate and regression analysis showed that early adopters were more likely than non-adopters to be younger, male, white, health system employees, and to have higher BMIs. Neither comorbidities nor utilization predicted adoption.Our results demonstrate that patients had little intrinsic desire to share PFT data with their providers, and suggest that patients most at risk for poor health outcomes are least likely to share PFT data. Marketing, incentives, and/or cultural change may be needed to induce such data-sharing.

  3. On the evolving portfolio of community-standards and data sharing policies: turning challenges into new opportunities

    Directory of Open Access Journals (Sweden)

    Sansone Susanna-Assunta

    2012-07-01

    Full Text Available Abstract There are thousands of biology databases with hundreds of terminologies, reporting guidelines, representations models, and exchange formats to help annotate, report, and share bioscience investigations. It is evident, however, that researchers and bioinformaticians struggle to navigate the various standards and to find the appropriate database to collect, manage, and share data. Further, policy makers, funders, and publishers lack sufficient information to formulate their guidelines. In this paper, we highlight a number of key issues that can be used to turn these challenges into new opportunities. It is time for all stakeholders to work together to reconcile cause and effect and make the data-sharing culture functional and efficient.

  4. Who Got All of My Personal Data? Enabling Users to Monitor the Proliferation of Shared Personally Identifiable Information

    OpenAIRE

    Labitzke , Sebastian

    2011-01-01

    Part 4: Privacy and Transparency in the Age of Cloud Computing; International audience; The risk involved when users publish information, which becomes available to an unintentional broad audience via online social networks is evident. It is especially difficult for users of social networks to determine who will get the information before it is shared. Moreover, it is impossible to monitor data flows or to control the access to personal data after sharing the information. In contrast to enter...

  5. panMetaDocs and DataSync - providing a convenient way to share and publish research data

    Science.gov (United States)

    Ulbricht, D.; Klump, J. F.

    2013-12-01

    any XML-based metadata schema. To reduce manual entries of metadata to a minimum and make use of contextual information in a project setting, metadata fields can be populated with static or dynamic content. Access rights can be defined to control visibility and access to stored objects. Notifications about recently updated datasets are available by RSS and e-mail and the entire inventory can be harvested via OAI-PMH. panMetaDocs is optimized to be harvested by panFMP [4]. panMetaDocs is able to mint dataset DOIs though DataCite and uses eSciDocs' REST API to transfer eSciDoc-objects from a non-public 'pending'-status to the published status 'released', which makes data and metadata of the published object available worldwide through the internet. The application scenario presented here shows the adoption of open source applications to data sharing and publication of data. An eSciDoc-repository is used as storage for data and metadata. DataSync serves as a file ingester and distributor, whereas panMetaDocs' main function is to annotate the dataset files with metadata to make them ready for publication and sharing with your own team, or with the scientific community.

  6. Key exchange using biometric identity based encryption for sharing encrypted data in cloud environment

    Science.gov (United States)

    Hassan, Waleed K.; Al-Assam, Hisham

    2017-05-01

    The main problem associated with using symmetric/ asymmetric keys is how to securely store and exchange the keys between the parties over open networks particularly in the open environment such as cloud computing. Public Key Infrastructure (PKI) have been providing a practical solution for session key exchange for loads of web services. The key limitation of PKI solution is not only the need for a trusted third partly (e.g. certificate authority) but also the absent link between data owner and the encryption keys. The latter is arguably more important where accessing data needs to be linked with identify of the owner. Currently available key exchange protocols depend on using trusted couriers or secure channels, which can be subject to man-in-the-middle attack and various other attacks. This paper proposes a new protocol for Key Exchange using Biometric Identity Based Encryption (KE-BIBE) that enables parties to securely exchange cryptographic keys even an adversary is monitoring the communication channel between the parties. The proposed protocol combines biometrics with IBE in order to provide a secure way to access symmetric keys based on the identity of the users in unsecure environment. In the KE-BIOBE protocol, the message is first encrypted by the data owner using a traditional symmetric key before migrating it to a cloud storage. The symmetric key is then encrypted using public biometrics of the users selected by data owner to decrypt the message based on Fuzzy Identity-Based Encryption. Only the selected users will be able to decrypt the message by providing a fresh sample of their biometric data. The paper argues that the proposed solution eliminates the needs for a key distribution centre in traditional cryptography. It will also give data owner the power of finegrained sharing of encrypted data by control who can access their data.

  7. PRODUCE MORE OIL AND GAS VIA eBUSINESS DATA SHARING

    Energy Technology Data Exchange (ETDEWEB)

    Paul Jehn; Mike Stettner

    2004-04-30

    GWPC, DOGGR, and other state agencies propose to build eBusiness applications based on a .NET front-end user interface for the DOE's Energy 100 Award-winning Risk Based Data Management System (RBDMS) data source and XML Web services. This project will slash the costs of regulatory compliance by automating routine regulatory reporting and permit notice review and by making it easier to exchange data with the oil and gas industry--especially small, independent operators. Such operators, who often do not have sophisticated in-house databases, will be able to use a subset of the same RBDMS tools available to the agencies on the desktop to file permit notices and production reports online. Once the data passes automated quality control checks, the application will upload the data into the agency's RBDMS data source. The operators also will have access to state agency datasets to focus exploration efforts and to perform production forecasting, economic evaluations, and risk assessments. With the ability to identify economically feasible oil and gas prospects, including unconventional plays, over the Internet, operators will minimize travel and other costs. Because GWPC will coordinate these data sharing efforts with the Bureau of Land Management (BLM), this project will improve access to public lands and make strides towards reducing the duplicative reporting to which industry is now subject for leases that cross jurisdictions. The resulting regulatory streamlining and improved access to agency data will make more domestic oil and gas available to the American public while continuing to safeguard environmental assets.

  8. Produce More Oil and Gas via eBusiness Data Sharing

    Energy Technology Data Exchange (ETDEWEB)

    Paul Jehn; Mike Stettner; Ben Grunewald

    2005-07-22

    GWPC, DOGGR, and other state agencies propose to build eBusiness applications based on a .NET front-end user interface for the DOE's Energy 100 Award-winning Risk Based Data Management System (RBDMS) data source and XML Web services. This project will slash the costs of regulatory compliance by automating routine regulatory reporting and permit notice review and by making it easier to exchange data with the oil and gas industry--especially small, independent operators. Such operators, who often do not have sophisticated in-house databases, will be able to use a subset of the same RBDMS tools available to the agencies on the desktop to file permit notices and production reports online. Once the data passes automated quality control checks, the application will upload the data into the agency's RBDMS data source. The operators also will have access to state agency datasets to focus exploration efforts and to perform production forecasting, economic evaluations, and risk assessments. With the ability to identify economically feasible oil and gas prospects, including unconventional plays, over the Internet, operators will minimize travel and other costs. Because GWPC will coordinate these data sharing efforts with the Bureau of Land Management (BLM), this project will improve access to public lands and make strides towards reducing the duplicative reporting to which industry is now subject for leases that cross jurisdictions. The resulting regulatory streamlining and improved access to agency data will make more domestic oil and gas available to the American public while continuing to safeguard environmental assets.

  9. Semantic Document Library: A Virtual Research Environment for Documents, Data and Workflows Sharing

    Science.gov (United States)

    Kotwani, K.; Liu, Y.; Myers, J.; Futrelle, J.

    2008-12-01

    The Semantic Document Library (SDL) was driven by use cases from the environmental observatory communities and is designed to provide conventional document repository features of uploading, downloading, editing and versioning of documents as well as value adding features of tagging, querying, sharing, annotating, ranking, provenance, social networking and geo-spatial mapping services. It allows users to organize a catalogue of watershed observation data, model output, workflows, as well publications and documents related to the same watershed study through the tagging capability. Users can tag all relevant materials using the same watershed name and find all of them easily later using this tag. The underpinning semantic content repository can store materials from other cyberenvironments such as workflow or simulation tools and SDL provides an effective interface to query and organize materials from various sources. Advanced features of the SDL allow users to visualize the provenance of the materials such as the source and how the output data is derived. Other novel features include visualizing all geo-referenced materials on a geospatial map. SDL as a component of a cyberenvironment portal (the NCSA Cybercollaboratory) has goal of efficient management of information and relationships between published artifacts (Validated models, vetted data, workflows, annotations, best practices, reviews and papers) produced from raw research artifacts (data, notes, plans etc.) through agents (people, sensors etc.). Tremendous scientific potential of artifacts is achieved through mechanisms of sharing, reuse and collaboration - empowering scientists to spread their knowledge and protocols and to benefit from the knowledge of others. SDL successfully implements web 2.0 technologies and design patterns along with semantic content management approach that enables use of multiple ontologies and dynamic evolution (e.g. folksonomies) of terminology. Scientific documents involved with

  10. Pediatric data sharing in genomic research: attitudes and preferences of parents.

    Science.gov (United States)

    Burstein, Matthew D; Robinson, Jill Oliver; Hilsenbeck, Susan G; McGuire, Amy L; Lau, Ching C

    2014-04-01

    In the United States, data from federally funded genomics studies are stored in national databases, which may be accessible to anyone online (public release) or only to qualified researchers (restricted release). The availability of such data exposes participants to privacy risk and limits the ability to withdraw from research. This exposure is especially challenging for pediatric participants, who are enrolled in studies with parental permission. The current study examines genomic research participants' attitudes to explore differences in data sharing (DS) preferences between parents of pediatric patients and adult patients. A total of 113 parents of pediatric patients and 196 adult participants from 6 genomics studies were randomly assigned to 3 experimental consent forms. Participants were invited to a follow-up structured interview exploring DS preferences, study understanding, and attitudes. Descriptive analyses and regression models were built on responses. Most parents (73.5%) and adult participants (90.3%) ultimately consented to broad public release. However, parents were significantly more restrictive in their data release decisions, not because of understanding or perceived benefits of participation but rather autonomy and control. Parents want to be more involved in the decision about DS and are significantly more concerned than adult participants about unknown future risks. Parents have the same altruistic motivations and grasp of genomics studies as adult participants. However, they are more concerned about future risks to their child, which probably motivates them to choose more restrictive DS options, but only when such options are made available.

  11. A novel, privacy-preserving cryptographic approach for sharing sequencing data

    Science.gov (United States)

    Cassa, Christopher A; Miller, Rachel A; Mandl, Kenneth D

    2013-01-01

    Objective DNA samples are often processed and sequenced in facilities external to the point of collection. These samples are routinely labeled with patient identifiers or pseudonyms, allowing for potential linkage to identity and private clinical information if intercepted during transmission. We present a cryptographic scheme to securely transmit externally generated sequence data which does not require any patient identifiers, public key infrastructure, or the transmission of passwords. Materials and methods This novel encryption scheme cryptographically protects participant sequence data using a shared secret key that is derived from a unique subset of an individual’s genetic sequence. This scheme requires access to a subset of an individual’s genetic sequence to acquire full access to the transmitted sequence data, which helps to prevent sample mismatch. Results We validate that the proposed encryption scheme is robust to sequencing errors, population uniqueness, and sibling disambiguation, and provides sufficient cryptographic key space. Discussion Access to a set of an individual’s genotypes and a mutually agreed cryptographic seed is needed to unlock the full sequence, which provides additional sample authentication and authorization security. We present modest fixed and marginal costs to implement this transmission architecture. Conclusions It is possible for genomics researchers who sequence participant samples externally to protect the transmission of sequence data using unique features of an individual’s genetic sequence. PMID:23125421

  12. A EUROPEAN FRAMEWORK FOR RECORDING AND SHARING DISASTER DAMAGE AND LOSS DATA

    Directory of Open Access Journals (Sweden)

    C. Corbane

    2015-08-01

    Full Text Available The recently adopted ‘Sendai Framework for Action on Disaster Risk Reduction 2015-2030’ sets the goals to reduce loss of life, livelihood and critical infrastructure through enhanced national planning and international cooperation. The new Framework is expected to enhance global, regional and national efforts for building resilience to disasters, across the entire disaster management cycle (prevention, preparedness, response and early recovery. Improved monitoring and accountability frameworks, relying on harmonized disaster loss data will be required for meeting the targets and for capturing the levels of progress across different scales of governance. To overcome the problems of heterogeneous disaster data and terminologies, guidelines for reporting disaster damage and losses in a structured manner will be necessary to help national and regional bodies compile this information. In the European Union, the Member States and the European Commission worked together on the establishment of guidelines for recording and sharing disaster damage and loss data as a first step towards the development of operational indicators to translate the Sendai Framework into action. This paper describes the progress to date in setting a common framework for recording disaster damage and loss data in the European Union and identifies the challenges ahead.

  13. Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services.

    Science.gov (United States)

    Wan, Zhiyu; Vorobeychik, Yevgeniy; Kantarcioglu, Murat; Malin, Bradley

    2017-07-26

    Genomic data is increasingly collected by a wide array of organizations. As such, there is a growing demand to make summary information about such collections available more widely. However, over the past decade, a series of investigations have shown that attacks, rooted in statistical inference methods, can be applied to discern the presence of a known individual's DNA sequence in the pool of subjects. Recently, it was shown that the Beacon Project of the Global Alliance for Genomics and Health, a web service for querying about the presence (or absence) of a specific allele, was vulnerable. The Integrating Data for Analysis, Anonymization, and Sharing (iDASH) Center modeled a track in their third Privacy Protection Challenge on how to mitigate the Beacon vulnerability. We developed the winning solution for this track. This paper describes our computational method to optimize the tradeoff between the utility and the privacy of the Beacon service. We generalize the genomic data sharing problem beyond that which was introduced in the iDASH Challenge to be more representative of real world scenarios to allow for a more comprehensive evaluation. We then conduct a sensitivity analysis of our method with respect to several state-of-the-art methods using a dataset of 400,000 positions in Chromosome 10 for 500 individuals from Phase 3 of the 1000 Genomes Project. All methods are evaluated for utility, privacy and efficiency. Our method achieves better performance than all state-of-the-art methods, irrespective of how key factors (e.g., the allele frequency in the population, the size of the pool and utility weights) change from the original parameters of the problem. We further illustrate that it is possible for our method to exhibit subpar performance under special cases of allele query sequences. However, we show our method can be extended to address this issue when the query sequence is fixed and known a priori to the data custodian, so that they may plan stage their

  14. Facilitating Oil Industry Access to Federal Lands through Interagency Data Sharing

    Energy Technology Data Exchange (ETDEWEB)

    Paul Jehn; Ben Grunewald

    2007-05-31

    -commerce. The next step beyond mere data sharing for facilitating the permitting process is to make it possible for industry to file those permit applications electronically. This process will involve the use of common XML schemas.

  15. How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers

    Science.gov (United States)

    Pepe, Alberto; Goodman, Alyssa; Muench, August; Crosas, Merce; Erdmann, Christopher

    2014-08-01

    We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); unfamiliarity with options that make data-sharing easier (faster) and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  16. How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers.

    Science.gov (United States)

    Pepe, Alberto; Goodman, Alyssa; Muench, August; Crosas, Merce; Erdmann, Christopher

    2014-01-01

    We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); unfamiliarity with options that make data-sharing easier (faster) and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  17. How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers.

    Directory of Open Access Journals (Sweden)

    Alberto Pepe

    Full Text Available We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it; unfamiliarity with options that make data-sharing easier (faster and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  18. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories.

    Science.gov (United States)

    Weber, Griffin M; Murphy, Shawn N; McMurry, Andrew J; Macfadden, Douglas; Nigrin, Daniel J; Churchill, Susanne; Kohane, Isaac S

    2009-01-01

    The authors developed a prototype Shared Health Research Information Network (SHRINE) to identify the technical, regulatory, and political challenges of creating a federated query tool for clinical data repositories. Separate Institutional Review Boards (IRBs) at Harvard's three largest affiliated health centers approved use of their data, and the Harvard Medical School IRB approved building a Query Aggregator Interface that can simultaneously send queries to each hospital and display aggregate counts of the number of matching patients. Our experience creating three local repositories using the open source Informatics for Integrating Biology and the Bedside (i2b2) platform can be used as a road map for other institutions. The authors are actively working with the IRBs and regulatory groups to develop procedures that will ultimately allow investigators to obtain identified patient data and biomaterials through SHRINE. This will guide us in creating a future technical architecture that is scalable to a national level, compliant with ethical guidelines, and protective of the interests of the participating hospitals.

  19. Invited review: Experimental design, data reporting, and sharing in support of animal systems modeling research.

    Science.gov (United States)

    McNamara, J P; Hanigan, M D; White, R R

    2016-12-01

    The National Animal Nutrition Program "National Research Support Project 9" supports efforts in livestock nutrition, including the National Research Council's committees on the nutrient requirements of animals. Our objective was to review the status of experimentation and data reporting in animal nutrition literature and to provide suggestions for the advancement of animal nutrition research and the ongoing improvement of field-applied nutrient requirement models. Improved data reporting consistency and completeness represent a substantial opportunity to improve nutrition-related mathematical models. We reviewed a body of nutrition research; recorded common phrases used to describe diets, animals, housing, and environmental conditions; and proposed equivalent numerical data that could be reported. With the increasing availability of online supplementary material sections in journals, we developed a comprehensive checklist of data that should be included in publications. To continue to improve our research effectiveness, studies utilizing multiple research methodologies to address complex systems and measure multiple variables will be necessary. From the current body of animal nutrition literature, we identified a series of opportunities to integrate research focuses (nutrition, reproduction and genetics) to advance the development of nutrient requirement models. From our survey of current experimentation and data reporting in animal nutrition, we identified 4 key opportunities to advance animal nutrition knowledge: (1) coordinated experiments should be designed to employ multiple research methodologies; (2) systems-oriented research approaches should be encouraged and supported; (3) publication guidelines should be updated to encourage and support sharing of more complete data sets; and (4) new experiments should be more rapidly integrated into our knowledge bases, research programs and practical applications. Copyright © 2016 American Dairy Science Association

  20. Community-managed Data Sharing, Curation, and Publication: SEN on SEAD

    Science.gov (United States)

    Martin, R. L.; Myers, J.; Hsu, L.

    2017-12-01

    While data publication in support of reuse and scientific reproducibility is increasingly being recognized as a key aspect of modern research practice, best practices are still to be developed at the level of scientific communities. Often, such practices are discussed in the abstract - as community standards for data plans or as requirements for yet-to-be-built software - with no clear path to community adoption. In contrast, the Sediment Experimentalist Network, supported through the National Science Foundation's (NSF) EarthCube initiative, has encouraged an iterative, practice-based approach within its community that has resulted in the publication of dozens of datasets, comprised of millions of files totaling more than 4 TB in size, and the documentation of more than 100 experimental procedures, instruments, and facilities, by multiple research teams. A key element of SEN's approach has been to leverage cloud-based data services that provide robust core capabilities with community-based management and customization capabilities. These services - data sharing, curation, and publication services developed through the NSF-supported Sustainable Environment - Actionable Data (SEAD) project and the wiki-based SEN Knowledge Base (KB) - have allowed the SEN team to ground discussions in reality and leverage the practical questions arising as researchers publish data to drive discussion and evolve towards better practices. In this presentation we summarize how SEN interacts with researchers, the best practices that have been developed, and the capabilities of SEAD and the SEN KB that support them. We also describe issues that have arisen in the community - related, for example, to recommended and required metadata, individual, project and community branding, and data version and derivation relationships - and describe how SEN's outreach activities, collaboration with the SEAD team, and the flexible design of the data services themselves have, in combination, been able to

  1. The International Data Sharing Challenge: Realities and Lessons Learned from International Field Projects and Data Analysis Efforts

    Science.gov (United States)

    Williams, S. F.; Moore, J. A.

    2014-12-01

    data sharing and open data. This will be done through the framework of the projects noted above in an environment of proprietary data claims, multiple formats and data collection procedures, stockpiling of data, international data restrictions and mistrust of other scientists.

  2. Virtual memory support for distributed computing environments using a shared data object model

    Science.gov (United States)

    Huang, F.; Bacon, J.; Mapp, G.

    1995-12-01

    Conventional storage management systems provide one interface for accessing memory segments and another for accessing secondary storage objects. This hinders application programming and affects overall system performance due to mandatory data copying and user/kernel boundary crossings, which in the microkernel case may involve context switches. Memory-mapping techniques may be used to provide programmers with a unified view of the storage system. This paper extends such techniques to support a shared data object model for distributed computing environments in which good support for coherence and synchronization is essential. The approach is based on a microkernel, typed memory objects, and integrated coherence control. A microkernel architecture is used to support multiple coherence protocols and the addition of new protocols. Memory objects are typed and applications can choose the most suitable protocols for different types of object to avoid protocol mismatch. Low-level coherence control is integrated with high-level concurrency control so that the number of messages required to maintain memory coherence is reduced and system-wide synchronization is realized without severely impacting the system performance. These features together contribute a novel approach to the support for flexible coherence under application control.

  3. As Libraries Go Digital, Sharing of Data Is at Odds with Tradition of Privacy

    Science.gov (United States)

    Parry, Marc

    2012-01-01

    Colleges share many things on Twitter, but one topic can be risky to broach: the reading habits of library patrons. Patrons' privacy is precious to most librarians. Yet new Web services thrive on collecting and sharing the very information that has long been protected. This points to an emerging tension as libraries embrace digital services.…

  4. Is There Rent Sharing in Italy? Evidence from Employer-Employee Data

    Directory of Open Access Journals (Sweden)

    Alessia Matano

    2011-12-01

    Full Text Available Using a unique employer-employee panel database, we investigate the extent of rent sharing in Italy from 1996 to 2003. We derive the following findings. First, after controlling for the national bargaining level, there is robust evidence of rent sharing at firm level. Second, by means of fixed effects estimates we show that the sorting of high-ability workers into high-profit firms appears to play a substantial role, since it captures a significant amount of cross sectional estimates of rent sharing. Third, in accordance to the related literature the endogeneity of profits causes a severe underestimation of rent sharing. Our final IV estimate of the elasticity of wages with respect to profits per employee amounts to 6%, with a "Lester" range of 24%. Moreover, we point out that the impact of rent sharing is not homogeneous across several dimensions (gender, occupation, sector and macroarea.

  5. A Shared Decision-Making System for Diabetes Medication Choice Utilizing Electronic Health Record Data.

    Science.gov (United States)

    Wang, Yu; Li, Peng-Fei; Tian, Yu; Ren, Jing-Jing; Li, Jing-Song

    2017-09-01

    The use of a shared decision-making (SDM) process in antihyperglycemic medication strategy decisions is necessary due to the complexity of the conditions of diabetes patients. Knowledge of guidelines is used as decision aids in clinical situations, and during this process, no patient health conditions are considered. In this paper, we propose an SDM system framework for type-2 diabetes mellitus (T2DM) patients that not only contains knowledge abstracted from guidelines but also employs a multilabel classification model that uses class-imbalanced electronic health record (EHR) data and that aims to provide a recommended list of available antihyperglycemic medications to help physicians and patients have an SDM conversation. The use of EHR data to serve as a decision-support component in decision aids helps physicians and patients to reach a more intuitive understanding of current health conditions and allows the tailoring of the available knowledge to each patient, leading to a more effective SDM. Real-world data from 2542 T2DM inpatient EHRs were substituted by 77 features and eight output labels, i.e., eight antihyperglycemic medications, and these data were utilized to build and validate the recommendation model. The multilabel recommendation model exhibited stable performance in every single-label classification and showed the ability to predict minority positive cases in which the average recall value of the eight classes was 0.9898. As a whole multilabel classifier, the recommendation model demonstrated outstanding performance, with scores of 0.0941 for Hamming Loss, 0.7611 for Accuracy exam , 0.9664 for Recall exam , and 0.8269 for F exam .

  6. Toward Implementation of the Global Earth Observation System of Systems Data Sharing Principles

    Directory of Open Access Journals (Sweden)

    Paul F Uhlir

    2009-09-01

    Full Text Available This article reflects the views of the authors and not necessarily those of their institutions of employment or affiliation. This article was first written as a “white paper” for the Group on Earth Observations (GEO under Task DA-06-01, “Furthering the Practical Application of the Agreed GEOSS Data Sharing Principles,” which was led beginning in 2006 by the Committee on Data for Science and Technology (CODATA of the International Council for Science (ISCU under the auspices of the GEO Architecture and Data Committee. We would like to thank the many individuals who helped facilitate the writing of the article or its review. These include Michael Rast, Michael Tanner, and Masami Onoda of the GEO Secretariat, and Kathleen Cass of the CODATA Secretariat, all of whom provided a great deal of project guidance and administrative support; Charles Barton, Australian National University, and Jack Hill, the U.S. Geological Survey, for their contributions to the drafting of the article; Santiago Borrero, Dora Ann Lange Canhos, Yukiko Fukasaku, Huadong Guo, Alexei Gvishiani, Bernard Minster, Steve Rossouw, and Fraser Taylor for providing review comments on earlier drafts; and the many representatives to GEO from its Member States or Affiliated Organizations, who also provided significant substantive comments and suggestions. We also acknowledge the strong support and encouragement of José Achache, Director of the GEO Secretariat, who recognized early on the importance of this effort. Finally, we wish to thank the editors of the Journal of Space Law and the CODATA Data Science Journal, for their assistance with the publishing of this article.

  7. Taxation Pressure, Capital Squeeze and Wage Share: A Study Based on the Data of China's Manufacturing Listed Companies

    Institute of Scientific and Technical Information of China (English)

    ZHONG Chunping; CHEN Liang; XUE Cuiying

    2017-01-01

    From the micro-level,this paper measures the proportion of labor income and tries to reveal the deep-lying causes,the interest distribution mechanism behind and the intemal logic for the lower proportion of labor income from the institutional aspect.By making use of the data of China's manufacturing listed companies,it measures the changing trend of wage share paid by enterprises and conducts a quantitative test on the factors affecting the wage share.The results show that,on the average,taxation accounts for 41.0% as the highest,labor wage accounts for 32.8%,and capital share accounts for 26.2%;the share of capital is rising,indicating that laborers are generally in a disadvantageous position.The analysis reveals that the wage share is affected by such factors as the interest gaining capacity of both of the labor and the management,the business taxes to the government and the capital costs payment level.When the labor side lacks sufficient protection,enterprises would shift the excessive taxation and capital pressure to the laborers,leading to the decrease of wage share.The internal causes are that the government levies taxes on the enterprises,in turn the enterprises squeeze the wages of the laborers;because the laborers are short of enough ability to protect their interests,finally their wage and the wage share are always low and declining.

  8. Use of Internet audience measurement data to gauge market share for online health information services.

    Science.gov (United States)

    Wood, Fred B; Benson, Dennis; LaCroix, Eve-Marie; Siegel, Elliot R; Fariss, Susan

    2005-07-01

    The transition to a largely Internet and Web-based environment for dissemination of health information has changed the health information landscape and the framework for evaluation of such activities. A multidimensional evaluative approach is needed. This paper discusses one important dimension of Web evaluation-usage data. In particular, we discuss the collection and analysis of external data on website usage in order to develop a better understanding of the health information (and related US government information) market space, and to estimate the market share or relative levels of usage for National Library of Medicine (NLM) and National Institutes of Health (NIH) websites compared to other health information providers. The primary method presented is Internet audience measurement based on Web usage by external panels of users and assembled by private vendors-in this case, comScore. A secondary method discussed is Web usage based on Web log software data. The principle metrics for both methods are unique visitors and total pages downloaded per month. NLM websites (primarily MedlinePlus and PubMed) account for 55% to 80% of total NIH website usage depending on the metric used. In turn, NIH.gov top-level domain usage (inclusive of NLM) ranks second only behind WebMD in the US domestic home health information market and ranks first on a global basis. NIH.gov consistently ranks among the top three or four US government top-level domains based on global Web usage. On a site-specific basis, the top health information websites in terms of global usage appear to be WebMD, MSN Health, PubMed, Yahoo! Health, AOL Health, and MedlinePlus. Based on MedlinePlus Web log data and external Internet audience measurement data, the three most heavily used cancer-centric websites appear to be www.cancer.gov (National Cancer Institute), www.cancer.org (American Cancer Society), and www.breastcancer.org (non-profit organization). Internet audience measurement has proven useful to NLM

  9. Data sharing in stem cell translational science: policy statement by the International Stem Cell Forum Ethics Working Party.

    Science.gov (United States)

    Bredenoord, Annelien L; Mostert, Menno; Isasi, Rosario; Knoppers, Bartha M

    2015-01-01

    Data and sample sharing constitute a scientific and ethical imperative but need to be conducted in a responsible manner in order to protect individual interests as well as maintain public trust. In 2014, the Global Alliance for Genomics and Health (GA4GH) adopted a common Framework for Responsible Sharing of Genomic and Health-Related Data. The GA4GH Framework is applicable to data sharing in the stem cell field, however, interpretation is required so as to provide guidance for this specific context. In this paper, the International Stem Cell Forum Ethics Working Party discusses those principles that are specific to translational stem cell science, including engagement, data quality and safety, privacy, security and confidentiality, risk-benefit analysis and sustainability.

  10. GeoSearch: a new virtual globe application for the submission, storage, and sharing of point-based ecological data

    Science.gov (United States)

    Cardille, J. A.; Gonzales, R.; Parrott, L.; Bai, J.

    2009-12-01

    How should researchers store and share data? For most of history, scientists with results and data to share have been mostly limited to books and journal articles. In recent decades, the advent of personal computers and shared data formats has made it feasible, though often cumbersome, to transfer data between individuals or among small groups. Meanwhile, the use of automatic samplers, simulation models, and other data-production techniques has increased greatly. The result is that there is more and more data to store, and a greater expectation that they will be available at the click of a button. In 10 or 20 years, will we still send emails to each other to learn about what data exist? The development and widespread familiarity with virtual globes like Google Earth and NASA WorldWind has created the potential, in just the last few years, to revolutionize the way we share data, search for and search through data, and understand the relationship between individual projects in research networks, where sharing and dissemination of knowledge is encouraged. For the last two years, we have been building the GeoSearch application, a cutting-edge online resource for the storage, sharing, search, and retrieval of data produced by research networks. Linking NASA’s WorldWind globe platform, the data browsing toolkit prefuse, and SQL databases, GeoSearch’s version 1.0 enables flexible searches and novel geovisualizations of large amounts of related scientific data. These data may be submitted to the database by individual researchers and processed by GeoSearch’s data parser. Ultimately, data from research groups gathered in a research network would be shared among users via the platform. Access is not limited to the scientists themselves; administrators can determine which data can be presented publicly and which require group membership. Under the auspices of the Canada’s Sustainable Forestry Management Network of Excellence, we have created a moderate-sized database

  11. Why is data sharing in collaborative natural resource efforts so hard and what can we do to improve it?

    Science.gov (United States)

    Volk, Carol J; Lucero, Yasmin; Barnas, Katie

    2014-05-01

    Increasingly, research and management in natural resource science rely on very large datasets compiled from multiple sources. While it is generally good to have more data, utilizing large, complex datasets has introduced challenges in data sharing, especially for collaborating researchers in disparate locations ("distributed research teams"). We surveyed natural resource scientists about common data-sharing problems. The major issues identified by our survey respondents (n = 118) when providing data were lack of clarity in the data request (including format of data requested). When receiving data, survey respondents reported various insufficiencies in documentation describing the data (e.g., no data collection description/no protocol, data aggregated, or summarized without explanation). Since metadata, or "information about the data," is a central obstacle in efficient data handling, we suggest documenting metadata through data dictionaries, protocols, read-me files, explicit null value documentation, and process metadata as essential to any large-scale research program. We advocate for all researchers, but especially those involved in distributed teams to alleviate these problems with the use of several readily available communication strategies including the use of organizational charts to define roles, data flow diagrams to outline procedures and timelines, and data update cycles to guide data-handling expectations. In particular, we argue that distributed research teams magnify data-sharing challenges making data management training even more crucial for natural resource scientists. If natural resource scientists fail to overcome communication and metadata documentation issues, then negative data-sharing experiences will likely continue to undermine the success of many large-scale collaborative projects.

  12. Why is Data Sharing in Collaborative Natural Resource Efforts so Hard and What can We Do to Improve it?

    Science.gov (United States)

    Volk, Carol J.; Lucero, Yasmin; Barnas, Katie

    2014-05-01

    Increasingly, research and management in natural resource science rely on very large datasets compiled from multiple sources. While it is generally good to have more data, utilizing large, complex datasets has introduced challenges in data sharing, especially for collaborating researchers in disparate locations ("distributed research teams"). We surveyed natural resource scientists about common data-sharing problems. The major issues identified by our survey respondents ( n = 118) when providing data were lack of clarity in the data request (including format of data requested). When receiving data, survey respondents reported various insufficiencies in documentation describing the data (e.g., no data collection description/no protocol, data aggregated, or summarized without explanation). Since metadata, or "information about the data," is a central obstacle in efficient data handling, we suggest documenting metadata through data dictionaries, protocols, read-me files, explicit null value documentation, and process metadata as essential to any large-scale research program. We advocate for all researchers, but especially those involved in distributed teams to alleviate these problems with the use of several readily available communication strategies including the use of organizational charts to define roles, data flow diagrams to outline procedures and timelines, and data update cycles to guide data-handling expectations. In particular, we argue that distributed research teams magnify data-sharing challenges making data management training even more crucial for natural resource scientists. If natural resource scientists fail to overcome communication and metadata documentation issues, then negative data-sharing experiences will likely continue to undermine the success of many large-scale collaborative projects.

  13. Occupational position, work stress and depressive symptoms: a pathway analysis of longitudinal SHARE data.

    Science.gov (United States)

    Hoven, H; Wahrendorf, M; Siegrist, J

    2015-05-01

    Several studies tested whether stressful work mediates the association between socioeconomic position (SEP) and health. Although providing moderate support, evidence is still inconclusive, partly due to a lack of theory-based measures of SEP and work stress, and because of methodological limitations. This contribution aims at overcoming these limitations. We conduct pathway analysis and investigate indirect effects of SEP on mental health via stressful work. Data are derived from the first two waves of the 'Survey of Health, Ageing and Retirement in Europe' (SHARE) with information from employed men and women aged 50-64 across 11 European countries (N=2798). SEP is measured according to two alternative measures of occupational position: occupational class (focus on employment relations) and occupational status (focus on prestige). We assess work stress according to the effort-reward imbalance and the demand-control model (wave 1), and we use newly occurring depressive symptoms as health outcome (wave 2). Effort-reward imbalance and, less consistently, low control mediate the effect of occupational class and occupational status on depressive symptoms. Our findings point to two important aspects of work stress (effort-reward imbalance and low control) in explaining socioeconomic differences in health. Further, we illustrate the significance of two alternative dimensions of occupational position, occupational class and occupational status. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  14. submitter BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences

    CERN Document Server

    McQuilton, Peter; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta

    2016-01-01

    BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the ...

  15. Secure and Privacy-Preserving Data Sharing and Collaboration in Mobile Healthcare Social Networks of Smart Cities

    Directory of Open Access Journals (Sweden)

    Qinlong Huang

    2017-01-01

    Full Text Available Mobile healthcare social networks (MHSN integrated with connected medical sensors and cloud-based health data storage provide preventive and curative health services in smart cities. The fusion of social data together with real-time health data facilitates a novel paradigm of healthcare big data analysis. However, the collaboration of healthcare and social network service providers may pose a series of security and privacy issues. In this paper, we propose a secure health and social data sharing and collaboration scheme in MHSN. To preserve the data privacy, we realize secure and fine-grained health data and social data sharing with attribute-based encryption and identity-based broadcast encryption techniques, respectively, which allows patients to share their private personal data securely. In order to achieve enhanced data collaboration, we allow the healthcare analyzers to access both the reencrypted health data and the social data with authorization from the data owner based on proxy reencryption. Specifically, most of the health data encryption and decryption computations are outsourced from resource-constrained mobile devices to a health cloud, and the decryption of the healthcare analyzer incurs a low cost. The security and performance analysis results show the security and efficiency of our scheme.

  16. Development of a cyber-threat intelligence-sharing model from big data sources

    CSIR Research Space (South Africa)

    Mtshweni, Jabu

    2016-01-01

    Full Text Available that are increasing in severity, complexity, and frequency. In fact, cybercriminals are developing and sharing advanced techniques for their cyber espionage, reconnaissance missions, and ultimately devastating attacks. In order to reduce cybersecurity risks...

  17. Current Efforts in European Projects to Facilitate the Sharing of Scientific Observation Data

    Science.gov (United States)

    Bredel, Henning; Rieke, Matthes; Maso, Joan; Jirka, Simon; Stasch, Christoph

    2017-04-01

    This presentation is intended to provide an overview of currently ongoing efforts in European projects to facilitate and promote the interoperable sharing of scientific observation data. This will be illustrated through two examples: a prototypical portal developed in the ConnectinGEO project for matching available (in-situ) data sources to the needs of users and a joint activity of several research projects to harmonise the usage of the OGC Sensor Web Enablement standards for providing access to marine observation data. ENEON is an activity initiated by the European ConnectinGEO project to coordinate in-situ Earth observation networks with the aim to harmonise the access to observations, improve discoverability, and identify/close gaps in European earth observation data resources. In this context, ENEON commons has been developed as a supporting Web portal for facilitating discovery, access, re-use and creation of knowledge about observations, networks, and related activities (e.g. projects). The portal is based on developments resulting from the European WaterInnEU project and has been extended to cover the requirements for handling knowledge about in-situ earth observation networks. A first prototype of the portal was completed in January 2017 which offers functionality for interactive discussion, information exchange and querying information about data delivered by different observation networks. Within this presentation, we will introduce the presented prototype and initiate a discussion about potential future work directions. The second example concerns the harmonisation of data exchange in the marine domain. There are many organisation who operate ocean observatories or data archives. In recent years, the application of the OGC Sensor Web Enablement (SWE) technology has become more and more popular to increase the interoperability between marine observation networks. However, as the SWE standards were intentionally designed in a domain independent manner

  18. United Kingdom health research analyses and the benefits of shared data.

    Science.gov (United States)

    Carter, James G; Sherbon, Beverley J; Viney, Ian S

    2016-06-24

    nationwide assessment of health research funding, but achieving coverage of the United Kingdom portfolio relies on sourcing these details from a large number of individual funding agencies. The effort needed to compile this data could be minimised if funders routinely shared or published this information in a standard and accessible way. The United Kingdom approach to landscaping analyses could be readily adapted to suit other groups or nations, and global availability of research funding data would support better national and international coordination of health research.

  19. Sharing with More Caring: Coordinating and Improving the Ethical Governance of Data and Biomaterials Obtained from Children.

    Science.gov (United States)

    Longstaff, Holly; Khramova, Vera; Portales-Casamar, Elodie; Illes, Judy

    2015-01-01

    Research on complex health conditions such as neurodevelopmental disorders increasingly relies on large-scale research and clinical studies that would benefit from data sharing initiatives. Organizations that share data stand to maximize the efficiency of invested research dollars, expedite research findings, minimize the burden on the patient community, and increase citation rates of publications associated with the data. This study examined ethics and governance information on websites of databases involving neurodevelopmental disorders to determine the availability of information on key factors crucial for comprehension of, and trust and participation in such initiatives. We identified relevant databases identified using online keyword searches. Two researchers reviewed each of the websites and identified thematic content using principles from grounded theory. The content for each organization was interrogated using the gap analysis method. Sixteen websites from data sharing organizations met our inclusion criteria. Information about types of data and tissues stored, data access requirements and procedures, and protections for confidentiality were significantly addressed by data sharing organizations. However, special considerations for minors (absent from 63%), controls to check if data and tissues are being submitted (absent from 81%), disaster recovery plans (absent from 81%), and discussions of incidental findings (absent from 88%) emerged as major gaps in thematic website content. When present, content pertaining to special considerations for youth, along with other ethics guidelines and requirements, were scattered throughout the websites or available only from associated documents accessed through live links. The complexities of sharing data acquired from children and adolescents will only increase with advances in genomic and neuro science. Our findings suggest that there is a need to improve the consistency, depth and accessibility of governance and

  20. Sharing with More Caring: Coordinating and Improving the Ethical Governance of Data and Biomaterials Obtained from Children.

    Directory of Open Access Journals (Sweden)

    Holly Longstaff

    Full Text Available Research on complex health conditions such as neurodevelopmental disorders increasingly relies on large-scale research and clinical studies that would benefit from data sharing initiatives. Organizations that share data stand to maximize the efficiency of invested research dollars, expedite research findings, minimize the burden on the patient community, and increase citation rates of publications associated with the data.This study examined ethics and governance information on websites of databases involving neurodevelopmental disorders to determine the availability of information on key factors crucial for comprehension of, and trust and participation in such initiatives.We identified relevant databases identified using online keyword searches. Two researchers reviewed each of the websites and identified thematic content using principles from grounded theory. The content for each organization was interrogated using the gap analysis method.Sixteen websites from data sharing organizations met our inclusion criteria. Information about types of data and tissues stored, data access requirements and procedures, and protections for confidentiality were significantly addressed by data sharing organizations. However, special considerations for minors (absent from 63%, controls to check if data and tissues are being submitted (absent from 81%, disaster recovery plans (absent from 81%, and discussions of incidental findings (absent from 88% emerged as major gaps in thematic website content. When present, content pertaining to special considerations for youth, along with other ethics guidelines and requirements, were scattered throughout the websites or available only from associated documents accessed through live links.The complexities of sharing data acquired from children and adolescents will only increase with advances in genomic and neuro science. Our findings suggest that there is a need to improve the consistency, depth and accessibility of

  1. Sharing reference data and including cows in the reference population improve genomic predictions in Danish Jersey.

    Science.gov (United States)

    Su, G; Ma, P; Nielsen, U S; Aamand, G P; Wiggans, G; Guldbrandtsen, B; Lund, M S

    2016-06-01

    Small reference populations limit the accuracy of genomic prediction in numerically small breeds, such like Danish Jersey. The objective of this study was to investigate two approaches to improve genomic prediction by increasing size of reference population in Danish Jersey. The first approach was to include North American Jersey bulls in Danish Jersey reference population. The second was to genotype cows and use them as reference animals. The validation of genomic prediction was carried out on bulls and cows, respectively. In validation on bulls, about 300 Danish bulls (depending on traits) born in 2005 and later were used as validation data, and the reference populations were: (1) about 1050 Danish bulls, (2) about 1050 Danish bulls and about 1150 US bulls. In validation on cows, about 3000 Danish cows from 87 young half-sib families were used as validation data, and the reference populations were: (1) about 1250 Danish bulls, (2) about 1250 Danish bulls and about 1150 US bulls, (3) about 1250 Danish bulls and about 4800 cows, (4) about 1250 Danish bulls, 1150 US bulls and 4800 Danish cows. Genomic best linear unbiased prediction model was used to predict breeding values. De-regressed proofs were used as response variables. In the validation on bulls for eight traits, the joint DK-US bull reference population led to higher reliability of genomic prediction than the DK bull reference population for six traits, but not for fertility and longevity. Averaged over the eight traits, the gain was 3 percentage points. In the validation on cows for six traits (fertility and longevity were not available), the gain from inclusion of US bull in reference population was 6.6 percentage points in average over the six traits, and the gain from inclusion of cows was 8.2 percentage points. However, the gains from cows and US bulls were not accumulative. The total gain of including both US bulls and Danish cows was 10.5 percentage points. The results indicate that sharing reference

  2. Comprehension and Data-Sharing Behavior of Direct-To-Consumer Genetic Test Customers.

    Science.gov (United States)

    McGrath, Scott P; Coleman, Jason; Najjar, Lotfollah; Fruhling, Ann; Bastola, Dhundy R

    2016-01-01

    The aim of this study was to evaluate current direct-to-consumer (DTC) genetic customers' ability to interpret and comprehend test results and to determine if honest brokers are needed. One hundred and twenty-two customers of the DTC genetic testing company 23andMe were polled in an online survey. The subjects were asked about their personal test results and to interpret the results of two mock test cases (type 2 diabetes and multiple sclerosis), where results were translated into disease probability for an individual compared to the public. When asked to evaluate the risk, 72.1% correctly assessed the first case and 77% were correct on the second case. Only 23.8% of those surveyed were able to interpret both cases correctly. x03C7;2 and logistic regression were used to interpret the results. Participants who took the time to read the DTC test-provided supplemental material were 3.93 times (p = 0.040) more likely to correctly interpret the test results than those who did not. The odds for correctly interpreting the test cases were 3.289 times (p = 0.011) higher for those who made more than USD 50,000 than those who made less. Survey results were compared to the Health Information National Trends Survey (HINTS) phase 4 cycle 3 data to evaluate national trends. Most of the subjects were able to correctly interpret the test cases, yet a majority did not share their results with a health-care professional. As the market for DTC genetic testing grows, test comprehension will become more critical. Involving more health professionals in this process may be necessary to ensure proper interpretations. © 2016 S. Karger AG, Basel.

  3. Public-private collaboration in spatial data infrastructure: Overview of exposure, acceptance and sharing platform in Malaysia

    Science.gov (United States)

    Othman, Raha binti; Bakar, Muhamad Shahbani Abu; Mahamud, Ku Ruhana Ku

    2017-10-01

    While Spatial Data Infrastructure (SDI) has been established in Malaysia, the full potential can be further realized. To a large degree, geospatial industry users are hopeful that they can easily get access to the system and start utilizing the data. Some users expect SDI to provide them with readily available data without the necessary steps of requesting the data from the data providers as well as the steps for them to process and to prepare the data for their use. Some further argued that the usability of the system can be improved if appropriate combination between data sharing and focused application is found within the services. In order to address the current challenges and to enhance the effectiveness of the SDI in Malaysia, there is possibility of establishing a collaborative business venture between public and private entities; thus can help addressing the issues and expectations. In this paper, we discussed the possibility of collaboration between these two entities. Interviews with seven entities are held to collect information on the exposure, acceptance and sharing of platform. The outcomes indicate that though the growth of GIS technology and the high level of technology acceptance provides a solid based for utilizing the geospatial data, the absence of concrete policy on data sharing, a quality geospatial data, an authority for coordinator agency, leaves a vacuum for the successful implementation of the SDI initiative.

  4. The productivity effects of profit sharing, employee ownership, stock option and team incentive plans: Evidence from korean panel data

    OpenAIRE

    Kato, Takao; Lee, Ju Ho; Ryu, Jang-soo

    2010-01-01

    We report the first results for Korean firms on the incidence, diffusion, scope and effects of diverse employee financial participation schemes, such as Profit Sharing Plans (PSPs), Employee Stock Ownership Plans (ESOPs), Stock Option Plans (SOPs) and Team Incentive Plans (TIPs). In do doing, we assemble important new panel data by merging data from a survey of all Korean firms listed on Korean Stock Exchange which enjoys an unusually high response rate of 60 percent with accounting data from...

  5. Profile Building, Research Sharing and Data Proliferation using Social Media Tools for Scientists (RTI presentation)

    Science.gov (United States)

    Many of us nowadays invest significant amounts of time in sharing our activities and opinions with friends and family via social networking tools such as Facebook, Twitter or other related websites. However, despite the availability of many platforms for scientists to connect and...

  6. Feasibility, Process, and Outcomes of Cardiovascular Clinical Trial Data Sharing: A Reproduction Analysis of the SMART-AF Trial.

    Science.gov (United States)

    Gay, Hawkins C; Baldridge, Abigail S; Huffman, Mark D

    2017-12-01

    Data sharing is as an expanding initiative for enhancing trust in the clinical research enterprise. To evaluate the feasibility, process, and outcomes of a reproduction analysis of the THERMOCOOL SMARTTOUCH Catheter for the Treatment of Symptomatic Paroxysmal Atrial Fibrillation (SMART-AF) trial using shared clinical trial data. A reproduction analysis of the SMART-AF trial was performed using the data sets, data dictionary, case report file, and statistical analysis plan from the original trial accessed through the Yale Open Data Access Project using the SAS Clinical Trials Data Transparency platform. SMART-AF was a multicenter, single-arm trial evaluating the effectiveness and safety of an irrigated, contact force-sensing catheter for ablation of drug refractory, symptomatic paroxysmal atrial fibrillation in 172 participants recruited from 21 sites between June 2011 and December 2011. Analysis of the data was conducted between December 2016 and April 2017. Effectiveness outcomes included freedom from atrial arrhythmias after ablation and proportion of participants without any arrhythmia recurrence over the 12 months of follow-up after a 3-month blanking period. Safety outcomes included major adverse device- or procedure-related events. The SMART AF trial participants' mean age was 58.7 (10.8) years, and 72% were men. The time from initial proposal submission to final analysis was 11 months. Freedom from atrial arrhythmias at 12 months postprocedure was similar compared with the primary study report (74.0%; 95% CI, 66.0-82.0 vs 76.4%; 95% CI, 68.7-84.1). The reproduction analysis success rate was higher than the primary study report (65.8%; 95% CI 56.5-74.2 vs 75.6%; 95% CI, 67.2-82.5). Adverse events were minimal and similar between the 2 analyses, but contact force range or regression models could not be reproduced. The feasibility of a reproduction analysis of the SMART-AF trial was demonstrated through an academic data-sharing platform. Data sharing can be

  7. Specification and development of the sharing memory data management module for a nuclear processes simulator

    International Nuclear Information System (INIS)

    Telesforo R, D.

    2003-01-01

    Actually it is developed in the Engineering Faculty of UNAM a simulator of nuclear processes with research and teaching purposes. It consists of diverse modules, included the one that is described in the present work that is the shared memory module. It uses the IPC mechanisms of the UNIX System V operative system, and it was codified with C language. To model the diverse components of the simulator the RELAP code is used. The function of the module is to generate locations of shared memory for to deposit in these the necessary variables for the interaction among the diverse ones processes of the simulator. In its it will be able read and to write the information that generate the running of the simulation program, besides being able to interact with the internal variables of the code in execution time. The graphic unfolding (mimic, pictorials, tendency graphics, virtual instrumentation, etc.) they also obtain information of the shared memory. In turn, actions of the user in interactive unfolding, they modify the segments of shared memory, and the information is sent to the RELAP code to modify the simulation course. The program has two beginning modes: automatic and manual. In automatic mode taking an enter file of RELAP (indta) and it joins in shared memory, the control variables that in this appear. In manual mode the user joins, he reads and he writes the wanted control variables, whenever they exist in the enter file (indta). This is a dynamic mode of interacting with the simulator in a direct way and of even altering the values as when its don't exist in the board elements associated to the variables. (Author)

  8. An Attribute-Based Access Control with Efficient and Secure Attribute Revocation for Cloud Data Sharing Service

    Institute of Scientific and Technical Information of China (English)

    Nyamsuren Vaanchig; Wei Chen; Zhi-Guang Qin

    2017-01-01

    Nowadays, there is the tendency to outsource data to cloud storage servers for data sharing purposes. In fact, this makes access control for the outsourced data a challenging issue. Ciphertext-policy attribute-based encryption (CP-ABE) is a promising cryptographic solution for this challenge. It gives the data owner (DO) direct control on access policy and enforces the access policy cryptographically. However, the practical application of CP-ABE in the data sharing service also has its own inherent challenge with regard to attribute revocation. To address this challenge, we proposed an attribute-revocable CP-ABE scheme by taking advantages of the over-encryption mechanism and CP-ABE scheme and by considering the semi-trusted cloud service provider (CSP) that participates in decryption processes to issue decryption tokens for authorized users. We further presented the security and performance analysis in order to assess the effectiveness of the scheme. As compared with the existing attribute-revocable CP-ABE schemes, our attribute-revocable scheme is reasonably efficient and more secure to enable attribute-based access control over the outsourced data in the cloud data sharing service.

  9. A multi-agent architecture for sharing knowledge and experimental data about waste water treatment plants through the Internet

    International Nuclear Information System (INIS)

    Abu Yaman, I. R.; Kerckhoffs, J. E.

    1998-01-01

    In this paper, we present a first prototype of a local multi-agent architecture for the sharing of knowledge and experimental data about waste water treatment plants through the Internet, or more specifically the WWW. Applying a net browser such as nets cape, a user can have access to a CLIPS expert system (advising on waste water cleaning technologies) and experimental data files. The discussed local prototype is part of proposed global agent architecture. (authors)

  10. Using GIS servers and interactive maps in spectral data sharing and administration: Case study of Ahvaz Spectral Geodatabase Platform (ASGP)

    Science.gov (United States)

    Karami, Mojtaba; Rangzan, Kazem; Saberi, Azim

    2013-10-01

    With emergence of air-borne and space-borne hyperspectral sensors, spectroscopic measurements are gaining more importance in remote sensing. Therefore, the number of available spectral reference data is constantly increasing. This rapid increase often exhibits a poor data management, which leads to ultimate isolation of data on disk storages. Spectral data without precise description of the target, methods, environment, and sampling geometry cannot be used by other researchers. Moreover, existing spectral data (in case it accompanied with good documentation) become virtually invisible or unreachable for researchers. Providing documentation and a data-sharing framework for spectral data, in which researchers are able to search for or share spectral data and documentation, would definitely improve the data lifetime. Relational Database Management Systems (RDBMS) are main candidates for spectral data management and their efficiency is proven by many studies and applications to date. In this study, a new approach to spectral data administration is presented based on spatial identity of spectral samples. This method benefits from scalability and performance of RDBMS for storage of spectral data, but uses GIS servers to provide users with interactive maps as an interface to the system. The spectral files, photographs and descriptive data are considered as belongings of a geospatial object. A spectral processing unit is responsible for evaluation of metadata quality and performing routine spectral processing tasks for newly-added data. As a result, by using internet browser software the users would be able to visually examine availability of data and/or search for data based on descriptive attributes associated to it. The proposed system is scalable and besides giving the users good sense of what data are available in the database, it facilitates participation of spectral reference data in producing geoinformation.

  11. Data-Sharing Politics and the Logics of Competition in Biobanking

    DEFF Research Database (Denmark)

    Tupasela, Aaro Mikael

    2017-01-01

    specific populations (usually national). The collections are from, usually, healthy donors who are then tracked over decades to see what diseases they develop over the course of their lifespan. LPC biobanks, therefore, also collect large amounts of health and lifestyle information that can be attributed...... to the tissue samples collected from the donors. The chapter evaluates the emerging tension between policies and practices in LPC biobanking with reference to the sharing of samples, and re-considers the applicability of social scientific theories that have sought to explain the emerging bio-economy....

  12. An Open Source Software and Web-GIS Based Platform for Airborne SAR Remote Sensing Data Management, Distribution and Sharing

    Science.gov (United States)

    Changyong, Dou; Huadong, Guo; Chunming, Han; Ming, Liu

    2014-03-01

    With more and more Earth observation data available to the community, how to manage and sharing these valuable remote sensing datasets is becoming an urgent issue to be solved. The web based Geographical Information Systems (GIS) technology provides a convenient way for the users in different locations to share and make use of the same dataset. In order to efficiently use the airborne Synthetic Aperture Radar (SAR) remote sensing data acquired in the Airborne Remote Sensing Center of the Institute of Remote Sensing and Digital Earth (RADI), Chinese Academy of Sciences (CAS), a Web-GIS based platform for airborne SAR data management, distribution and sharing was designed and developed. The major features of the system include map based navigation search interface, full resolution imagery shown overlaid the map, and all the software adopted in the platform are Open Source Software (OSS). The functions of the platform include browsing the imagery on the map navigation based interface, ordering and downloading data online, image dataset and user management, etc. At present, the system is under testing in RADI and will come to regular operation soon.

  13. Deployment Strategy for Car-Sharing Depots by Clustering Urban Traffic Big Data Based on Affinity Propagation

    Directory of Open Access Journals (Sweden)

    Zhihan Liu

    2018-01-01

    Full Text Available Car sharing is a type of car rental service, by which consumers rent cars for short periods of time, often charged by hours. The analysis of urban traffic big data is full of importance and significance to determine locations of depots for car-sharing system. Taxi OD (Origin-Destination is a typical dataset of urban traffic. The volume of the data is extremely large so that traditional data processing applications do not work well. In this paper, an optimization method to determine the depot locations by clustering taxi OD points with AP (Affinity Propagation clustering algorithm has been presented. By analyzing the characteristics of AP clustering algorithm, AP clustering has been optimized hierarchically based on administrative region segmentation. Considering sparse similarity matrix of taxi OD points, the input parameters of AP clustering have been adapted. In the case study, we choose the OD pairs information from Beijing’s taxi GPS trajectory data. The number and locations of depots are determined by clustering the OD points based on the optimization AP clustering. We describe experimental results of our approach and compare it with standard K-means method using quantitative and stationarity index. Experiments on the real datasets show that the proposed method for determining car-sharing depots has a superior performance.

  14. Guidelines for governance of data sharing in agri-food networks

    NARCIS (Netherlands)

    Wolfert, J.; Bogaardt, M.J.; Ge, L.; Soma, K.; Verdouw, C.N.

    2017-01-01

    Big Data is becoming a new asset in the agri-food sector including enterprise data from operational systems, sensor data, farm equipment data, etc. Recently, Big Data applications are being implemented to improve farm and chain performance in agri-food networks. Still, many companies are refraining

  15. Sustaining an Online, Shared Community Resource for Models, Robust Open source Software Tools and Data for Volcanology - the Vhub Experience

    Science.gov (United States)

    Patra, A. K.; Valentine, G. A.; Bursik, M. I.; Connor, C.; Connor, L.; Jones, M.; Simakov, N.; Aghakhani, H.; Jones-Ivey, R.; Kosar, T.; Zhang, B.

    2015-12-01

    Over the last 5 years we have created a community collaboratory Vhub.org [Palma et al, J. App. Volc. 3:2 doi:10.1186/2191-5040-3-2] as a place to find volcanology-related resources, and a venue for users to disseminate tools, teaching resources, data, and an online platform to support collaborative efforts. As the community (current active users > 6000 from an estimated community of comparable size) embeds the tools in the collaboratory into educational and research workflows it became imperative to: a) redesign tools into robust, open source reusable software for online and offline usage/enhancement; b) share large datasets with remote collaborators and other users seamlessly with security; c) support complex workflows for uncertainty analysis, validation and verification and data assimilation with large data. The focus on tool development/redevelopment has been twofold - firstly to use best practices in software engineering and new hardware like multi-core and graphic processing units. Secondly we wish to enhance capabilities to support inverse modeling, uncertainty quantification using large ensembles and design of experiments, calibration, validation. Among software engineering practices we practice are open source facilitating community contributions, modularity and reusability. Our initial targets are four popular tools on Vhub - TITAN2D, TEPHRA2, PUFF and LAVA. Use of tools like these requires many observation driven data sets e.g. digital elevation models of topography, satellite imagery, field observations on deposits etc. These data are often maintained in private repositories that are privately shared by "sneaker-net". As a partial solution to this we tested mechanisms using irods software for online sharing of private data with public metadata and access limits. Finally, we adapted use of workflow engines (e.g. Pegasus) to support the complex data and computing workflows needed for usage like uncertainty quantification for hazard analysis using physical

  16. HydroShare for iUTAH: Collaborative Publication, Interoperability, and Reuse of Hydrologic Data and Models for a Large, Interdisciplinary Water Research Project

    Science.gov (United States)

    Horsburgh, J. S.; Jones, A. S.

    2016-12-01

    Data and models used within the hydrologic science community are diverse. New research data and model repositories have succeeded in making data and models more accessible, but have been, in most cases, limited to particular types or classes of data or models and also lack the type of collaborative, and iterative functionality needed to enable shared data collection and modeling workflows. File sharing systems currently used within many scientific communities for private sharing of preliminary and intermediate data and modeling products do not support collaborative data capture, description, visualization, and annotation. More recently, hydrologic datasets and models have been cast as "social objects" that can be published, collaborated around, annotated, discovered, and accessed. Yet it can be difficult using existing software tools to achieve the kind of collaborative workflows and data/model reuse that many envision. HydroShare is a new, web-based system for sharing hydrologic data and models with specific functionality aimed at making collaboration easier and achieving new levels of interactive functionality and interoperability. Within HydroShare, we have developed new functionality for creating datasets, describing them with metadata, and sharing them with collaborators. HydroShare is enabled by a generic data model and content packaging scheme that supports describing and sharing diverse hydrologic datasets and models. Interoperability among the diverse types of data and models used by hydrologic scientists is achieved through the use of consistent storage, management, sharing, publication, and annotation within HydroShare. In this presentation, we highlight and demonstrate how the flexibility of HydroShare's data model and packaging scheme, HydroShare's access control and sharing functionality, and versioning and publication capabilities have enabled the sharing and publication of research datasets for a large, interdisciplinary water research project

  17. 78 FR 57860 - Draft NIH Genomic Data Sharing Policy Request for Public Comments

    Science.gov (United States)

    2013-09-20

    ... specific, Up to 6 months initial round of alignments to a generally within after data analysis or reference... genetic variant calls, generally within after data variants, gene expression peaks, 3 months after...-phenotype Data submitted as Data released with that relates the relationships, analyses are publication...

  18. Willingness of older adults to share data and privacy concerns after exposure to unobtrusive in-home monitoring.

    Science.gov (United States)

    Boise, Linda; Wild, Katherine; Mattek, Nora; Ruhl, Mary; Dodge, Hiroko H; Kaye, Jeffrey

    2013-01-01

    Older adult participants in the Intelligent Systems for Assessment of Aging Changes study (ISAAC) carried out by the Oregon Center for Aging and Technology (ORCATECH) were surveyed regarding their attitudes about unobtrusive home monitoring and computer use at baseline and after one year (n=119). The survey was part of a longitudinal study using in-home sensor technology to detect cognitive changes and other health problems. Our primary objective was to measure willingness to share health or activity data with one's doctor or family members and concerns about privacy or security of monitoring over one year of study participation. Differences in attitudes of participants with Mild Cognitive Impairment (MCI) compared to those with normal cognition were also examined. A high proportion (over 72%) of participants reported acceptance of in-home and computer monitoring and willingness to have data shared with their doctor or family members. However, a majority (60%) reported concerns related to privacy or security; these concerns increased after one year of participation. Few differences between participants with MCI and those with normal cognition were identified. Findings suggest that involvement in this unobtrusive in-home monitoring study may have raised awareness about the potential privacy risks of technology. Still, results show high acceptance, stable over time, of sharing information from monitoring systems with family members and doctors. Our findings have important implications for the deployment of technologies among older adults in research studies as well as in the general community.

  19. Classification of processes involved in sharing individual participant data from clinical trials [version 1; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Christian Ohmann

    2018-02-01

    Full Text Available Background: In recent years, a cultural change in the handling of data from research has resulted in the strong promotion of a culture of openness and increased sharing of data. In the area of clinical trials, sharing of individual participant data involves a complex set of processes and the interaction of many actors and actions. Individual services/tools to support data sharing are available, but what is missing is a detailed, structured and comprehensive list of processes/subprocesses involved and tools/services needed. Methods: Principles and recommendations from a published data sharing consensus document are analysed in detail by a small expert group. Processes/subprocesses involved in data sharing are identified and linked to actors and possible services/tools. Definitions are adapted from the business process model and notation (BPMN and applied in the analysis. Results: A detailed and comprehensive list of individual processes/subprocesses involved in data sharing, structured according to 9 main processes, is provided. Possible tools/services to support these processes/subprocesses are identified and grouped according to major type of support. Conclusions: The list of individual processes/subprocesses and tools/services identified is a first step towards development of a generic framework or architecture for sharing of data from clinical trials. Such a framework is strongly needed to give an overview of how various actors, research processes and services could form an interoperable system for data sharing.

  20. Classification of processes involved in sharing individual participant data from clinical trials [version 2; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Christian Ohmann

    2018-04-01

    Full Text Available Background: In recent years, a cultural change in the handling of research data has resulted in the promotion of a culture of openness and an increased sharing of data. In the area of clinical trials, sharing of individual participant data involves a complex set of processes and the interaction of many actors and actions. Individual services and tools to support data sharing are becoming available, but what is missing is a detailed, structured and comprehensive list of processes and subprocesses involved and the tools and services needed. Methods: Principles and recommendations from a published consensus document on data sharing were analysed in detail by a small expert group. Processes and subprocesses involved in data sharing were identified and linked to actors and possible supporting services and tools. Definitions adapted from the business process model and notation (BPMN were applied in the analysis. Results: A detailed and comprehensive tabulation of individual processes and subprocesses involved in data sharing, structured according to 9 main processes, is provided. Possible tools and services to support these processes are identified and grouped according to the major type of support. Conclusions: The identification of the individual processes and subprocesses and supporting tools and services, is a first step towards development of a generic framework or architecture for the sharing of data from clinical trials. Such a framework is needed to provide an overview of how the various actors, research processes and services could interact to form a sustainable system for data sharing.

  1. A Knowledge Model Sharing Based Approach to Privacy-Preserving Data Mining

    OpenAIRE

    Hongwei Tian; Weining Zhang; Shouhuai Xu; Patrick Sharkey

    2012-01-01

    Privacy-preserving data mining (PPDM) is an important problem and is currently studied in three approaches: the cryptographic approach, the data publishing, and the model publishing. However, each of these approaches has some problems. The cryptographic approach does not protect privacy of learned knowledge models and may have performance and scalability issues. The data publishing, although is popular, may suffer from too much utility loss for certain types of data mining applications. The m...

  2. Full and Open Access to Data in the Global Earth Observing System of Systems (GEOSS): Implementing the GEOSS Data Sharing Principles

    Science.gov (United States)

    Chen, R. S.; Uhlir, P. F.; Gabrinowicz, J. I.

    2008-12-01

    Full and open access to data from remote sensing platforms and other sources can facilitate not only scientific research but also the more widespread and effective use of scientific data for the benefit of society. The Global Earth Observing System of Systems (GEOSS) is a major international initiative of the Group on Earth Observations (GEO) to develop "coordinated, comprehensive and sustained Earth observations and information." In 2005, GEO adopted the GEOSS Data Sharing Principles, which call for the "full and open exchange of data, metadata, and products shared within GEOSS, recognizing relevant international instruments and national policies and legislation." These Principles also note that "All shared data, metadata, and products will be made available with minimum time delay and at minimum cost" and that "All shared data, metadata, and products being free of charge or no more than cost of reproduction will be encouraged for research and education." GEOSS Task DA-06-01, aimed at developing a set of recommended implementation guidelines for the Principles, was established in 2006 under the leadership of CODATA, the Committee on Data for Science and Technology of the International Council for Science (ICSU). An international team of authors has developed a draft White Paper on the GEOSS Data Sharing Principles and a proposed set of implementation guidelines. These have been carefully reviewed by independent reviewers, various GEO Committees, and GEO National Members and Participating Organizations. It is expected that the proposed implementation guidelines will be discussed at the GEO-V Plenary in Budapest in November 2008. The current version of the proposed implementation guidelines recognizes the importance of good faith, voluntary adherence to the Principles by GEO National Members and Participating Organizations. It underscores the value of reuse and re-dissemination of GEOSS data with minimum restrictions, not only within GEOSS itself but on the part of

  3. Capturing, Sharing, and Discovering Product Data at a Semantic Level--Moving Forward to the Semantic Web for Advancing the Engineering Product Design Process

    Science.gov (United States)

    Zhu, Lijuan

    2011-01-01

    Along with the greater productivity that CAD automation provides nowadays, the product data of engineering applications needs to be shared and managed efficiently to gain a competitive edge for the engineering product design. However, exchanging and sharing the heterogeneous product data is still challenging. This dissertation first presents a…

  4. Health administrative data can be used to define a shared care typology for people with HIV.

    Science.gov (United States)

    Kendall, Claire E; Younger, Jaime; Manuel, Douglas G; Hogg, William; Glazier, Richard H; Taljaard, Monica

    2015-11-01

    Building on an existing theoretical shared primary care/specialist care framework to (1) develop a unique typology of care for people living with human immunodeficiency virus (HIV) in Ontario, (2) assess sensitivity of the typology by varying typology definitions, and (3) describe characteristics of typology categories. Retrospective population-based observational study from April 1, 2009, to March 31, 2012. A total of 13,480 eligible patients with HIV and receiving publicly funded health care in Ontario. We derived a typology of care by linking patients to usual family physicians and to HIV specialists with five possible patterns of care. Patient and physician characteristics and outpatient visits for HIV-related and non-HIV-related care were used to assess the robustness and characteristics of the typology. Five possible patterns of care were described as low engagement (8.6%), exclusively primary care (52.7%), family physician-dominated comanagement (10.0%), specialist-dominated comanagement (30.5%), and exclusively specialist care (5.2%). Sensitivity analyses demonstrated robustness of typology assignments. Visit patterns varied in ways that conform to typology assignments. We anticipate this typology can be used to assess the impact of care patterns on the quality of primary care for people living with HIV. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. ECDS - a Swedish Research Infrastructure for the Open Sharing of Environment and Climate Data

    Directory of Open Access Journals (Sweden)

    T Klein

    2013-02-01

    Full Text Available Environment Climate Data Sweden (ECDS is a new Swedish research infrastructure, furthering the reuse of scientific data in the domains of environment and climate. ECDS consists of a technical infrastructure and a service organization, supporting the management, exchange, and re-use of scientific data. The technical components of ECDS include a portal and an underlying data catalogue with information on datasets. The datasets are described using a metadata profile compliant with international standards. The datasets accessible through ECDS can be hosted by universities, institutes, or research groups or at the new Swedish federated data storage facility Swestore of the Swedish National Infrastructure for Computing (SNIC.

  6. The development of the Older Persons and Informal Caregivers Survey Minimum DataSet (TOPICS-MDS): a large-scale data sharing initiative.

    Science.gov (United States)

    Lutomski, Jennifer E; Baars, Maria A E; Schalk, Bianca W M; Boter, Han; Buurman, Bianca M; den Elzen, Wendy P J; Jansen, Aaltje P D; Kempen, Gertrudis I J M; Steunenberg, Bas; Steyerberg, Ewout W; Olde Rikkert, Marcel G M; Melis, René J F

    2013-01-01

    In 2008, the Ministry of Health, Welfare and Sport commissioned the National Care for the Elderly Programme. While numerous research projects in older persons' health care were to be conducted under this national agenda, the Programme further advocated the development of The Older Persons and Informal Caregivers Survey Minimum DataSet (TOPICS-MDS) which would be integrated into all funded research protocols. In this context, we describe TOPICS data sharing initiative (www.topics-mds.eu). A working group drafted TOPICS-MDS prototype, which was subsequently approved by a multidisciplinary panel. Using instruments validated for older populations, information was collected on demographics, morbidity, quality of life, functional limitations, mental health, social functioning and health service utilisation. For informal caregivers, information was collected on demographics, hours of informal care and quality of life (including subjective care-related burden). Between 2010 and 2013, a total of 41 research projects contributed data to TOPICS-MDS, resulting in preliminary data available for 32,310 older persons and 3,940 informal caregivers. The majority of studies sampled were from primary care settings and inclusion criteria differed across studies. TOPICS-MDS is a public data repository which contains essential data to better understand health challenges experienced by older persons and informal caregivers. Such findings are relevant for countries where increasing health-related expenditure has necessitated the evaluation of contemporary health care delivery. Although open sharing of data can be difficult to achieve in practice, proactively addressing issues of data protection, conflicting data analysis requests and funding limitations during TOPICS-MDS developmental phase has fostered a data sharing culture. To date, TOPICS-MDS has been successfully incorporated into 41 research projects, thus supporting the feasibility of constructing a large (>30,000 observations

  7. SynapticDB, effective web-based management and sharing of data from serial section electron microscopy.

    Science.gov (United States)

    Shi, Bitao; Bourne, Jennifer; Harris, Kristen M

    2011-03-01

    Serial section electron microscopy (ssEM) is rapidly expanding as a primary tool to investigate synaptic circuitry and plasticity. The ultrastructural images collected through ssEM are content rich and their comprehensive analysis is beyond the capacity of an individual laboratory. Hence, sharing ultrastructural data is becoming crucial to visualize, analyze, and discover the structural basis of synaptic circuitry and function in the brain. We devised a web-based management system called SynapticDB (http://synapses.clm.utexas.edu/synapticdb/) that catalogues, extracts, analyzes, and shares experimental data from ssEM. The management strategy involves a library with check-in, checkout and experimental tracking mechanisms. We developed a series of spreadsheet templates (MS Excel, Open Office spreadsheet, etc) that guide users in methods of data collection, structural identification, and quantitative analysis through ssEM. SynapticDB provides flexible access to complete templates, or to individual columns with instructional headers that can be selected to create user-defined templates. New templates can also be generated and uploaded. Research progress is tracked via experimental note management and dynamic PDF forms that allow new investigators to follow standard protocols and experienced researchers to expand the range of data collected and shared. The combined use of templates and tracking notes ensures that the supporting experimental information is populated into the database and associated with the appropriate ssEM images and analyses. We anticipate that SynapticDB will serve future meta-analyses towards new discoveries about the composition and circuitry of neurons and glia, and new understanding about structural plasticity during development, behavior, learning, memory, and neuropathology.

  8. Using Globus GridFTP to Transfer and Share Big Data | Poster

    Science.gov (United States)

    By Ashley DeVine, Staff Writer, and Mark Wance, Guest Writer; photo by Richard Frederickson, Staff Photographer Transferring big data, such as the genomics data delivered to customers from the Center for Cancer Research Sequencing Facility (CCR SF), has been difficult in the past because the transfer systems have not kept pace with the size of the data. However, the situation is changing as a result of the Globus GridFTP project.

  9. B-CAN: a resource sharing platform to improve the operation, visualization and integrated analysis of TCGA breast cancer data.

    Science.gov (United States)

    Wen, Can-Hong; Ou, Shao-Min; Guo, Xiao-Bo; Liu, Chen-Feng; Shen, Yan-Bo; You, Na; Cai, Wei-Hong; Shen, Wen-Jun; Wang, Xue-Qin; Tan, Hai-Zhu

    2017-12-12

    Breast cancer is a high-risk heterogeneous disease with myriad subtypes and complicated biological features. The Cancer Genome Atlas (TCGA) breast cancer database provides researchers with the large-scale genome and clinical data via web portals and FTP services. Researchers are able to gain new insights into their related fields, and evaluate experimental discoveries with TCGA. However, it is difficult for researchers who have little experience with database and bioinformatics to access and operate on because of TCGA's complex data format and diverse files. For ease of use, we build the breast cancer (B-CAN) platform, which enables data customization, data visualization, and private data center. The B-CAN platform runs on Apache server and interacts with the backstage of MySQL database by PHP. Users can customize data based on their needs by combining tables from original TCGA database and selecting variables from each table. The private data center is applicable for private data and two types of customized data. A key feature of the B-CAN is that it provides single table display and multiple table display. Customized data with one barcode corresponding to many records and processed customized data are allowed in Multiple Tables Display. The B-CAN is an intuitive and high-efficient data-sharing platform.

  10. A Standard for Sharing and Accessing Time Series Data: The Heliophysics Application Programmers Interface (HAPI) Specification

    Science.gov (United States)

    Vandegriff, J. D.; King, T. A.; Weigel, R. S.; Faden, J.; Roberts, D. A.; Harris, B. T.; Lal, N.; Boardsen, S. A.; Candey, R. M.; Lindholm, D. M.

    2017-12-01

    We present the Heliophysics Application Programmers Interface (HAPI), a new interface specification that both large and small data centers can use to expose time series data holdings in a standard way. HAPI was inspired by the similarity of existing services at many Heliophysics data centers, and these data centers have collaborated to define a single interface that captures best practices and represents what everyone considers the essential, lowest common denominator for basic data access. This low level access can serve as infrastructure to support greatly enhanced interoperability among analysis tools, with the goal being simplified analysis and comparison of data from any instrument, model, mission or data center. The three main services a HAPI server must perform are 1. list a catalog of datasets (one unique ID per dataset), 2. describe the content of one dataset (JSON metadata), and 3. retrieve numerical content for one dataset (stream the actual data). HAPI defines both the format of the query to the server, and the response from the server. The metadata is lightweight, focusing on use rather than discovery, and the data format is a streaming one, with Comma Separated Values (CSV) being required and binary or JSON streaming being optional. The HAPI specification is available at GitHub, where projects are also underway to develop reference implementation servers that data providers can adapt and use at their own sites. Also in the works are data analysis clients in multiple languages (IDL, Python, Matlab, and Java). Institutions which have agreed to adopt HAPI include Goddard (CDAWeb for data and CCMC for models), LASP at the University of Colorado Boulder, the Particles and Plasma Interactions node of the Planetary Data System (PPI/PDS) at UCLA, the Plasma Wave Group at the University of Iowa, the Space Sector at the Johns Hopkins Applied Physics Lab (APL), and the tsds.org site maintained at George Mason University. Over the next year, the adoption of a

  11. Big data sharing and analysis to advance research in post-traumatic epilepsy.

    Science.gov (United States)

    Duncan, Dominique; Vespa, Paul; Pitkanen, Asla; Braimah, Adebayo; Lapinlampi, Nina; Toga, Arthur W

    2018-06-01

    We describe the infrastructure and functionality for a centralized preclinical and clinical data repository and analytic platform to support importing heterogeneous multi-modal data, automatically and manually linking data across modalities and sites, and searching content. We have developed and applied innovative image and electrophysiology processing methods to identify candidate biomarkers from MRI, EEG, and multi-modal data. Based on heterogeneous biomarkers, we present novel analytic tools designed to study epileptogenesis in animal model and human with the goal of tracking the probability of developing epilepsy over time. Copyright © 2017. Published by Elsevier Inc.

  12. Towards Blockchain-based Auditable Storage and Sharing of IoT Data

    OpenAIRE

    Shafagh, Hossein; Burkhalter, Lukas; Hithnawi, Anwar; Duquennoy, Simon

    2017-01-01

    Today the cloud plays a central role in storing, processing, and distributing data. Despite contributing to the rapid development of IoT applications, the current IoT cloud-centric architecture has led into a myriad of isolated data silos that hinders the full potential of holistic data-driven analytics within the IoT. In this paper, we present a blockchain-based design for the IoT that brings a distributed access control and data management. We depart from the current trust model that delega...

  13. FHIR Healthcare Directories: Adopting Shared Interfaces to Achieve Interoperable Medical Device Data Integration.

    Science.gov (United States)

    Tyndall, Timothy; Tyndall, Ayami

    2018-01-01

    Healthcare directories are vital for interoperability among healthcare providers, researchers and patients. Past efforts at directory services have not provided the tools to allow integration of the diverse data sources. Many are overly strict, incompatible with legacy databases, and do not provide Data Provenance. A more architecture-independent system is needed to enable secure, GDPR-compatible (8) service discovery across organizational boundaries. We review our development of a portable Data Provenance Toolkit supporting provenance within Health Information Exchange (HIE) systems. The Toolkit has been integrated with client software and successfully leveraged in clinical data integration. The Toolkit validates provenance stored in a Blockchain or Directory record and creates provenance signatures, providing standardized provenance that moves with the data. This healthcare directory suite implements discovery of healthcare data by HIE and EHR systems via FHIR. Shortcomings of past directory efforts include the ability to map complex datasets and enabling interoperability via exchange endpoint discovery. By delivering data without dictating how it is stored we improve exchange and facilitate discovery on a multi-national level through open source, fully interoperable tools. With the development of Data Provenance resources we enhance exchange and improve security and usability throughout the health data continuum.

  14. ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses

    Science.gov (United States)

    Stokes, Todd H; Torrance, JT; Li, Henry; Wang, May D

    2008-01-01

    Background A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers), and that the repositories provide only basic biological keywords linking to PubMed. As a result, it is difficult to find datasets using research context or analysis parameters information beyond a few keywords. For example, to reduce the "curse-of-dimension" problem in microarray analysis, the number of samples is often increased by merging array data from different datasets. Knowing chip data parameters such as pre-processing steps (e.g., normalization, artefact removal, etc), and knowing any previous biological validation of the dataset is essential due to the heterogeneity of the data. However, most of the microarray repositories do not have meta-data information in the first place, and do not have a a mechanism to add or insert this information. Thus, there is a critical need to create "intelligent" microarray repositories that (1) enable update of meta-data with the raw array data, and (2) provide standardized archiving protocols to minimize bias from the raw data sources. Results To address the problems discussed, we have developed a community maintained system called ArrayWiki that unites disparate meta-data of microarray meta-experiments from multiple primary sources with four key features. First, ArrayWiki provides a user-friendly knowledge management interface in addition to a programmable interface using standards developed by Wikipedia. Second, ArrayWiki includes automated quality control processes (caCORRECT) and novel visualization methods (BioPNG, Gel Plots), which provide extra information about data quality unavailable in other microarray repositories. Third, it provides a user-curation capability through the familiar Wiki interface. Fourth, ArrayWiki provides users with simple text-based searches across all experiment meta-data, and exposes data to search engine crawlers

  15. tranSMART: An Open Source and Community-Driven Informatics and Data Sharing Platform for Clinical and Translational Research.

    Science.gov (United States)

    Athey, Brian D; Braxenthaler, Michael; Haas, Magali; Guo, Yike

    2013-01-01

    tranSMART is an emerging global open source public private partnership community developing a comprehensive informatics-based analysis and data-sharing cloud platform for clinical and translational research. The tranSMART consortium includes pharmaceutical and other companies, not-for-profits, academic entities, patient advocacy groups, and government stakeholders. The tranSMART value proposition relies on the concept that the global community of users, developers, and stakeholders are the best source of innovation for applications and for useful data. Continued development and use of the tranSMART platform will create a means to enable "pre-competitive" data sharing broadly, saving money and, potentially accelerating research translation to cures. Significant transformative effects of tranSMART includes 1) allowing for all its user community to benefit from experts globally, 2) capturing the best of innovation in analytic tools, 3) a growing 'big data' resource, 4) convergent standards, and 5) new informatics-enabled translational science in the pharma, academic, and not-for-profit sectors.

  16. An ecoinformatics application for forest dynamics plot data management and sharing

    Science.gov (United States)

    Chau-Chin Lin; Abd Rahman Kassim; Kristin Vanderbilt; Donald Henshaw; Eda C. Melendez-Colom; John H. Porter; Kaoru Niiyama; Tsutomu Yagihashi; Sek Aun Tan; Sheng-Shan Lu; Chi-Wen Hsiao; Li-Wan Chang; Meei-Ru. Jeng

    2011-01-01

    Several forest dynamics plot research projects in the East-Asia Pacific region of the International Long-Term Ecological Research network actively collect long-term data, and some of these large plots are members of the Center for Tropical Forest Science network. The wealth of forest plot data presents challenges in information management to researchers. In order to...

  17. Watershed and Economic Data InterOperability (WEDO): Facilitating Discovery, Evaluation and Integration through the Sharing of Watershed Modeling Data

    Science.gov (United States)

    Watershed and Economic Data InterOperability (WEDO) is a system of information technologies designed to publish watershed modeling studies for reuse. WEDO facilitates three aspects of interoperability: discovery, evaluation and integration of data. This increased level of interop...

  18. Strategic Planning for a Data-Driven, Shared-Access Research Enterprise: Virginia Tech Research Data Assessment and Landscape Study

    Science.gov (United States)

    Shen, Yi

    2016-01-01

    The data landscape study at Virginia Tech addresses the changing modes of faculty scholarship and supports the development of a user-centric data infrastructure, management, and curation system. The study investigates faculty researchers' current practices in organizing, describing, and preserving data and the emerging needs for services and…

  19. The eTOX Data-Sharing Project to Advance in Silico Drug-Induced Toxicity Prediction

    Directory of Open Access Journals (Sweden)

    Montserrat Cases

    2014-11-01

    Full Text Available The high-quality in vivo preclinical safety data produced by the pharmaceutical industry during drug development, which follows numerous strict guidelines, are mostly not available in the public domain. These safety data are sometimes published as a condensed summary for the few compounds that reach the market, but the majority of studies are never made public and are often difficult to access in an automated way, even sometimes within the owning company itself. It is evident from many academic and industrial examples, that useful data mining and model development requires large and representative data sets and careful curation of the collected data. In 2010, under the auspices of the Innovative Medicines Initiative, the eTOX project started with the objective of extracting and sharing preclinical study data from paper or pdf archives of toxicology departments of the 13 participating pharmaceutical companies and using such data for establishing a detailed, well-curated database, which could then serve as source for read-across approaches (early assessment of the potential toxicity of a drug candidate by comparison of similar structure and/or effects and training of predictive models. The paper describes the efforts undertaken to allow effective data sharing intellectual property (IP protection and set up of adequate controlled vocabularies and to establish the database (currently with over 4000 studies contributed by the pharma companies corresponding to more than 1400 compounds. In addition, the status of predictive models building and some specific features of the eTOX predictive system (eTOXsys are presented as decision support knowledge-based tools for drug development process at an early stage.

  20. The Open Spectral Database: an open platform for sharing and searching spectral data.

    Science.gov (United States)

    Chalk, Stuart J

    2016-01-01

    A number of websites make available spectral data for download (typically as JCAMP-DX text files) and one (ChemSpider) that also allows users to contribute spectral files. As a result, searching and retrieving such spectral data can be time consuming, and difficult to reuse if the data is compressed in the JCAMP-DX file. What is needed is a single resource that allows submission of JCAMP-DX files, export of the raw data in multiple formats, searching based on multiple chemical identifiers, and is open in terms of license and access. To address these issues a new online resource called the Open Spectral Database (OSDB) http://osdb.info/ has been developed and is now available. Built using open source tools, using open code (hosted on GitHub), providing open data, and open to community input about design and functionality, the OSDB is available for anyone to submit spectral data, making it searchable and available to the scientific community. This paper details the concept and coding, internal architecture, export formats, Representational State Transfer (REST) Application Programming Interface and options for submission of data. The OSDB website went live in November 2015. Concurrently, the GitHub repository was made available at https://github.com/stuchalk/OSDB/, and is open for collaborators to join the project, submit issues, and contribute code. The combination of a scripting environment (PHPStorm), a PHP Framework (CakePHP), a relational database (MySQL) and a code repository (GitHub) provides all the capabilities to easily develop REST based websites for ingestion, curation and exposure of open chemical data to the community at all levels. It is hoped this software stack (or equivalent ones in other scripting languages) will be leveraged to make more chemical data available for both humans and computers.

  1. Using business intelligence to analyze and share health system infrastructure data in a rural health authority.

    Science.gov (United States)

    Haque, Waqar; Urquhart, Bonnie; Berg, Emery; Dhanoa, Ramandeep

    2014-08-06

    Health care organizations gather large volumes of data, which has been traditionally stored in legacy formats making it difficult to analyze or use effectively. Though recent government-funded initiatives have improved the situation, the quality of most existing data is poor, suffers from inconsistencies, and lacks integrity. Generating reports from such data is generally not considered feasible due to extensive labor, lack of reliability, and time constraints. Advanced data analytics is one way of extracting useful information from such data. The intent of this study was to propose how Business Intelligence (BI) techniques can be applied to health system infrastructure data in order to make this information more accessible and comprehensible for a broader group of people. An integration process was developed to cleanse and integrate data from disparate sources into a data warehouse. An Online Analytical Processing (OLAP) cube was then built to allow slicing along multiple dimensions determined by various key performance indicators (KPIs), representing population and patient profiles, case mix groups, and healthy community indicators. The use of mapping tools, customized shape files, and embedded objects further augment the navigation. Finally, Web forms provide a mechanism for remote uploading of data and transparent processing of the cube. For privileged information, access controls were implemented. Data visualization has eliminated tedious analysis through legacy reports and provided a mechanism for optimally aligning resources with needs. Stakeholders are able to visualize KPIs on a main dashboard, slice-and-dice data, generate ad hoc reports, and quickly find the desired information. In addition, comparison, availability, and service level reports can also be generated on demand. All reports can be drilled down for navigation at a finer granularity. We have demonstrated how BI techniques and tools can be used in the health care environment to make informed

  2. Sweat, Skepticism, and Uncharted Territory: A Qualitative Study of Opinions on Data Sharing Among Public Health Researchers and Research Participants in Mumbai, India.

    Science.gov (United States)

    Hate, Ketaki; Meherally, Sanna; Shah More, Neena; Jayaraman, Anuja; Bull, Susan; Parker, Michael; Osrin, David

    2015-07-01

    Efforts to internalize data sharing in research practice have been driven largely by developing international norms that have not incorporated opinions from researchers in low- and middle-income countries. We sought to identify the issues around ethical data sharing in the context of research involving women and children in urban India. We interviewed researchers, managers, and research participants associated with a Mumbai non-governmental organization, as well as researchers from other organizations and members of ethics committees. We conducted 22 individual semi-structured interviews and involved 44 research participants in focus group discussions. We used framework analysis to examine ideas about data and data sharing in general; its potential benefits or harms, barriers, obligations, and governance; and the requirements for consent. Both researchers and participants were generally in favor of data sharing, although limited experience amplified their reservations. We identified three themes: concerns that the work of data producers may not receive appropriate acknowledgment, skepticism about the process of sharing, and the fact that the terrain of data sharing was essentially uncharted and confusing. To increase data sharing in India, we need to provide guidelines, protocols, and examples of good practice in terms of consent, data preparation, screening of applications, and what individuals and organizations can expect in terms of validation, acknowledgment, and authorship. © The Author(s) 2015.

  3. Launch of Village Blue Web Application Shares Water Monitoring Data with Baltimore Community

    Science.gov (United States)

    EPA and the U.S. Geological Survey (USGS) have launched their mobile-friendly web application for Village Blue, a project that provides real-time water quality monitoring data to the Baltimore, Maryland community.

  4. Tools for Interdisciplinary Data Assimilation and Sharing in Support of Hydrologic Science

    Science.gov (United States)

    Blodgett, D. L.; Walker, J.; Suftin, I.; Warren, M.; Kunicki, T.

    2013-12-01

    Information consumed and produced in hydrologic analyses is interdisciplinary and massive. These factors put a heavy information management burden on the hydrologic science community. The U.S. Geological Survey (USGS) Office of Water Information Center for Integrated Data Analytics (CIDA) seeks to assist hydrologic science investigators with all-components of their scientific data management life cycle. Ongoing data publication and software development projects will be presented demonstrating publically available data access services and manipulation tools being developed with support from two Department of the Interior initiatives. The USGS-led National Water Census seeks to provide both data and tools in support of nationally consistent water availability estimates. Newly available data include national coverages of radar-indicated precipitation, actual evapotranspiration, water use estimates aggregated by county, and South East region estimates of streamflow for 12-digit hydrologic unit code watersheds. Web services making these data available and applications to access them will be demonstrated. Web-available processing services able to provide numerous streamflow statistics for any USGS daily flow record or model result time series and other National Water Census processing tools will also be demonstrated. The National Climate Change and Wildlife Science Center is a USGS center leading DOI-funded academic global change adaptation research. It has a mission goal to ensure data used and produced by funded projects is available via web services and tools that streamline data management tasks in interdisciplinary science. For example, collections of downscaled climate projections, typically large collections of files that must be downloaded to be accessed, are being published using web services that allow access to the entire dataset via simple web-service requests and numerous processing tools. Recent progress on this front includes, data web services for Climate

  5. Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways

    DEFF Research Database (Denmark)

    King, Zachary A.; Draeger, Andreas; Ebrahim, Ali

    2015-01-01

    Escher is a web application for visualizing data on biological pathways. Three key features make Escher a uniquely effective tool for pathway visualization. First, users can rapidly design new pathway maps. Escher provides pathway suggestions based on user data and genome-scale models, so users c...... of these features and explains how the development approach used for Escher can be used to guide the development of future visualization tools....

  6. Nanocuration workflows: Establishing best practices for identifying, inputting, and sharing data to inform decisions on nanomaterials

    Directory of Open Access Journals (Sweden)

    Christina M. Powers

    2015-09-01

    Full Text Available There is a critical opportunity in the field of nanoscience to compare and integrate information across diverse fields of study through informatics (i.e., nanoinformatics. This paper is one in a series of articles on the data curation process in nanoinformatics (nanocuration. Other articles in this series discuss key aspects of nanocuration (temporal metadata, data completeness, database integration, while the focus of this article is on the nanocuration workflow, or the process of identifying, inputting, and reviewing nanomaterial data in a data repository. In particular, the article discusses: 1 the rationale and importance of a defined workflow in nanocuration, 2 the influence of organizational goals or purpose on the workflow, 3 established workflow practices in other fields, 4 current workflow practices in nanocuration, 5 key challenges for workflows in emerging fields like nanomaterials, 6 examples to make these challenges more tangible, and 7 recommendations to address the identified challenges. Throughout the article, there is an emphasis on illustrating key concepts and current practices in the field. Data on current practices in the field are from a group of stakeholders active in nanocuration. In general, the development of workflows for nanocuration is nascent, with few individuals formally trained in data curation or utilizing available nanocuration resources (e.g., ISA-TAB-Nano. Additional emphasis on the potential benefits of cultivating nanomaterial data via nanocuration processes (e.g., capability to analyze data from across research groups and providing nanocuration resources (e.g., training will likely prove crucial for the wider application of nanocuration workflows in the scientific community.

  7. Data Sharing: A New Editorial Initiative of the International Committee of Medical Journal Editors. Implications for the Editors’ Network

    Directory of Open Access Journals (Sweden)

    Fernando Alfonso, MD

    2017-05-01

    Full Text Available The International Committee of Medical Journal Editors (ICMJE provides recommendations to improve the editorial standards and scientific quality of biomedical journals. These recommendations range from uniform technical requirements to more complex and elusive editorial issues including ethical aspects of the scientific process. Recently, registration of clinical trials, conflicts of interest disclosure, and new criteria for authorship -emphasizing the importance of responsibility and accountability-, have been proposed. Last year, a new editorial initiative to foster sharing of clinical trial data was launched. This review discusses this novel initiative with the aim of increasing awareness among readers, investigators, authors and editors belonging to the Editors’ Network of the European Society of Cardiology.

  8. Data sharing: A new editorial initiative of the International Committee of Medical Journal Editors. Implications for the editors’ network

    Directory of Open Access Journals (Sweden)

    Fernando Alfonso

    2017-06-01

    Full Text Available The International Committee of Medical Journal Editors (ICMJE provides recommendations to improve the editorial standards and scientific quality of biomedical journals. These recommendations range from uniform technical requirements to more complex and elusive editorial issues including ethical aspects of the scientific process. Recently, registration of clinical trials, conflicts of interest disclosure, and new criteria for authorship -emphasizing the importance of responsibility and accountability-, have been proposed. Last year, a new editorial initiative to foster sharing of clinical trial data was launched. This review discusses this novel initiative with the aim of increasing awareness among readers, investigators, authors and editors belonging to the Editors’ Network of the European Society of Cardiology.

  9. Enhancing interdisciplinary collaboration and decisionmaking with J-Earth: an open source data sharing, visualization and GIS analysis platform

    Science.gov (United States)

    Prashad, L. C.; Christensen, P. R.; Fink, J. H.; Anwar, S.; Dickenshied, S.; Engle, E.; Noss, D.

    2010-12-01

    Our society currently is facing a number of major environmental challenges, most notably the threat of climate change. A multifaceted, interdisciplinary approach involving physical and social scientists, engineers and decisionmakers is critical to adequately address these complex issues. To best facilitate this interdisciplinary approach, data and models at various scales - from local to global - must be quickly and easily shared between disciplines to effectively understand environmental phenomena and human-environmental interactions. When data are acquired and studied on different scales and within different disciplines, researchers and practitioners may not be able to easily learn from each others results. For example, climate change models are often developed at a global scale, while strategies that address human vulnerability to climate change and mitigation/adaptation strategies are often assessed on a local level. Linkages between urban heat island phenomena and global climate change may be better understood with increased data flow amongst researchers and those making policy decisions. In these cases it would be useful have a single platform to share, visualize, and analyze numerical model and satellite/airborne remote sensing data with social, environmental, and economic data between researchers and practitioners. The Arizona State University 100 Cities Project and Mars Space Flight Facility are developing the open source application J-Earth, with the goal of providing this single platform, that facilitates data sharing, visualization, and analysis between researchers and applied practitioners around environmental and other sustainability challenges. This application is being designed for user communities including physical and social scientists, NASA researchers, non-governmental organizations, and decisionmakers to share and analyze data at multiple scales. We are initially focusing on urban heat island and urban ecology studies, with data and users from

  10. Share Data with OPeNDAP Hyrax: New Features and Improvements

    Science.gov (United States)

    Gallagher, James

    2016-01-01

    During the upcoming Summer 2016 meeting of the ESIP Federation (July 19-22), OpenDAP will hold a Developers and Users Workshop. While a broad set of topics will be covered, a key focus is capitalizing on recent EOSDIS-sponsored advances in Hyrax, OPeNDAPs own software for server-side realization of the DAP2 and DAP4 protocols. These Hyrax advances are as important to data users as to data providers, and the workshop will include hands-on experiences of value to both. Specifically, a balanced set of presentations and hands-on tutorials will address advances in 1. server installation, 2. server configuration, 3. Hyrax aggregation capabilities, 4. support for data-access from clients that are HTTP-based, JSON-based or OGC-compliant (especially WCS and WMS), 5. support for DAP4,6.use and extension of server-side computational capabilities, and7.several performance-affecting matters. Topics 2 through 7 will be relevant to data consumers, data providers and notably, due to the open-source nature of all OPeNDAP software to developers wishing to extend Hyrax, to build compatible clients and servers, and/or to employ Hyrax as middleware that enables interoperability across a variety of end-user and source-data contexts. A session for contributed talks will elaborate the topics listed above and embrace additional ones.

  11. Step-Based Data Sharing and Exchange in One-of-a-Kind Product Collaborative Design for Cloud Manufacturing

    Directory of Open Access Journals (Sweden)

    B. M. Li

    2013-01-01

    Full Text Available With the trend for global collaboration, there is a need for collaborative design between geographically distributed teams and companies. In particular, this need is inevitable in the companies doing their business based on one-of-a-kind production (OKP. One important problem is the lack of interoperability and compatibility of data between different CAx systems. This problem is further highlighted in data exchange in cloud manufacturing. To the best of authors' knowledge, current studies have limitations in achieving the interoperability and compatibility of data. In this paper, a STEP-based data model is proposed to represent OKP product data/knowledge, which contains four categories of product knowledge (i.e., customer, product, manufacturing, and resource resp.. A STEP-based data modelling approach is proposed to describe each category of knowledge separately and then connect them to form the final integrated model. Compared with most current product models, this model includes the more complete product data/knowledge involved in OKP product development (OKPPD, and thus it can provide more adequate knowledge support for OKPPD activities. Based on the proposed STEP-based data model, a product data exchange and sharing (DES framework is proposed and developed to enable DES in collaborative OKPPD in the cloud manufacturing environment. Case studies were carried out to validate the proposed data model and DES framework.

  12. Willingness to share personal health record data for care improvement and public health: a survey of experienced personal health record users

    Directory of Open Access Journals (Sweden)

    Weitzman Elissa R

    2012-05-01

    Full Text Available Abstract Background Data stored in personally controlled health records (PCHRs may hold value for clinicians and public health entities, if patients and their families will share them. We sought to characterize consumer willingness and unwillingness (reticence to share PCHR data across health topics, and with different stakeholders, to advance understanding of this issue. Methods Cross-sectional 2009 Web survey of repeat PCHR users who were patients over 18 years old or parents of patients, to assess willingness to share their PCHR data with an-out-of-hospital provider to support care, and the state/local public health authority to support monitoring; the odds of reticence to share PCHR information about ten exemplary health topics were estimated using a repeated measures approach. Results Of 261 respondents (56% response rate, more reported they would share all information with the state/local public health authority (63.3% than with an out-of-hospital provider (54.1% (OR 1.5, 95% CI 1.1, 1.9; p = .005; few would not share any information with these parties (respectively, 7.9% and 5.2%. For public health sharing, reticence was higher for most topics compared to contagious illness (ORs 4.9 to 1.4, all p-values  Conclusions Pediatric patients and their families are often willing to share electronic health information to support health improvement, but remain cautious. Robust trust models for PCHR sharing are needed.

  13. Scalable privacy-preserving data sharing methodology for genome-wide association studies.

    Science.gov (United States)

    Yu, Fei; Fienberg, Stephen E; Slavković, Aleksandra B; Uhler, Caroline

    2014-08-01

    The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of "an attack" on GWAS data by Homer et al. (2008). Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach that provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, Uhler et al. (2013) proposed new methods to release aggregate GWAS data without compromising an individual's privacy. We extend the methods developed in Uhler et al. (2013) for releasing differentially-private χ(2)-statistics by allowing for arbitrary number of cases and controls, and for releasing differentially-private allelic test statistics. We also provide a new interpretation by assuming the controls' data are known, which is a realistic assumption because some GWAS use publicly available data as controls. We assess the performance of the proposed methods through a risk-utility analysis on a real data set consisting of DNA samples collected by the Wellcome Trust Case Control Consortium and compare the methods with the differentially-private release mechanism proposed by Johnson and Shmatikov (2013). Copyright © 2014 Elsevier Inc. All rights reserved.

  14. An overview on integrated data system for archiving and sharing marine geology and geophysical data in Korea Institute of Ocean Science & Technology (KIOST)

    Science.gov (United States)

    Choi, Sang-Hwa; Kim, Sung Dae; Park, Hyuk Min; Lee, SeungHa

    2016-04-01

    We established and have operated an integrated data system for managing, archiving and sharing marine geology and geophysical data around Korea produced from various research projects and programs in Korea Institute of Ocean Science & Technology (KIOST). First of all, to keep the consistency of data system with continuous data updates, we set up standard operating procedures (SOPs) for data archiving, data processing and converting, data quality controls, and data uploading, DB maintenance, etc. Database of this system comprises two databases, ARCHIVE DB and GIS DB for the purpose of this data system. ARCHIVE DB stores archived data as an original forms and formats from data providers for data archive and GIS DB manages all other compilation, processed and reproduction data and information for data services and GIS application services. Relational data management system, Oracle 11g, adopted for DBMS and open source GIS techniques applied for GIS services such as OpenLayers for user interface, GeoServer for application server, PostGIS and PostgreSQL for GIS database. For the sake of convenient use of geophysical data in a SEG Y format, a viewer program was developed and embedded in this system. Users can search data through GIS user interface and save the results as a report.

  15. A PUBLIC PLATFORM FOR GEOSPATIAL DATA SHARING FOR DISASTER RISK MANAGEMENT

    Directory of Open Access Journals (Sweden)

    S. Balbo

    2014-01-01

    This paper presents a case study scenario of setting up a Web platform based on GeoNode. It is a public platform called MASDAP and promoted by the Government of Malawi in order to support development of the country and build resilience against natural disasters. A substantial amount of geospatial data has already been collected about hydrogeological risk, as well as several other-disasters related information. Moreover this platform will help to ensure that the data created by a number of past or ongoing projects is maintained and that this information remains accessible and useful. An Integrated Flood Risk Management Plan for a river basin has already been included in the platform and other data fr