WorldWideScience

Sample records for developing publicly-accessible datasets

  1. The case for developing publicly-accessible datasets for health services research in the Middle East and North Africa (MENA region

    Directory of Open Access Journals (Sweden)

    El-Jardali Fadi

    2009-10-01

    Full Text Available Abstract Background The existence of publicly-accessible datasets comprised a significant opportunity for health services research to evolve into a science that supports health policy making and evaluation, proper inter- and intra-organizational decisions and optimal clinical interventions. This paper investigated the role of publicly-accessible datasets in the enhancement of health care systems in the developed world and highlighted the importance of their wide existence and use in the Middle East and North Africa (MENA region. Discussion A search was conducted to explore the availability of publicly-accessible datasets in the MENA region. Although datasets were found in most countries in the region, those were limited in terms of their relevance, quality and public-accessibility. With rare exceptions, publicly-accessible datasets - as present in the developed world - were absent. Based on this, we proposed a gradual approach and a set of recommendations to promote the development and use of publicly-accessible datasets in the region. These recommendations target potential actions by governments, researchers, policy makers and international organizations. Summary We argue that the limited number of publicly-accessible datasets in the MENA region represents a lost opportunity for the evidence-based advancement of health systems in the region. The availability and use of publicly-accessible datasets would encourage policy makers in this region to base their decisions on solid representative data and not on estimates or small-scale studies; researchers would be able to exercise their expertise in a meaningful manner to both, policy makers and the public. The population of the MENA countries would exercise the right to benefit from locally- or regionally-based studies, versus imported and in 'best cases' customized ones. Furthermore, on a macro scale, the availability of regionally comparable publicly-accessible datasets would allow for the

  2. Public Access Points, Location of public beach access along the Oregon Coast. Boat ramp locations were added to the dataset to allow users to view the location of boat ramps along the Columbia River and the Willamete River north of the Oregon City Dam., Published in 2005, 1:100000 (1in=8333ft) scale, Oregon Geospatial Enterprise Office (GEO).

    Data.gov (United States)

    NSGIC State | GIS Inventory — Public Access Points dataset current as of 2005. Location of public beach access along the Oregon Coast. Boat ramp locations were added to the dataset to allow users...

  3. Public Access ICT across Cultures

    International Development Research Centre (IDRC) Digital Library (Canada)

    3 Impact of Public Access to ICT Skills on Job Prospects in Rwanda ... reform and information and communication technology (ICT) policies, particularly in developing countries. ... (From internal memoranda prepared by Amy Mahan, 2008) ..... little decision-making autonomy, power, or financial control within the household.

  4. Open For Business? An Historical, Comparative Study of Public Access to Information about Two Controversial Coastal Developments in North-East Scotland

    Science.gov (United States)

    Baxter, Graeme

    2014-01-01

    Introduction: This paper compares public access to information about two controversial coastal developments in North-east Scotland: the construction of a gas terminal by the British Gas Council and Total in the 1970s, and the current development of "the world's greatest golf course" by the tycoon Donald Trump. Method: Data has been…

  5. Development of a SPARK Training Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  6. Development of a SPARK Training Dataset

    International Nuclear Information System (INIS)

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-01-01

    In its first five years, the National Nuclear Security Administration's (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK's intended analysis capability. The analysis demonstration sought to answer

  7. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  8. 22 CFR 214.37 - Public access to committee records.

    Science.gov (United States)

    2010-04-01

    ... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Public access to committee records. 214.37 Section 214.37 Foreign Relations AGENCY FOR INTERNATIONAL DEVELOPMENT ADVISORY COMMITTEE MANAGEMENT Operation of Advisory Committees § 214.37 Public access to committee records. Records maintained in...

  9. Public Access ICT across Cultures: Diversifying Participation in the ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    2015-06-01

    Jun 1, 2015 ... Public access venues – most often Internet cafés in cities and ... 35 years working as an economist for international development agencies. ... Birth registration is the basis for advancing gender equality and children's rights.

  10. The Wired Island: The First Two Years of Public Access To Cable Television In Manhattan.

    Science.gov (United States)

    Othmer, David

    A review is presented of the first two years of free public access programing on New York City's cable television (CATV) systems. The report provides some background information on franchising, public access to CATV in New York City, and Federal Communications Commission regulations. It also deals with the public access programing developed; it…

  11. Designing User Manuals for the Online Public Access Catalog.

    Science.gov (United States)

    Seiden, Peggy; Sullivan, Patricia

    1986-01-01

    Describes the process of developing and revising a brochure to guide library patrons in conducting an author search on an online public access catalog in order to demonstrate the application of four steps in production of a functional document--analysis; planning; development; evaluation, testing, and revision. Three sources are given. (EJS)

  12. 24 CFR 598.410 - Public access to materials and proceedings.

    Science.gov (United States)

    2010-04-01

    ... DESIGNATIONS Post-Designation Requirements § 598.410 Public access to materials and proceedings. After... 24 Housing and Urban Development 3 2010-04-01 2010-04-01 false Public access to materials and... Development (Continued) OFFICE OF ASSISTANT SECRETARY FOR COMMUNITY PLANNING AND DEVELOPMENT, DEPARTMENT OF...

  13. Online Public Access Catalogs. ERIC Fact Sheet.

    Science.gov (United States)

    Cochrane, Pauline A.

    A listing is presented of 17 documents in the ERIC database concerning the Online Catalog (sometimes referred to as OPAC or Online Public Access Catalog), a computer-based and supported library catalog designed for patron use. The database usually represents recent acquisitions and often contains information about books on order and items in…

  14. The challenges and possibilities of public access defibrillation.

    Science.gov (United States)

    Ringh, M; Hollenberg, J; Palsgaard-Moeller, T; Svensson, L; Rosenqvist, M; Lippert, F K; Wissenberg, M; Malta Hansen, C; Claesson, A; Viereck, S; Zijlstra, J A; Koster, R W; Herlitz, J; Blom, M T; Kramer-Johansen, J; Tan, H L; Beesems, S G; Hulleman, M; Olasveengen, T M; Folke, F

    2018-03-01

    Out-of-hospital cardiac arrest (OHCA) is a major health problem that affects approximately four hundred and thousand patients annually in the United States alone. It is a major challenge for the emergency medical system as decreased survival rates are directly proportional to the time delay from collapse to defibrillation. Historically, defibrillation has only been performed by physicians and in-hospital. With the development of automated external defibrillators (AEDs), rapid defibrillation by nonmedical professionals and subsequently by trained or untrained lay bystanders has become possible. Much hope has been put to the concept of Public Access Defibrillation with a massive dissemination of public available AEDs throughout most Western countries. Accordingly, current guidelines recommend that AEDs should be deployed in places with a high likelihood of OHCA. Despite these efforts, AED use is in most settings anecdotal with little effect on overall OHCA survival. The major reasons for low use of public AEDs are that most OHCAs take place outside high incidence sites of cardiac arrest and that most OHCAs take place in residential settings, currently defined as not suitable for Public Access Defibrillation. However, the use of new technology for identification and recruitment of lay bystanders and nearby AEDs to the scene of the cardiac arrest as well as new methods for strategic AED placement redefines and challenges the current concept and definitions of Public Access Defibrillation. Existing evidence of Public Access Defibrillation and knowledge gaps and future directions to improve outcomes for OHCA are discussed. In addition, a new definition of the different levels of Public Access Defibrillation is offered as well as new strategies for increasing AED use in the society. © 2018 The Association for the Publication of the Journal of Internal Medicine.

  15. Development of an online, publicly accessible naive Bayesian decision support tool for mammographic mass lesions based on the American College of Radiology (ACR) BI-RADS lexicon

    Energy Technology Data Exchange (ETDEWEB)

    Benndorf, Matthias; Kotter, Elmar; Langer, Mathias [University Hospital Freiburg, Department of Radiology, Freiburg (Germany); Herda, Christoph [Kantonsspital Graubuenden, Chur (Switzerland); Wu, Yirong; Burnside, Elizabeth S. [University of Wisconsin-Madison School of Medicine and Public Health, Department of Radiology, Madison, WI (United States)

    2015-06-01

    To develop and validate a decision support tool for mammographic mass lesions based on a standardized descriptor terminology (BI-RADS lexicon) to reduce variability of practice. We used separate training data (1,276 lesions, 138 malignant) and validation data (1,177 lesions, 175 malignant). We created naive Bayes (NB) classifiers from the training data with tenfold cross-validation. Our ''inclusive model'' comprised BI-RADS categories, BI-RADS descriptors, and age as predictive variables; our ''descriptor model'' comprised BI-RADS descriptors and age. The resulting NB classifiers were applied to the validation data. We evaluated and compared classifier performance with ROC-analysis. In the training data, the inclusive model yields an AUC of 0.959; the descriptor model yields an AUC of 0.910 (P < 0.001). The inclusive model is superior to the clinical performance (BI-RADS categories alone, P < 0.001); the descriptor model performs similarly. When applied to the validation data, the inclusive model yields an AUC of 0.935; the descriptor model yields an AUC of 0.876 (P < 0.001). Again, the inclusive model is superior to the clinical performance (P < 0.001); the descriptor model performs similarly. We consider our classifier a step towards a more uniform interpretation of combinations of BI-RADS descriptors. We provide our classifier at www.ebm-radiology.com/nbmm/index.html. (orig.)

  16. Development of an online, publicly accessible naive Bayesian decision support tool for mammographic mass lesions based on the American College of Radiology (ACR) BI-RADS lexicon

    International Nuclear Information System (INIS)

    Benndorf, Matthias; Kotter, Elmar; Langer, Mathias; Herda, Christoph; Wu, Yirong; Burnside, Elizabeth S.

    2015-01-01

    To develop and validate a decision support tool for mammographic mass lesions based on a standardized descriptor terminology (BI-RADS lexicon) to reduce variability of practice. We used separate training data (1,276 lesions, 138 malignant) and validation data (1,177 lesions, 175 malignant). We created naive Bayes (NB) classifiers from the training data with tenfold cross-validation. Our ''inclusive model'' comprised BI-RADS categories, BI-RADS descriptors, and age as predictive variables; our ''descriptor model'' comprised BI-RADS descriptors and age. The resulting NB classifiers were applied to the validation data. We evaluated and compared classifier performance with ROC-analysis. In the training data, the inclusive model yields an AUC of 0.959; the descriptor model yields an AUC of 0.910 (P < 0.001). The inclusive model is superior to the clinical performance (BI-RADS categories alone, P < 0.001); the descriptor model performs similarly. When applied to the validation data, the inclusive model yields an AUC of 0.935; the descriptor model yields an AUC of 0.876 (P < 0.001). Again, the inclusive model is superior to the clinical performance (P < 0.001); the descriptor model performs similarly. We consider our classifier a step towards a more uniform interpretation of combinations of BI-RADS descriptors. We provide our classifier at www.ebm-radiology.com/nbmm/index.html. (orig.)

  17. 22 CFR 214.51 - Administrative review of denial for public access to records.

    Science.gov (United States)

    2010-04-01

    ... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Administrative review of denial for public access to records. 214.51 Section 214.51 Foreign Relations AGENCY FOR INTERNATIONAL DEVELOPMENT ADVISORY COMMITTEE MANAGEMENT Administrative Remedies § 214.51 Administrative review of denial for public access to...

  18. Public Access to NASA's Earth Science Data

    Science.gov (United States)

    Behnke, J.; James, N.

    2013-12-01

    survey to over 80,000 users. Our staff also engages in face-to-face discussions with over 200 scientists/users via DAAC User Working Group meetings. Using the feedback from the survey, face-to-face meetings and from our metrics, we determine how to change the system to meet the needs of our user community. We will be presenting on our best practices and methods for improving public access to the NASA EOSDIS data collection.

  19. Recent Development on the NOAA's Global Surface Temperature Dataset

    Science.gov (United States)

    Zhang, H. M.; Huang, B.; Boyer, T.; Lawrimore, J. H.; Menne, M. J.; Rennie, J.

    2016-12-01

    Global Surface Temperature (GST) is one of the most widely used indicators for climate trend and extreme analyses. A widely used GST dataset is the NOAA merged land-ocean surface temperature dataset known as NOAAGlobalTemp (formerly MLOST). The NOAAGlobalTemp had recently been updated from version 3.5.4 to version 4. The update includes a significant improvement in the ocean surface component (Extended Reconstructed Sea Surface Temperature or ERSST, from version 3b to version 4) which resulted in an increased temperature trends in recent decades. Since then, advancements in both the ocean component (ERSST) and land component (GHCN-Monthly) have been made, including the inclusion of Argo float SSTs and expanded EOT modes in ERSST, and the use of ISTI databank in GHCN-Monthly. In this presentation, we describe the impact of those improvements on the merged global temperature dataset, in terms of global trends and other aspects.

  20. participatory development of a minimum dataset for the khayelitsha ...

    African Journals Online (AJOL)

    This dataset was integrated with data requirements at ... model for defining health information needs at district level. This participatory process has enabled health workers to appraise their .... of reproductive health, mental health, disability and community ... each chose a facilitator and met in between the forum meetings.

  1. National Scale Marine Geophysical Data Portal for the Israel EEZ with Public Access Web-GIS Platform

    Science.gov (United States)

    Ketter, T.; Kanari, M.; Tibor, G.

    2017-12-01

    Recent offshore discoveries and regulation in the Israel Exclusive Economic Zone (EEZ) are the driving forces behind increasing marine research and development initiatives such as infrastructure development, environmental protection and decision making among many others. All marine operations rely on existing seabed information, while some also generate new data. We aim to create a single platform knowledge-base to enable access to existing information, in a comprehensive, publicly accessible web-based interface. The Israel EEZ covers approx. 26,000 sqkm and has been surveyed continuously with various geophysical instruments over the past decades, including 10,000 km of multibeam survey lines, 8,000 km of sub-bottom seismic lines, and hundreds of sediment sampling stations. Our database consists of vector and raster datasets from multiple sources compiled into a repository of geophysical data and metadata, acquired nation-wide by several research institutes and universities. The repository will enable public access via a web portal based on a GIS platform, including datasets from multibeam, sub-bottom profiling, single- and multi-channel seismic surveys and sediment sampling analysis. Respective data products will also be available e.g. bathymetry, substrate type, granulometry, geological structure etc. Operating a web-GIS based repository allows retrieval of pre-existing data for potential users to facilitate planning of future activities e.g. conducting marine surveys, construction of marine infrastructure and other private or public projects. User interface is based on map oriented spatial selection, which will reveal any relevant data for designated areas of interest. Querying the database will allow the user to obtain information about the data owner and to address them for data retrieval as required. Wide and free public access to existing data and metadata can save time and funds for academia, government and commercial sectors, while aiding in cooperation

  2. PUBLIC ACCESS TO PRIVATE LAND IN SCOTLAND

    Directory of Open Access Journals (Sweden)

    David L Carey Miller

    2012-08-01

    Full Text Available This article attempts to understand the radical reform of Scottish land law in its provision for a general right of public access to private land introduced in 2003 as part of land reform legislation, an important aspect of the initial agenda of the Scottish Parliament revived in 1999. The right is to recreational access for a limited period and the right to cross land. Access can be taken only on foot or by horse or bicycle. As a starting point clarification of the misunderstood pre-reform position is attempted. The essential point is that Scots common law does not give civil damages for a simple act of trespass (as English law does but only a right to obtain removal of the trespasser. Under the reforms the longstanding Scottish position of landowners allowing walkers access to the hills and mountains becomes a legal right. A critical aspect of the new right is that it is one of responsible access; provided a landowner co-operates with the spirit and system of the Act access can be denied on the basis that it is not being exercised responsibly. But the onus is on the landowner to show that the exercise of the right is not responsible.Although the right applies to all land a general exception protects the privacy of a domestic dwelling. Early case law suggests that the scope of this limit depends upon particular circumstances although reasonable 'garden ground' is likely to be protected. There are various particular limits such as school land.Compliance with the protection of property under the European Convention on Human Rights is discussed. The article emphasises the latitude, open to nations, for limitations to the right of ownership in land in the public interest. The extent of the Scottish access inroad illustrates this. This leads to the conclusion that 'land governance' – the subject of the Potchefstroom Conference at which the paper was initially presented – largely remains a matter for domestic law; the lex situs concept is alive

  3. Investigating the Social and Economic Impact of Public Access to ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    The Bill and Melinda Gates Foundation has invested in a number of public access to information and communication technology (ICT) ... University of Washington ... It is now generally accepted that affordable, effective telecommunication ...

  4. Multimedia Content Development as a Facial Expression Datasets for Recognition of Human Emotions

    Science.gov (United States)

    Mamonto, N. E.; Maulana, H.; Liliana, D. Y.; Basaruddin, T.

    2018-02-01

    Datasets that have been developed before contain facial expression from foreign people. The development of multimedia content aims to answer the problems experienced by the research team and other researchers who will conduct similar research. The method used in the development of multimedia content as facial expression datasets for human emotion recognition is the Villamil-Molina version of the multimedia development method. Multimedia content developed with 10 subjects or talents with each talent performing 3 shots with each capturing talent having to demonstrate 19 facial expressions. After the process of editing and rendering, tests are carried out with the conclusion that the multimedia content can be used as a facial expression dataset for recognition of human emotions.

  5. Metropolitan natural area protection to maximize public access and species representation

    Science.gov (United States)

    Jane A. Ruliffson; Robert G. Haight; Paul H. Gobster; Frances R. Homans

    2003-01-01

    In response to widespread urban development, local governments in metropolitan areas in the United States acquire and protect privately-owned open space. We addressed the planner's problem of allocating a fixed budget for open space protection among eligible natural areas with the twin objectives of maximizing public access and species representation. Both...

  6. [Research on developping the spectral dataset for Dunhuang typical colors based on color constancy].

    Science.gov (United States)

    Liu, Qiang; Wan, Xiao-Xia; Liu, Zhen; Li, Chan; Liang, Jin-Xing

    2013-11-01

    The present paper aims at developping a method to reasonably set up the typical spectral color dataset for different kinds of Chinese cultural heritage in color rendering process. The world famous wall paintings dating from more than 1700 years ago in Dunhuang Mogao Grottoes was taken as typical case in this research. In order to maintain the color constancy during the color rendering workflow of Dunhuang culture relics, a chromatic adaptation based method for developping the spectral dataset of typical colors for those wall paintings was proposed from the view point of human vision perception ability. Under the help and guidance of researchers in the art-research institution and protection-research institution of Dunhuang Academy and according to the existing research achievement of Dunhuang Research in the past years, 48 typical known Dunhuang pigments were chosen and 240 representative color samples were made with reflective spectral ranging from 360 to 750 nm was acquired by a spectrometer. In order to find the typical colors of the above mentioned color samples, the original dataset was devided into several subgroups by clustering analysis. The grouping number, together with the most typical samples for each subgroup which made up the firstly built typical color dataset, was determined by wilcoxon signed rank test according to the color inconstancy index comprehensively calculated under 6 typical illuminating conditions. Considering the completeness of gamut of Dunhuang wall paintings, 8 complementary colors was determined and finally the typical spectral color dataset was built up which contains 100 representative spectral colors. The analytical calculating results show that the median color inconstancy index of the built dataset in 99% confidence level by wilcoxon signed rank test was 3.28 and the 100 colors are distributing in the whole gamut uniformly, which ensures that this dataset can provide reasonable reference for choosing the color with highest

  7. The challenges and possibilities of public access defibrillation

    DEFF Research Database (Denmark)

    Ringh, Mattias; Hollenberg, Jacob; Palsgaard-Moeller, Thea

    2018-01-01

    . Much hope has been put to the concept of Public Access Defibrillation with a massive dissemination of public available AEDs throughout most western countries. Accordingly, current guidelines recommend that AEDs should be deployed in places with a high likelihood of OHCA. Despite these efforts, AED use...... is in most settings anecdotal with little effect on overall OHCA survival. The major reasons for low use of public AEDs are that most OHCA take place outside high incidence sites of cardiac arrest and that most OHCAs take place in residential settings, currently defined as not suitable for Public Access...... Defibrillation. However, the use of new technology for identification and recruitment of lay bystanders and nearby AEDs to the scene of the cardiac arrest as well as new methods for strategic AED placement redefines and challenges the current concept and definitions of Public Access Defibrillation. Existing...

  8. The Battle to Secure Our Public Access Computers

    Science.gov (United States)

    Sendze, Monique

    2006-01-01

    Securing public access workstations should be a significant part of any library's network and information-security strategy because of the sensitive information patrons enter on these workstations. As the IT manager for the Johnson County Library in Kansas City, Kan., this author is challenged to make sure that thousands of patrons get the access…

  9. Economic assessment of strategies to deploy publicly accessible charging infrastructure

    OpenAIRE

    Madina, Carlos; Barlag, Heike; Coppola, Giovanni; Gómez-Arriola, Inés; Rodríguez-Sánchez, Raúl; Zabala, Eduardo

    2015-01-01

    From the end user perspective, the main barriers for widespread electric vehicle (EV) adoption are high purchase cost and range anxiety, both regarding battery capacity and availability of accessible EV charging infrastructure. Governments and public bodies in general are taking steps towards overcoming these barriers by, among others, setting up regulatory requirements regarding standardisation, customer information and recommending objectives of publicly accessible charging infrastructure. ...

  10. Public Access to Government Electronic Information. Policy Framework.

    Science.gov (United States)

    Bulletin of the American Society for Information Science, 1992

    1992-01-01

    This policy framework provides guidelines for federal agencies on public access to government electronic information. Highlights include reasons for disseminating information; defining user groups; which technology to use; pricing flexibility; security and privacy issues; and the private sector and state and local government roles. (LRW)

  11. Public access to private land in Scotland | Miller | Potchefstroom ...

    African Journals Online (AJOL)

    This article attempts to understand the radical reform of Scottish land law in its provision for a general right of public access to private land introduced in 2003 as part of land reform legislation, an important aspect of the initial agenda of the Scottish Parliament revived in 1999. The right is to recreational access for a limited ...

  12. Extending the Online Public Access Catalog into the Microcomputer Environment.

    Science.gov (United States)

    Sutton, Brett

    1990-01-01

    Describes PCBIS, a database program for MS-DOS microcomputers that features a utility for automatically converting online public access catalog search results stored as text files into structured database files that can be searched, sorted, edited, and printed. Topics covered include the general features of the program, record structure, record…

  13. Support Services for Remote Users of Online Public Access Catalogs.

    Science.gov (United States)

    Kalin, Sally W.

    1991-01-01

    Discusses the needs of remote users of online public access catalogs (OPACs). User expectations are discussed; problems encountered by remote-access users are examined, including technical problems and searching problems; support services are described, including instruction, print guides, and online help; and differences from the needs of…

  14. Training Concerns for an Online Public Access Catalog.

    Science.gov (United States)

    Rockman, Ilene F.; Adalian, Paul T., Jr.

    This report is designed to raise issues and concerns which will affect the successful implementation of an education and training program once an online public access catalog (OLPAC) has been installed in the Kennedy Library at California Polytechnic State University (Cal Poly), San Luis Obispo. Information presented in the document was gathered…

  15. Public Access; Public Interest. The Network Project. Notebook Number 11.

    Science.gov (United States)

    Columbia Univ., New York, NY. Network Project.

    The transcript of a panel discussion and an essay on public access to and control of society's information resources are presented. It is contended that the electronic Media--including radio, television, and communication satellites--are controlled by a select group of individuals and corporations and that they are not meeting the public interest.…

  16. Public access to Indian geographical data - Guest editorial

    Digital Repository Service at National Institute of Oceanography (India)

    Narasimha, R.; Shetye, S.R.

    . 79, NO. 4, 25 AUGUST 2000 391 In this issue Public access to Indian Geographical Data In late 1997 the Indian Academy of Sciences set up a Panel on Scientific Data of Public Interest to examine all issues concerning scientific data... available in the country. The mem- bers of the Panel are R. Narasimha (Chairman), S. Dhawan, M. Gadgil, I. Nath, S. Shetye (Secretary). In early 1998 the Panel issued a public announcement outlining its concerns and inviting suggestions on how...

  17. Developing a Resource for Implementing ArcSWAT Using Global Datasets

    Science.gov (United States)

    Taggart, M.; Caraballo Álvarez, I. O.; Mueller, C.; Palacios, S. L.; Schmidt, C.; Milesi, C.; Palmer-Moloney, L. J.

    2015-12-01

    This project developed a comprehensive user manual outlining methods for adapting and implementing global datasets for use within ArcSWAT for international and worldwide applications. The Soil and Water Assessment Tool (SWAT) is a hydrologic model that looks at a number of hydrologic variables including runoff and the chemical makeup of water at a given location on the Earth's surface using Digital Elevation Models (DEM), land cover, soil, and weather data. However, the application of ArcSWAT for projects outside of the United States is challenging as there is no standard framework for inputting global datasets into ArcSWAT. This project aims to remove this obstacle by outlining methods for adapting and implementing these global datasets via the user manual. The manual takes the user through the processes of data conditioning while providing solutions and suggestions for common errors. The efficacy of the manual was explored using examples from watersheds located in Puerto Rico, Mexico and Western Africa. Each run explored the various options for setting up a ArcSWAT project as well as a range of satellite data products and soil databases. Future work will incorporate in-situ data for validation and calibration of the model and outline additional resources to assist future users in efficiently implementing the model for worldwide applications. The capacity to manage and monitor freshwater availability is of critical importance in both developed and developing countries. As populations grow and climate changes, both the quality and quantity of freshwater are affected resulting in negative impacts on the health of the surrounding population. The use of hydrologic models such as ArcSWAT can help stakeholders and decision makers understand the future impacts of these changes enabling informed and substantiated decisions.

  18. Development of an Operational TS Dataset Production System for the Data Assimilation System

    Science.gov (United States)

    Kim, Sung Dae; Park, Hyuk Min; Kim, Young Ho; Park, Kwang Soon

    2017-04-01

    An operational TS (Temperature and Salinity) dataset production system was developed to provide near real-time data to the data assimilation system periodically. It collects the latest 15 days' TS data of the north western pacific area (20°N - 55°N, 110°E - 150°E), applies QC tests to the archived data and supplies them to numerical prediction models of KIOST (Korea Institute of Ocean Science and Technology). The latest real-time TS data are collected from Argo GDAC and GTSPP data server every week. Argo data are downloaded from /latest_data directory of Argo GDAC. Because many duplicated data exist when all profile data are extracted from all Argo netCDF files, DB system is used to avoid duplication. All metadata (float ID, location, observation date and time, etc) of all Argo floats is stored into Database system and a Matlab program was developed to manipulate DB data, to check the duplication and to exclude duplicated data. GTSPP data are downloaded from /realtime directory of GTSPP data service. The latest data except ARGO data are extracted from the original data. Another Matlab program was coded to inspect all collected data using 10 QC tests and produce final dataset which can be used by the assimilation system. Three regional range tests to inspect annual, seasonal and monthly variations are included in the QC procedures. The C program was developed to provide regional ranges to data managers. It can calculate upper limit and lower limit of temperature and salinity at depth from 0 to 1550m. The final TS dataset contains the latest 15 days' TS data in netCDF format. It is updated every week and transmitted to numerical modeler of KIOST for operational use.

  19. DEVELOPING VERIFICATION SYSTEMS FOR BUILDING INFORMATION MODELS OF HERITAGE BUILDINGS WITH HETEROGENEOUS DATASETS

    Directory of Open Access Journals (Sweden)

    L. Chow

    2017-08-01

    Full Text Available The digitization and abstraction of existing buildings into building information models requires the translation of heterogeneous datasets that may include CAD, technical reports, historic texts, archival drawings, terrestrial laser scanning, and photogrammetry into model elements. In this paper, we discuss a project undertaken by the Carleton Immersive Media Studio (CIMS that explored the synthesis of heterogeneous datasets for the development of a building information model (BIM for one of Canada’s most significant heritage assets – the Centre Block of the Parliament Hill National Historic Site. The scope of the project included the development of an as-found model of the century-old, six-story building in anticipation of specific model uses for an extensive rehabilitation program. The as-found Centre Block model was developed in Revit using primarily point cloud data from terrestrial laser scanning. The data was captured by CIMS in partnership with Heritage Conservation Services (HCS, Public Services and Procurement Canada (PSPC, using a Leica C10 and P40 (exterior and large interior spaces and a Faro Focus (small to mid-sized interior spaces. Secondary sources such as archival drawings, photographs, and technical reports were referenced in cases where point cloud data was not available. As a result of working with heterogeneous data sets, a verification system was introduced in order to communicate to model users/viewers the source of information for each building element within the model.

  20. Developing Verification Systems for Building Information Models of Heritage Buildings with Heterogeneous Datasets

    Science.gov (United States)

    Chow, L.; Fai, S.

    2017-08-01

    The digitization and abstraction of existing buildings into building information models requires the translation of heterogeneous datasets that may include CAD, technical reports, historic texts, archival drawings, terrestrial laser scanning, and photogrammetry into model elements. In this paper, we discuss a project undertaken by the Carleton Immersive Media Studio (CIMS) that explored the synthesis of heterogeneous datasets for the development of a building information model (BIM) for one of Canada's most significant heritage assets - the Centre Block of the Parliament Hill National Historic Site. The scope of the project included the development of an as-found model of the century-old, six-story building in anticipation of specific model uses for an extensive rehabilitation program. The as-found Centre Block model was developed in Revit using primarily point cloud data from terrestrial laser scanning. The data was captured by CIMS in partnership with Heritage Conservation Services (HCS), Public Services and Procurement Canada (PSPC), using a Leica C10 and P40 (exterior and large interior spaces) and a Faro Focus (small to mid-sized interior spaces). Secondary sources such as archival drawings, photographs, and technical reports were referenced in cases where point cloud data was not available. As a result of working with heterogeneous data sets, a verification system was introduced in order to communicate to model users/viewers the source of information for each building element within the model.

  1. Developing predictive imaging biomarkers using whole-brain classifiers: Application to the ABIDE I dataset

    Directory of Open Access Journals (Sweden)

    Swati Rane

    2017-03-01

    Full Text Available We designed a modular machine learning program that uses functional magnetic resonance imaging (fMRI data in order to distinguish individuals with autism spectrum disorders from neurodevelopmentally normal individuals. Data was selected from the Autism Brain Imaging Dataset Exchange (ABIDE I Preprocessed Dataset.

  2. Mesoscale atmospheric modelling technology as a tool for the long-term meteorological dataset development

    Science.gov (United States)

    Platonov, Vladimir; Kislov, Alexander; Rivin, Gdaly; Varentsov, Mikhail; Rozinkina, Inna; Nikitin, Mikhail; Chumakov, Mikhail

    2017-04-01

    The detailed hydrodynamic modelling of meteorological parameters during the last 30 years (1985 - 2014) was performed for the Okhotsk Sea and the Sakhalin island regions. The regional non-hydrostatic atmospheric model COSMO-CLM used for this long-term simulation with 13.2, 6.6 and 2.2 km horizontal resolutions. The main objective of creation this dataset was the outlook of the investigation of statistical characteristics and the physical mechanisms of extreme weather events (primarily, wind speed extremes) on the small spatio-temporal scales. COSMO-CLM is the climate version of the well-known mesoscale COSMO model, including some modifications and extensions adapting to the long-term numerical experiments. The downscaling technique was realized and developed for the long-term simulations with three consequent nesting domains. ERA-Interim reanalysis ( 0.75 degrees resolution) used as global forcing data for the starting domain ( 13.2 km horizontal resolution), then these simulation data used as initial and boundary conditions for the next model runs over the domain with 6.6 km resolution, and similarly, for the next step to 2.2 km domain. Besides, the COSMO-CLM model configuration for 13.2 km run included the spectral nudging technique, i.e. an additional assimilation of reanalysis data not only at boundaries, but also inside the whole domain. Practically, this computational scheme realized on the SGI Altix 4700 supercomputer system in the Main Computer Center of Roshydromet and used 2,400 hours of CPU time total. According to modelling results, the verification of the obtained dataset was performed on the observation data. Estimations showed the mean error -0.5 0C, up to 2 - 3 0C RMSE in temperature, and overestimation in wind speed (RMSE is up to 2 m/s). Overall, analysis showed that the used downscaling technique with applying the COSMO-CLM model reproduced the meteorological conditions, spatial distribution, seasonal and synoptic variability of temperature and

  3. Intellectual property rights vs. public access rights: ethical aspects of the DeCSS decryptation program

    Directory of Open Access Journals (Sweden)

    Robert Vaagan

    2005-01-01

    Full Text Available Introduction. In 1999-2000, a Norwegian youth cracked a DVD-access code and published a decryptation program on the Internet. He was sued by the US DVD Copy Control Association (DVD-CCA and the Norwegian Motion Picture Association (MAP, allies of the US Motion Picture Association of America (MPAA, arrested by Norwegian police and charged with data crime. Two Norwegian court rulings in 2003 unanimously ruled that the program did not amount to a breach of Norwegian law, and he was fully acquitted. In the US, there have been related cases, some with other outcomes. Method. Based on a theoretical framework developed by Zwass, the paper discusses these court rulings and the wider issues of intellectual property rights versus public access rights. Analysis. The DVD-Jon case illustrates that intellectual property rights can conflict with public access rights, as the struggle between proprietary software and public domain software, as well as the SPARC and Open Archives Initiative reflect. Results. An assessment of the DVD-Jon case based on the Zwass framework does not give a clear information ethics answer. The analysis depends on whether one ascribes to consequentialist (e.g., utilitarian or de-ontological reflection, and also on which side of the digital gap is to be accorded most weight. Conclusion. While copyright interests are being legally strengthened, there may be ethically- grounded access rights that outweigh property rights.

  4. Development of Gridded Ensemble Precipitation and Temperature Datasets for the Contiguous United States Plus Hawai'i and Alaska

    Science.gov (United States)

    Newman, A. J.; Clark, M. P.; Nijssen, B.; Wood, A.; Gutmann, E. D.; Mizukami, N.; Longman, R. J.; Giambelluca, T. W.; Cherry, J.; Nowak, K.; Arnold, J.; Prein, A. F.

    2016-12-01

    Gridded precipitation and temperature products are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Despite this inherent uncertainty, uncertainty is typically not included, or is a specific addition to each dataset without much general applicability across different datasets. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. To address this gap, we have developed a first of its kind gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012 over the United States (including Alaska and Hawaii). A longer, higher resolution version (1970-present, 1/16th degree) has also been implemented to support real-time hydrologic- monitoring and prediction in several regional US domains. We will present the development and evaluation of the dataset, along with initial applications of the dataset for ensemble data assimilation and probabilistic evaluation of high resolution regional climate model simulations. We will also present results on the new high resolution products for Alaska and Hawaii (2 km and 250 m respectively), to complete the first ensemble observation based product suite for the entire 50 states. Finally, we will present plans to improve the ensemble dataset, focusing on efforts to improve the methods used for station interpolation and ensemble generation, as well as methods to fuse station data with numerical weather prediction model output.

  5. Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments.

    Science.gov (United States)

    Keuleers, Emmanuel; Balota, David A

    2015-01-01

    This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics. Finally, as is the case across the papers, we highlight some methodological issues that are brought forth via the analyses of such datasets.

  6. Long Hole Film Cooling Dataset for CFD Development . Part 1; Infrared Thermography and Thermocouple Surveys

    Science.gov (United States)

    Shyam, Vikram; Thurman, Douglas; Poinsatte, Phillip; Ameri, Ali; Eichele, Peter; Knight, James

    2013-01-01

    An experiment investigating flow and heat transfer of long (length to diameter ratio of 18) cylindrical film cooling holes has been completed. In this paper, the thermal field in the flow and on the surface of the film cooled flat plate is presented for nominal freestream turbulence intensities of 1.5 and 8 percent. The holes are inclined at 30deg above the downstream direction, injecting chilled air of density ratio 1.0 onto the surface of a flat plate. The diameter of the hole is 0.75 in. (0.01905 m) with center to center spacing (pitch) of 3 hole diameters. Coolant was injected into the mainstream flow at nominal blowing ratios of 0.5, 1.0, 1.5, and 2.0. The Reynolds number of the freestream was approximately 11,000 based on hole diameter. Thermocouple surveys were used to characterize the thermal field. Infrared thermography was used to determine the adiabatic film effectiveness on the plate. Hotwire anemometry was used to provide flowfield physics and turbulence measurements. The results are compared to existing data in the literature. The aim of this work is to produce a benchmark dataset for Computational Fluid Dynamics (CFD) development to eliminate the effects of hole length to diameter ratio and to improve resolution in the near-hole region. In this report, a Time-Filtered Navier Stokes (TFNS), also known as Partially Resolved Navier Stokes (PRNS), method that was implemented in the Glenn-HT code is used to model coolant-mainstream interaction. This method is a high fidelity unsteady method that aims to represent large scale flow features and mixing more accurately.

  7. Review and Analysis of Algorithmic Approaches Developed for Prognostics on CMAPSS Dataset

    Science.gov (United States)

    2014-12-23

    training datasets that have complete (run-to-failure) tra- jectories. Using data with complete trajectories gives access to the true End-of-Life ( EOL ) to...2010) as the actual EOL is known apriori. This allows testing the critical time aspect of a prediction in addi- tion to accuracy and precision

  8. A novel industry grade dataset for fault prediction based on model-driven developed automotive embedded software

    NARCIS (Netherlands)

    Altinger, H.; Siegl, S.; Dajsuren, Y.; Wotawa, F.

    2015-01-01

    In this paper, we present a novel industry dataset on static software and change metrics for Matlab/Simulink models and their corresponding auto-generated C source code. The data set comprises data of three automotive projects developed and tested accordingly to industry standards and restrictive

  9. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  10. Datasets will not be made accessible to the public due to the fact that they include household level data with PII.

    Data.gov (United States)

    U.S. Environmental Protection Agency — Datasets will not be made accessible to the public due to the fact that they include household level data with PII. This dataset is not publicly accessible because:...

  11. Significant differences between the Nordic laws on public access to documents

    DEFF Research Database (Denmark)

    Jørgensen, Oluf

    2017-01-01

    Transparency and public access to information work as a check on the exercise of power and the existence of corruption. In Sweden the constitutional right of access to documents is justified precisely by its contribution to democracy, the rule of law and efficiency in the public administration....... The wide access to information in today’s world also makes possible the publication of personal information about individuals’ private life in an unprecedented way. Does this mean that the relative importance of the protection of privacy has to be strengthened at the cost of access to information? What...... will be the impact of the developing information and communication technology on access to information? The right of access to documents has traditionally been discussed on the level of domestic administration but when public administration is internationalised the issue of access to documents makes itself felt also...

  12. Development and Application of Improved Long-Term Datasets of Surface Hydrology for Texas

    Directory of Open Access Journals (Sweden)

    Kyungtae Lee

    2017-01-01

    Full Text Available Freshwater availability and agricultural production are key factors for sustaining the fast growing population and economy in the state of Texas, which is the third largest state in terms of agricultural production in the United States. This paper describes a long-term (1918–2011 grid-based (1/8° surface hydrological dataset for Texas at a daily time step based on simulations from the Variable Infiltration Capacity (VIC hydrological model. The model was calibrated and validated against observed streamflow over 10 Texas river basins. The simulated soil moisture was also evaluated using in situ observations. Results suggest that there is a decreasing trend in precipitation and an increasing trend in temperature in most of the basins. Droughts and floods were reconstructed and analyzed. In particular, the spatially distributed severity and duration of major Texas droughts were compared to identify new characteristics. The modeled flood recurrence interval and the return period were also compared with observations. Results suggest the performance of extreme flood simulations needs further improvement. This dataset is expected to serve as a benchmark which may contribute to water resources management and to mitigating agricultural drought, especially in the context of understanding the effects of climate change on crop yield in Texas.

  13. Economic growth and decline in mortality in developing countries: an analysis of the World Bank development datasets.

    Science.gov (United States)

    Renton, A; Wall, M; Lintott, J

    2012-07-01

    The 1999 World Bank report claimed that growth in gross domestic product (GDP) between 1960 and 1990 only accounted for 15% of concomitant growth in life expectancy in developing countries. These findings were used repeatedly by the World Health Organization (WHO) to support a policy shift away from promoting social and economic development, towards vertical technology-driven programmes. This paper updates the 1999 World Bank report using the World Bank's 2005 dataset, providing a new assessment of the relative contribution of economic growth. Time-series analysis. Cross-sectional time-series regression analysis using a random effect model of associations between GDP, education and technical progress and improved health outcomes. The proportion of improvement in health indicators between 1970 and 2000 associated with changes in GDP, education and technical progress was estimated. In 1970, a 1% difference in GDP between countries was associated with 6% difference in female (LEBF) and 5% male (LEBM) life expectancy at birth. By 2000, these values had increased to 14% and 12%, explaining most of the observed health gain. Excluding Europe and Central Asia, the proportion of the increase in LEBF and LEBM attributable to increased GDP was 31% and 33% in the present analysis, vs. 17% and 14%, respectively, estimated by the World Bank. In the poorest countries, higher GDPs were required in 2000 than in 1970 to achieve the same health outcomes. In the poorest countries, socio-economic change is likely to be a more important source of health improvement than technical progress. Technical progress, operating by increasing the size of the effect of a unit of GDP on health, is likely to benefit richer countries more than poorer countries, thereby increasing global health inequalities. Copyright © 2012 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.

  14. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8–10. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  15. Development of an online radiology case review system featuring interactive navigation of volumetric image datasets using advanced visualization techniques

    International Nuclear Information System (INIS)

    Yang, Hyun Kyung; Kim, Boh Kyoung; Jung, Ju Hyun; Kang, Heung Sik; Lee, Kyoung Ho; Woo, Hyun Soo; Jo, Jae Min; Lee, Min Hee

    2015-01-01

    To develop an online radiology case review system that allows interactive navigation of volumetric image datasets using advanced visualization techniques. Our Institutional Review Board approved the use of the patient data and waived the need for informed consent. We determined the following system requirements: volumetric navigation, accessibility, scalability, undemanding case management, trainee encouragement, and simulation of a busy practice. The system comprised a case registry server, client case review program, and commercially available cloud-based image viewing system. In the pilot test, we used 30 cases of low-dose abdomen computed tomography for the diagnosis of acute appendicitis. In each case, a trainee was required to navigate through the images and submit answers to the case questions. The trainee was then given the correct answers and key images, as well as the image dataset with annotations on the appendix. After evaluation of all cases, the system displayed the diagnostic accuracy and average review time, and the trainee was asked to reassess the failed cases. The pilot system was deployed successfully in a hands-on workshop course. We developed an online radiology case review system that allows interactive navigation of volumetric image datasets using advanced visualization techniques

  16. Development of an online radiology case review system featuring interactive navigation of volumetric image datasets using advanced visualization techniques

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Hyun Kyung; Kim, Boh Kyoung; Jung, Ju Hyun; Kang, Heung Sik; Lee, Kyoung Ho [Dept. of Radiology, Seoul National University Bundang Hospital, Seongnam (Korea, Republic of); Woo, Hyun Soo [Dept. of Radiology, SMG-SNU Boramae Medical Center, Seoul (Korea, Republic of); Jo, Jae Min [Dept. of Computer Science and Engineering, Seoul National University, Seoul (Korea, Republic of); Lee, Min Hee [Dept. of Radiology, Soonchunhyang University Bucheon Hospital, Bucheon (Korea, Republic of)

    2015-11-15

    To develop an online radiology case review system that allows interactive navigation of volumetric image datasets using advanced visualization techniques. Our Institutional Review Board approved the use of the patient data and waived the need for informed consent. We determined the following system requirements: volumetric navigation, accessibility, scalability, undemanding case management, trainee encouragement, and simulation of a busy practice. The system comprised a case registry server, client case review program, and commercially available cloud-based image viewing system. In the pilot test, we used 30 cases of low-dose abdomen computed tomography for the diagnosis of acute appendicitis. In each case, a trainee was required to navigate through the images and submit answers to the case questions. The trainee was then given the correct answers and key images, as well as the image dataset with annotations on the appendix. After evaluation of all cases, the system displayed the diagnostic accuracy and average review time, and the trainee was asked to reassess the failed cases. The pilot system was deployed successfully in a hands-on workshop course. We developed an online radiology case review system that allows interactive navigation of volumetric image datasets using advanced visualization techniques.

  17. 76 FR 68518 - Request for Information: Public Access to Peer-Reviewed Scholarly Publications Resulting From...

    Science.gov (United States)

    2011-11-04

    ... archiving publications and making them publically accessible be used to grow the economy and improve the... can ensure long-term stewardship if content is distributed across multiple private sources? (4) Are...

  18. 76 FR 80418 - Request for Information: Public Access to Peer-Reviewed Scholarly Publications Resulting From...

    Science.gov (United States)

    2011-12-23

    ... archiving publications and making them publically accessible be used to grow the economy and improve the... can ensure long-term stewardship if content is distributed across multiple private sources? (4) Are...

  19. What Use Patterns Were Observed for PEV Drivers at Publicly Accessible AC Level 2 EVSE Sites?

    Energy Technology Data Exchange (ETDEWEB)

    Francfort, James Edward [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2015-12-01

    The EV Project deployed over 4,000 ACL2 EVSE for drivers to charge their plug-in electric vehicle (PEV) when away-from-home. The vast majority of these EVSE stations were installed to be available to all PEV drivers at publicly accessible locations. Some were also deployed for use at workplaces and fleets. This paper examines only the use patterns of PEV drivers using the EVSE intended to be publicly accessible.

  20. Development of a gridded meteorological dataset over Java island, Indonesia 1985-2014.

    Science.gov (United States)

    Yanto; Livneh, Ben; Rajagopalan, Balaji

    2017-05-23

    We describe a gridded daily meteorology dataset consisting of precipitation, minimum and maximum temperature over Java Island, Indonesia at 0.125°×0.125° (~14 km) resolution spanning 30 years from 1985-2014. Importantly, this data set represents a marked improvement from existing gridded data sets over Java with higher spatial resolution, derived exclusively from ground-based observations unlike existing satellite or reanalysis-based products. Gap-infilling and gridding were performed via the Inverse Distance Weighting (IDW) interpolation method (radius, r, of 25 km and power of influence, α, of 3 as optimal parameters) restricted to only those stations including at least 3,650 days (~10 years) of valid data. We employed MSWEP and CHIRPS rainfall products in the cross-validation. It shows that the gridded rainfall presented here produces the most reasonable performance. Visual inspection reveals an increasing performance of gridded precipitation from grid, watershed to island scale. The data set, stored in a network common data form (NetCDF), is intended to support watershed-scale and island-scale studies of short-term and long-term climate, hydrology and ecology.

  1. Applying Advances in GPM Radiometer Intercalibration and Algorithm Development to a Long-Term TRMM/GPM Global Precipitation Dataset

    Science.gov (United States)

    Berg, W. K.

    2016-12-01

    The Global Precipitation Mission (GPM) Core Observatory, which was launched in February of 2014, provides a number of advances for satellite monitoring of precipitation including a dual-frequency radar, high frequency channels on the GPM Microwave Imager (GMI), and coverage over middle and high latitudes. The GPM concept, however, is about producing unified precipitation retrievals from a constellation of microwave radiometers to provide approximately 3-hourly global sampling. This involves intercalibration of the input brightness temperatures from the constellation radiometers, development of an apriori precipitation database using observations from the state-of-the-art GPM radiometer and radars, and accounting for sensor differences in the retrieval algorithm in a physically-consistent way. Efforts by the GPM inter-satellite calibration working group, or XCAL team, and the radiometer algorithm team to create unified precipitation retrievals from the GPM radiometer constellation were fully implemented into the current version 4 GPM precipitation products. These include precipitation estimates from a total of seven conical-scanning and six cross-track scanning radiometers as well as high spatial and temporal resolution global level 3 gridded products. Work is now underway to extend this unified constellation-based approach to the combined TRMM/GPM data record starting in late 1997. The goal is to create a long-term global precipitation dataset employing these state-of-the-art calibration and retrieval algorithm approaches. This new long-term global precipitation dataset will incorporate the physics provided by the combined GPM GMI and DPR sensors into the apriori database, extend prior TRMM constellation observations to high latitudes, and expand the available TRMM precipitation data to the full constellation of available conical and cross-track scanning radiometers. This combined TRMM/GPM precipitation data record will thus provide a high-quality high

  2. APLIKASI ONLINE PUBLIC ACCESS CATALOQUE (OPAC BERBASIS ANDROID SEBAGAI SARANA TEMU KEMBALI INFORMASI DI PERPUSTAKAAN UNIVERSITAS PENDIDIKAN GANESHA

    Directory of Open Access Journals (Sweden)

    Putu Tika Parmawati

    2016-08-01

    Full Text Available Abstrak Penelitian ini bertujuan untuk mengembangkan perangkat lunak aplikasi Online Public Access Cataloque (OPAC berbasis android. Jenis penelitian ini merupakan Research and Development (R & D dengan metode pengembangan menggunakan model prototyping. Pengembangan sistem informasi layanan audio visual berbasis video streaming dengan enam tahap, yaitu : 1 Tahap pengumpulan kebutuhan dan perbaikan, 2 Tahap perancangan desain cepat (desain awal, 3 Tahap membangun prototipe, 4 Tahap evaluasi prototype, 5 Tahap perbaikan prototype, dan 6 Tahap rekayasa produk. Penentuan tingkat kelayakan aplikasi Online Public Access Cataloque (OPAC berbasis android berdasarkan uji validasi ahli bidang teknologi informasi dan uji coba terbatas pada pengguna. Hasil uji coba sebagai berikut : 1 Pengembangan aplikasi Online Public Access Cataloque (OPAC berbasis android sudah sesuai dengan spesifikasi yang telah ditentukan sebagai aplikasi penelusuran informasi koleksi buku teks umum secara online melalui smartphone. 2 Indikator penilaian dari program ini adalah kebenaran atau ketepatan operasional sistem, ketegaran, keterluasan, keterpakaian ulang, efisiensi atau kinerja, portabilitas, integritas, modularitas, keterbacaan mendapat kualifikasi cukup baik, sedangkan verifikasi mendapat kualifikasi baik. 3 Secara umum dari hasil penilaian tersebut aplikasi OPAC berbasis android ini cukup layak untuk digunakan sebagai alternatif pelengkap pemberian layanan penelusuran informasi koleksi buku teks umum di Perpustakaan Undiksha. Kata Kunci: OPAC, android, dan temu kembali informasi Abstract Aim of this study to develop the software of Online Public Access Cataloque (OPAC based on android. Research and Development (R & D design was applied in this study which was developed through prototyping models. The software was constructed through six stages, namely: 1 needs analysis and repairment, 2 rapid design (preliminary design, 3 prototypes building, 4 prototype evaluation, 5

  3. Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets

    Directory of Open Access Journals (Sweden)

    Ohno Summer

    2011-08-01

    Full Text Available Abstract Background Verbal autopsy methods are critically important for evaluating the leading causes of death in populations without adequate vital registration systems. With a myriad of analytical and data collection approaches, it is essential to create a high quality validation dataset from different populations to evaluate comparative method performance and make recommendations for future verbal autopsy implementation. This study was undertaken to compile a set of strictly defined gold standard deaths for which verbal autopsies were collected to validate the accuracy of different methods of verbal autopsy cause of death assignment. Methods Data collection was implemented in six sites in four countries: Andhra Pradesh, India; Bohol, Philippines; Dar es Salaam, Tanzania; Mexico City, Mexico; Pemba Island, Tanzania; and Uttar Pradesh, India. The Population Health Metrics Research Consortium (PHMRC developed stringent diagnostic criteria including laboratory, pathology, and medical imaging findings to identify gold standard deaths in health facilities as well as an enhanced verbal autopsy instrument based on World Health Organization (WHO standards. A cause list was constructed based on the WHO Global Burden of Disease estimates of the leading causes of death, potential to identify unique signs and symptoms, and the likely existence of sufficient medical technology to ascertain gold standard cases. Blinded verbal autopsies were collected on all gold standard deaths. Results Over 12,000 verbal autopsies on deaths with gold standard diagnoses were collected (7,836 adults, 2,075 children, 1,629 neonates, and 1,002 stillbirths. Difficulties in finding sufficient cases to meet gold standard criteria as well as problems with misclassification for certain causes meant that the target list of causes for analysis was reduced to 34 for adults, 21 for children, and 10 for neonates, excluding stillbirths. To ensure strict independence for the validation of

  4. Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools

    DEFF Research Database (Denmark)

    Greenbaum, Jason A.; Andersen, Pernille; Blythe, Martin

    2007-01-01

    and immunology communities. Improving the accuracy of B-cell epitope prediction methods depends on a community consensus on the data and metrics utilized to develop and evaluate such tools. A workshop, sponsored by the National Institute of Allergy and Infectious Disease (NIAID), was recently held in Washington...

  5. Development of Regional Wind Resource and Wind Plant Output Datasets for the Hawaiian Islands

    Energy Technology Data Exchange (ETDEWEB)

    Manobianco, J.; Alonge, C.; Frank, J.; Brower, M.

    2010-07-01

    In March 2009, AWS Truepower was engaged by the National Renewable Energy Laboratory (NREL) to develop a set of wind resource and plant output data for the Hawaiian Islands. The objective of this project was to expand the methods and techniques employed in the Eastern Wind Integration and Transmission Study (EWITS) to include the state of Hawaii.

  6. Development of a Computer-Assisted Cranial Nerve Simulation from the Visible Human Dataset

    Science.gov (United States)

    Yeung, Jeffrey C.; Fung, Kevin; Wilson, Timothy D.

    2011-01-01

    Advancements in technology and personal computing have allowed for the development of novel teaching modalities such as online web-based modules. These modules are currently being incorporated into medical curricula and, in some paradigms, have been shown to be superior to classroom instruction. We believe that these modules have the potential of…

  7. Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

    Science.gov (United States)

    Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

    2016-11-28

    At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.

  8. Suppression of AC railway power-line interference in ECG signals recorded by public access defibrillators

    Directory of Open Access Journals (Sweden)

    Dotsinsky Ivan

    2005-11-01

    Full Text Available Abstract Background Public access defibrillators (PADs are now available for more efficient and rapid treatment of out-of-hospital sudden cardiac arrest. PADs are used normally by untrained people on the streets and in sports centers, airports, and other public areas. Therefore, automated detection of ventricular fibrillation, or its exclusion, is of high importance. A special case exists at railway stations, where electric power-line frequency interference is significant. Many countries, especially in Europe, use 16.7 Hz AC power, which introduces high level frequency-varying interference that may compromise fibrillation detection. Method Moving signal averaging is often used for 50/60 Hz interference suppression if its effect on the ECG spectrum has little importance (no morphological analysis is performed. This approach may be also applied to the railway situation, if the interference frequency is continuously detected so as to synchronize the analog-to-digital conversion (ADC for introducing variable inter-sample intervals. A better solution consists of rated ADC, software frequency measuring, internal irregular re-sampling according to the interference frequency, and a moving average over a constant sample number, followed by regular back re-sampling. Results The proposed method leads to a total railway interference cancellation, together with suppression of inherent noise, while the peak amplitudes of some sharp complexes are reduced. This reduction has negligible effect on accurate fibrillation detection. Conclusion The method is developed in the MATLAB environment and represents a useful tool for real time railway interference suppression.

  9. Suppression of AC railway power-line interference in ECG signals recorded by public access defibrillators.

    Science.gov (United States)

    Dotsinsky, Ivan

    2005-11-26

    Public access defibrillators (PADs) are now available for more efficient and rapid treatment of out-of-hospital sudden cardiac arrest. PADs are used normally by untrained people on the streets and in sports centers, airports, and other public areas. Therefore, automated detection of ventricular fibrillation, or its exclusion, is of high importance. A special case exists at railway stations, where electric power-line frequency interference is significant. Many countries, especially in Europe, use 16.7 Hz AC power, which introduces high level frequency-varying interference that may compromise fibrillation detection. Moving signal averaging is often used for 50/60 Hz interference suppression if its effect on the ECG spectrum has little importance (no morphological analysis is performed). This approach may be also applied to the railway situation, if the interference frequency is continuously detected so as to synchronize the analog-to-digital conversion (ADC) for introducing variable inter-sample intervals. A better solution consists of rated ADC, software frequency measuring, internal irregular re-sampling according to the interference frequency, and a moving average over a constant sample number, followed by regular back re-sampling. The proposed method leads to a total railway interference cancellation, together with suppression of inherent noise, while the peak amplitudes of some sharp complexes are reduced. This reduction has negligible effect on accurate fibrillation detection. The method is developed in the MATLAB environment and represents a useful tool for real time railway interference suppression.

  10. Development and Evaluation of a Methodology for the Generation of Gridded Isotopic Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Argiriou, A. A.; Salamalikis, V [University of Patras, Department of Physics, Laboratory of Atmospheric Physics, Patras (Greece); Lykoudis, S. P. [National Observatory of Athens, Institute of Environmental and Sustainable Development, Athens (Greece)

    2013-07-15

    The accurate knowledge of the spatial distribution of stable isotopes in precipitation is necessary for several applications. Since the number of rain sampling stations is small and unevenly distributed around the globe, the global distribution of stable isotopes can be calculated via the generation of gridded isotopic data sets. Several methods have been proposed for this purpose. In this work a methodology is proposed for the development of 10'x 10' gridded isotopic data from precipitation in the central and eastern Mediterranean. Statistical models are developed taking into account geographical and meteorological parameters as regressors. The residuals are interpolated onto the grid using ordinary kriging and thin plate splines. The result is added to the model grids, to obtain the final isotopic gridded data sets. Models are evaluated using an independent data set. the overall performance of the procedure is satisfactory and the obtained gridded data reproduce the isotopic parameters successfully. (author)

  11. Development and Evaluation of a Methodology for the Generation of Gridded Isotopic Datasets

    International Nuclear Information System (INIS)

    Argiriou, A.A.; Salamalikis, V; Lykoudis, S.P.

    2013-01-01

    The accurate knowledge of the spatial distribution of stable isotopes in precipitation is necessary for several applications. Since the number of rain sampling stations is small and unevenly distributed around the globe, the global distribution of stable isotopes can be calculated via the generation of gridded isotopic data sets. Several methods have been proposed for this purpose. In this work a methodology is proposed for the development of 10'x 10' gridded isotopic data from precipitation in the central and eastern Mediterranean. Statistical models are developed taking into account geographical and meteorological parameters as regressors. The residuals are interpolated onto the grid using ordinary kriging and thin plate splines. The result is added to the model grids, to obtain the final isotopic gridded data sets. Models are evaluated using an independent data set. the overall performance of the procedure is satisfactory and the obtained gridded data reproduce the isotopic parameters successfully. (author)

  12. Development of a browser application to foster research on linking climate and health datasets: Challenges and opportunities.

    Science.gov (United States)

    Hajat, Shakoor; Whitmore, Ceri; Sarran, Christophe; Haines, Andy; Golding, Brian; Gordon-Brown, Harriet; Kessel, Anthony; Fleming, Lora E

    2017-01-01

    Improved data linkages between diverse environment and health datasets have the potential to provide new insights into the health impacts of environmental exposures, including complex climate change processes. Initiatives that link and explore big data in the environment and health arenas are now being established. To encourage advances in this nascent field, this article documents the development of a web browser application to facilitate such future research, the challenges encountered to date, and how they were addressed. A 'storyboard approach' was used to aid the initial design and development of the application. The application followed a 3-tier architecture: a spatial database server for storing and querying data, server-side code for processing and running models, and client-side browser code for user interaction and for displaying data and results. The browser was validated by reproducing previously published results from a regression analysis of time-series datasets of daily mortality, air pollution and temperature in London. Data visualisation and analysis options of the application are presented. The main factors that shaped the development of the browser were: accessibility, open-source software, flexibility, efficiency, user-friendliness, licensing restrictions and data confidentiality, visualisation limitations, cost-effectiveness, and sustainability. Creating dedicated data and analysis resources, such as the one described here, will become an increasingly vital step in improving understanding of the complex interconnections between the environment and human health and wellbeing, whilst still ensuring appropriate confidentiality safeguards. The issues raised in this paper can inform the future development of similar tools by other researchers working in this field. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Regional-Scale Differential Time Tomography Methods: Development and Application to the Sichuan, China, Dataset

    Science.gov (United States)

    Zhang, H.; Thurber, C.; Wang, W.; Roecker, S. W.

    2008-12-01

    We extended our recent development of double-difference seismic tomography [Zhang and Thurber, BSSA, 2003] to the use of station-pair residual differences in addition to event-pair residual differences. Tomography using station- pair residual differences is somewhat akin to teleseismic tomography but with the sources contained within the model region. Synthetic tests show that the inversion using both event- and station-pair residual differences has advantages in terms of more accurately recovering higher-resolution structure in both the source and receiver regions. We used the Spherical-Earth Finite-Difference (SEFD) travel time calculation method in the tomographic system. The basic concept is the extension of a standard Cartesian FD travel time algorithm [Vidale, 1990] to the spherical case by developing a mesh in radius, co-latitude, and longitude, expressing the FD derivatives in a form appropriate to the spherical mesh, and constructing"stencil" to calculate extrapolated travel times. The SEFD travel time calculation method is more advantageous in dealing with heterogeneity and sphericity of the Earth than the simple Earth flattening transformation and the"sphere-in-a-bo" approach [Flanagan et al., 2007]. We applied this method to the Sichuan, China data set for the period of 2001 to 2004. The Vp, Vs and Vp/Vs models show that there is a clear contrast across the Longmenshan Fault, where the 2008 M8 Wenchuan earthquake initiated.

  14. New Insights into Passive Margin Development from a Global Deep Seismic Reflection Dataset

    Science.gov (United States)

    Bellingham, Paul; Pindell, James; Graham, Rod; Horn, Brian

    2014-05-01

    The kinematic and dynamic evolution of the world's passive margins is still poorly understood. Yet the need to replace reserves, a high oil price and advances in drilling technology have pushed the international oil and gas industry to explore in the deep and ultra-deep waters of the continental margins. To support this exploration and help understand these margins, ION-GXT has acquired, processed and interpreted BasinSPAN surveys across many of the world's passive margins. Observations from these data lead us to consider the modes of subsidence and uplift at both volcanic and non-volcanic margins. At non-volcanic margins, it appears that frequently much of the subsidence post-dates major rifting and is not thermal in origin. Rather the subsidence is associated with extensional displacement on a major fault or shear zone running at least as deep as the continental Moho. We believe that the subsidence is structural and is probably associated with the pinching out (boudinage) of the Lower Crust so that the Upper crust effectively collapses onto the mantle. Eventually this will lead to the exhumation of the sub-continental mantle at the sea bed. Volcanic margins present more complex challenges both in terms of imaging and interpretation. The addition of volcanic and plutonic material into the system and dynamic effects all impact subsidence and uplift. However, we will show some fundamental observations regarding the kinematic development of volcanic margins and especially SDRs which demonstate that the process of collapse and the development of shear zones within and below the crust are also in existence at this type of margin. A model is presented of 'magma welds' whereby packages of SDRs collapse onto an emerging sub-crustal shear zone and it is this collapse which creates the commonly observed SDR geometry. Examples will be shown from East India, Newfoundland, Brazil, Argentina and the Gulf of Mexico.

  15. Dataset exploited for the development and validation of automated cyanobacteria quantification algorithm, ACQUA

    Directory of Open Access Journals (Sweden)

    Emanuele Gandola

    2016-09-01

    Full Text Available The estimation and quantification of potentially toxic cyanobacteria in lakes and reservoirs are often used as a proxy of risk for water intended for human consumption and recreational activities. Here, we present data sets collected from three volcanic Italian lakes (Albano, Vico, Nemi that present filamentous cyanobacteria strains at different environments. Presented data sets were used to estimate abundance and morphometric characteristics of potentially toxic cyanobacteria comparing manual Vs. automated estimation performed by ACQUA (“ACQUA: Automated Cyanobacterial Quantification Algorithm for toxic filamentous genera using spline curves, pattern recognition and machine learning” (Gandola et al., 2016 [1]. This strategy was used to assess the algorithm performance and to set up the denoising algorithm. Abundance and total length estimations were used for software development, to this aim we evaluated the efficiency of statistical tools and mathematical algorithms, here described. The image convolution with the Sobel filter has been chosen to denoise input images from background signals, then spline curves and least square method were used to parameterize detected filaments and to recombine crossing and interrupted sections aimed at performing precise abundances estimations and morphometric measurements. Keywords: Comparing data, Filamentous cyanobacteria, Algorithm, Deoising, Natural sample

  16. Outer Continental Shelf Stratigraphic Development and Sand Resource Potential: Integration of New and Legacy Geologic Datasets

    Science.gov (United States)

    Hughes, M.; Harris, S.; Luciano, K. E.; Alexander, C. R., Jr.

    2017-12-01

    glacial maxima and portions of the marine transgression. Further research will be conducted once the surveys are compiled and merged with previous data sets to build a more complete understanding of the paleolandscapes, shelf development, and sand resources in the South Carolina and Georgia OCS.

  17. Comparison and Evaluation of End-User Interfaces for Online Public Access Catalogs.

    Science.gov (United States)

    Zumer, Maja

    End-user interfaces for the online public access catalogs (OPACs) of OhioLINK, a system linking major university and research libraries in Ohio, and its 16 member libraries, accessible through the Internet, are compared and evaluated from the user-oriented perspective. A common, systematic framework was used for the scientific observation of the…

  18. Pilot Test of the Online Public Access Catalog Project's User and Nonuser Questionnaires. Final Report.

    Science.gov (United States)

    Markey, Karen

    This report describes the pilot data collections and post-questionnaire interview activities of the Council on Library Resources (CLR)/Online Computer Library Center (OCLC) Online Public Access Project. The background of the project is briefly described, the purpose and adminstration of the post-questionnaire interviews are outlined, and pilot…

  19. Towards a Methodology for the Design of Multimedia Public Access Interfaces.

    Science.gov (United States)

    Rowley, Jennifer

    1998-01-01

    Discussion of information systems methodologies that can contribute to interface design for public access systems covers: the systems life cycle; advantages of adopting information systems methodologies; soft systems methodologies; task-oriented approaches to user interface design; holistic design, the Star model, and prototyping; the…

  20. 48 CFR 504.602-71 - Federal Procurement Data System-Public access to data.

    Science.gov (United States)

    2010-10-01

    ... Government procurement to the public. (b) Fee for direct hook-up. To the extent that a member of the public... 48 Federal Acquisition Regulations System 4 2010-10-01 2010-10-01 false Federal Procurement Data... Procurement Data System—Public access to data. (a) The FPDS database. The General Services Administration...

  1. Computer self efficacy as correlate of on-line public access ...

    African Journals Online (AJOL)

    The use of Online Public Access Catalogue (OPAC) by students has a lot of advantages and computer self-efficacy is a factor that could determine its effective utilization. Little appears to be known about colleges of education students‟ use of OPAC, computer self-efficacy and the relationship between OPAC and computer ...

  2. Online Public Access Catalog User Studies: A Review of Research Methodologies, March 1986-November 1989.

    Science.gov (United States)

    Seymour, Sharon

    1991-01-01

    Review of research methodologies used in studies of online public access catalog (OPAC) users finds that a variety of research methodologies--e.g., surveys, transaction log analysis, interviews--have been used with varying degrees of expertise. It is concluded that poor research methodology resulting from limited training and resources limits the…

  3. Increasing Access to Archival Records in Library Online Public Access Catalogs.

    Science.gov (United States)

    Gilmore, Matthew B.

    1988-01-01

    Looks at the use of online public access catalogs, the utility of subject and call-number searching, and possible archival applications. The Wallace Archives at the Claremont Colleges is used as an example of the availability of bibliographic descriptions of multiformat archival materials through the library catalog. Sample records and searches…

  4. A Comparison of Keyword Subject Searching on Six British University OPACs Online Public Access Catalogs.

    Science.gov (United States)

    Aanonson, John

    1987-01-01

    Compares features of online public access catalogs (OPACs) at six British universities: (1) Cambridge; (2) Hull; (3) Newcastle; (4) Surrey; (5) Sussex; and (6) York. Results of keyword subject searches on two topics performed on each of the OPACs are reported and compared. Six references are listed. (MES)

  5. The Distribution of Information: The Role for Online Public Access Catalogs.

    Science.gov (United States)

    Matthews, Joseph R.

    1994-01-01

    Describes the Online Public Access Catalog (OPAC) and the inclusion of abstracting and indexing industry databases in OPACs. Topics addressed include the implications of including abstracting and indexing tape and CD-ROM products in OPACs; the need for standards allowing library systems to communicate with dissimilar CD-ROM products; and computer,…

  6. The Online Public Access Catalogue at the Cite des Sciences Mediatheque in Paris.

    Science.gov (United States)

    Witt, Maria

    1990-01-01

    Provides background on the holdings, services, and layout of the mediatheque (multimedia library) at the Cite des Sciences et de l'Industrie (originally the Museum of Science, Technology, and Industry) in Paris. The library's online public access catalog and use of the catalog by children and the visually handicapped are described. (four…

  7. Penerapan Bahasa Alami Sederhana pada Online Public Access Catalog (Opac) Berbasis Web Semantik

    OpenAIRE

    Andri, Andri

    2012-01-01

    Online Public Access Catalog (OPAC) merupakan sistem katalog online yang memanfaatkan teknologi komputer dan internet sebagai media pengaksesan dan penyimpanan datanya. Sebuah katalog biasanya memberikan informasi mengenai koleksi yang disimpan dalam sebuah perpustakaan digital. Dalam penelitian ini akan dibuat sebuah prototipe aplikasi pencarian pada katalog online di perpustakaan Universitas Binadarma Palembang berbasis teknologi web semantik serta menerapkan pengolahan bahasa alami sederha...

  8. User Problems with Access to Fictional Characters and Personal Names in Online Public Access Catalogs.

    Science.gov (United States)

    Yee, Martha M.; Soto, Raymond

    1991-01-01

    Describes a survey of reference librarians in libraries with online public access catalogs that was conducted to determine what types of searches patrons would use to look for names of fictional characters. Name, subject, and author indexes are discussed, and implications for cataloging using the MARC format are suggested. (10 references) (LRW)

  9. System Design and Cataloging Meet the User: User Interfaces to Online Public Access Catalogs.

    Science.gov (United States)

    Yee, Martha M.

    1991-01-01

    Discusses features of online public access catalogs: (1) demonstration of relationships between records; (2) provision of entry vocabularies; (3) arrangement of multiple entries on the screen; (4) provision of access points; (5) display of single records; and (6) division of catalogs into separate files or indexes. User studies and other research…

  10. Microcomputer Database Management Systems that Interface with Online Public Access Catalogs.

    Science.gov (United States)

    Rice, James

    1988-01-01

    Describes a study that assessed the availability and use of microcomputer database management interfaces to online public access catalogs. The software capabilities needed to effect such an interface are identified, and available software packages are evaluated by these criteria. A directory of software vendors is provided. (4 notes with…

  11. User Practices in Keyword and Boolean Searching on an Online Public Access Catalog.

    Science.gov (United States)

    Ensor, Pat

    1992-01-01

    Discussion of keyword and Boolean searching techniques in online public access catalogs (OPACs) focuses on a study conducted at Indiana State University that examined users' attitudes toward searching on NOTIS (Northwestern Online Total Integrated System). Relevant literature is reviewed, and implications for library instruction are suggested. (17…

  12. Online Public Access Catalog: The Google Maps of the Library World

    Science.gov (United States)

    Bailey, Kieren

    2011-01-01

    What do Google Maps and a library's Online Public Access Catalog (OPAC) have in common? Google Maps provides users with all the information they need for a trip in one place; users can get directions and find out what attractions, hotels, and restaurants are close by. Librarians must find the ultimate OPAC that will provide, in one place, all the…

  13. Information Resources on Online Public Access Catalogs. A Selected ERIC Bibliography.

    Science.gov (United States)

    ERIC Clearinghouse on Information Resources, Syracuse, NY.

    Sixteen articles, books, and reports published between 1978 and 1983 and cited in "Resources in Education" and "Current Index to Journals in Education" are listed in this bibliography on online public access catalogs (OPACs). Emphasis is on the movement toward computer-based alternatives to library card catalogs and user…

  14. The Searching Behavior of Remote Users: A Study of One Online Public Access Catalog (OPAC).

    Science.gov (United States)

    Kalin, Sally W.

    1991-01-01

    Describes a study that was conducted to determine whether the searching behavior of remote users of LIAS (Library Information Access System), Pennsylvania State University's online public access catalog (OPAC), differed from those using the OPAC within the library. Differences in search strategies and in user satisfaction are discussed. (eight…

  15. EPA's Public Access Website Children’s Privacy and Copyright Issues

    Science.gov (United States)

    This document establishes the policy for protecting the privacy of children on EPA’s Public Access Web site. It concerns the collection, both online and off, of information from ages 13 and under, and the display of Personally Identifying Information (PII)

  16. Classification of forest development stages from national low-density lidar datasets: a comparison of machine learning methods

    Directory of Open Access Journals (Sweden)

    R. Valbuena

    2016-02-01

    Full Text Available The area-based method has become a widespread approach in airborne laser scanning (ALS, being mainly employed for the estimation of continuous variables describing forest attributes: biomass, volume, density, etc. However, to date, classification methods based on machine learning, which are fairly common in other remote sensing fields, such as land use / land cover classification using multispectral sensors, have been largely overseen in forestry applications of ALS. In this article, we wish to draw the attention on statistical methods predicting discrete responses, for supervised classification of ALS datasets. A wide spectrum of approaches are reviewed: discriminant analysis (DA using various classifiers –maximum likelihood, minimum volume ellipsoid, naïve Bayes–, support vector machine (SVM, artificial neural networks (ANN, random forest (RF and nearest neighbour (NN methods. They are compared in the context of a classification of forest areas into development classes (DC used in practical silvicultural management in Finland, using their low-density national ALS dataset. We observed that RF and NN had the most balanced error matrices, with cross-validated predictions which were mainly unbiased for all DCs. Although overall accuracies were higher for SVM and ANN, their results were very dissimilar across DCs, and they can therefore be only advantageous if certain DCs are targeted. DA methods underperformed in comparison to other alternatives, and were only advantageous for the detection of seedling stands. These results show that, besides the well demonstrated capacity of ALS for quantifying forest stocks, there is a great deal of potential for predicting categorical variables in general, and forest types in particular. In conclusion, we consider that the presented methodology shall also be adapted to the type of forest classes that can be relevant to Mediterranean ecosystems, opening a range of possibilities for future research, in which

  17. Development and Assessment of the Sand Dust Prediction Model by Utilizing Microwave-Based Satellite Soil Moisture and Reanalysis Datasets in East Asian Desert Areas

    Directory of Open Access Journals (Sweden)

    Hyunglok Kim

    2017-01-01

    Full Text Available For several decades, satellite-based microwave sensors have provided valuable soil moisture monitoring in various surface conditions. We have first developed a modeled aerosol optical depth (AOD dataset by utilizing Soil Moisture and Ocean Salinity (SMOS, Advanced Microwave Scanning Radiometer 2 (AMSR2, and the Global Land Data Assimilation System (GLDAS soil moisture datasets in order to estimate dust outbreaks over desert areas of East Asia. Moderate Resolution Imaging Spectroradiometer- (MODIS- based AOD products were used as reference datasets to validate the modeled AOD (MA. The SMOS-based MA (SMOS-MA dataset showed good correspondence with observed AOD (R-value: 0.56 compared to AMSR2- and GLDAS-based MA datasets, and it overestimated AOD compared to observed AOD. The AMSR2-based MA dataset was found to underestimate AOD, and it showed a relatively low R-value (0.35 with respect to observed AOD. Furthermore, SMOS-MA products were able to simulate the short-term AOD trends, having a high R-value (0.65. The results of this study may allow us to acknowledge the utilization of microwave-based soil moisture datasets for investigation of near-real time dust outbreak predictions and short-term dust outbreak trend analysis.

  18. Public access of environmental information. Report of an Interdepartmental Working Party on public access to information held by Pollution Control Authorities

    International Nuclear Information System (INIS)

    1986-01-01

    The working party was set up to report to the Government ways of implementing the recommendations of the Royal Commission on Environmental Pollution that 'there should be a presumption in favour of unrestricted public access to the information which the pollution control authorities obtain or receive by virtue of their statutory powers'. Chapter 6 deals with Radioactive wastes. The present situation (eg on how the information is gathered, which department or bodies are involved etc) and the current state of the law are discussed. Licensed nuclear sites, sea disposal, inspections and defence wastes are all considered briefly. The case for improving public access to information and, recommendations on how to achieve this made, and the resource implications considered. On control of radioactive wastes there is currently no power for the responsible Government Departments to make information public. It is recommended that new legislation should confer powers to make information available, including a power to require public registers to be kept at prescribed places giving information related to certificates issued under the Radioactive Substances Act 1960. (UK)

  19. The Purdue Agro-climatic (PAC dataset for the U.S. Corn Belt: Development and initial results

    Directory of Open Access Journals (Sweden)

    Xing Liu

    2017-01-01

    This high-resolution dataset is available to the wider community, and can fill gaps in observed data records and increase accessibility for the agricultural sector, and for conducting variety of if-then assessments.

  20. 40 CFR 1400.3 - Public access to paper copies of off-site consequence analysis information.

    Science.gov (United States)

    2010-07-01

    ...-site consequence analysis information. 1400.3 Section 1400.3 Protection of Environment ENVIRONMENTAL... PROGRAMS UNDER THE CLEAN AIR ACT SECTION 112(r)(7); DISTRIBUTION OF OFF-SITE CONSEQUENCE ANALYSIS INFORMATION DISTRIBUTION OF OFF-SITE CONSEQUENCE ANALYSIS INFORMATION Public Access § 1400.3 Public access to...

  1. 41 CFR 102-79.65 - May Executive agencies outlease space on major public access levels, courtyards and rooftops of...

    Science.gov (United States)

    2010-07-01

    ... 41 Public Contracts and Property Management 3 2010-07-01 2010-07-01 false May Executive agencies outlease space on major public access levels, courtyards and rooftops of public buildings? 102-79.65... Utilization of Space Outleasing § 102-79.65 May Executive agencies outlease space on major public access...

  2. The right of public access to legal information : A proposal for its universal recognition as a human right

    NARCIS (Netherlands)

    Mitee, Leesi Ebenezer

    2017-01-01

    Abstract: This Article examines the desirability of the universal recognition of the right of public access to legal information as a human right and therefore as part of a legal framework for improving national and global access to legal information. It discusses the right of public access to legal

  3. Developing a minimum dataset for nursing team leader handover in the intensive care unit: A focus group study.

    Science.gov (United States)

    Spooner, Amy J; Aitken, Leanne M; Corley, Amanda; Chaboyer, Wendy

    2018-01-01

    Despite increasing demand for structured processes to guide clinical handover, nursing handover tools are limited in the intensive care unit. The study aim was to identify key items to include in a minimum dataset for intensive care nursing team leader shift-to-shift handover. This focus group study was conducted in a 21-bed medical/surgical intensive care unit in Australia. Senior registered nurses involved in team leader handovers were recruited. Focus groups were conducted using a nominal group technique to generate and prioritise minimum dataset items. Nurses were presented with content from previous team leader handovers and asked to select which content items to include in a minimum dataset. Participant responses were summarised as frequencies and percentages. Seventeen senior nurses participated in three focus groups. Participants agreed that ISBAR (Identify-Situation-Background-Assessment-Recommendations) was a useful tool to guide clinical handover. Items recommended to be included in the minimum dataset (≥65% agreement) included Identify (name, age, days in intensive care), Situation (diagnosis, surgical procedure), Background (significant event(s), management of significant event(s)) and Recommendations (patient plan for next shift, tasks to follow up for next shift). Overall, 30 of the 67 (45%) items in the Assessment category were considered important to include in the minimum dataset and focused on relevant observations and treatment within each body system. Other non-ISBAR items considered important to include related to the ICU (admissions to ICU, staffing/skill mix, theatre cases) and patients (infectious status, site of infection, end of life plan). Items were further categorised into those to include in all handovers and those to discuss only when relevant to the patient. The findings suggest a minimum dataset for intensive care nursing team leader shift-to-shift handover should contain items within ISBAR along with unit and patient specific

  4. Examining the Impact of the National Institutes of Health Public Access Policy on the Citation Rates of Journal Articles

    Science.gov (United States)

    De Groote, Sandra L.; Shultz, Mary; Smalheiser, Neil R.

    2015-01-01

    Purpose To examine whether National Institutes of Health (NIH) funded articles that were archived in PubMed Central (PMC) after the release of the 2008 NIH Public Access Policy show greater scholarly impact than comparable articles not archived in PMC. Methods A list of journals across several subject areas was developed from which to collect article citation data. Citation information and cited reference counts of the articles published in 2006 and 2009 from 122 journals were obtained from the Scopus database. The articles were separated into categories of NIH funded, non-NIH funded and whether they were deposited in PubMed Central. An analysis of citation data across a five-year timespan was performed on this set of articles. Results A total of 45,716 articles were examined, including 7,960 with NIH-funding. An analysis of the number of times these articles were cited found that NIH-funded 2006 articles in PMC were not cited significantly more than NIH-funded non-PMC articles. However, 2009 NIH funded articles in PMC were cited 26% more than 2009 NIH funded articles not in PMC, 5 years after publication. This result is highly significant even after controlling for journal (as a proxy of article quality and topic). Conclusion Our analysis suggests that factors occurring between 2006 and 2009 produced a subsequent boost in scholarly impact of PubMed Central. The 2008 Public Access Policy is likely to be one such factor, but others may have contributed as well (e.g., growing size and visibility of PMC, increasing availability of full-text linkouts from PubMed, and indexing of PMC articles by Google Scholar). PMID:26448551

  5. Examining the Impact of the National Institutes of Health Public Access Policy on the Citation Rates of Journal Articles.

    Science.gov (United States)

    De Groote, Sandra L; Shultz, Mary; Smalheiser, Neil R

    2015-01-01

    To examine whether National Institutes of Health (NIH) funded articles that were archived in PubMed Central (PMC) after the release of the 2008 NIH Public Access Policy show greater scholarly impact than comparable articles not archived in PMC. A list of journals across several subject areas was developed from which to collect article citation data. Citation information and cited reference counts of the articles published in 2006 and 2009 from 122 journals were obtained from the Scopus database. The articles were separated into categories of NIH funded, non-NIH funded and whether they were deposited in PubMed Central. An analysis of citation data across a five-year timespan was performed on this set of articles. A total of 45,716 articles were examined, including 7,960 with NIH-funding. An analysis of the number of times these articles were cited found that NIH-funded 2006 articles in PMC were not cited significantly more than NIH-funded non-PMC articles. However, 2009 NIH funded articles in PMC were cited 26% more than 2009 NIH funded articles not in PMC, 5 years after publication. This result is highly significant even after controlling for journal (as a proxy of article quality and topic). Our analysis suggests that factors occurring between 2006 and 2009 produced a subsequent boost in scholarly impact of PubMed Central. The 2008 Public Access Policy is likely to be one such factor, but others may have contributed as well (e.g., growing size and visibility of PMC, increasing availability of full-text linkouts from PubMed, and indexing of PMC articles by Google Scholar).

  6. Use and benefits of public access defibrillation in a nation-wide network

    DEFF Research Database (Denmark)

    Nielsen, Anne Møller; Folke, Fredrik; Lippert, Freddy Knudsen

    2013-01-01

    BACKGROUND: Automated External Defibrillators (AEDs) are known to increase survival after out-of-hospital cardiac arrest (OHCA). The aim of this study was to examine the use and benefit of public-access defibrillation (PAD) in a nation-wide network. We primarily sought to assess survival at 1 month...... to exercise (42% vs. 0%), and with improved 30-day survival (69% vs. 15%, p=0.001). Among those presenting with a shockable rhythm, 20 (65%) had Return of Spontaneous Circulation upon arrival of EMS and 8 (26%) were conscious, which emphasizes the diagnostic value of ECG downloads from AEDs. Survival could...

  7. Open University Learning Analytics dataset.

    Science.gov (United States)

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-28

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  8. Do PEV Drivers Park Near Publicly Accessible EVSE in San Diego but Not Use Them?

    Energy Technology Data Exchange (ETDEWEB)

    Francfort, James Edward [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2015-06-01

    The PEV charging stations deployed as part of The EV Project included both residential and non-residential sites. Non-residential sites included EVSE installed in workplace environments, fleet applications and those that were publicly accessible near retail centers, parking lots, and similar locations. The EV Project utilized its Micro-Climate® planning process to determine potential sites for publicly accessible EVSE in San Diego. This process worked with local stakeholders to target EVSE deployment near areas where significant PEV traffic and parking was expected. This planning process is described in The Micro-Climate deployment Process in San Diego1. The EV Project issued its deployment plan for San Diego in November 2010, prior to the sale of PEVs by Nissan and Chevrolet. The Project deployed residential EVSE concurrent with vehicle delivery starting in December 2010. The installation of non-residential EVSE commenced in April 2011 consistent with the original Project schedule, closely following the adoption of PEVs. The residential participation portion of The EV Project was fully subscribed by January 2013 and the non-residential EVSE deployment was essentially completed by August 2013.

  9. EPA Nanorelease Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....

  10. Development and validation of a national data registry for midwife-led births: the Midwives Alliance of North America Statistics Project 2.0 dataset.

    Science.gov (United States)

    Cheyney, Melissa; Bovbjerg, Marit; Everson, Courtney; Gordon, Wendy; Hannibal, Darcy; Vedam, Saraswathi

    2014-01-01

    In 2004, the Midwives Alliance of North America's (MANA's) Division of Research developed a Web-based data collection system to gather information on the practices and outcomes associated with midwife-led births in the United States. This system, called the MANA Statistics Project (MANA Stats), grew out of a widely acknowledged need for more reliable data on outcomes by intended place of birth. This article describes the history and development of the MANA Stats birth registry and provides an analysis of the 2.0 dataset's content, strengths, and limitations. Data collection and review procedures for the MANA Stats 2.0 dataset are described, along with methods for the assessment of data accuracy. We calculated descriptive statistics for client demographics and contributing midwife credentials, and assessed the quality of data by calculating point estimates, 95% confidence intervals, and kappa statistics for key outcomes on pre- and postreview samples of records. The MANA Stats 2.0 dataset (2004-2009) contains 24,848 courses of care, 20,893 of which are for women who planned a home or birth center birth at the onset of labor. The majority of these records were planned home births (81%). Births were attended primarily by certified professional midwives (73%), and clients were largely white (92%), married (87%), and college-educated (49%). Data quality analyses of 9932 records revealed no differences between pre- and postreviewed samples for 7 key benchmarking variables (kappa, 0.98-1.00). The MANA Stats 2.0 data were accurately entered by participants; any errors in this dataset are likely random and not systematic. The primary limitation of the 2.0 dataset is that the sample was captured through voluntary participation; thus, it may not accurately reflect population-based outcomes. The dataset's primary strength is that it will allow for the examination of research questions on normal physiologic birth and midwife-led birth outcomes by intended place of birth.

  11. Environmental Assessment for Authorizing the Puerto Rico Electric Power Authority (PREPA) to allow Public Access to the Boiling Nuclear Superheat (BONUS) Reactor Building, Rincon, Puerto Rico

    International Nuclear Information System (INIS)

    2003-01-01

    The U.S. Department of Energy (DOE) proposes to consent to a proposal by the Puerto Rico Electric Power Authority (PREPA) to allow public access to the Boiling Nuclear Superheat (BONUS) reactor building located near Rincon, Puerto Rico for use as a museum. PREPA, the owner of the BONUS facility, has determined that the historical significance of this facility, as one of only two reactors of this design ever constructed in the world, warrants preservation in a museum, and that this museum would provide economic benefits to the local community through increased tourism. Therefore, PREPA is proposing development of the BONUS facility as a museum

  12. Conservation and Recreation Lands with Public Access in the State of Iowa

    Data.gov (United States)

    Iowa State University GIS Support and Research Facility — This dataset represents conservation and recreation lands in the state of Iowa. Boundaries of areas represent differences in ownership and managing agency of the...

  13. The SunCloud project: An initiative for a development of a worldwide sunshine duration and cloudiness observations dataset

    Science.gov (United States)

    Sanchez-Lorenzo, A.

    2010-09-01

    One problem encountered when establishing the causes of global dimming and brightening is the limited number of long-term solar radiation series with accurate and calibrated measurements. For this reason, the analysis is often supported and extended with the use of other climatic variables such as sunshine duration and cloud cover. Specifically, sunshine duration is defined as the amount of time usually expressed in hours that direct solar radiation exceeds a certain threshold (usually taken at 120 W m-2). Consequently, this variable can be considered as an excellent proxy measure of solar radiation at interannual and decadal time scales, with the advantage that measurements of this variable were initiated in the late 19th century in different, worldwide, main meteorological stations. Nevertheless, detailed and up-to-date analysis of sunshine duration behavior on global or hemispheric scales are still missing. Thus, starting on September 2010 in the framework of different research projects, we will engage a worldwide compilation of the longest daily or monthly sunshine duration series from the late 19th century until present. Several quality control checks and homogenization methods will be applied to the generated sunshine dataset. The relationship between the more precise downward solar radiation series from the Global Energy Balance Archive (GEBA) and the homogenized sunshine series will be studied in order to reconstruct global and regional solar irradiance at the Earth's surface since the late 19th century. Since clouds are the main cause of interannual and decadal variability of radiation reaching the Earth's surface, as a complement to the long-term sunshine series we will also compile worldwide surface cloudiness observations. With this presentation we seek to encourage the climate community to contribute with their own local datasets to the SunCloud project. The SunCloud Team: M. Wild, Institute for Atmospheric and Climate Science, ETH Zurich, Switzerland

  14. Barriers and facilitators to public access defibrillation in out-of-hospital cardiac arrest: a systematic review.

    Science.gov (United States)

    Smith, Christopher M; Lim Choi Keung, Sarah N; Khan, Mohammed O; Arvanitis, Theodoros N; Fothergill, Rachael; Hartley-Sharpe, Christopher; Wilson, Mark H; Perkins, Gavin D

    2017-10-01

    Public access defibrillation initiatives make automated external defibrillators available to the public. This facilitates earlier defibrillation of out-of-hospital cardiac arrest victims and could save many lives. It is currently only used for a minority of cases. The aim of this systematic review was to identify barriers and facilitators to public access defibrillation. A comprehensive literature review was undertaken defining formal search terms for a systematic review of the literature in March 2017. Studies were included if they considered reasons affecting the likelihood of public access defibrillation and presented original data. An electronic search strategy was devised searching MEDLINE and EMBASE, supplemented by bibliography and related-article searches. Given the low-quality and observational nature of the majority of articles, a narrative review was performed. Sixty-four articles were identified in the initial literature search. An additional four unique articles were identified from the electronic search strategies. The following themes were identified related to public access defibrillation: knowledge and awareness; willingness to use; acquisition and maintenance; availability and accessibility; training issues; registration and regulation; medicolegal issues; emergency medical services dispatch-assisted use of automated external defibrillators; automated external defibrillator-locator systems; demographic factors; other behavioural factors. In conclusion, several barriers and facilitators to public access defibrillation deployment were identified. However, the evidence is of very low quality and there is not enough information to inform changes in practice. This is an area in urgent need of further high-quality research if public access defibrillation is to be increased and more lives saved. PROSPERO registration number CRD42016035543. Published on behalf of the European Society of Cardiology. All rights reserved. © The Author 2017. For permissions

  15. Creation of the Naturalistic Engagement in Secondary Tasks (NEST) distracted driving dataset.

    Science.gov (United States)

    Owens, Justin M; Angell, Linda; Hankey, Jonathan M; Foley, James; Ebe, Kazutoshi

    2015-09-01

    Distracted driving has become a topic of critical importance to driving safety research over the past several decades. Naturalistic driving data offer a unique opportunity to study how drivers engage with secondary tasks in real-world driving; however, the complexities involved with identifying and coding relevant epochs of naturalistic data have limited its accessibility to the general research community. This project was developed to help address this problem by creating an accessible dataset of driver behavior and situational factors observed during distraction-related safety-critical events and baseline driving epochs, using the Strategic Highway Research Program 2 (SHRP2) naturalistic dataset. The new NEST (Naturalistic Engagement in Secondary Tasks) dataset was created using crashes and near-crashes from the SHRP2 dataset that were identified as including secondary task engagement as a potential contributing factor. Data coding included frame-by-frame video analysis of secondary task and hands-on-wheel activity, as well as summary event information. In addition, information about each secondary task engagement within the trip prior to the crash/near-crash was coded at a higher level. Data were also coded for four baseline epochs and trips per safety-critical event. 1,180 events and baseline epochs were coded, and a dataset was constructed. The project team is currently working to determine the most useful way to allow broad public access to the dataset. We anticipate that the NEST dataset will be extraordinarily useful in allowing qualified researchers access to timely, real-world data concerning how drivers interact with secondary tasks during safety-critical events and baseline driving. The coded dataset developed for this project will allow future researchers to have access to detailed data on driver secondary task engagement in the real world. It will be useful for standalone research, as well as for integration with additional SHRP2 data to enable the

  16. The EV Project Price/Fee Models for Publicly Accessible Charging

    Energy Technology Data Exchange (ETDEWEB)

    Francfort, James Edward [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2015-12-01

    As plug-in electric vehicles (PEVs) are introduced to the market place and gain more consumer acceptance, it is important for a robust and self-sustaining non-residential infrastructure of electric vehicle supply equipment (EVSE) to be established to meet the needs of PEV drivers. While federal and state financial incentives for electric vehicles were in place and remain so today, future incentives are uncertain. In order for PEVs to achieve mainstream adoption, an adequate and sustainable commercial or publicly available charging infrastructure was pursued by The EV Project to encourage increased PEV purchases by alleviating range anxiety, and by removing adoption barriers for consumers without a dedicated overnight parking location to provide a home-base charger. This included determining a business model for publicly accessible charge infrastructure. To establish this business model, The EV Project team created a fee for charge model along with various ancillary offerings related to charging that would generate revenue. And after placing chargers in the field the Project rolled out this fee structure.

  17. IPCC Socio-Economic Baseline Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...

  18. Public access to New Hampshire state waters: a comparison of three cohorts of residents across three distinct geographic

    Science.gov (United States)

    Kim Pawlawski; Robert A. Robertson; Laura Pfister

    2003-01-01

    This study was intended to provide New Hampshire agencies with a better understanding of public access-related demand information. Through an analysis of three groups of New Hampshire residents based upon geographic location and length of residency, important issues and attitudes were identified from all over the State. The results of this study will assist in policy-...

  19. Enhancing public access to legal information : A proposal for a new official legal information generic top-level domain

    NARCIS (Netherlands)

    Mitee, Leesi Ebenezer

    2017-01-01

    Abstract: This article examines the use of a new legal information generic Top-Level Domain (gTLD) as a viable tool for easy identification of official legal information websites (OLIWs) and enhancing global public access to their resources. This intervention is necessary because of the existence of

  20. User-Based Information Retrieval System Interface Evaluation: An Examination of an On-Line Public Access Catalog.

    Science.gov (United States)

    Hert, Carol A.; Nilan, Michael S.

    1991-01-01

    Presents preliminary data that characterizes the relationship between what users say they are trying to accomplish when using an online public access catalog (OPAC) and their perceptions of what input to give the system. Human-machine interaction is discussed, and appropriate methods for evaluating information retrieval systems are considered. (18…

  1. Children, Technology, and Instruction: A Case Study of Elementary School Children Using an Online Public Access Catalog (OPAC).

    Science.gov (United States)

    Solomon, Paul

    1994-01-01

    Examines elementary school students' use of an online public access catalog to investigate the interaction between children, technology, curriculum, instruction, and learning. Highlights include patterns of successes and breakdowns; search strategies; instructional approaches and childrens' interests; structure of interaction; search terms; and…

  2. Terminal Ailments Need Not Be Fatal: A Speculative Assessment of the Impact of Online Public Access Catalogs in Academic Settings.

    Science.gov (United States)

    Sandler, Mark

    1985-01-01

    Discusses several concerns about nature of online public access catalogs (OPAC) that have particular import to reference librarians: user passivity and loss of control growing out of "human-machine interface" and the larger social context; and the tendency of computerized bibliographic systems to obfuscate human origins of library…

  3. Development of pharmacophore similarity-based quantitative activity hypothesis and its applicability domain: applied on a diverse data-set of HIV-1 integrase inhibitors.

    Science.gov (United States)

    Kumar, Sivakumar Prasanth; Jasrai, Yogesh T; Mehta, Vijay P; Pandya, Himanshu A

    2015-01-01

    Quantitative pharmacophore hypothesis combines the 3D spatial arrangement of pharmacophore features with biological activities of the ligand data-set and predicts the activities of geometrically and/or pharmacophoric similar ligands. Most pharmacophore discovery programs face difficulties in conformational flexibility, molecular alignment, pharmacophore features sampling, and feature selection to score models if the data-set constitutes diverse ligands. Towards this focus, we describe a ligand-based computational procedure to introduce flexibility in aligning the small molecules and generating a pharmacophore hypothesis without geometrical constraints to define pharmacophore space, enriched with chemical features necessary to elucidate common pharmacophore hypotheses (CPHs). Maximal common substructure (MCS)-based alignment method was adopted to guide the alignment of carbon molecules, deciphered the MCS atom connectivity to cluster molecules in bins and subsequently, calculated the pharmacophore similarity matrix with the bin-specific reference molecules. After alignment, the carbon molecules were enriched with original atoms in their respective positions and conventional pharmacophore features were perceived. Distance-based pharmacophoric descriptors were enumerated by computing the interdistance between perceived features and MCS-aligned 'centroid' position. The descriptor set and biological activities were used to develop support vector machine models to predict the activities of the external test set. Finally, fitness score was estimated based on pharmacophore similarity with its bin-specific reference molecules to recognize the best and poor alignments and, also with each reference molecule to predict outliers of the quantitative hypothesis model. We applied this procedure to a diverse data-set of 40 HIV-1 integrase inhibitors and discussed its effectiveness with the reported CPH model.

  4. Saving lives with public access defibrillation: A deadly game of hide and seek.

    Science.gov (United States)

    Sidebottom, David B; Potter, Ryan; Newitt, Laura K; Hodgetts, Gillian A; Deakin, Charles D

    2018-07-01

    Early defibrillation is a critical link in the chain of survival. Public access defibrillation (PAD) programmes utilising automated external defibrillators (AEDs) aim to decrease the time-to-first-shock, and improve survival from out-of-hospital cardiac arrest. Effective use of PADs requires rapid location of the device, facilitated by adequate signage. We aimed to therefore assess the quality of signage for PADs in the community. From April 2017 to January 2018 we surveyed community PADs available for public use on the 'Save a Life' AED locator mobile application in and around Southampton, UK. Location and signage characteristics were collected, and the distance from the furthest sign to the AED was measured. Researchers evaluated 201 separate PADs. All devices visited were included in the final analysis. No signage at all was present for 135 (67.2%) devices. Only 15/201 (7.5%) AEDs had signage at a distance from AED itself. In only 5 of these cases (2.5%) was signage mounted more than 5.0 m from the AED. When signage was present, 46 used 2008 ILCOR signage and 15 used 2006 Resuscitation Council (UK) signage. Signage visibility was partially or severely obstructed at 27/66 (40.9%) sites. None of the 45 GP surgeries surveyed used exterior signage or an exterior 24/7 access box. Current signage of PADs is poor and limits the device effectiveness by impeding public awareness and location of AEDs. Recommendations should promote visible signage within the operational radius of each AED. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Underutilisation of public access defibrillation is related to retrieval distance and time-dependent availability.

    Science.gov (United States)

    Deakin, Charles D; Anfield, Steve; Hodgetts, Gillian A

    2018-05-14

    Public access defibrillation doubles the chances of neurologically intact survival following out-of-hospital cardiac arrest (OHCA). Although there are increasing numbers of defibrillators (automated external defibrillator (AEDs)) available in the community, they are used infrequently, despite often being available. We aimed to match OHCAs with known AED locations in order to understand AED availability, the effects of reduced AED availability at night and the operational radius at which they can be effectively retrieved. All emergency calls to South Central Ambulance Service from April 2014 to April 2016 were screened to identify cardiac arrests. Each was mapped to the nearest AED, according to the time of day. Mapping software was used to calculate the actual walking distance for a bystander between each OHCA and respective AED, when travelling at a brisk walking speed (4 mph). 4012 cardiac arrests were identified and mapped to one of 2076 AEDs. All AEDs were available during daytime hours, but only 713 at night (34.3%). 5.91% of cardiac arrests were within a retrieval (walking) radius of 100 m during the day, falling to 1.59% out-of-hours. Distances to rural AEDs were greater than in urban areas (P<0.0001). An AED could potentially have been retrieved prior to actual ambulance arrival in 25.3% cases. Existing AEDs are underused; 36.4% of OHCAs are located within 500 m of an AED. Although more AEDs will improve availability, greater use can be made of existing AEDs, particularly by ensuring they are all available on a 24/7 basis. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  6. Prediction of Scylla olivacea (Crustacea; Brachyura) peptide hormones using publicly accessible transcriptome shotgun assembly (TSA) sequences.

    Science.gov (United States)

    Christie, Andrew E

    2016-05-01

    The aquaculture of crabs from the genus Scylla is of increasing economic importance for many Southeast Asian countries. Expansion of Scylla farming has led to increased efforts to understand the physiology and behavior of these crabs, and as such, there are growing molecular resources for them. Here, publicly accessible Scylla olivacea transcriptomic data were mined for putative peptide-encoding transcripts; the proteins deduced from the identified sequences were then used to predict the structures of mature peptide hormones. Forty-nine pre/preprohormone-encoding transcripts were identified, allowing for the prediction of 187 distinct mature peptides. The identified peptides included isoforms of adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin B, allatostatin C, bursicon β, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone/molt-inhibiting hormone, diuretic hormone 31, eclosion hormone, FMRFamide-like peptide, HIGSLYRamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, pyrokinin, red pigment concentrating hormone, RYamide, short neuropeptide F, SIFamide and tachykinin-related peptide, all well-known neuropeptide families. Surprisingly, the tissue used to generate the transcriptome mined here is reported to be testis. Whether or not the testis samples had neural contamination is unknown. However, if the peptides are truly produced by this reproductive organ, it could have far reaching consequences for the study of crustacean endocrinology, particularly in the area of reproductive control. Regardless, this peptidome is the largest thus far predicted for any brachyuran (true crab) species, and will serve as a foundation for future studies of peptidergic control in members of the commercially important genus Scylla. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Aaron Journal article datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — All figures used in the journal article are in netCDF format. This dataset is associated with the following publication: Sims, A., K. Alapaty , and S. Raman....

  8. Integrated Surface Dataset (Global)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...

  9. Control Measure Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...

  10. National Hydrography Dataset (NHD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  11. Market Squid Ecology Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  12. Tables and figure datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  13. Geoscience Meets Social Science: A Flexible Data Driven Approach for Developing High Resolution Population Datasets at Global Scale

    Science.gov (United States)

    Rose, A.; McKee, J.; Weber, E.; Bhaduri, B. L.

    2017-12-01

    Leveraging decades of expertise in population modeling, and in response to growing demand for higher resolution population data, Oak Ridge National Laboratory is now generating LandScan HD at global scale. LandScan HD is conceived as a 90m resolution population distribution where modeling is tailored to the unique geography and data conditions of individual countries or regions by combining social, cultural, physiographic, and other information with novel geocomputation methods. Similarities among these areas are exploited in order to leverage existing training data and machine learning algorithms to rapidly scale development. Drawing on ORNL's unique set of capabilities, LandScan HD adapts highly mature population modeling methods developed for LandScan Global and LandScan USA, settlement mapping research and production in high-performance computing (HPC) environments, land use and neighborhood mapping through image segmentation, and facility-specific population density models. Adopting a flexible methodology to accommodate different geographic areas, LandScan HD accounts for the availability, completeness, and level of detail of relevant ancillary data. Beyond core population and mapped settlement inputs, these factors determine the model complexity for an area, requiring that for any given area, a data-driven model could support either a simple top-down approach, a more detailed bottom-up approach, or a hybrid approach.

  14. Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Programs - calibration Report for Phoenix Testbed : Final Report. [supporting datasets - Phoenix Testbed

    Science.gov (United States)

    2017-07-26

    The datasets in this zip file are in support of FHWA-JPO-16-379, Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Program...

  15. Using LandSat and SRTM datasets to develop relationships for estimating bankfull channel widths in the Amazon Basin

    Science.gov (United States)

    Gummadi, V.; He, Y.; Beighley, E. R.

    2007-12-01

    Modeling fine scale spatial and temporal processes of the hydrologic cycle over continental to global extents is vital for assessing the potential impacts of climate and land use change on global water resources and related systems. Significant advancement in understanding and predicting the magnitude, trend, timing and partitioning of terrestrial water stores and fluxes requires the development of methodologies and knowledge for extracting representative hydraulic geometries from remote sensing data products and field data, suitable for estimating inundation characteristics and water storage changes which are limited for much of the globe. In this research, relationships between channel and floodplain widths and spatial drainage characteristics are developed for the Amazon Basin. Channel and floodplain widths were measured using SRTM data and LandSat TM/ETM imagery at 510 sites. The study sites were selected based on the Pfafstetter decomposition methodology which provides an irregular model grid based on repeatedly subdividing landscape units into nine subunits consisting of basins and interbasins. The selected sites encompass all possible combinations of Pfafstetter modeling units (ex., basins of interbasins, interbasins of basins, etc.). The 510 study sites are within the Amazon Basin with drainage areas ranging 10 to 5.4 million sq km and mean watershed ground slopes ranging from 0.4 and 30 percent. Preliminary results indicate that channel widths can be predicted using drainage area and mean watershed slope (R2 = 0.85). Floodplain widths can be predicted using channel width and the local slope (R2 = 0.70). Using the Purus watershed, a sub-basin to the Amazon (350,000 sq km), effects of channel and floodplain widths on simulated hydrographs are presented.

  16. Development of a 30 m Spatial Resolution Land Cover of Canada: Contribution to the Harmonized North America Land Cover Dataset

    Science.gov (United States)

    Pouliot, D.; Latifovic, R.; Olthof, I.

    2017-12-01

    Land cover is needed for a large range of environmental applications regarding climate impacts and adaption, emergency response, wildlife habitat, air quality, water yield, etc. In Canada a 2008 user survey revealed that the most practical scale for provision of land cover data is 30 m, nationwide, with an update frequency of five years (Ball, 2008). In response to this need the Canada Centre for Remote Sensing has generated a 30 m land cover of Canada for the base year 2010 as part of a planned series of maps at the recommended five year update frequency. This land cover is the Canadian contribution to the North American Land Change Monitoring System initiative, which seeks to provide harmonized land cover across Canada, the United States, and Mexico. The methodology developed in this research utilized a combination of unsupervised and machine learning techniques to map land cover, blend results between mapping units, locally optimize results, and process some thematic attributes with specific features sets. Accuracy assessment with available field data shows it was on average 75% for the five study areas assessed. In this presentation an overview of the unique processing aspects, example results, and initial accuracy assessment will be discussed.

  17. 41 CFR 102-79.115 - What guidelines must an agency follow if it elects to establish a public access defibrillation...

    Science.gov (United States)

    2010-07-01

    ... SPACE Assignment and Utilization of Space Public Access Defibrillation Programs § 102-79.115 What... 41 Public Contracts and Property Management 3 2010-07-01 2010-07-01 false What guidelines must an agency follow if it elects to establish a public access defibrillation program in a Federal facility? 102...

  18. Isfahan MISP Dataset.

    Science.gov (United States)

    Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

    2017-01-01

    An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).

  19. The development of the Older Persons and Informal Caregivers Survey Minimum DataSet (TOPICS-MDS): a large-scale data sharing initiative.

    Science.gov (United States)

    Lutomski, Jennifer E; Baars, Maria A E; Schalk, Bianca W M; Boter, Han; Buurman, Bianca M; den Elzen, Wendy P J; Jansen, Aaltje P D; Kempen, Gertrudis I J M; Steunenberg, Bas; Steyerberg, Ewout W; Olde Rikkert, Marcel G M; Melis, René J F

    2013-01-01

    In 2008, the Ministry of Health, Welfare and Sport commissioned the National Care for the Elderly Programme. While numerous research projects in older persons' health care were to be conducted under this national agenda, the Programme further advocated the development of The Older Persons and Informal Caregivers Survey Minimum DataSet (TOPICS-MDS) which would be integrated into all funded research protocols. In this context, we describe TOPICS data sharing initiative (www.topics-mds.eu). A working group drafted TOPICS-MDS prototype, which was subsequently approved by a multidisciplinary panel. Using instruments validated for older populations, information was collected on demographics, morbidity, quality of life, functional limitations, mental health, social functioning and health service utilisation. For informal caregivers, information was collected on demographics, hours of informal care and quality of life (including subjective care-related burden). Between 2010 and 2013, a total of 41 research projects contributed data to TOPICS-MDS, resulting in preliminary data available for 32,310 older persons and 3,940 informal caregivers. The majority of studies sampled were from primary care settings and inclusion criteria differed across studies. TOPICS-MDS is a public data repository which contains essential data to better understand health challenges experienced by older persons and informal caregivers. Such findings are relevant for countries where increasing health-related expenditure has necessitated the evaluation of contemporary health care delivery. Although open sharing of data can be difficult to achieve in practice, proactively addressing issues of data protection, conflicting data analysis requests and funding limitations during TOPICS-MDS developmental phase has fostered a data sharing culture. To date, TOPICS-MDS has been successfully incorporated into 41 research projects, thus supporting the feasibility of constructing a large (>30,000 observations

  20. Mridangam stroke dataset

    OpenAIRE

    CompMusic

    2014-01-01

    The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan using SM-58 microphones and an H4n ZOOM recorder. The audio was sampled at 44.1 kHz and stored as 16 bit wav files. The dataset can be used for training models for each Mridangam stroke. /n/nA detailed description of the Mridangam and its strokes can be found in the paper below. A part of the dataset was used in the following paper. /nAkshay Anantapadman...

  1. The GTZAN dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN...

  2. ­A curated transcriptomic dataset collection relevant to embryonic development associated with in vitro fertilization in healthy individuals and patients with polycystic ovary syndrome [version 1; referees: 1 approved, 2 approved with reservations

    Directory of Open Access Journals (Sweden)

    Rafah Mackeh

    2017-02-01

    Full Text Available The collection of large-scale datasets available in public repositories is rapidly growing and providing opportunities to identify and fill gaps in different fields of biomedical research. However, users of these datasets should be able to selectively browse datasets related to their field of interest. Here we made available a collection of transcriptome datasets related to human follicular cells from normal individuals or patients with polycystic ovary syndrome, in the process of their development, during in vitro fertilization. After RNA-seq dataset exclusion and careful selection based on study description and sample information, 12 datasets, encompassing a total of 85 unique transcriptome profiles, were identified in NCBI Gene Expression Omnibus and uploaded to the Gene Expression Browser (GXB, a web application specifically designed for interactive query and visualization of integrated large-scale data. Once annotated in GXB, multiple sample grouping has been made in order to create rank lists to allow easy data interpretation and comparison. The GXB tool also allows the users to browse a single gene across multiple projects to evaluate its expression profiles in multiple biological systems/conditions in a web-based customized graphical views. The curated dataset is accessible at the following link: http://ivf.gxbsidra.org/dm3/landing.gsp.

  3. Dataset - Adviesregel PPL 2010

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an

  4. Development of Web GIS for complex processing and visualization of climate geospatial datasets as an integral part of dedicated Virtual Research Environment

    Science.gov (United States)

    Gordov, Evgeny; Okladnikov, Igor; Titov, Alexander

    2017-04-01

    For comprehensive usage of large geospatial meteorological and climate datasets it is necessary to create a distributed software infrastructure based on the spatial data infrastructure (SDI) approach. Currently, it is generally accepted that the development of client applications as integrated elements of such infrastructure should be based on the usage of modern web and GIS technologies. The paper describes the Web GIS for complex processing and visualization of geospatial (mainly in NetCDF and PostGIS formats) datasets as an integral part of the dedicated Virtual Research Environment for comprehensive study of ongoing and possible future climate change, and analysis of their implications, providing full information and computing support for the study of economic, political and social consequences of global climate change at the global and regional levels. The Web GIS consists of two basic software parts: 1. Server-side part representing PHP applications of the SDI geoportal and realizing the functionality of interaction with computational core backend, WMS/WFS/WPS cartographical services, as well as implementing an open API for browser-based client software. Being the secondary one, this part provides a limited set of procedures accessible via standard HTTP interface. 2. Front-end part representing Web GIS client developed according to a "single page application" technology based on JavaScript libraries OpenLayers (http://openlayers.org/), ExtJS (https://www.sencha.com/products/extjs), GeoExt (http://geoext.org/). It implements application business logic and provides intuitive user interface similar to the interface of such popular desktop GIS applications, as uDIG, QuantumGIS etc. Boundless/OpenGeo architecture was used as a basis for Web-GIS client development. According to general INSPIRE requirements to data visualization Web GIS provides such standard functionality as data overview, image navigation, scrolling, scaling and graphical overlay, displaying map

  5. SIMADL: Simulated Activities of Daily Living Dataset

    Directory of Open Access Journals (Sweden)

    Talal Alshammari

    2018-04-01

    Full Text Available With the realisation of the Internet of Things (IoT paradigm, the analysis of the Activities of Daily Living (ADLs, in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator, which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset.

  6. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  7. Daily precipitation grids for Austria since 1961—development and evaluation of a spatial dataset for hydroclimatic monitoring and modelling

    Science.gov (United States)

    Hiebl, Johann; Frei, Christoph

    2018-04-01

    Spatial precipitation datasets that are long-term consistent, highly resolved and extend over several decades are an increasingly popular basis for modelling and monitoring environmental processes and planning tasks in hydrology, agriculture, energy resources management, etc. Here, we present a grid dataset of daily precipitation for Austria meant to promote such applications. It has a grid spacing of 1 km, extends back till 1961 and is continuously updated. It is constructed with the classical two-tier analysis, involving separate interpolations for mean monthly precipitation and daily relative anomalies. The former was accomplished by kriging with topographic predictors as external drift utilising 1249 stations. The latter is based on angular distance weighting and uses 523 stations. The input station network was kept largely stationary over time to avoid artefacts on long-term consistency. Example cases suggest that the new analysis is at least as plausible as previously existing datasets. Cross-validation and comparison against experimental high-resolution observations (WegenerNet) suggest that the accuracy of the dataset depends on interpretation. Users interpreting grid point values as point estimates must expect systematic overestimates for light and underestimates for heavy precipitation as well as substantial random errors. Grid point estimates are typically within a factor of 1.5 from in situ observations. Interpreting grid point values as area mean values, conditional biases are reduced and the magnitude of random errors is considerably smaller. Together with a similar dataset of temperature, the new dataset (SPARTACUS) is an interesting basis for modelling environmental processes, studying climate change impacts and monitoring the climate of Austria.

  8. Report of the Association of Coloproctology of Great Britain and Ireland/British Society of Gastroenterology Colorectal Polyp Working Group: the development of a complex colorectal polyp minimum dataset.

    Science.gov (United States)

    Chattree, A; Barbour, J A; Thomas-Gibson, S; Bhandari, P; Saunders, B P; Veitch, A M; Anderson, J; Rembacken, B J; Loughrey, M B; Pullan, R; Garrett, W V; Lewis, G; Dolwani, S; Rutter, M D

    2017-01-01

    The management of large non-pedunculated colorectal polyps (LNPCPs) is complex, with widespread variation in management and outcome, even amongst experienced clinicians. Variations in the assessment and decision-making processes are likely to be a major factor in this variability. The creation of a standardized minimum dataset to aid decision-making may therefore result in improved clinical management. An official working group of 13 multidisciplinary specialists was appointed by the Association of Coloproctology of Great Britain and Ireland (ACPGBI) and the British Society of Gastroenterology (BSG) to develop a minimum dataset on LNPCPs. The literature review used to structure the ACPGBI/BSG guidelines for the management of LNPCPs was used by a steering subcommittee to identify various parameters pertaining to the decision-making processes in the assessment and management of LNPCPs. A modified Delphi consensus process was then used for voting on proposed parameters over multiple voting rounds with at least 80% agreement defined as consensus. The minimum dataset was used in a pilot process to ensure rigidity and usability. A 23-parameter minimum dataset with parameters relating to patient and lesion factors, including six parameters relating to image retrieval, was formulated over four rounds of voting with two pilot processes to test rigidity and usability. This paper describes the development of the first reported evidence-based and expert consensus minimum dataset for the management of LNPCPs. It is anticipated that this dataset will allow comprehensive and standardized lesion assessment to improve decision-making in the assessment and management of LNPCPs. Colorectal Disease © 2016 The Association of Coloproctology of Great Britain and Ireland.

  9. 76 FR 80417 - Request for Information: Public Access to Digital Data Resulting From Federally Funded Scientific...

    Science.gov (United States)

    2011-12-23

    ... specific data management and data sharing requirements for specific types of projects, such as genome-wide... disciplines and different types of digital data when developing policies on the management of data? (4) How... to provide recommendations on approaches for ensuring long-term stewardship and encouraging broad...

  10. Erotized, AIDS-HIV information on public-access television: a study of obscenity, state censorship and cultural resistance.

    Science.gov (United States)

    Lukenbill, W B

    1998-06-01

    This study analyzes court records of a county-level obscenity trial in Austin, Texas, and the appeal of the guilty verdict beginning with a Texas appellate court up to the U.S. Supreme Court of two individuals who broadcast erotized AIDS and HIV safer sex information on a public-access cable television. The trial and appellate court decisions are reviewed in terms of argument themes, and the nature of sexual value controversy is outlined. Erotic materials often conflict with broad-based sexual and community values, and providing erotized HIV and AIDS information products can be a form of radical political action designed to force societal change. This study raises question as to how this trial and this type of informational product might affect the programs and activities of information resource centers, community-based organizations, libraries, and the overall mission of public health education.

  11. Simulation of Smart Home Activity Datasets

    Directory of Open Access Journals (Sweden)

    Jonathan Synnott

    2015-06-01

    Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  12. Simulation of Smart Home Activity Datasets.

    Science.gov (United States)

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  13. The CMS dataset bookkeeping service

    Science.gov (United States)

    Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

    2008-07-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  14. The CMS dataset bookkeeping service

    Energy Technology Data Exchange (ETDEWEB)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V [Fermilab, Batavia, Illinois 60510 (United States); Dolgert, A; Jones, C; Kuznetsov, V; Riley, D [Cornell University, Ithaca, New York 14850 (United States)

    2008-07-15

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  15. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

    2008-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  16. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

    2007-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  17. Thirty Years, One Million Spectra: Public Access to the SAO Spectral Archives

    Science.gov (United States)

    Mink, J.; Moran, S.

    2015-09-01

    Over the last 30 years, the SAO Telescope Data Center has reduced and archived over 1,000,000 spectra, consisting of 287,000 spectra from five high dispersion Echelle spectrographs and 717,000 spectra from four low dispersion spectrographs, across three telescopes. 151,000 spectra from six instruments are currently online and publicly available, covering many interesting objects in the northern sky, including most of the galaxies in the Updated Zwicky Catalog which are reachable through NED or Simbad. A majority of the high dispersion spectra will soon be made public, as will more data from the MMT multi-fiber spectrographs. Many objects in the archive have multiple spectra over time, which make them a valuable resource for archival time-domain studies. We are now developing a system to make all of the public spectra more easily searchable and viewable through the Virtual Observatory.

  18. Enhancing Public Access to Relevant and Valued Medical Information: Fresh Directions for RadiologyInfo.org.

    Science.gov (United States)

    Rubin, Geoffrey D; Krishnaraj, Arun; Mahesh, Mahadevappa; Rajendran, Ramji R; Fishman, Elliot K

    2017-05-01

    RadiologyInfo.org is a public information portal designed to support patient care and broaden public awareness of the essential role radiology plays in overall patient health care. Over the past 14 years, RadiologyInfo.org has evolved considerably to provide access to more than 220 mixed-media descriptions of tests, treatments, and diseases through a spectrum of mobile and desktop platforms, social media, and downloadable documents in both English and Spanish. In 2014, the RSNA-ACR Public Information Website Committee, which stewards RadiologyInfo.org, developed 3- to 5-year strategic and implementation plans for the website. The process was informed by RadiologyInfo.org user surveys, formal stakeholder interviews, focus groups, and usability testing. Metrics were established as key performance indicators to assess progress toward the stated goals of (1) optimizing content to enhance patient-centeredness, (2) enhancing reach and engagement, and (3) maintaining sustainability. Major changes resulting from this process include a complete redesign of the website, the replacement of text-rich PowerPoint presentations with conversational videos, and the development of an affiliate network. Over the past year, visits to RadiologyInfo.org have increased by 60.27% to 1,424,523 in August 2016 from 235 countries and territories. Twenty-two organizations have affiliated with RadiologyInfo.org with new organizations being added on a monthly basis. RadiologyInfo provides a tangible demonstration of how radiologists can engage directly with the global public to educate them on the value of radiology in their health care and to allay concerns and dispel misconceptions. Regular self-assessment and responsive planning will ensure its continued growth and relevance. Copyright © 2016 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  19. National Elevation Dataset

    Science.gov (United States)

    ,

    2002-01-01

    The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.

  20. NP-PAH Interaction Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  1. Development of a quality management system (QMS) for borehole investigations. (2) Evaluation of applicability of QMS methodology for the hydrochemical dataset

    International Nuclear Information System (INIS)

    Kunimaru, Takanori; Ota, Kunio; Amano, Kenji; Alexander, W. Russell

    2011-01-01

    An appropriate QMS, which is among the first tools required for repository site characterisation, will save on effort by reducing errors and the requirement to resample and reanalyse - but this can only be guaranteed by continuously assessing if the system is truly fit-for-purpose and amending it as necessary based on the practical experience of the end-users on-site. A QA audit of hydrochemical datasets for boreholes HDB-1-11 from Horonobe URL project by JAEA has been carried out by the application of a formal QA analysis which is based on the methodology previously employed for groundwaters during the recent site characterisation programme in Sweden. This methodology has been successfully applied to the groundwaters of the fractured crystalline rocks of the Fennoscandian Shield and has now been adapted and applied to some of the ground- and porewaters of the Horonobe URL area. This paper will present this system in the context of the Japanese national programme and elucidate improvements made during hands-on application of the borehole investigation QMS. Further improvements foreseen for the future will also be discussed with a view of removing inter-operator variability as much as is possible. Only then can confidence by placed in URL project or repository site hydrochemical datasets. (author)

  2. Public Access Defibrillation

    DEFF Research Database (Denmark)

    Agerskov, Marianne; Nielsen, Anne Møller; Hansen, Carolina Malta

    2015-01-01

    BACKGROUND: In Copenhagen, a volunteer-based Automated External Defibrillator (AED) network provides a unique opportunity to assess AED use. We aimed to determine the proportion of Out-of-Hospital Cardiac Arrest (OHCA) where an AED was applied before arrival of the ambulance, and the proportion o...

  3. Editorial: Datasets for Learning Analytics

    NARCIS (Netherlands)

    Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

    2018-01-01

    The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of

  4. New developments in delivering public access to data from the National Center for Computational Toxicology at the EPA

    Science.gov (United States)

    Researchers at EPA’s National Center for Computational Toxicology integrate advances in biology, chemistry, and computer science to examine the toxicity of chemicals and help prioritize chemicals for further research based on potential human health risks. The goal of this researc...

  5. Do no harm: the role of community pharmacists in regulating public access to prescription drugs in Saudi Arabia.

    Science.gov (United States)

    Bahnassi, Anas

    2016-04-01

    Pharmacists have a crucial role to ensure regulated public access to prescription drugs. The study aimed to investigate the views of community pharmacists practising in Saudi Arabia on their role in the unauthorised supply of prescription drugs, consider the possible contributory factors and report pharmacists' suggested strategies to regulate supply. One hundred community pharmacists were invited to participate in an interview-based survey, including questions on demographic characteristics, and the unauthorised supply of prescription drugs. Descriptive statistics were conducted, and associations between categorical responses tested; a P value of ≤0.05 was considered significant. Responses to open questions were analysed thematically. In Saudi Arabia, there is widespread unregulated supply of prescription drugs; pharmacists are under pressure from patients to provide prescription drugs for a wide range of clinical conditions. There are safety and appropriateness concerns when drugs are provided based on patient demand rather than clinical need. Pharmacists do not maintain patient records with information on drugs supplied and associated actions. While most pharmacists supply prescription drugs without the necessary prescriber authorisation, they also this may jeopardise patients safety. While we have many concerns about this practice its present form, we believe pharmacists should have certain prescribing privileges within their areas of competence. A legal framework is needed to guarantee proper pharmacists' training, support, mentorship and access to the tools required to provide safe pharmacy practice. © 2015 Royal Pharmaceutical Society.

  6. NCI Program for Natural Product Discovery: A Publicly-Accessible Library of Natural Product Fractions for High-Throughput Screening.

    Science.gov (United States)

    Thornburg, Christopher C; Britt, John R; Evans, Jason R; Akee, Rhone K; Whitt, James A; Trinh, Spencer K; Harris, Matthew J; Thompson, Jerell R; Ewing, Teresa L; Shipley, Suzanne M; Grothaus, Paul G; Newman, David J; Schneider, Joel P; Grkovic, Tanja; O'Keefe, Barry R

    2018-06-13

    The US National Cancer Institute's (NCI) Natural Product Repository is one of the world's largest, most diverse collections of natural products containing over 230,000 unique extracts derived from plant, marine, and microbial organisms that have been collected from biodiverse regions throughout the world. Importantly, this national resource is available to the research community for the screening of extracts and the isolation of bioactive natural products. However, despite the success of natural products in drug discovery, compatibility issues that make extracts challenging for liquid handling systems, extended timelines that complicate natural product-based drug discovery efforts and the presence of pan-assay interfering compounds have reduced enthusiasm for the high-throughput screening (HTS) of crude natural product extract libraries in targeted assay systems. To address these limitations, the NCI Program for Natural Product Discovery (NPNPD), a newly launched, national program to advance natural product discovery technologies and facilitate the discovery of structurally defined, validated lead molecules ready for translation will create a prefractionated library from over 125,000 natural product extracts with the aim of producing a publicly-accessible, HTS-amenable library of >1,000,000 fractions. This library, representing perhaps the largest accumulation of natural-product based fractions in the world, will be made available free of charge in 384-well plates for screening against all disease states in an effort to reinvigorate natural product-based drug discovery.

  7. National Elevation Dataset (NED)

    Data.gov (United States)

    Kansas Data Access and Support Center — The U.S. Geological Survey has developed a National Elevation Database (NED). The NED is a seamless mosaic of best-available elevation data. The 7.5-minute elevation...

  8. Public Access to Library Automation. Clinic on Library Applications of Data Processing (17th, University of Illinois at Urbana-Champaign, 1980).

    Science.gov (United States)

    Divilbiss, J. L., Ed.

    Eight studies by experts in the field of information retrieval examine aspects of public use of such automated systems as online catalogs in libraries. Ward Shaw discusses "Design Principles for Public Access," outlining desirable characteristics of an information retrieval system. Allen Avner and H. George Friedman, Jr. treat problems…

  9. Is There a Standard Default Keyword Operator? A Bibliometric Analysis of Processing Options Chosen by Libraries To Execute Keyword Searches in Online Public Access Catalogs.

    Science.gov (United States)

    Klein, Gary M.

    1994-01-01

    Online public access catalogs from 67 libraries using NOTIS software were searched using Internet connections to determine the positional operators selected as the default keyword operator on each catalog. Results indicate the lack of a processing standard for keyword searches. Five tables provide information. (Author/AEF)

  10. Comparison of Shallow Survey 2012 Multibeam Datasets

    Science.gov (United States)

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  11. Turkey Run Landfill Emissions Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  12. Dataset of NRDA emission data

    Data.gov (United States)

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  13. Chemical product and function dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  14. Prediction of the neuropeptidomes of members of the Astacidea (Crustacea, Decapoda) using publicly accessible transcriptome shotgun assembly (TSA) sequence data.

    Science.gov (United States)

    Christie, Andrew E; Chi, Megan

    2015-12-01

    The decapod infraorder Astacidea is comprised of clawed lobsters and freshwater crayfish. Due to their economic importance and their use as models for investigating neurochemical signaling, much work has focused on elucidating their neurochemistry, particularly their peptidergic systems. Interestingly, no astacidean has been the subject of large-scale peptidomic analysis via in silico transcriptome mining, this despite growing transcriptomic resources for members of this taxon. Here, the publicly accessible astacidean transcriptome shotgun assembly data were mined for putative peptide-encoding transcripts; these sequences were used to predict the structures of mature neuropeptides. One hundred seventy-six distinct peptides were predicted for Procambarus clarkii, including isoforms of adipokinetic hormone-corazonin-like peptide (ACP), allatostatin A (AST-A), allatostatin B, allatostatin C (AST-C) bursicon α, bursicon β, CCHamide, crustacean hyperglycemic hormone (CHH)/ion transport peptide (ITP), diuretic hormone 31 (DH31), eclosion hormone (EH), FMRFamide-like peptide, GSEFLamide, intocin, leucokinin, neuroparsin, neuropeptide F, pigment dispersing hormone, pyrokinin, RYamide, short neuropeptide F (sNPF), SIFamide, sulfakinin and tachykinin-related peptide (TRP). Forty-six distinct peptides, including isoforms of AST-A, AST-C, bursicon α, CCHamide, CHH/ITP, DH31, EH, intocin, myosuppressin, neuroparsin, red pigment concentrating hormone, sNPF and TRP, were predicted for Pontastacus leptodactylus, with a bursicon β and a neuroparsin predicted for Cherax quadricarinatus. The identification of ACP is the first from a decapod, while the predictions of CCHamide, EH, GSEFLamide, intocin, neuroparsin and RYamide are firsts for the Astacidea. Collectively, these data greatly expand the catalog of known astacidean neuropeptides and provide a foundation for functional studies of peptidergic signaling in members of this decapod infraorder. Copyright © 2015 Elsevier Inc

  15. Pattern Analysis On Banking Dataset

    Directory of Open Access Journals (Sweden)

    Amritpal Singh

    2015-06-01

    Full Text Available Abstract Everyday refinement and development of technology has led to an increase in the competition between the Tech companies and their going out of way to crack the system andbreak down. Thus providing Data mining a strategically and security-wise important area for many business organizations including banking sector. It allows the analyzes of important information in the data warehouse and assists the banks to look for obscure patterns in a group and discover unknown relationship in the data.Banking systems needs to process ample amount of data on daily basis related to customer information their credit card details limit and collateral details transaction details risk profiles Anti Money Laundering related information trade finance data. Thousands of decisionsbased on the related data are taken in a bank daily. This paper analyzes the banking dataset in the weka environment for the detection of interesting patterns based on its applications ofcustomer acquisition customer retention management and marketing and management of risk fraudulence detections.

  16. The NOAA Dataset Identifier Project

    Science.gov (United States)

    de la Beaujardiere, J.; Mccullough, H.; Casey, K. S.

    2013-12-01

    The US National Oceanic and Atmospheric Administration (NOAA) initiated a project in 2013 to assign persistent identifiers to datasets archived at NOAA and to create informational landing pages about those datasets. The goals of this project are to enable the citation of datasets used in products and results in order to help provide credit to data producers, to support traceability and reproducibility, and to enable tracking of data usage and impact. A secondary goal is to encourage the submission of datasets for long-term preservation, because only archived datasets will be eligible for a NOAA-issued identifier. A team was formed with representatives from the National Geophysical, Oceanographic, and Climatic Data Centers (NGDC, NODC, NCDC) to resolve questions including which identifier scheme to use (answer: Digital Object Identifier - DOI), whether or not to embed semantics in identifiers (no), the level of granularity at which to assign identifiers (as coarsely as reasonable), how to handle ongoing time-series data (do not break into chunks), creation mechanism for the landing page (stylesheet from formal metadata record preferred), and others. Decisions made and implementation experience gained will inform the writing of a Data Citation Procedural Directive to be issued by the Environmental Data Management Committee in 2014. Several identifiers have been issued as of July 2013, with more on the way. NOAA is now reporting the number as a metric to federal Open Government initiatives. This paper will provide further details and status of the project.

  17. The Harvard organic photovoltaic dataset.

    Science.gov (United States)

    Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-27

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  18. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  19. Tracking past changes in lake-water phosphorus with a 251-lake calibration dataset in British Columbia: tool development and application in a multiproxy assessment of eutrophication and recovery in Osoyoos Lake, a transboundary lake in western North America

    Directory of Open Access Journals (Sweden)

    Brian Fraser Cumming

    2015-08-01

    Full Text Available Recently there has been an active discussion about the potential and challenges of tracking past lake-water trophic state using paleolimnological methods. Herein, we present analyses of the relationship between modern-day diatom assemblages from the surface sediments of 251 fresh-water lakes from British Columbia and contemporary limnological variables. Total phosphorus (TP was significantly related to the modern distribution of diatom assemblages. The large size of this new calibration dataset resulted in higher abundances and occurrences of many diatom taxa thereby allowing a more accurate quantification of the optima of diatom taxa to TP in comparison to previous smaller calibration datasets. Robust diatom-based TP inference models with a moderate predictive power were developed using weighted-averaging regression and calibration. Information from the calibration dataset was used to interpret changes in the diatom assemblages from the north and south basins of Osoyoos Lake, in conjunction with fossil pigment analyses. Osoyoos Lake is a large salmon-bearing lake that straddles the British Columbia-Washington border and has undergone cultural eutrophication followed by recovery due to substantial mitigation efforts in managing sources of nutrients. Both diatom assemblages and sedimentary pigments indicate that eutrophication began c. 1950 in the north basin and c. 1960 in the southern basin, reaching peak levels of production between 1960 and 1990, after which decreases in sedimentary pigments occurred, as well as decreases in the relative abundance and concentrations of diatom taxa inferred to have high TP optima. Post-1990 changes in the diatom assemblage suggests conditions have become less productive with a shift to taxa more indicative of lower TP optima in concert with measurements of declining TP, two of these diatom taxa, Cyclotella comensis and Cyclotella gordonensis, that were previously rare are now abundant.

  20. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  1. Fluxnet Synthesis Dataset Collaboration Infrastructure

    Energy Technology Data Exchange (ETDEWEB)

    Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

    2008-02-06

    The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.

  2. Really big data: Processing and analysis of large datasets

    Science.gov (United States)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  3. Enhancing Scientific Collaboration, Transparency, and Public Access: Utilizing the Second Life Platform to Convene a Scientific Conference in 3-D Virtual Space

    Science.gov (United States)

    McGee, B. W.

    2006-12-01

    Recent studies reveal a general mistrust of science as well as a distorted perception of the scientific method by the public at-large. Concurrently, the number of science undergraduate and graduate students is in decline. By taking advantage of emergent technologies not only for direct public outreach but also to enhance public accessibility to the science process, it may be possible to both begin a reversal of popular scientific misconceptions and to engage a new generation of scientists. The Second Life platform is a 3-D virtual world produced and operated by Linden Research, Inc., a privately owned company instituted to develop new forms of immersive entertainment. Free and downloadable to the public, Second Life offers an imbedded physics engine, streaming audio and video capability, and unlike other "multiplayer" software, the objects and inhabitants of Second Life are entirely designed and created by its users, providing an open-ended experience without the structure of a traditional video game. Already, educational institutions, virtual museums, and real-world businesses are utilizing Second Life for teleconferencing, pre-visualization, and distance education, as well as to conduct traditional business. However, the untapped potential of Second Life lies in its versatility, where the limitations of traditional scientific meeting venues do not exist, and attendees need not be restricted by prohibitive travel costs. It will be shown that the Second Life system enables scientific authors and presenters at a "virtual conference" to display figures and images at full resolution, employ audio-visual content typically not available to conference organizers, and to perform demonstrations or premier three-dimensional renderings of objects, processes, or information. An enhanced presentation like those possible with Second Life would be more engaging to non- scientists, and such an event would be accessible to the general users of Second Life, who could have an

  4. A geospatial dataset for U.S. hurricane storm surge and sea-level rise vulnerability: Development and case study applications

    Directory of Open Access Journals (Sweden)

    Megan C. Maloney

    2014-01-01

    Full Text Available The consequences of future sea-level rise for coastal communities are a priority concern arising from anthropogenic climate change. Here, previously published methods are scaled up in order to undertake a first pass assessment of exposure to hurricane storm surge and sea-level rise for the U.S. Gulf of Mexico and Atlantic coasts. Sea-level rise scenarios ranging from +0.50 to +0.82 m by 2100 increased estimates of the area exposed to inundation by 4–13% and 7–20%, respectively, among different Saffir-Simpson hurricane intensity categories. Potential applications of these hazard layers for vulnerability assessment are demonstrated with two contrasting case studies: potential exposure of current energy infrastructure in the U.S. Southeast and exposure of current and future housing along both the Gulf and Atlantic Coasts. Estimates of the number of Southeast electricity generation facilities potentially exposed to hurricane storm surge ranged from 69 to 291 for category 1 and category 5 storms, respectively. Sea-level rise increased the number of exposed facilities by 6–60%, depending on the sea-level rise scenario and the intensity of the hurricane under consideration. Meanwhile, estimates of the number of housing units currently exposed to hurricane storm surge ranged from 4.1 to 9.4 million for category 1 and category 4 storms, respectively, while exposure for category 5 storms was estimated at 7.1 million due to the absence of landfalling category 5 hurricanes in the New England region. Housing exposure was projected to increase 83–230% by 2100 among different sea-level rise and housing scenarios, with the majority of this increase attributed to future housing development. These case studies highlight the utility of geospatial hazard information for national-scale coastal exposure or vulnerability assessment as well as the importance of future socioeconomic development in the assessment of coastal vulnerability.

  5. Automatic processing of multimodal tomography datasets.

    Science.gov (United States)

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  6. CERC Dataset (Full Hadza Data)

    DEFF Research Database (Denmark)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7......) Kyzyl, Tyva Republic; and (8) Yasawa, Fiji. Related publication: Purzycki, et al. (2016). Moralistic Gods, Supernatural Punishment and the Expansion of Human Sociality. Nature, 530(7590): 327-330....

  7. Information management for aged care provision in Australia: development of an aged care minimum dataset and strategies to improve quality and continuity of care.

    Science.gov (United States)

    Davis, Jenny; Morgans, Amee; Burgess, Stephen

    2016-04-01

    Efficient information systems support the provision of multi-disciplinary aged care and a variety of organisational purposes, including quality, funding, communication and continuity of care. Agreed minimum data sets enable accurate communication across multiple care settings. However, in aged care multiple and poorly integrated data collection frameworks are commonly used for client assessment, government reporting and funding purposes. To determine key information needs in aged care settings to improve information quality, information transfer, safety, quality and continuity of care to meet the complex needs of aged care clients. Modified Delphi methods involving five stages were employed by one aged care provider in Victoria, Australia, to establish stakeholder consensus for a derived minimum data set and address barriers to data quality. Eleven different aged care programs were identified; with five related data dictionaries, three minimum data sets, five program standards or quality frameworks. The remaining data collection frameworks related to diseases classification, funding, service activity reporting, and statistical standards and classifications. A total of 170 different data items collected across seven internal information systems were consolidated to a derived set of 60 core data items and aligned with nationally consistent data collection frameworks. Barriers to data quality related to inconsistencies in data items, staff knowledge, workflow, system access and configuration. The development an internal aged care minimum data set highlighted the critical role of primary data quality in the upstream and downstream use of client information; and presents a platform to build national consistency across the sector.

  8. A New Outlier Detection Method for Multidimensional Datasets

    KAUST Repository

    Abdel Messih, Mario A.

    2012-07-01

    This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.

  9. Viking Seismometer PDS Archive Dataset

    Science.gov (United States)

    Lorenz, R. D.

    2016-12-01

    The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.

  10. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  11. RARD: The Related-Article Recommendation Dataset

    OpenAIRE

    Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

    2017-01-01

    Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...

  12. ATLAS File and Dataset Metadata Collection and Use

    CERN Document Server

    Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

    2012-01-01

    The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...

  13. Passive Containment DataSet

    Science.gov (United States)

    This data is for Figures 6 and 7 in the journal article. The data also includes the two EPANET input files used for the analysis described in the paper, one for the looped system and one for the block system.This dataset is associated with the following publication:Grayman, W., R. Murray , and D. Savic. Redesign of Water Distribution Systems for Passive Containment of Contamination. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 108(7): 381-391, (2016).

  14. Homogenised Australian climate datasets used for climate change monitoring

    International Nuclear Information System (INIS)

    Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

    2007-01-01

    Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal

  15. 2008 TIGER/Line Nationwide Dataset

    Data.gov (United States)

    California Natural Resource Agency — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  16. Satellite-Based Precipitation Datasets

    Science.gov (United States)

    Munchak, S. J.; Huffman, G. J.

    2017-12-01

    Of the possible sources of precipitation data, those based on satellites provide the greatest spatial coverage. There is a wide selection of datasets, algorithms, and versions from which to choose, which can be confusing to non-specialists wishing to use the data. The International Precipitation Working Group (IPWG) maintains tables of the major publicly available, long-term, quasi-global precipitation data sets (http://www.isac.cnr.it/ ipwg/data/datasets.html), and this talk briefly reviews the various categories. As examples, NASA provides two sets of quasi-global precipitation data sets: the older Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) and current Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) mission (IMERG). Both provide near-real-time and post-real-time products that are uniformly gridded in space and time. The TMPA products are 3-hourly 0.25°x0.25° on the latitude band 50°N-S for about 16 years, while the IMERG products are half-hourly 0.1°x0.1° on 60°N-S for over 3 years (with plans to go to 16+ years in Spring 2018). In addition to the precipitation estimates, each data set provides fields of other variables, such as the satellite sensor providing estimates and estimated random error. The discussion concludes with advice about determining suitability for use, the necessity of being clear about product names and versions, and the need for continued support for satellite- and surface-based observation.

  17. Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

    Science.gov (United States)

    Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

    2016-11-01

    This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.

  18. Quality Controlling CMIP datasets at GFDL

    Science.gov (United States)

    Horowitz, L. W.; Radhakrishnan, A.; Balaji, V.; Adcroft, A.; Krasting, J. P.; Nikonov, S.; Mason, E. E.; Schweitzer, R.; Nadeau, D.

    2017-12-01

    As GFDL makes the switch from model development to production in light of the Climate Model Intercomparison Project (CMIP), GFDL's efforts are shifted to testing and more importantly establishing guidelines and protocols for Quality Controlling and semi-automated data publishing. Every CMIP cycle introduces key challenges and the upcoming CMIP6 is no exception. The new CMIP experimental design comprises of multiple MIPs facilitating research in different focus areas. This paradigm has implications not only for the groups that develop the models and conduct the runs, but also for the groups that monitor, analyze and quality control the datasets before data publishing, before their knowledge makes its way into reports like the IPCC (Intergovernmental Panel on Climate Change) Assessment Reports. In this talk, we discuss some of the paths taken at GFDL to quality control the CMIP-ready datasets including: Jupyter notebooks, PrePARE, LAMP (Linux, Apache, MySQL, PHP/Python/Perl): technology-driven tracker system to monitor the status of experiments qualitatively and quantitatively, provide additional metadata and analysis services along with some in-built controlled-vocabulary validations in the workflow. In addition to this, we also discuss the integration of community-based model evaluation software (ESMValTool, PCMDI Metrics Package, and ILAMB) as part of our CMIP6 workflow.

  19. Integrated remotely sensed datasets for disaster management

    Science.gov (United States)

    McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

    2008-10-01

    Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.

  20. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    Science.gov (United States)

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  1. Public Health Applications of Remotely-sensed Environmental Datasets for the Conterminous United States

    Science.gov (United States)

    Al-Hamdan, Mohammad; Crosson, William; Economou, Sigrid; Estes, Marice Jr; Estes, Sue; Hemmings, Sarah; Kent, Shia; Puckett, Mark; Quattrochi, Dale; Wade, Gina

    2013-01-01

    NASA Marshall Space Flight Center is collaborating with the University of Alabama at Birmingham (UAB) School of Public Health and the Centers for Disease Control and Prevention (CDC) National Center for Public Health Informatics to address issues of environmental health and enhance public health decision-making using NASA remotely-sensed data and products. The objectives of this study are to develop high-quality spatial data sets of environmental variables, link these with public health data from a national cohort study, and deliver the linked data sets and associated analyses to local, state and federal end-user groups. Three daily environmental data sets were developed for the conterminous U.S. on different spatial resolutions for the period 2003-2008: (1) spatial surfaces of estimated fine particulate matter (PM2.5) exposures on a 10-km grid using the US Environmental Protection Agency (EPA) ground observations and NASA's MODerate-resolution Imaging Spectroradiometer (MODIS) data; (2) a 1-km grid of Land Surface Temperature (LST) using MODIS data; and (3) a 12-km grid of daily Incoming Solar Radiation (Insolation) and heat-related products using the North American Land Data Assimilation System (NLDAS) forcing data. These environmental data sets were linked with public health data from the UAB REasons for Geographic And Racial Differences in Stroke (REGARDS) national cohort study to determine whether exposures to these environmental risk factors are related to cognitive decline, stroke and other health outcomes. These environmental datasets and the results of the public health linkage analyses will be disseminated to end-users for decision-making through the CDC Wide-ranging Online Data for Epidemiologic Research (WONDER) system and through peer-reviewed publications respectively. The linkage of these data with the CDC WONDER system substantially expands public access to NASA data, making their use by a wide range of decision makers feasible. By successful

  2. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction The first part of the year presented an important test for the new Physics Performance and Dataset (PPD) group (cf. its mandate: http://cern.ch/go/8f77). The activity was focused on the validation of the new releases meant for the Monte Carlo (MC) production and the data-processing in 2012 (CMSSW 50X and 52X), and on the preparation of the 2012 operations. In view of the Chamonix meeting, the PPD and physics groups worked to understand the impact of the higher pile-up scenario on some of the flagship Higgs analyses to better quantify the impact of the high luminosity on the CMS physics potential. A task force is working on the optimisation of the reconstruction algorithms and on the code to cope with the performance requirements imposed by the higher event occupancy as foreseen for 2012. Concerning the preparation for the analysis of the new data, a new MC production has been prepared. The new samples, simulated at 8 TeV, are already being produced and the digitisation and recons...

  3. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The PPD activities, in the first part of 2013, have been focused mostly on the final physics validation and preparation for the data reprocessing of the full 8 TeV datasets with the latest calibrations. These samples will be the basis for the preliminary results for summer 2013 but most importantly for the final publications on the 8 TeV Run 1 data. The reprocessing involves also the reconstruction of a significant fraction of “parked data” that will allow CMS to perform a whole new set of precision analyses and searches. In this way the CMSSW release 53X is becoming the legacy release for the 8 TeV Run 1 data. The regular operation activities have included taking care of the prolonged proton-proton data taking and the run with proton-lead collisions that ended in February. The DQM and Data Certification team has deployed a continuous effort to promptly certify the quality of the data. The luminosity-weighted certification efficiency (requiring all sub-detectors to be certified as usab...

  4. MIPS bacterial genomes functional annotation benchmark dataset.

    Science.gov (United States)

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  5. Comparison between publicly accessible publications, registries, and protocols of phase III trials indicated persistence of selective outcome reporting.

    Science.gov (United States)

    Zhang, Sheng; Liang, Fei; Li, Wenfeng

    2017-11-01

    The decision to make protocols of phase III randomized controlled trials (RCTs) publicly accessible by leading journals was a landmark event in clinical trial reporting. Here, we compared primary outcomes defined in protocols with those in publications describing the trials and in trial registration. We identified phase III RCTs published between January 1, 2012, and June 30, 2015, in The New England Journal of Medicine, The Lancet, The Journal of the American Medical Association, and The BMJ with available protocols. Consistency in primary outcomes between protocols and registries (articles) was evaluated. We identified 299 phase III RCTs with available protocols in this analysis. Out of them, 25 trials (8.4%) had some discrepancy for primary outcomes between publications and protocols. Types of discrepancies included protocol-defined primary outcome reported as nonprimary outcome in publication (11 trials, 3.7%), protocol-defined primary outcome omitted in publication (10 trials, 3.3%), new primary outcome introduced in publication (8 trials, 2.7%), protocol-defined nonprimary outcome reported as primary outcome in publication (4 trials, 1.3%), and different timing of assessment of primary outcome (4 trials, 1.3%). Out of trials with discrepancies in primary outcome, 15 trials (60.0%) had discrepancies that favored statistically significant results. Registration could be seen as a valid surrogate of protocol in 237 of 299 trials (79.3%) with regard to primary outcome. Despite unrestricted public access to protocols, selective outcome reporting persists in a small fraction of phase III RCTs. Only studies from four leading journals were included, which may cause selection bias and limit the generalizability of this finding. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

    Directory of Open Access Journals (Sweden)

    Kang Zhang

    2014-01-01

    Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

  7. The Geometry of Finite Equilibrium Datasets

    DEFF Research Database (Denmark)

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....

  8. Utilizing Public Access Data and Open Source Statistical Programs to Teach Climate Science to Interdisciplinary Undergraduate Students

    Science.gov (United States)

    Collins, L.

    2014-12-01

    Students in the Environmental Studies major at the University of Southern California fulfill their curriculum requirements by taking a broad range of courses in the social and natural sciences. Climate change is often taught in 1-2 lectures in these courses with limited examination of this complex topic. Several upper division elective courses focus on the science, policy, and social impacts of climate change. In an upper division course focused on the scientific tools used to determine paleoclimate and predict future climate, I have developed a project where students download, manipulate, and analyze data from the National Climatic Data Center. Students are required to download 100 or more years of daily temperature records and use the statistical program R to analyze that data, calculating daily, monthly, and yearly temperature averages along with changes in the number of extreme hot or cold days (≥90˚F and ≤30˚F, respectively). In parallel, they examine population growth, city expansion, and changes in transportation looking for correlations between the social data and trends observed in the temperature data. Students examine trends over time to determine correlations to urban heat island effect. This project exposes students to "real" data, giving them the tools necessary to critically analyze scientific studies without being experts in the field. Utilizing the existing, public, online databases provides almost unlimited, free data. Open source statistical programs provide a cost-free platform for examining the data although some in-class time is required to help students navigate initial data importation and analysis. Results presented will highlight data compiled over three years of course projects.

  9. Veterans Affairs Suicide Prevention Synthetic Dataset

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  10. Nanoparticle-organic pollutant interaction dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  11. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  12. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    A dataset was simulated and distributed to participants of the QTLMAS XII workshop who were invited to develop genomic selection models. Each contributing group was asked to describe the model development and validation as well as to submit genomic predictions for three generations of individuals...

  13. ASSISTments Dataset from Multiple Randomized Controlled Experiments

    Science.gov (United States)

    Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

    2016-01-01

    In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…

  14. Synthetic and Empirical Capsicum Annuum Image Dataset

    NARCIS (Netherlands)

    Barth, R.

    2016-01-01

    This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included. The aim of the datasets are to

  15. Design of an audio advertisement dataset

    Science.gov (United States)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  16. Internationally coordinated glacier monitoring: strategy and datasets

    Science.gov (United States)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    Internationally coordinated monitoring of long-term glacier changes provide key indicator data about global climate change and began in the year 1894 as an internationally coordinated effort to establish standardized observations. Today, world-wide monitoring of glaciers and ice caps is embedded within the Global Climate Observing System (GCOS) in support of the United Nations Framework Convention on Climate Change (UNFCCC) as an important Essential Climate Variable (ECV). The Global Terrestrial Network for Glaciers (GTN-G) was established in 1999 with the task of coordinating measurements and to ensure the continuous development and adaptation of the international strategies to the long-term needs of users in science and policy. The basic monitoring principles must be relevant, feasible, comprehensive and understandable to a wider scientific community as well as to policy makers and the general public. Data access has to be free and unrestricted, the quality of the standardized and calibrated data must be high and a combination of detailed process studies at selected field sites with global coverage by satellite remote sensing is envisaged. Recently a GTN-G Steering Committee was established to guide and advise the operational bodies responsible for the international glacier monitoring, which are the World Glacier Monitoring Service (WGMS), the US National Snow and Ice Data Center (NSIDC), and the Global Land Ice Measurements from Space (GLIMS) initiative. Several online databases containing a wealth of diverse data types having different levels of detail and global coverage provide fast access to continuously updated information on glacier fluctuation and inventory data. For world-wide inventories, data are now available through (a) the World Glacier Inventory containing tabular information of about 130,000 glaciers covering an area of around 240,000 km2, (b) the GLIMS-database containing digital outlines of around 118,000 glaciers with different time stamps and

  17. The global coastline dataset: the observed relation between erosion and sea-level rise

    Science.gov (United States)

    Donchyts, G.; Baart, F.; Luijendijk, A.; Hagenaars, G.

    2017-12-01

    Erosion of sandy coasts is considered one of the key risks of sea-level rise. Because sandy coastlines of the world are often highly populated, erosive coastline trends result in risk to populations and infrastructure. Most of our understanding of the relation between sea-level rise and coastal erosion is based on local or regional observations and generalizations of numerical and physical experiments. Until recently there was no reliable global scale assessment of the location of sandy coasts and their rate of erosion and accretion. Here we present the global coastline dataset that covers erosion indicators on a local scale with global coverage. The dataset uses our global coastline transects grid defined with an alongshore spacing of 250 m and a cross shore length extending 1 km seaward and 1 km landward. This grid matches up with pre-existing local grids where available. We present the latest results on validation of coastal-erosion trends (based on optical satellites) and classification of sandy versus non-sandy coasts. We show the relation between sea-level rise (based both on tide-gauges and multi-mission satellite altimetry) and observed erosion trends over the last decades, taking into account broken-coastline trends (for example due to nourishments).An interactive web application presents the publicly-accessible results using a backend based on Google Earth Engine. It allows both researchers and stakeholders to use objective estimates of coastline trends, particularly when authoritative sources are not available.

  18. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Science.gov (United States)

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  19. The Kinetics Human Action Video Dataset

    OpenAIRE

    Kay, Will; Carreira, Joao; Simonyan, Karen; Zhang, Brian; Hillier, Chloe; Vijayanarasimhan, Sudheendra; Viola, Fabio; Green, Tim; Back, Trevor; Natsev, Paul; Suleyman, Mustafa; Zisserman, Andrew

    2017-01-01

    We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some ...

  20. Synthetic ALSPAC longitudinal datasets for the Big Data VR project.

    Science.gov (United States)

    Avraam, Demetris; Wilson, Rebecca C; Burton, Paul

    2017-01-01

    Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information.  In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.

  1. NEW WEB-BASED ACCESS TO NUCLEAR STRUCTURE DATASETS.

    Energy Technology Data Exchange (ETDEWEB)

    WINCHELL,D.F.

    2004-09-26

    As part of an effort to migrate the National Nuclear Data Center (NNDC) databases to a relational platform, a new web interface has been developed for the dissemination of the nuclear structure datasets stored in the Evaluated Nuclear Structure Data File and Experimental Unevaluated Nuclear Data List.

  2. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  3. BASE MAP DATASET, CHEROKEE COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  4. Harvard Aging Brain Study : Dataset and accessibility

    NARCIS (Netherlands)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G.; Chatwal, Jasmeer P.; Papp, Kathryn V.; Amariglio, Rebecca E.; Blacker, Deborah; Rentz, Dorene M.; Johnson, Keith A.; Sperling, Reisa A.; Schultz, Aaron P.

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging.

  5. BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  6. BASE MAP DATASET, EDGEFIELD COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  7. Environmental Dataset Gateway (EDG) REST Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  8. BASE MAP DATASET, INYO COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  9. BASE MAP DATASET, JACKSON COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  10. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  11. Climate Prediction Center IR 4km Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  12. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  13. BASE MAP DATASET, KINGFISHER COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  14. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Science.gov (United States)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  15. Comparison of recent SnIa datasets

    International Nuclear Information System (INIS)

    Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S.

    2009-01-01

    We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w 0 +w 1 (1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w 0 ,w 1 ) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample

  16. Visualization of conserved structures by fusing highly variable datasets.

    Science.gov (United States)

    Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

    2002-01-01

    Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual

  17. Data-Driven Decision Support for Radiologists: Re-using the National Lung Screening Trial Dataset for Pulmonary Nodule Management

    OpenAIRE

    Morrison, James J.; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L.

    2014-01-01

    Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was conve...

  18. A high-resolution European dataset for hydrologic modeling

    Science.gov (United States)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as

  19. An integrated dataset for in silico drug discovery

    Directory of Open Access Journals (Sweden)

    Cockell Simon J

    2010-12-01

    Full Text Available Drug development is expensive and prone to failure. It is potentially much less risky and expensive to reuse a drug developed for one condition for treating a second disease, than it is to develop an entirely new compound. Systematic approaches to drug repositioning are needed to increase throughput and find candidates more reliably. Here we address this need with an integrated systems biology dataset, developed using the Ondex data integration platform, for the in silico discovery of new drug repositioning candidates. We demonstrate that the information in this dataset allows known repositioning examples to be discovered. We also propose a means of automating the search for new treatment indications of existing compounds.

  20. Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

    Directory of Open Access Journals (Sweden)

    Folorunso Olufemi A.

    2011-04-01

    Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.

  1. 3DSEM: A 3D microscopy dataset

    Directory of Open Access Journals (Sweden)

    Ahmad P. Tafti

    2016-03-01

    Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM

  2. Data Mining for Imbalanced Datasets: An Overview

    Science.gov (United States)

    Chawla, Nitesh V.

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

  3. Genomics dataset of unidentified disclosed isolates

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. Keywords: BioLABs, Blunt ends, Genomics, NEB cutter, Restriction digestion, Short DNA sequences, Sticky ends

  4. A dataset of forest biomass structure for Eurasia.

    Science.gov (United States)

    Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

    2017-05-16

    The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.

  5. Increasing Public Access to Scientific Research through Stakeholder Involvement: Ecological Effects of Sea Level Rise in the Northern Gulf of Mexico

    Science.gov (United States)

    Hagen, S. C.; Stephens, S. H.; DeLorme, D. E.; Ruple, D.; Graham, L.

    2013-12-01

    Sea level rise (SLR) has the potential to have a myriad of deleterious effects on coastal ecology and human infrastructure. Stakeholders, including managers of coastal resources, must be aware of potential consequences of SLR and adjust their plans accordingly to protect and preserve the resources under their care. Members of the public, particularly those who live or work in coastal areas, should also be informed about the results of scientific research on the effects of SLR. However, research results are frequently published in venues or formats to which resource managers and the broader public have limited access. It is imperative for scientists to move beyond traditional publication venues in order to more effectively disseminate the results of their research (Dennison, W. 2007, Estu. Coast. Shelf Sci. 77, 185). One potentially effective way to advance public access to research is to incorporate stakeholder involvement into the research project process in order to target study objectives and tailor communication products toward stakeholder needs (Lemos, M. & Morehouse, B. 2005, Glob. Env. Chg. 15, 57). However, it is important to manage communication and clarify participant expectations during this type of research (Gawith, M. et al. 2009, Glob. Env. Chg. 19, 113). This presentation describes the process being undertaken by an ongoing 5-year multi-disciplinary NOAA-funded project, Ecological Effects of Sea Level Rise in the Northern Gulf of Mexico (EESLR-NGOM), to improve accessibility and utility of scientific research results through stakeholder engagement. The EESLR-NGOM project is assessing the ecological risks from SLR along the Mississippi, Alabama and Florida Panhandle coasts, coastal habitats, and floodplains. It has incorporated stakeholder involvement throughout the research process so as to better target and tailor the emerging research products to meet resource managers' needs, as well as to facilitate eventual public dissemination of results. An

  6. Los catalogos en linea de aceso publico en las bibliotecas argentinas concolecciones juridicas Online public access catalogs in argentine libraries with law collections

    Directory of Open Access Journals (Sweden)

    Carolina Gregui

    2006-01-01

    Full Text Available Se analizan las interfaces de usuario de los catálogos en línea de acceso público (OPACs en entorno web de las bibliotecas argentinas con colecciones jurídicas, a fin de elaborar un diagnóstico de situación sobre la descripción bibliográfica, el análisis temático, los mensajes de ayuda al usuario y la visualización de los datos bibliográficos. Se adopta una metodología cuali-cuantitativa, se utiliza como instrumento de recolección de datos la lista de funcionalidades del sistema que proporciona Hildreth (1982, se actualiza en función de los nuevos desarrollos y se obtiene un formulario que permite, mediante 38 preguntas cerradas, observar la frecuencia de aparición de las funcionalidades básicas propias de cuatro áreas: Área I - control de operaciones; Área II, subdividida en control de formulación de la búsqueda y punto de acceso; Área III - control de salida y Área IV - asistencia al usuario: información e instrucción. Se trabaja con la información correspondiente a 43 unidades. Los resultados demuestran que la mayoría de los OPACs relevados brindan prestaciones mínimas, por lo que se encuentran en una fase inicial de implementación y no responden, por lo tanto, a las necesidades de los usuarios.User interfaces of web based online public access catalogs (OPACs of Argentine libraries with law collections are studied to provide a diagnosis of the situation of bibliographic description, subject analisis, help messages and bibliographic display. A cuali-cuantitative methodology is adopted and a checklist of systems functions created by Hildreth (1982, updated according to recent advances, has been used as data collection tool. The resulting 38 closed questions checklist has allowed to observe the frequency of appearance of the basic functions of four areas: Area I – operational control; Area II, subdivided in search formulation control and access points; Area III – output control and IV – user assistance

  7. The Effects of Public Access Defibrillation on Survival After Out-of-Hospital Cardiac Arrest: A Systematic Review of Observational Studies.

    Science.gov (United States)

    Bækgaard, Josefine S; Viereck, Søren; Møller, Thea Palsgaard; Ersbøll, Annette Kjær; Lippert, Freddy; Folke, Fredrik

    2017-09-05

    Despite recent advances, the average survival after out-of-hospital cardiac arrest (OHCA) remains 50%. Accordingly, placement of automated external defibrillators in the community as part of a public access defibrillation program (PAD) is recommended by international guidelines. However, different strategies have been proposed on how exactly to increase and make use of publicly available automated external defibrillators. This systematic review aimed to evaluate the effect of PAD and the different PAD strategies on survival after OHCA. PubMed, Embase, and the Cochrane Library were systematically searched on August 31, 2015 for observational studies reporting survival to hospital discharge in OHCA patients where an automated external defibrillator had been used by nonemergency medical services. PAD was divided into 3 groups according to who applied the defibrillator: nondispatched lay first responders, professional first responders (firefighters/police) dispatched by the Emergency Medical Dispatch Center (EMDC), or lay first responders dispatched by the EMDC. A total of 41 studies were included; 18 reported PAD by nondispatched lay first responders, 20 reported PAD by EMDC-dispatched professional first responders (firefighters/police), and 3 reported both. We identified no qualified studies reporting survival after PAD by EMDC-dispatched lay first responders. The overall survival to hospital discharge after OHCA treated with PAD showed a median survival of 40.0% (range, 9.1-83.3). Defibrillation by nondispatched lay first responders was associated with the highest survival with a median survival of 53.0% (range, 26.0-72.0), whereas defibrillation by EMDC-dispatched professional first responders (firefighters/police) was associated with a median survival of 28.6% (range, 9.0-76.0). A meta-analysis of the different survival outcomes could not be performed because of the large heterogeneity of the included studies. This systematic review showed a median overall

  8. Harvard Aging Brain Study: Dataset and accessibility.

    Science.gov (United States)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. A new dataset validation system for the Planetary Science Archive

    Science.gov (United States)

    Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

    2007-08-01

    The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that

  10. Data Recommender: An Alternative Way to Discover Open Scientific Datasets

    Science.gov (United States)

    Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.

    2017-12-01

    Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce

  11. Random Coefficient Logit Model for Large Datasets

    NARCIS (Netherlands)

    C. Hernández-Mireles (Carlos); D. Fok (Dennis)

    2010-01-01

    textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,

  12. Thesaurus Dataset of Educational Technology in Chinese

    Science.gov (United States)

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  13. Public Availability to ECS Collected Datasets

    Science.gov (United States)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  14. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

    KAUST Repository

    Mü ller, Matthias; Bibi, Adel Aamer; Giancola, Silvio; Al-Subaihi, Salman; Ghanem, Bernard

    2018-01-01

    Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

  15. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

    KAUST Repository

    Müller, Matthias

    2018-03-28

    Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

  16. Sharing Video Datasets in Design Research

    DEFF Research Database (Denmark)

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  17. Interpolation of diffusion weighted imaging datasets

    DEFF Research Database (Denmark)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal......Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical...

  18. Omicseq: a web-based search engine for exploring omics datasets

    Science.gov (United States)

    Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

    2017-01-01

    Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462

  19. Omicseq: a web-based search engine for exploring omics datasets.

    Science.gov (United States)

    Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

    2017-07-03

    The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Data assimilation and model evaluation experiment datasets

    Science.gov (United States)

    Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

    1994-01-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.

  1. A hybrid organic-inorganic perovskite dataset

    Science.gov (United States)

    Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

    2017-05-01

    Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.

  2. The Role of Datasets on Scientific Influence within Conflict Research

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped

  3. The Role of Datasets on Scientific Influence within Conflict Research.

    Directory of Open Access Journals (Sweden)

    Tracy Van Holt

    Full Text Available We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS over a 66-year period (1945-2011. We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA, a specialized social network analysis on this citation network (~1.5 million works, to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993. The critical path consisted of a number of key features: 1 Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2 Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3 We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography. Publically available conflict datasets developed early on helped

  4. The Role of Datasets on Scientific Influence within Conflict Research.

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the

  5. Quantifying uncertainty in observational rainfall datasets

    Science.gov (United States)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  6. Parton Distributions based on a Maximally Consistent Dataset

    Science.gov (United States)

    Rojo, Juan

    2016-04-01

    The choice of data that enters a global QCD analysis can have a substantial impact on the resulting parton distributions and their predictions for collider observables. One of the main reasons for this has to do with the possible presence of inconsistencies, either internal within an experiment or external between different experiments. In order to assess the robustness of the global fit, different definitions of a conservative PDF set, that is, a PDF set based on a maximally consistent dataset, have been introduced. However, these approaches are typically affected by theory biases in the selection of the dataset. In this contribution, after a brief overview of recent NNPDF developments, we propose a new, fully objective, definition of a conservative PDF set, based on the Bayesian reweighting approach. Using the new NNPDF3.0 framework, we produce various conservative sets, which turn out to be mutually in agreement within the respective PDF uncertainties, as well as with the global fit. We explore some of their implications for LHC phenomenology, finding also good consistency with the global fit result. These results provide a non-trivial validation test of the new NNPDF3.0 fitting methodology, and indicate that possible inconsistencies in the fitted dataset do not affect substantially the global fit PDFs.

  7. Kernel-based discriminant feature extraction using a representative dataset

    Science.gov (United States)

    Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

    2002-07-01

    Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.

  8. Principal Component Analysis of Process Datasets with Missing Values

    Directory of Open Access Journals (Sweden)

    Kristen A. Severson

    2017-07-01

    Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.

  9. Analysis, modeling, and simulation (AMS) testbed development and evaluation to support dynamic mobility applications (DMA) and active transportation and demand management (ATDM) programs — evaluation report for ATDM program. [supporting datasets - Pasadena Testbed

    Science.gov (United States)

    2017-07-26

    This zip file contains POSTDATA.ATT (.ATT); Print to File (.PRN); Portable Document Format (.PDF); and document (.DOCX) files of data to support FHWA-JPO-16-385, Analysis, modeling, and simulation (AMS) testbed development and evaluation to support d...

  10. Strontium removal jar test dataset for all figures and tables.

    Data.gov (United States)

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  11. Predicting dataset popularity for the CMS experiment

    CERN Document Server

    INSPIRE-00005122; Li, Ting; Giommi, Luca; Bonacorsi, Daniele; Wildish, Tony

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  12. Predicting dataset popularity for the CMS experiment

    International Nuclear Information System (INIS)

    Kuznetsov, V.; Li, T.; Giommi, L.; Bonacorsi, D.; Wildish, T.

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure. (paper)

  13. 2006 Fynmeet sea clutter measurement trial: Datasets

    CSIR Research Space (South Africa)

    Herselman, PLR

    2007-09-06

    Full Text Available -011............................................................................................................................................................................................. 25 iii Dataset CAD14-001 0 5 10 15 20 25 30 35 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-001 2400 2600 2800... 40 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-002 2400 2600 2800 3000 3200 3400 3600 -30 -25 -20 -15 -10 -5 0 5 10...

  14. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  15. A multimodal dataset for authoring and editing multimedia content: The MAMEM project

    Directory of Open Access Journals (Sweden)

    Spiros Nikolopoulos

    2017-12-01

    Full Text Available We present a dataset that combines multimodal biosignals and eye tracking information gathered under a human-computer interaction framework. The dataset was developed in the vein of the MAMEM project that aims to endow people with motor disabilities with the ability to edit and author multimedia content through mental commands and gaze activity. The dataset includes EEG, eye-tracking, and physiological (GSR and Heart rate signals collected from 34 individuals (18 able-bodied and 16 motor-impaired. Data were collected during the interaction with specifically designed interface for web browsing and multimedia content manipulation and during imaginary movement tasks. The presented dataset will contribute towards the development and evaluation of modern human-computer interaction systems that would foster the integration of people with severe motor impairments back into society.

  16. Wind Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies. WIND

  17. Solar Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Solar Integration National Dataset Toolkit Solar Integration National Dataset Toolkit NREL is working on a Solar Integration National Dataset (SIND) Toolkit to enable researchers to perform U.S . regional solar generation integration studies. It will provide modeled, coherent subhourly solar power data

  18. Technical note: An inorganic water chemistry dataset (1972–2011 ...

    African Journals Online (AJOL)

    A national dataset of inorganic chemical data of surface waters (rivers, lakes, and dams) in South Africa is presented and made freely available. The dataset comprises more than 500 000 complete water analyses from 1972 up to 2011, collected from more than 2 000 sample monitoring stations in South Africa. The dataset ...

  19. QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity

    Directory of Open Access Journals (Sweden)

    Davy Guan

    2018-04-01

    Full Text Available Five datasets were constructed from ligand and bioassay result data from the literature. These datasets include bioassay results from the Ames mutagenicity assay, Greenscreen GADD-45a-GFP assay, Syrian Hamster Embryo (SHE assay, and 2 year rat carcinogenicity assay results. These datasets provide information about chemical mutagenicity, genotoxicity and carcinogenicity.

  20. Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network

    Directory of Open Access Journals (Sweden)

    Kindie Biredagn Nahato

    2015-01-01

    Full Text Available The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.

  1. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    Science.gov (United States)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  2. The Path from Large Earth Science Datasets to Information

    Science.gov (United States)

    Vicente, G. A.

    2013-12-01

    The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.

  3. A synthetic dataset for evaluating soft and hard fusion algorithms

    Science.gov (United States)

    Graham, Jacob L.; Hall, David L.; Rimland, Jeffrey

    2011-06-01

    There is an emerging demand for the development of data fusion techniques and algorithms that are capable of combining conventional "hard" sensor inputs such as video, radar, and multispectral sensor data with "soft" data including textual situation reports, open-source web information, and "hard/soft" data such as image or video data that includes human-generated annotations. New techniques that assist in sense-making over a wide range of vastly heterogeneous sources are critical to improving tactical situational awareness in counterinsurgency (COIN) and other asymmetric warfare situations. A major challenge in this area is the lack of realistic datasets available for test and evaluation of such algorithms. While "soft" message sets exist, they tend to be of limited use for data fusion applications due to the lack of critical message pedigree and other metadata. They also lack corresponding hard sensor data that presents reasonable "fusion opportunities" to evaluate the ability to make connections and inferences that span the soft and hard data sets. This paper outlines the design methodologies, content, and some potential use cases of a COIN-based synthetic soft and hard dataset created under a United States Multi-disciplinary University Research Initiative (MURI) program funded by the U.S. Army Research Office (ARO). The dataset includes realistic synthetic reports from a variety of sources, corresponding synthetic hard data, and an extensive supporting database that maintains "ground truth" through logical grouping of related data into "vignettes." The supporting database also maintains the pedigree of messages and other critical metadata.

  4. ClimateNet: A Machine Learning dataset for Climate Science Research

    Science.gov (United States)

    Prabhat, M.; Biard, J.; Ganguly, S.; Ames, S.; Kashinath, K.; Kim, S. K.; Kahou, S.; Maharaj, T.; Beckham, C.; O'Brien, T. A.; Wehner, M. F.; Williams, D. N.; Kunkel, K.; Collins, W. D.

    2017-12-01

    Deep Learning techniques have revolutionized commercial applications in Computer vision, speech recognition and control systems. The key for all of these developments was the creation of a curated, labeled dataset ImageNet, for enabling multiple research groups around the world to develop methods, benchmark performance and compete with each other. The success of Deep Learning can be largely attributed to the broad availability of this dataset. Our empirical investigations have revealed that Deep Learning is similarly poised to benefit the task of pattern detection in climate science. Unfortunately, labeled datasets, a key pre-requisite for training, are hard to find. Individual research groups are typically interested in specialized weather patterns, making it hard to unify, and share datasets across groups and institutions. In this work, we are proposing ClimateNet: a labeled dataset that provides labeled instances of extreme weather patterns, as well as associated raw fields in model and observational output. We develop a schema in NetCDF to enumerate weather pattern classes/types, store bounding boxes, and pixel-masks. We are also working on a TensorFlow implementation to natively import such NetCDF datasets, and are providing a reference convolutional architecture for binary classification tasks. Our hope is that researchers in Climate Science, as well as ML/DL, will be able to use (and extend) ClimateNet to make rapid progress in the application of Deep Learning for Climate Science research.

  5. CoVennTree: A new method for the comparative analysis of large datasets

    Directory of Open Access Journals (Sweden)

    Steffen C. Lott

    2015-02-01

    Full Text Available The visualization of massive datasets, such as those resulting from comparative metatranscriptome analyses or the analysis of microbial population structures using ribosomal RNA sequences, is a challenging task. We developed a new method called CoVennTree (Comparative weighted Venn Tree that simultaneously compares up to three multifarious datasets by aggregating and propagating information from the bottom to the top level and produces a graphical output in Cytoscape. With the introduction of weighted Venn structures, the contents and relationships of various datasets can be correlated and simultaneously aggregated without losing information. We demonstrate the suitability of this approach using a dataset of 16S rDNA sequences obtained from microbial populations at three different depths of the Gulf of Aqaba in the Red Sea. CoVennTree has been integrated into the Galaxy ToolShed and can be directly downloaded and integrated into the user instance.

  6. Evaluation of Modified Categorical Data Fuzzy Clustering Algorithm on the Wisconsin Breast Cancer Dataset

    Directory of Open Access Journals (Sweden)

    Amir Ahmad

    2016-01-01

    Full Text Available The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data.

  7. Statistical segmentation of multidimensional brain datasets

    Science.gov (United States)

    Desco, Manuel; Gispert, Juan D.; Reig, Santiago; Santos, Andres; Pascau, Javier; Malpica, Norberto; Garcia-Barreno, Pedro

    2001-07-01

    This paper presents an automatic segmentation procedure for MRI neuroimages that overcomes part of the problems involved in multidimensional clustering techniques like partial volume effects (PVE), processing speed and difficulty of incorporating a priori knowledge. The method is a three-stage procedure: 1) Exclusion of background and skull voxels using threshold-based region growing techniques with fully automated seed selection. 2) Expectation Maximization algorithms are used to estimate the probability density function (PDF) of the remaining pixels, which are assumed to be mixtures of gaussians. These pixels can then be classified into cerebrospinal fluid (CSF), white matter and grey matter. Using this procedure, our method takes advantage of using the full covariance matrix (instead of the diagonal) for the joint PDF estimation. On the other hand, logistic discrimination techniques are more robust against violation of multi-gaussian assumptions. 3) A priori knowledge is added using Markov Random Field techniques. The algorithm has been tested with a dataset of 30 brain MRI studies (co-registered T1 and T2 MRI). Our method was compared with clustering techniques and with template-based statistical segmentation, using manual segmentation as a gold-standard. Our results were more robust and closer to the gold-standard.

  8. ASSESSING SMALL SAMPLE WAR-GAMING DATASETS

    Directory of Open Access Journals (Sweden)

    W. J. HURLEY

    2013-10-01

    Full Text Available One of the fundamental problems faced by military planners is the assessment of changes to force structure. An example is whether to replace an existing capability with an enhanced system. This can be done directly with a comparison of measures such as accuracy, lethality, survivability, etc. However this approach does not allow an assessment of the force multiplier effects of the proposed change. To gauge these effects, planners often turn to war-gaming. For many war-gaming experiments, it is expensive, both in terms of time and dollars, to generate a large number of sample observations. This puts a premium on the statistical methodology used to examine these small datasets. In this paper we compare the power of three tests to assess population differences: the Wald-Wolfowitz test, the Mann-Whitney U test, and re-sampling. We employ a series of Monte Carlo simulation experiments. Not unexpectedly, we find that the Mann-Whitney test performs better than the Wald-Wolfowitz test. Resampling is judged to perform slightly better than the Mann-Whitney test.

  9. Overview of the CERES Edition-4 Multilayer Cloud Property Datasets

    Science.gov (United States)

    Chang, F. L.; Minnis, P.; Sun-Mack, S.; Chen, Y.; Smith, R. A.; Brown, R. R.

    2014-12-01

    Knowledge of the cloud vertical distribution is important for understanding the role of clouds on earth's radiation budget and climate change. Since high-level cirrus clouds with low emission temperatures and small optical depths can provide a positive feedback to a climate system and low-level stratus clouds with high emission temperatures and large optical depths can provide a negative feedback effect, the retrieval of multilayer cloud properties using satellite observations, like Terra and Aqua MODIS, is critically important for a variety of cloud and climate applications. For the objective of the Clouds and the Earth's Radiant Energy System (CERES), new algorithms have been developed using Terra and Aqua MODIS data to allow separate retrievals of cirrus and stratus cloud properties when the two dominant cloud types are simultaneously present in a multilayer system. In this paper, we will present an overview of the new CERES Edition-4 multilayer cloud property datasets derived from Terra as well as Aqua. Assessment of the new CERES multilayer cloud datasets will include high-level cirrus and low-level stratus cloud heights, pressures, and temperatures as well as their optical depths, emissivities, and microphysical properties.

  10. ISC-EHB: Reconstruction of a robust earthquake dataset

    Science.gov (United States)

    Weston, J.; Engdahl, E. R.; Harris, J.; Di Giacomo, D.; Storchak, D. A.

    2018-04-01

    The EHB Bulletin of hypocentres and associated travel-time residuals was originally developed with procedures described by Engdahl, Van der Hilst and Buland (1998) and currently ends in 2008. It is a widely used seismological dataset, which is now expanded and reconstructed, partly by exploiting updated procedures at the International Seismological Centre (ISC), to produce the ISC-EHB. The reconstruction begins in the modern period (2000-2013) to which new and more rigorous procedures for event selection, data preparation, processing, and relocation are applied. The selection criteria minimise the location bias produced by unmodelled 3D Earth structure, resulting in events that are relatively well located in any given region. Depths of the selected events are significantly improved by a more comprehensive review of near station and secondary phase travel-time residuals based on ISC data, especially for the depth phases pP, pwP and sP, as well as by a rigorous review of the event depths in subduction zone cross sections. The resulting cross sections and associated maps are shown to provide details of seismicity in subduction zones in much greater detail than previously achievable. The new ISC-EHB dataset will be especially useful for global seismicity studies and high-frequency regional and global tomographic inversions.

  11. Datasets of Odontocete Sounds Annotated for Developing Automatic Detection Methods

    National Research Council Canada - National Science Library

    Mellinger, David K

    2007-01-01

    .... However, baleen whales are only a small fraction of marine mammals, whereas the greatest public concern and possible impact to Navy operations now centers on odontocetes, particularly beaked whales...

  12. The SAIL databank: linking multiple health and social care datasets

    Directory of Open Access Journals (Sweden)

    Ford David V

    2009-01-01

    Full Text Available Abstract Background Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Methods Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR, to assess the efficacy of this process, and the optimum matching technique. Results The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL at the 50% threshold, and error rates were Conclusion With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.

  13. A robust post-processing workflow for datasets with motion artifacts in diffusion kurtosis imaging.

    Science.gov (United States)

    Li, Xianjun; Yang, Jian; Gao, Jie; Luo, Xue; Zhou, Zhenyu; Hu, Yajie; Wu, Ed X; Wan, Mingxi

    2014-01-01

    The aim of this study was to develop a robust post-processing workflow for motion-corrupted datasets in diffusion kurtosis imaging (DKI). The proposed workflow consisted of brain extraction, rigid registration, distortion correction, artifacts rejection, spatial smoothing and tensor estimation. Rigid registration was utilized to correct misalignments. Motion artifacts were rejected by using local Pearson correlation coefficient (LPCC). The performance of LPCC in characterizing relative differences between artifacts and artifact-free images was compared with that of the conventional correlation coefficient in 10 randomly selected DKI datasets. The influence of rejected artifacts with information of gradient directions and b values for the parameter estimation was investigated by using mean square error (MSE). The variance of noise was used as the criterion for MSEs. The clinical practicality of the proposed workflow was evaluated by the image quality and measurements in regions of interest on 36 DKI datasets, including 18 artifact-free (18 pediatric subjects) and 18 motion-corrupted datasets (15 pediatric subjects and 3 essential tremor patients). The relative difference between artifacts and artifact-free images calculated by LPCC was larger than that of the conventional correlation coefficient (pworkflow improved the image quality and reduced the measurement biases significantly on motion-corrupted datasets (pworkflow was reliable to improve the image quality and the measurement precision of the derived parameters on motion-corrupted DKI datasets. The workflow provided an effective post-processing method for clinical applications of DKI in subjects with involuntary movements.

  14. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    Science.gov (United States)

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating

  15. The Dataset of Countries at Risk of Electoral Violence

    OpenAIRE

    Birch, Sarah; Muchlinski, David

    2017-01-01

    Electoral violence is increasingly affecting elections around the world, yet researchers have been limited by a paucity of granular data on this phenomenon. This paper introduces and describes a new dataset of electoral violence – the Dataset of Countries at Risk of Electoral Violence (CREV) – that provides measures of 10 different types of electoral violence across 642 elections held around the globe between 1995 and 2013. The paper provides a detailed account of how and why the dataset was ...

  16. Norwegian Hydrological Reference Dataset for Climate Change Studies

    Energy Technology Data Exchange (ETDEWEB)

    Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

    2012-07-01

    Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)

  17. BIA Indian Lands Dataset (Indian Lands of the United States)

    Data.gov (United States)

    Federal Geographic Data Committee — The American Indian Reservations / Federally Recognized Tribal Entities dataset depicts feature location, selected demographics and other associated data for the 561...

  18. Framework for Interactive Parallel Dataset Analysis on the Grid

    Energy Technology Data Exchange (ETDEWEB)

    Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  19. Socioeconomic Data and Applications Center (SEDAC) Treaty Status Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Socioeconomic Data and Application Center (SEDAC) Treaty Status Dataset contains comprehensive treaty information for multilateral environmental agreements,...

  20. The SAIL databank: linking multiple health and social care datasets.

    Science.gov (United States)

    Lyons, Ronan A; Jones, Kerina H; John, Gareth; Brooks, Caroline J; Verplancke, Jean-Philippe; Ford, David V; Brown, Ginevra; Leake, Ken

    2009-01-16

    Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique. The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were SAIL databank represents a research-ready platform for record-linkage studies.

  1. Large Scale Flood Risk Analysis using a New Hyper-resolution Population Dataset

    Science.gov (United States)

    Smith, A.; Neal, J. C.; Bates, P. D.; Quinn, N.; Wing, O.

    2017-12-01

    Here we present the first national scale flood risk analyses, using high resolution Facebook Connectivity Lab population data and data from a hyper resolution flood hazard model. In recent years the field of large scale hydraulic modelling has been transformed by new remotely sensed datasets, improved process representation, highly efficient flow algorithms and increases in computational power. These developments have allowed flood risk analysis to be undertaken in previously unmodeled territories and from continental to global scales. Flood risk analyses are typically conducted via the integration of modelled water depths with an exposure dataset. Over large scales and in data poor areas, these exposure data typically take the form of a gridded population dataset, estimating population density using remotely sensed data and/or locally available census data. The local nature of flooding dictates that for robust flood risk analysis to be undertaken both hazard and exposure data should sufficiently resolve local scale features. Global flood frameworks are enabling flood hazard data to produced at 90m resolution, resulting in a mis-match with available population datasets which are typically more coarsely resolved. Moreover, these exposure data are typically focused on urban areas and struggle to represent rural populations. In this study we integrate a new population dataset with a global flood hazard model. The population dataset was produced by the Connectivity Lab at Facebook, providing gridded population data at 5m resolution, representing a resolution increase over previous countrywide data sets of multiple orders of magnitude. Flood risk analysis undertaken over a number of developing countries are presented, along with a comparison of flood risk analyses undertaken using pre-existing population datasets.

  2. Evolving hard problems: Generating human genetics datasets with a complex etiology

    Directory of Open Access Journals (Sweden)

    Himmelstein Daniel S

    2011-07-01

    Full Text Available Abstract Background A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Results Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. Conclusions This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.

  3. An Analysis of the GTZAN Music Genre Dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2012-01-01

    Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...

  4. An Annotated Dataset of 14 Cardiac MR Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  5. Review of ATLAS Open Data 8 TeV datasets, tools and activities

    CERN Document Server

    The ATLAS collaboration

    2018-01-01

    The ATLAS Collaboration has released two 8 TeV datasets and relevant simulated samples to the public for educational use. A number of groups within ATLAS have used these ATLAS Open Data 8 TeV datasets, developing tools and educational material to promote particle physics. The general aim of these activities is to provide simple and user-friendly interactive interfaces to simulate the procedures used by high-energy physics researchers. International Masterclasses introduce particle physics to high school students and have been studying 8 TeV ATLAS Open Data since 2015. Inspired by this success, a new ATLAS Open Data initiative was launched in 2016 for university students. A comprehensive educational platform was thus developed featuring a second 8 TeV dataset and a new set of educational tools. The 8 TeV datasets and associated tools are presented and discussed here, as well as a selection of activities studying the ATLAS Open Data 8 TeV datasets.

  6. Benchmarking of Typical Meteorological Year datasets dedicated to Concentrated-PV systems

    Science.gov (United States)

    Realpe, Ana Maria; Vernay, Christophe; Pitaval, Sébastien; Blanc, Philippe; Wald, Lucien; Lenoir, Camille

    2016-04-01

    Accurate analysis of meteorological and pyranometric data for long-term analysis is the basis of decision-making for banks and investors, regarding solar energy conversion systems. This has led to the development of methodologies for the generation of Typical Meteorological Years (TMY) datasets. The most used method for solar energy conversion systems was proposed in 1978 by the Sandia Laboratory (Hall et al., 1978) considering a specific weighted combination of different meteorological variables with notably global, diffuse horizontal and direct normal irradiances, air temperature, wind speed, relative humidity. In 2012, a new approach was proposed in the framework of the European project FP7 ENDORSE. It introduced the concept of "driver" that is defined by the user as an explicit function of the pyranometric and meteorological relevant variables to improve the representativeness of the TMY datasets with respect the specific solar energy conversion system of interest. The present study aims at comparing and benchmarking different TMY datasets considering a specific Concentrated-PV (CPV) system as the solar energy conversion system of interest. Using long-term (15+ years) time-series of high quality meteorological and pyranometric ground measurements, three types of TMY datasets generated by the following methods: the Sandia method, a simplified driver with DNI as the only representative variable and a more sophisticated driver. The latter takes into account the sensitivities of the CPV system with respect to the spectral distribution of the solar irradiance and wind speed. Different TMY datasets from the three methods have been generated considering different numbers of years in the historical dataset, ranging from 5 to 15 years. The comparisons and benchmarking of these TMY datasets are conducted considering the long-term time series of simulated CPV electric production as a reference. The results of this benchmarking clearly show that the Sandia method is not

  7. A dataset on tail risk of commodities markets.

    Science.gov (United States)

    Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

    2017-12-01

    This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.

  8. MANAGEMENT AND COMPARATIVE ANALYSIS OF DATASET ENSEMBLES

    Energy Technology Data Exchange (ETDEWEB)

    Geveci, Berk [Senior Director, Scientific Computing

    2010-05-17

    The primary Phase I technical objective was to develop a prototype that demonstrates the functionality of all components required for an end-to-end meta-data management and comparative visualization system.

  9. PUBLIC ACCESS TO PRIVATE LAND IN SCOTLAND

    African Journals Online (AJOL)

    David

    The Forum's work led to advice to the Scottish Government concluding that ... respect of the facilitation and upholding of access rights. This is ...... particular protection of a person's rights in respect of private and family life and the home under ... jurisprudence emphasising the necessity of a balancing-of-interests approach.

  10. 20 CFR 655.350 - Public access.

    Science.gov (United States)

    2010-04-01

    ... FOREIGN WORKERS IN THE UNITED STATES Attestations by Facilities Using Nonimmigrant Aliens as Registered... available to any interested parties within 72 hours upon written or oral request. If a party requests a copy...

  11. State Wildlife Management Area Boundaries - Publicly Accessible

    Data.gov (United States)

    Minnesota Department of Natural Resources — This polygon theme contains boundaries for approximately 1392 Wildlife Management Areas (WMAs) across the state covering nearly 1,288,000 acres. WMAs are part of the...

  12. Automatic registration method for multisensor datasets adopted for dimensional measurements on cutting tools

    International Nuclear Information System (INIS)

    Shaw, L; Mehari, F; Weckenmann, A; Ettl, S; Häusler, G

    2013-01-01

    Multisensor systems with optical 3D sensors are frequently employed to capture complete surface information by measuring workpieces from different views. During coarse and fine registration the resulting datasets are afterward transformed into one common coordinate system. Automatic fine registration methods are well established in dimensional metrology, whereas there is a deficit in automatic coarse registration methods. The advantage of a fully automatic registration procedure is twofold: it enables a fast and contact-free alignment and further a flexible application to datasets of any kind of optical 3D sensor. In this paper, an algorithm adapted for a robust automatic coarse registration is presented. The method was originally developed for the field of object reconstruction or localization. It is based on a segmentation of planes in the datasets to calculate the transformation parameters. The rotation is defined by the normals of three corresponding segmented planes of two overlapping datasets, while the translation is calculated via the intersection point of the segmented planes. First results have shown that the translation is strongly shape dependent: 3D data of objects with non-orthogonal planar flanks cannot be registered with the current method. In the novel supplement for the algorithm, the translation is additionally calculated via the distance between centroids of corresponding segmented planes, which results in more than one option for the transformation. A newly introduced measure considering the distance between the datasets after coarse registration evaluates the best possible transformation. Results of the robust automatic registration method are presented on the example of datasets taken from a cutting tool with a fringe-projection system and a focus-variation system. The successful application in dimensional metrology is proven with evaluations of shape parameters based on the registered datasets of a calibrated workpiece. (paper)

  13. Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session.

    Science.gov (United States)

    Kohli, Marc D; Summers, Ronald M; Geis, J Raymond

    2017-08-01

    At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.

  14. Discovery and Reuse of Open Datasets: An Exploratory Study

    Directory of Open Access Journals (Sweden)

    Sara

    2016-07-01

    Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

  15. Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

    Science.gov (United States)

    Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

    2014-01-01

    SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111

  16. PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

    Directory of Open Access Journals (Sweden)

    E. Hietanen

    2016-06-01

    Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  17. Leveraging multiple datasets for deep leaf counting

    OpenAIRE

    Dobrescu, Andrei; Giuffrida, Mario Valerio; Tsaftaris, Sotirios A

    2017-01-01

    The number of leaves a plant has is one of the key traits (phenotypes) describing its development and growth. Here, we propose an automated, deep learning based approach for counting leaves in model rosette plants. While state-of-the-art results on leaf counting with deep learning methods have recently been reported, they obtain the count as a result of leaf segmentation and thus require per-leaf (instance) segmentation to train the models (a rather strong annotation). Instead, our method tre...

  18. Inquiring with Geoscience Datasets: Instruction and Assessment

    Science.gov (United States)

    Zalles, D.; Quellmalz, E.; Gobert, J.

    2005-12-01

    This session will describe a new NSF-funded project in Geoscience education, Inquiring with Geoscience Data Sets. The goals of the project are to (1) Study the impacts on student learning of Web-based supplementary curriculum modules that engage secondary-level students in inquiry projects addressing important geoscience problems using an Earth System Science approach. Students will use technologies to access real data sets in the geosciences and to interpret, analyze, and communicate findings based on the data sets. The standards addressed will include geoscience concepts, inquiry abilities in NSES and Benchmarks for Science Literacy, data literacy, NCTM standards, and 21st-century skills and technology proficiencies (NETTS/ISTE). (2) Develop design principles, specification templates, and prototype exemplars for technology-based performance assessments that provide evidence of students' geoscientific knowledge and inquiry skills (including data literacy skills) and students' ability to access, use, analyze, and interpret technology-based geoscience data sets. (3) Develop scenarios based on the specification templates that describe curriculum modules and performance assessments that could be developed for other Earth Science standards and curriculum programs. Also to be described in the session are the project's efforts to differentiate among the dimensions of data literacy and scientific inquiry that are relevant for the geoscience discplines, and how recognition and awareness of the differences can be effectively channelled for the betterment of geoscience education.

  19. Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge

    Science.gov (United States)

    Scerri, Antony; Kuriakose, John; Deshmane, Amit Ajit; Stanger, Mark; Moore, Rebekah; Naik, Raj; de Waard, Anita

    2017-01-01

    Abstract We developed a two-stream, Apache Solr-based information retrieval system in response to the bioCADDIE 2016 Dataset Retrieval Challenge. One stream was based on the principle of word embeddings, the other was rooted in ontology based indexing. Despite encountering several issues in the data, the evaluation procedure and the technologies used, the system performed quite well. We provide some pointers towards future work: in particular, we suggest that more work in query expansion could benefit future biomedical search engines. Database URL: https://data.mendeley.com/datasets/zd9dxpyybg/1 PMID:29220454

  20. Tension in the recent Type Ia supernovae datasets

    International Nuclear Information System (INIS)

    Wei, Hao

    2010-01-01

    In the present work, we investigate the tension in the recent Type Ia supernovae (SNIa) datasets Constitution and Union. We show that they are in tension not only with the observations of the cosmic microwave background (CMB) anisotropy and the baryon acoustic oscillations (BAO), but also with other SNIa datasets such as Davis and SNLS. Then, we find the main sources responsible for the tension. Further, we make this more robust by employing the method of random truncation. Based on the results of this work, we suggest two truncated versions of the Union and Constitution datasets, namely the UnionT and ConstitutionT SNIa samples, whose behaviors are more regular.

  1. Immersive Interaction, Manipulation and Analysis of Large 3D Datasets for Planetary and Earth Sciences

    Science.gov (United States)

    Pariser, O.; Calef, F.; Manning, E. M.; Ardulov, V.

    2017-12-01

    We will present implementation and study of several use-cases of utilizing Virtual Reality (VR) for immersive display, interaction and analysis of large and complex 3D datasets. These datasets have been acquired by the instruments across several Earth, Planetary and Solar Space Robotics Missions. First, we will describe the architecture of the common application framework that was developed to input data, interface with VR display devices and program input controllers in various computing environments. Tethered and portable VR technologies will be contrasted and advantages of each highlighted. We'll proceed to presenting experimental immersive analytics visual constructs that enable augmentation of 3D datasets with 2D ones such as images and statistical and abstract data. We will conclude by presenting comparative analysis with traditional visualization applications and share the feedback provided by our users: scientists and engineers.

  2. Visual Comparison of Multiple Gene Expression Datasets in a Genomic Context

    Directory of Open Access Journals (Sweden)

    Borowski Krzysztof

    2008-06-01

    Full Text Available The need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.

  3. Check your biosignals here: a new dataset for off-the-person ECG biometrics.

    Science.gov (United States)

    da Silva, Hugo Plácido; Lourenço, André; Fred, Ana; Raposo, Nuno; Aires-de-Sousa, Marta

    2014-02-01

    The Check Your Biosignals Here initiative (CYBHi) was developed as a way of creating a dataset and consistently repeatable acquisition framework, to further extend research in electrocardiographic (ECG) biometrics. In particular, our work targets the novel trend towards off-the-person data acquisition, which opens a broad new set of challenges and opportunities both for research and industry. While datasets with ECG signals collected using medical grade equipment at the chest can be easily found, for off-the-person ECG data the solution is generally for each team to collect their own corpus at considerable expense of resources. In this paper we describe the context, experimental considerations, methods, and preliminary findings of two public datasets created by our team, one for short-term and another for long-term assessment, with ECG data collected at the hand palms and fingers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  4. Long Term Cloud Property Datasets From MODIS and AVHRR Using the CERES Cloud Algorithm

    Science.gov (United States)

    Minnis, Patrick; Bedka, Kristopher M.; Doelling, David R.; Sun-Mack, Sunny; Yost, Christopher R.; Trepte, Qing Z.; Bedka, Sarah T.; Palikonda, Rabindra; Scarino, Benjamin R.; Chen, Yan; hide

    2015-01-01

    Cloud properties play a critical role in climate change. Monitoring cloud properties over long time periods is needed to detect changes and to validate and constrain models. The Clouds and the Earth's Radiant Energy System (CERES) project has developed several cloud datasets from Aqua and Terra MODIS data to better interpret broadband radiation measurements and improve understanding of the role of clouds in the radiation budget. The algorithms applied to MODIS data have been adapted to utilize various combinations of channels on the Advanced Very High Resolution Radiometer (AVHRR) on the long-term time series of NOAA and MetOp satellites to provide a new cloud climate data record. These datasets can be useful for a variety of studies. This paper presents results of the MODIS and AVHRR analyses covering the period from 1980-2014. Validation and comparisons with other datasets are also given.

  5. Synthetic ALSPAC longitudinal datasets for the Big Data VR project [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Demetris Avraam

    2017-08-01

    Full Text Available Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables without including the original data itself or disclosing participant information.  In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.

  6. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system

    Science.gov (United States)

    Jensen, Tue V.; Pinson, Pierre

    2017-11-01

    Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.

  7. Correction of elevation offsets in multiple co-located lidar datasets

    Science.gov (United States)

    Thompson, David M.; Dalyander, P. Soupy; Long, Joseph W.; Plant, Nathaniel G.

    2017-04-07

    IntroductionTopographic elevation data collected with airborne light detection and ranging (lidar) can be used to analyze short- and long-term changes to beach and dune systems. Analysis of multiple lidar datasets at Dauphin Island, Alabama, revealed systematic, island-wide elevation differences on the order of 10s of centimeters (cm) that were not attributable to real-world change and, therefore, were likely to represent systematic sampling offsets. These offsets vary between the datasets, but appear spatially consistent within a given survey. This report describes a method that was developed to identify and correct offsets between lidar datasets collected over the same site at different times so that true elevation changes over time, associated with sediment accumulation or erosion, can be analyzed.

  8. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system.

    Science.gov (United States)

    Jensen, Tue V; Pinson, Pierre

    2017-11-28

    Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.

  9. The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap.

    Science.gov (United States)

    Mann, Richard P; Mushtaq, Faisal; White, Alan D; Mata-Cervantes, Gabriel; Pike, Tom; Coker, Dalton; Murdoch, Stuart; Hiles, Tim; Smith, Clare; Berridge, David; Hinchliffe, Suzanne; Hall, Geoff; Smye, Stephen; Wilkie, Richard M; Lodge, J Peter A; Mon-Williams, Mark

    2016-01-01

    Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public's perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating health-care delivery. We propose that greater attention (and funding) needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit "big data."

  10. Valuation of large variable annuity portfolios: Monte Carlo simulation and synthetic datasets

    Directory of Open Access Journals (Sweden)

    Gan Guojun

    2017-12-01

    Full Text Available Metamodeling techniques have recently been proposed to address the computational issues related to the valuation of large portfolios of variable annuity contracts. However, it is extremely diffcult, if not impossible, for researchers to obtain real datasets frominsurance companies in order to test their metamodeling techniques on such real datasets and publish the results in academic journals. To facilitate the development and dissemination of research related to the effcient valuation of large variable annuity portfolios, this paper creates a large synthetic portfolio of variable annuity contracts based on the properties of real portfolios of variable annuities and implements a simple Monte Carlo simulation engine for valuing the synthetic portfolio. In addition, this paper presents fair market values and Greeks for the synthetic portfolio of variable annuity contracts that are important quantities for managing the financial risks associated with variable annuities. The resulting datasets can be used by researchers to test and compare the performance of various metamodeling techniques.

  11. Dataset of the relationship between unconfined compressive strength and tensile strength of rock mass

    International Nuclear Information System (INIS)

    Sugita, Yutaka; Yui, Mikazu

    2002-02-01

    This report summary the dataset of the relationship between unconfined compressive strength and tensile strength of the rock mass described in supporting report 2; repository design and engineering technology of second progress report (H12 report) on research and development for the geological disposal of HLW in Japan. (author)

  12. The Changing Shape of Global Inequality 1820--2000; Exploring a New Dataset

    NARCIS (Netherlands)

    van Zanden, Jan Luiten|info:eu-repo/dai/nl/071115374; Baten, Joerg; Foldvari, Peter|info:eu-repo/dai/nl/323382045; van Leeuwen, Bas|info:eu-repo/dai/nl/330811924

    2014-01-01

    new dataset for charting the development of global inequality between 1820 and 2000 is presented, based on a large variety of sources and methods for estimating (gross household) income inequality. On this basis we estimate the evolution of global income inequality over the past two centuries. Two

  13. Use of Maple Seeding Canopy Reflectance Dataset for Validation of SART/LEAFMOD Radiative Transfer Model

    Science.gov (United States)

    Bond, Barbara J.; Peterson, David L.

    1999-01-01

    This project was a collaborative effort by researchers at ARC, OSU and the University of Arizona. The goal was to use a dataset obtained from a previous study to "empirically validate a new canopy radiative-transfer model (SART) which incorporates a recently-developed leaf-level model (LEAFMOD)". The document includes a short research summary.

  14. Large datasets: Segmentation, feature extraction, and compression

    Energy Technology Data Exchange (ETDEWEB)

    Downing, D.J.; Fedorov, V.; Lawkins, W.F.; Morris, M.D.; Ostrouchov, G.

    1996-07-01

    Large data sets with more than several mission multivariate observations (tens of megabytes or gigabytes of stored information) are difficult or impossible to analyze with traditional software. The amount of output which must be scanned quickly dilutes the ability of the investigator to confidently identify all the meaningful patterns and trends which may be present. The purpose of this project is to develop both a theoretical foundation and a collection of tools for automated feature extraction that can be easily customized to specific applications. Cluster analysis techniques are applied as a final step in the feature extraction process, which helps make data surveying simple and effective.

  15. Forest restoration: a global dataset for biodiversity and vegetation structure.

    Science.gov (United States)

    Crouzeilles, Renato; Ferreira, Mariana S; Curran, Michael

    2016-08-01

    Restoration initiatives are becoming increasingly applied around the world. Billions of dollars have been spent on ecological restoration research and initiatives, but restoration outcomes differ widely among these initiatives in part due to variable socioeconomic and ecological contexts. Here, we present the most comprehensive dataset gathered to date on forest restoration. It encompasses 269 primary studies across 221 study landscapes in 53 countries and contains 4,645 quantitative comparisons between reference ecosystems (e.g., old-growth forest) and degraded or restored ecosystems for five taxonomic groups (mammals, birds, invertebrates, herpetofauna, and plants) and five measures of vegetation structure reflecting different ecological processes (cover, density, height, biomass, and litter). We selected studies that (1) were conducted in forest ecosystems; (2) had multiple replicate sampling sites to measure indicators of biodiversity and/or vegetation structure in reference and restored and/or degraded ecosystems; and (3) used less-disturbed forests as a reference to the ecosystem under study. We recorded (1) latitude and longitude; (2) study year; (3) country; (4) biogeographic realm; (5) past disturbance type; (6) current disturbance type; (7) forest conversion class; (8) restoration activity; (9) time that a system has been disturbed; (10) time elapsed since restoration started; (11) ecological metric used to assess biodiversity; and (12) quantitative value of the ecological metric of biodiversity and/or vegetation structure for reference and restored and/or degraded ecosystems. These were the most common data available in the selected studies. We also estimated forest cover and configuration in each study landscape using a recently developed 1 km consensus land cover dataset. We measured forest configuration as the (1) mean size of all forest patches; (2) size of the largest forest patch; and (3) edge:area ratio of forest patches. Global analyses of the

  16. Handling limited datasets with neural networks in medical applications: A small-data approach.

    Science.gov (United States)

    Shaikhina, Torgyn; Khovanova, Natalia A

    2017-01-01

    Single-centre studies in medical domain are often characterised by limited samples due to the complexity and high costs of patient data collection. Machine learning methods for regression modelling of small datasets (less than 10 observations per predictor variable) remain scarce. Our work bridges this gap by developing a novel framework for application of artificial neural networks (NNs) for regression tasks involving small medical datasets. In order to address the sporadic fluctuations and validation issues that appear in regression NNs trained on small datasets, the method of multiple runs and surrogate data analysis were proposed in this work. The approach was compared to the state-of-the-art ensemble NNs; the effect of dataset size on NN performance was also investigated. The proposed framework was applied for the prediction of compressive strength (CS) of femoral trabecular bone in patients suffering from severe osteoarthritis. The NN model was able to estimate the CS of osteoarthritic trabecular bone from its structural and biological properties with a standard error of 0.85MPa. When evaluated on independent test samples, the NN achieved accuracy of 98.3%, outperforming an ensemble NN model by 11%. We reproduce this result on CS data of another porous solid (concrete) and demonstrate that the proposed framework allows for an NN modelled with as few as 56 samples to generalise on 300 independent test samples with 86.5% accuracy, which is comparable to the performance of an NN developed with 18 times larger dataset (1030 samples). The significance of this work is two-fold: the practical application allows for non-destructive prediction of bone fracture risk, while the novel methodology extends beyond the task considered in this study and provides a general framework for application of regression NNs to medical problems characterised by limited dataset sizes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  17. Sigma-2 receptor ligands QSAR model dataset

    Directory of Open Access Journals (Sweden)

    Antonio Rescifina

    2017-08-01

    Full Text Available The data have been obtained from the Sigma-2 Receptor Selective Ligands Database (S2RSLDB and refined according to the QSAR requirements. These data provide information about a set of 548 Sigma-2 (σ2 receptor ligands selective over Sigma-1 (σ1 receptor. The development of the QSAR model has been undertaken with the use of CORAL software using SMILES, molecular graphs and hybrid descriptors (SMILES and graph together. Data here reported include the regression for σ2 receptor pKi QSAR models. The QSAR model was also employed to predict the σ2 receptor pKi values of the FDA approved drugs that are herewith included.

  18. Transition to Operations Plans for GPM Datasets

    Science.gov (United States)

    Zavodsky, Bradley; Jedlovec, Gary; Case, Jonathan; Leroy, Anita; Molthan, Andrew; Bell, Jordan; Fuell, Kevin; Stano, Geoffrey

    2013-01-01

    Founded in 2002 at the National Space Science Technology Center at Marshall Space Flight Center in Huntsville, AL. Focused on transitioning unique NASA and NOAA observations and research capabilities to the operational weather community to improve short-term weather forecasts on a regional and local scale. NASA directed funding; NOAA funding from Proving Grounds (PG). Demonstrate capabilities experimental products to weather applications and societal benefit to prepare forecasters for the use of data from next generation of operational satellites. Objective of this poster is to highlight SPoRT's research to operations (R2O) paradigm and provide examples of work done by the team with legacy instruments relevant to GPM in order to promote collaborations with groups developing GPM products.

  19. Dataset definition for CMS operations and physics analyses

    Science.gov (United States)

    Franzoni, Giovanni; Compact Muon Solenoid Collaboration

    2016-04-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.

  20. U.S. Climate Divisional Dataset (Version Superseded)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...

  1. Karna Particle Size Dataset for Tables and Figures

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains 1) table of bulk Pb-XAS LCF results, 2) table of bulk As-XAS LCF results, 3) figure data of particle size distribution, and 4) figure data for...

  2. NOAA Global Surface Temperature Dataset, Version 4.0

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...

  3. National Hydrography Dataset (NHD) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS National Hydrography Dataset (NHD) Downloadable Data Collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes...

  4. Watershed Boundary Dataset (WBD) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The Watershed Boundary Dataset (WBD) from The National Map (TNM) defines the perimeter of drainage areas formed by the terrain and other landscape characteristics....

  5. BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  6. USGS National Hydrography Dataset from The National Map

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — USGS The National Map - National Hydrography Dataset (NHD) is a comprehensive set of digital spatial data that encodes information about naturally occurring and...

  7. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    Science.gov (United States)

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  8. AFSC/REFM: Seabird Necropsy dataset of North Pacific

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...

  9. Dataset definition for CMS operations and physics analyses

    CERN Document Server

    AUTHOR|(CDS)2051291

    2016-01-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concept of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the first run, and we discuss the plans for the second LHC run.

  10. USGS National Boundary Dataset (NBD) Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS Governmental Unit Boundaries dataset from The National Map (TNM) represents major civil areas for the Nation, including States or Territories, counties (or...

  11. Environmental Dataset Gateway (EDG) CS-W Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  12. Global Man-made Impervious Surface (GMIS) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Man-made Impervious Surface (GMIS) Dataset From Landsat consists of global estimates of fractional impervious cover derived from the Global Land Survey...

  13. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  14. Newton SSANTA Dr Water using POU filters dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains information about all the features extracted from the raw data files, the formulas that were assigned to some of these features, and the...

  15. Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Science.gov (United States)

    Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

    2017-07-10

    Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher

  16. Toward computational cumulative biology by combining models of biological datasets.

    Science.gov (United States)

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

  17. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

    OpenAIRE

    Li, Lianwei; Ma, Zhanshan (Sam)

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health?the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples...

  18. General Purpose Multimedia Dataset - GarageBand 2008

    DEFF Research Database (Denmark)

    Meng, Anders

    This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....

  19. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    Science.gov (United States)

    Altman, R B

    2017-05-01

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.

  20. Public Access to Digital Material; A Call to Researchers: Digital Libraries Need Collaboration across Disciplines; Greenstone: Open-Source Digital Library Software; Retrieval Issues for the Colorado Digitization Project's Heritage Database; Report on the 5th European Conference on Digital Libraries, ECDL 2001; Report on the First Joint Conference on Digital Libraries.

    Science.gov (United States)

    Kahle, Brewster; Prelinger, Rick; Jackson, Mary E.; Boyack, Kevin W.; Wylie, Brian N.; Davidson, George S.; Witten, Ian H.; Bainbridge, David; Boddie, Stefan J.; Garrison, William A.; Cunningham, Sally Jo; Borgman, Christine L.; Hessel, Heather

    2001-01-01

    These six articles discuss various issues relating to digital libraries. Highlights include public access to digital materials; intellectual property concerns; the need for collaboration across disciplines; Greenstone software for construction and presentation of digital information collections; the Colorado Digitization Project; and conferences…

  1. Heuristics for Relevancy Ranking of Earth Dataset Search Results

    Science.gov (United States)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2016-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  2. Standardization of GIS datasets for emergency preparedness of NPPs

    International Nuclear Information System (INIS)

    Saindane, Shashank S.; Suri, M.M.K.; Otari, Anil; Pradeepkumar, K.S.

    2012-01-01

    Probability of a major nuclear accident which can lead to large scale release of radioactivity into environment is extremely small by the incorporation of safety systems and defence-in-depth philosophy. Nevertheless emergency preparedness for implementation of counter measures to reduce the consequences are required for all major nuclear facilities. Iodine prophylaxis, Sheltering, evacuation etc. are protective measures to be implemented for members of public in the unlikely event of any significant releases from nuclear facilities. Bhabha Atomic Research Centre has developed a GIS supported Nuclear Emergency Preparedness Program. Preparedness for Response to Nuclear emergencies needs geographical details of the affected locations specially Nuclear Power Plant Sites and nearby public domain. Geographical information system data sets which the planners are looking for will have appropriate details in order to take decision and mobilize the resources in time and follow the Standard Operating Procedures. Maps are 2-dimensional representations of our real world and GIS makes it possible to manipulate large amounts of geo-spatially referenced data and convert it into information. This has become an integral part of the nuclear emergency preparedness and response planning. This GIS datasets consisting of layers such as village settlements, roads, hospitals, police stations, shelters etc. is standardized and effectively used during the emergency. The paper focuses on the need of standardization of GIS datasets which in turn can be used as a tool to display and evaluate the impact of standoff distances and selected zones in community planning. It will also highlight the database specifications which will help in fast processing of data and analysis to derive useful and helpful information. GIS has the capability to store, manipulate, analyze and display the large amount of required spatial and tabular data. This study intends to carry out a proper response and preparedness

  3. Datasets collected in general practice: an international comparison using the example of obesity.

    Science.gov (United States)

    Sturgiss, Elizabeth; van Boven, Kees

    2018-06-04

    be able to partake in these kinds of comparison studies. What are the implications for practitioners? Australian primary care researchers and clinicians will be at a disadvantage in any international collaboration if they are unable to accurately describe current general practice management. The Netherlands has developed an impressive dataset that requires within-consultation data collection. These datasets allow for person-centred, symptom-specific, longitudinal understanding of general practice management. The possibilities for the quasi-experimental questions that can be answered with such a dataset are limitless. It is only with the ability to answer clinically driven questions that are relevant to primary care that the clinical care of patients can be measured, developed and improved.

  4. Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

    KAUST Repository

    Sun, Ying

    2014-11-07

    For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.

  5. Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

    KAUST Repository

    Sun, Ying; Stein, Michael L.

    2014-01-01

    For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.

  6. Automated Fault Interpretation and Extraction using Improved Supplementary Seismic Datasets

    Science.gov (United States)

    Bollmann, T. A.; Shank, R.

    2017-12-01

    During the interpretation of seismic volumes, it is necessary to interpret faults along with horizons of interest. With the improvement of technology, the interpretation of faults can be expedited with the aid of different algorithms that create supplementary seismic attributes, such as semblance and coherency. These products highlight discontinuities, but still need a large amount of human interaction to interpret faults and are plagued by noise and stratigraphic discontinuities. Hale (2013) presents a method to improve on these datasets by creating what is referred to as a Fault Likelihood volume. In general, these volumes contain less noise and do not emphasize stratigraphic features. Instead, planar features within a specified strike and dip range are highlighted. Once a satisfactory Fault Likelihood Volume is created, extraction of fault surfaces is much easier. The extracted fault surfaces are then exported to interpretation software for QC. Numerous software packages have implemented this methodology with varying results. After investigating these platforms, we developed a preferred Automated Fault Interpretation workflow.

  7. Annotating spatio-temporal datasets for meaningful analysis in the Web

    Science.gov (United States)

    Stasch, Christoph; Pebesma, Edzer; Scheider, Simon

    2014-05-01

    More and more environmental datasets that vary in space and time are available in the Web. This comes along with an advantage of using the data for other purposes than originally foreseen, but also with the danger that users may apply inappropriate analysis procedures due to lack of important assumptions made during the data collection process. In order to guide towards a meaningful (statistical) analysis of spatio-temporal datasets available in the Web, we have developed a Higher-Order-Logic formalism that captures some relevant assumptions in our previous work [1]. It allows to proof on meaningful spatial prediction and aggregation in a semi-automated fashion. In this poster presentation, we will present a concept for annotating spatio-temporal datasets available in the Web with concepts defined in our formalism. Therefore, we have defined a subset of the formalism as a Web Ontology Language (OWL) pattern. It allows capturing the distinction between the different spatio-temporal variable types, i.e. point patterns, fields, lattices and trajectories, that in turn determine whether a particular dataset can be interpolated or aggregated in a meaningful way using a certain procedure. The actual annotations that link spatio-temporal datasets with the concepts in the ontology pattern are provided as Linked Data. In order to allow data producers to add the annotations to their datasets, we have implemented a Web portal that uses a triple store at the backend to store the annotations and to make them available in the Linked Data cloud. Furthermore, we have implemented functions in the statistical environment R to retrieve the RDF annotations and, based on these annotations, to support a stronger typing of spatio-temporal datatypes guiding towards a meaningful analysis in R. [1] Stasch, C., Scheider, S., Pebesma, E., Kuhn, W. (2014): "Meaningful spatial prediction and aggregation", Environmental Modelling & Software, 51, 149-165.

  8. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved...

  9. ProDaMa: an open source Python library to generate protein structure datasets.

    Science.gov (United States)

    Armano, Giuliano; Manconi, Andrea

    2009-10-02

    The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.

  10. ProDaMa: an open source Python library to generate protein structure datasets

    Directory of Open Access Journals (Sweden)

    Manconi Andrea

    2009-10-01

    Full Text Available Abstract Background The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. Findings To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. Conclusion ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.

  11. Technical note: Space-time analysis of rainfall extremes in Italy: clues from a reconciled dataset

    Science.gov (United States)

    Libertino, Andrea; Ganora, Daniele; Claps, Pierluigi

    2018-05-01

    Like other Mediterranean areas, Italy is prone to the development of events with significant rainfall intensity, lasting for several hours. The main triggering mechanisms of these events are quite well known, but the aim of developing rainstorm hazard maps compatible with their actual probability of occurrence is still far from being reached. A systematic frequency analysis of these occasional highly intense events would require a complete countrywide dataset of sub-daily rainfall records, but this kind of information was still lacking for the Italian territory. In this work several sources of data are gathered, for assembling the first comprehensive and updated dataset of extreme rainfall of short duration in Italy. The resulting dataset, referred to as the Italian Rainfall Extreme Dataset (I-RED), includes the annual maximum rainfalls recorded in 1 to 24 consecutive hours from more than 4500 stations across the country, spanning the period between 1916 and 2014. A detailed description of the spatial and temporal coverage of the I-RED is presented, together with an exploratory statistical analysis aimed at providing preliminary information on the climatology of extreme rainfall at the national scale. Due to some legal restrictions, the database can be provided only under certain conditions. Taking into account the potentialities emerging from the analysis, a description of the ongoing and planned future work activities on the database is provided.

  12. EEG datasets for motor imagery brain-computer interface.

    Science.gov (United States)

    Cho, Hohyun; Ahn, Minkyu; Ahn, Sangtae; Kwon, Moonyoung; Jun, Sung Chan

    2017-07-01

    Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states. © The Authors 2017. Published by Oxford University Press.

  13. Error characterisation of global active and passive microwave soil moisture datasets

    Directory of Open Access Journals (Sweden)

    W. A. Dorigo

    2010-12-01

    datasets are uncorrelated the errors estimated for the remote sensing products are hardly influenced by the choice of the third independent dataset. The results obtained in this study can help us in developing adequate strategies for the combined use of various scatterometer and radiometer-based soil moisture datasets, e.g. for improved flood forecast modelling or the generation of superior multi-mission long-term soil moisture datasets.

  14. Harnessing Connectivity in a Large-Scale Small-Molecule Sensitivity Dataset | Office of Cancer Genomics

    Science.gov (United States)

    Identifying genetic alterations that prime a cancer cell to respond to a particular therapeutic agent can facilitate the development of precision cancer medicines. Cancer cell-line (CCL) profiling of small-molecule sensitivity has emerged as an unbiased method to assess the relationships between genetic or cellular features of CCLs and small-molecule response. Here, we developed annotated cluster multidimensional enrichment analysis to explore the associations between groups of small molecules and groups of CCLs in a new, quantitative sensitivity dataset.

  15. Macroeconomic dataset for generating macroeconomic volatility among selected countries in the Asia Pacific region

    OpenAIRE

    Chow, Yee Peng; Muhammad, Junaina; Amin Noordin, Bany Ariffin; Cheng, Fan Fah

    2017-01-01

    This data article provides macroeconomic data that can be used to generate macroeconomic volatility. The data cover a sample of seven selected countries in the Asia Pacific region for the period 2004–2014, including both developing and developed countries. This dataset was generated to enhance our understanding of the sources of macroeconomic volatility affecting the countries in this region. Although the Asia Pacific region continues to remain as the most dynamic part of the world's economy,...

  16. On the visualization of water-related big data: extracting insights from drought proxies' datasets

    Science.gov (United States)

    Diaz, Vitali; Corzo, Gerald; van Lanen, Henny A. J.; Solomatine, Dimitri

    2017-04-01

    Big data is a growing area of science where hydroinformatics can benefit largely. There have been a number of important developments in the area of data science aimed at analysis of large datasets. Such datasets related to water include measurements, simulations, reanalysis, scenario analyses and proxies. By convention, information contained in these databases is referred to a specific time and a space (i.e., longitude/latitude). This work is motivated by the need to extract insights from large water-related datasets, i.e., transforming large amounts of data into useful information that helps to better understand of water-related phenomena, particularly about drought. In this context, data visualization, part of data science, involves techniques to create and to communicate data by encoding it as visual graphical objects. They may help to better understand data and detect trends. Base on existing methods of data analysis and visualization, this work aims to develop tools for visualizing water-related large datasets. These tools were developed taking advantage of existing libraries for data visualization into a group of graphs which include both polar area diagrams (PADs) and radar charts (RDs). In both graphs, time steps are represented by the polar angles and the percentages of area in drought by the radios. For illustration, three large datasets of drought proxies are chosen to identify trends, prone areas and spatio-temporal variability of drought in a set of case studies. The datasets are (1) SPI-TS2p1 (1901-2002, 11.7 GB), (2) SPI-PRECL0p5 (1948-2016, 7.91 GB) and (3) SPEI-baseV2.3 (1901-2013, 15.3 GB). All of them are on a monthly basis and with a spatial resolution of 0.5 degrees. First two were retrieved from the repository of the International Research Institute for Climate and Society (IRI). They are included into the Analyses Standardized Precipitation Index (SPI) project (iridl.ldeo.columbia.edu/SOURCES/.IRI/.Analyses/.SPI/). The third dataset was

  17. Wind and wave dataset for Matara, Sri Lanka

    Science.gov (United States)

    Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

    2018-01-01

    We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447).

  18. The LANDFIRE Refresh strategy: updating the national dataset

    Science.gov (United States)

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  19. Interactive visualization and analysis of multimodal datasets for surgical applications.

    Science.gov (United States)

    Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

    2012-12-01

    Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.

  20. Wind and wave dataset for Matara, Sri Lanka

    Directory of Open Access Journals (Sweden)

    Y. Luo

    2018-01-01

    Full Text Available We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1 is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017 is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447.

  1. Process mining in oncology using the MIMIC-III dataset

    Science.gov (United States)

    Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

    2018-03-01

    Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.

  2. Vikodak--A Modular Framework for Inferring Functional Potential of Microbial Communities from 16S Metagenomic Datasets.

    Directory of Open Access Journals (Sweden)

    Sunil Nagpal

    Full Text Available The overall metabolic/functional potential of any given environmental niche is a function of the sum total of genes/proteins/enzymes that are encoded and expressed by various interacting microbes residing in that niche. Consequently, prior (collated information pertaining to genes, enzymes encoded by the resident microbes can aid in indirectly (reconstructing/ inferring the metabolic/ functional potential of a given microbial community (given its taxonomic abundance profile. In this study, we present Vikodak--a multi-modular package that is based on the above assumption and automates inferring and/ or comparing the functional characteristics of an environment using taxonomic abundance generated from one or more environmental sample datasets. With the underlying assumptions of co-metabolism and independent contributions of different microbes in a community, a concerted effort has been made to accommodate microbial co-existence patterns in various modules incorporated in Vikodak.Validation experiments on over 1400 metagenomic samples have confirmed the utility of Vikodak in (a deciphering enzyme abundance profiles of any KEGG metabolic pathway, (b functional resolution of distinct metagenomic environments, (c inferring patterns of functional interaction between resident microbes, and (d automating statistical comparison of functional features of studied microbiomes. Novel features incorporated in Vikodak also facilitate automatic removal of false positives and spurious functional predictions.With novel provisions for comprehensive functional analysis, inclusion of microbial co-existence pattern based algorithms, automated inter-environment comparisons; in-depth analysis of individual metabolic pathways and greater flexibilities at the user end, Vikodak is expected to be an important value addition to the family of existing tools for 16S based function prediction.A web implementation of Vikodak can be publicly accessed at: http

  3. The OXL format for the exchange of integrated datasets

    Directory of Open Access Journals (Sweden)

    Taubert Jan

    2007-12-01

    Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.

  4. Dataset of transcriptional landscape of B cell early activation

    Directory of Open Access Journals (Sweden)

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  5. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    Science.gov (United States)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  6. A cross-country Exchange Market Pressure (EMP) dataset.

    Science.gov (United States)

    Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

    2017-06-01

    The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  7. The Most Common Geometric and Semantic Errors in CityGML Datasets

    Science.gov (United States)

    Biljecki, F.; Ledoux, H.; Du, X.; Stoter, J.; Soon, K. H.; Khoo, V. H. S.

    2016-10-01

    To be used as input in most simulation and modelling software, 3D city models should be geometrically and topologically valid, and semantically rich. We investigate in this paper what is the quality of currently available CityGML datasets, i.e. we validate the geometry/topology of the 3D primitives (Solid and MultiSurface), and we validate whether the semantics of the boundary surfaces of buildings is correct or not. We have analysed all the CityGML datasets we could find, both from portals of cities and on different websites, plus a few that were made available to us. We have thus validated 40M surfaces in 16M 3D primitives and 3.6M buildings found in 37 CityGML datasets originating from 9 countries, and produced by several companies with diverse software and acquisition techniques. The results indicate that CityGML datasets without errors are rare, and those that are nearly valid are mostly simple LOD1 models. We report on the most common errors we have found, and analyse them. One main observation is that many of these errors could be automatically fixed or prevented with simple modifications to the modelling software. Our principal aim is to highlight the most common errors so that these are not repeated in the future. We hope that our paper and the open-source software we have developed will help raise awareness for data quality among data providers and 3D GIS software producers.

  8. HEp-2 cell image classification method based on very deep convolutional networks with small datasets

    Science.gov (United States)

    Lu, Mengchi; Gao, Long; Guo, Xifeng; Liu, Qiang; Yin, Jianping

    2017-07-01

    Human Epithelial-2 (HEp-2) cell images staining patterns classification have been widely used to identify autoimmune diseases by the anti-Nuclear antibodies (ANA) test in the Indirect Immunofluorescence (IIF) protocol. Because manual test is time consuming, subjective and labor intensive, image-based Computer Aided Diagnosis (CAD) systems for HEp-2 cell classification are developing. However, methods proposed recently are mostly manual features extraction with low accuracy. Besides, the scale of available benchmark datasets is small, which does not exactly suitable for using deep learning methods. This issue will influence the accuracy of cell classification directly even after data augmentation. To address these issues, this paper presents a high accuracy automatic HEp-2 cell classification method with small datasets, by utilizing very deep convolutional networks (VGGNet). Specifically, the proposed method consists of three main phases, namely image preprocessing, feature extraction and classification. Moreover, an improved VGGNet is presented to address the challenges of small-scale datasets. Experimental results over two benchmark datasets demonstrate that the proposed method achieves superior performance in terms of accuracy compared with existing methods.

  9. Performance evaluation of tile-based Fisher Ratio analysis using a benchmark yeast metabolome dataset.

    Science.gov (United States)

    Watson, Nathanial E; Parsons, Brendon A; Synovec, Robert E

    2016-08-12

    Performance of tile-based Fisher Ratio (F-ratio) data analysis, recently developed for discovery-based studies using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC×GC-TOFMS), is evaluated with a metabolomics dataset that had been previously analyzed in great detail, but while taking a brute force approach. The previously analyzed data (referred to herein as the benchmark dataset) were intracellular extracts from Saccharomyces cerevisiae (yeast), either metabolizing glucose (repressed) or ethanol (derepressed), which define the two classes in the discovery-based analysis to find metabolites that are statistically different in concentration between the two classes. Beneficially, this previously analyzed dataset provides a concrete means to validate the tile-based F-ratio software. Herein, we demonstrate and validate the significant benefits of applying tile-based F-ratio analysis. The yeast metabolomics data are analyzed more rapidly in about one week versus one year for the prior studies with this dataset. Furthermore, a null distribution analysis is implemented to statistically determine an adequate F-ratio threshold, whereby the variables with F-ratio values below the threshold can be ignored as not class distinguishing, which provides the analyst with confidence when analyzing the hit table. Forty-six of the fifty-four benchmarked changing metabolites were discovered by the new methodology while consistently excluding all but one of the benchmarked nineteen false positive metabolites previously identified. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs

    International Nuclear Information System (INIS)

    Yang, Jing; Li, Yuan-Yuan; Li, Yi-Xue; Ye, Zhi-Qiang

    2012-01-01

    Highlights: ► Proper dataset partition can improve the prediction of deleterious nsSNPs. ► Partition according to original residue type at nsSNP is a good criterion. ► Similar strategy is supposed promising in other machine learning problems. -- Abstract: Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.

  11. Palmprint and Palmvein Recognition Based on DCNN and A New Large-Scale Contactless Palmvein Dataset

    Directory of Open Access Journals (Sweden)

    Lin Zhang

    2018-03-01

    Full Text Available Among the members of biometric identifiers, the palmprint and the palmvein have received significant attention due to their stability, uniqueness, and non-intrusiveness. In this paper, we investigate the problem of palmprint/palmvein recognition and propose a Deep Convolutional Neural Network (DCNN based scheme, namely P a l m R CNN (short for palmprint/palmvein recognition using CNNs. The effectiveness and efficiency of P a l m R CNN have been verified through extensive experiments conducted on benchmark datasets. In addition, though substantial effort has been devoted to palmvein recognition, it is still quite difficult for the researchers to know the potential discriminating capability of the contactless palmvein. One of the root reasons is that a large-scale and publicly available dataset comprising high-quality, contactless palmvein images is still lacking. To this end, a user-friendly acquisition device for collecting high quality contactless palmvein images is at first designed and developed in this work. Then, a large-scale palmvein image dataset is established, comprising 12,000 images acquired from 600 different palms in two separate collection sessions. The collected dataset now is publicly available.

  12. Enhancing Conservation with High Resolution Productivity Datasets for the Conterminous United States

    Science.gov (United States)

    Robinson, Nathaniel Paul

    Human driven alteration of the earth's terrestrial surface is accelerating through land use changes, intensification of human activity, climate change, and other anthropogenic pressures. These changes occur at broad spatio-temporal scales, challenging our ability to effectively monitor and assess the impacts and subsequent conservation strategies. While satellite remote sensing (SRS) products enable monitoring of the earth's terrestrial surface continuously across space and time, the practical applications for conservation and management of these products are limited. Often the processes driving ecological change occur at fine spatial resolutions and are undetectable given the resolution of available datasets. Additionally, the links between SRS data and ecologically meaningful metrics are weak. Recent advances in cloud computing technology along with the growing record of high resolution SRS data enable the development of SRS products that quantify ecologically meaningful variables at relevant scales applicable for conservation and management. The focus of my dissertation is to improve the applicability of terrestrial gross and net primary productivity (GPP/NPP) datasets for the conterminous United States (CONUS). In chapter one, I develop a framework for creating high resolution datasets of vegetation dynamics. I use the entire archive of Landsat 5, 7, and 8 surface reflectance data and a novel gap filling approach to create spatially continuous 30 m, 16-day composites of the normalized difference vegetation index (NDVI) from 1986 to 2016. In chapter two, I integrate this with other high resolution datasets and the MOD17 algorithm to create the first high resolution GPP and NPP datasets for CONUS. I demonstrate the applicability of these products for conservation and management, showing the improvements beyond currently available products. In chapter three, I utilize this dataset to evaluate the relationships between land ownership and terrestrial production

  13. The effects of spatial population dataset choice on estimates of population at risk of disease

    Directory of Open Access Journals (Sweden)

    Gething Peter W

    2011-02-01

    consistently more accurate than the others in estimating PAR. The sizes of such differences among modeled human populations were related to variations in the methods, input resolution, and date of the census data underlying each dataset. Data quality varied from country to country within the spatial population datasets. Conclusions Detailed, highly spatially resolved human population data are an essential resource for planning health service delivery for disease control, for the spatial modeling of epidemics, and for decision-making processes related to public health. However, our results highlight that for the low-income regions of the world where disease burden is greatest, existing datasets display substantial variations in estimated population distributions, resulting in uncertainty in disease assessments that utilize them. Increased efforts are required to gather contemporary and spatially detailed demographic data to reduce this uncertainty, particularly in Africa, and to develop population distribution modeling methods that match the rigor, sophistication, and ability to handle uncertainty of contemporary disease mapping and spread modeling. In the meantime, studies that utilize a particular spatial population dataset need to acknowledge the uncertainties inherent within them and consider how the methods and data that comprise each will affect conclusions.

  14. FY13 Summary Report on the Augmentation of the Spent Fuel Composition Dataset for Nuclear Forensics: SFCOMPO/NF

    Energy Technology Data Exchange (ETDEWEB)

    Brady Raap, Michaele C.; Lyons, Jennifer A.; Collins, Brian A.; Livingston, James V.

    2014-03-31

    This report documents the FY13 efforts to enhance a dataset of spent nuclear fuel isotopic composition data for use in developing intrinsic signatures for nuclear forensics. A review and collection of data from the open literature was performed in FY10. In FY11, the Spent Fuel COMPOsition (SFCOMPO) excel-based dataset for nuclear forensics (NF), SFCOMPO/NF was established and measured data for graphite production reactors, Boiling Water Reactors (BWRs) and Pressurized Water Reactors (PWRs) were added to the dataset and expanded to include a consistent set of data simulated by calculations. A test was performed to determine whether the SFCOMPO/NF dataset will be useful for the analysis and identification of reactor types from isotopic ratios observed in interdicted samples.

  15. Dataset of herbarium specimens of threatened vascular plants in Catalonia.

    Science.gov (United States)

    Nualart, Neus; Ibáñez, Neus; Luque, Pere; Pedrol, Joan; Vilar, Lluís; Guàrdia, Roser

    2017-01-01

    This data paper describes a specimens' dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE). Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU). This dataset includes 1,618 records collected from 17 th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.

  16. A Large-Scale 3D Object Recognition dataset

    DEFF Research Database (Denmark)

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  17. Traffic sign classification with dataset augmentation and convolutional neural network

    Science.gov (United States)

    Tang, Qing; Kurnianggoro, Laksono; Jo, Kang-Hyun

    2018-04-01

    This paper presents a method for traffic sign classification using a convolutional neural network (CNN). In this method, firstly we transfer a color image into grayscale, and then normalize it in the range (-1,1) as the preprocessing step. To increase robustness of classification model, we apply a dataset augmentation algorithm and create new images to train the model. To avoid overfitting, we utilize a dropout module before the last fully connection layer. To assess the performance of the proposed method, the German traffic sign recognition benchmark (GTSRB) dataset is utilized. Experimental results show that the method is effective in classifying traffic signs.

  18. Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

    Science.gov (United States)

    Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

    2010-06-30

    QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but

  19. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    Directory of Open Access Journals (Sweden)

    Spjuth Ola

    2010-06-01

    Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join

  20. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    Energy Technology Data Exchange (ETDEWEB)

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  1. Measuring river from the cloud - River width algorithm development on Google Earth Engine

    Science.gov (United States)

    Yang, X.; Pavelsky, T.; Allen, G. H.; Donchyts, G.

    2017-12-01

    Rivers are some of the most dynamic features of the terrestrial land surface. They help distribute freshwater, nutrients, sediment, and they are also responsible for some of the greatest natural hazards. Despite their importance, our understanding of river behavior is limited at the global scale, in part because we do not have a river observational dataset that spans both time and space. Remote sensing data represent a rich, largely untapped resource for observing river dynamics. In particular, publicly accessible archives of satellite optical imagery, which date back to the 1970s, can be used to study the planview morphodynamics of rivers at the global scale. Here we present an image processing algorithm developed using the Google Earth Engine cloud-based platform, that can automatically extracts river centerlines and widths from Landsat 5, 7, and 8 scenes at 30 m resolution. Our algorithm makes use of the latest monthly global surface water history dataset and an existing Global River Width from Landsat (GRWL) dataset to efficiently extract river masks from each Landsat scene. Then a combination of distance transform and skeletonization techniques are used to extract river centerlines. Finally, our algorithm calculates wetted river width at each centerline pixel perpendicular to its local centerline direction. We validated this algorithm using in situ data estimated from 16 USGS gauge stations (N=1781). We find that 92% of the width differences are within 60 m (i.e. the minimum length of 2 Landsat pixels). Leveraging Earth Engine's infrastructure of collocated data and processing power, our goal is to use this algorithm to reconstruct the morphodynamic history of rivers globally by processing over 100,000 Landsat 5 scenes, covering from 1984 to 2013.

  2. Being an honest broker of hydrology: Uncovering, communicating and addressing model error in a climate change streamflow dataset

    Science.gov (United States)

    Chegwidden, O.; Nijssen, B.; Pytlak, E.

    2017-12-01

    Any model simulation has errors, including errors in meteorological data, process understanding, model structure, and model parameters. These errors may express themselves as bias, timing lags, and differences in sensitivity between the model and the physical world. The evaluation and handling of these errors can greatly affect the legitimacy, validity and usefulness of the resulting scientific product. In this presentation we will discuss a case study of handling and communicating model errors during the development of a hydrologic climate change dataset for the Pacific Northwestern United States. The dataset was the result of a four-year collaboration between the University of Washington, Oregon State University, the Bonneville Power Administration, the United States Army Corps of Engineers and the Bureau of Reclamation. Along the way, the partnership facilitated the discovery of multiple systematic errors in the streamflow dataset. Through an iterative review process, some of those errors could be resolved. For the errors that remained, honest communication of the shortcomings promoted the dataset's legitimacy. Thoroughly explaining errors also improved ways in which the dataset would be used in follow-on impact studies. Finally, we will discuss the development of the "streamflow bias-correction" step often applied to climate change datasets that will be used in impact modeling contexts. We will describe the development of a series of bias-correction techniques through close collaboration among universities and stakeholders. Through that process, both universities and stakeholders learned about the others' expectations and workflows. This mutual learning process allowed for the development of methods that accommodated the stakeholders' specific engineering requirements. The iterative revision process also produced a functional and actionable dataset while preserving its scientific merit. We will describe how encountering earlier techniques' pitfalls allowed us

  3. Hydrological simulation of the Brahmaputra basin using global datasets

    Science.gov (United States)

    Bhattacharya, Biswa; Conway, Crystal; Craven, Joanne; Masih, Ilyas; Mazzolini, Maurizio; Shrestha, Shreedeepy; Ugay, Reyne; van Andel, Schalk Jan

    2017-04-01

    Brahmaputra River flows through China, India and Bangladesh to the Bay of Bengal and is one of the largest rivers of the world with a catchment size of 580K km2. The catchment is largely hilly and/or forested with sparse population and with limited urbanisation and economic activities. The catchment experiences heavy monsoon rainfall leading to very high flood discharges. Large inter-annual variation of discharge leading to flooding, erosion and morphological changes are among the major challenges. The catchment is largely ungauged; moreover, limited availability of hydro-meteorological data limits the possibility of carrying out evidence based research, which could provide trustworthy information for managing and when needed, controlling, the basin processes by the riparian countries for overall basin development. The paper presents initial results of a current research project on Brahmaputra basin. A set of hydrological and hydraulic models (SWAT, HMS, RAS) are developed by employing publicly available datasets of DEM, land use and soil and simulated using satellite based rainfall products, evapotranspiration and temperature estimates. Remotely sensed data are compared with sporadically available ground data. The set of models are able to produce catchment wide hydrological information that potentially can be used in the future in managing the basin's water resources. The model predications should be used with caution due to high level of uncertainty because the semi-calibrated models are developed with uncertain physical representation (e.g. cross-section) and simulated with global meteorological forcing (e.g. TRMM) with limited validation. Major scientific challenges are seen in producing robust information that can be reliably used in managing the basin. The information generated by the models are uncertain and as a result, instead of using them per se, they are used in improving the understanding of the catchment, and by running several scenarios with varying

  4. Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

    International Nuclear Information System (INIS)

    Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

    2014-01-01

    This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed. (paper)

  5. Uncertainty Assessment of the NASA Earth Exchange Global Daily Downscaled Climate Projections (NEX-GDDP) Dataset

    Science.gov (United States)

    Wang, Weile; Nemani, Ramakrishna R.; Michaelis, Andrew; Hashimoto, Hirofumi; Dungan, Jennifer L.; Thrasher, Bridget L.; Dixon, Keith W.

    2016-01-01

    The NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP) dataset is comprised of downscaled climate projections that are derived from 21 General Circulation Model (GCM) runs conducted under the Coupled Model Intercomparison Project Phase 5 (CMIP5) and across two of the four greenhouse gas emissions scenarios (RCP4.5 and RCP8.5). Each of the climate projections includes daily maximum temperature, minimum temperature, and precipitation for the periods from 1950 through 2100 and the spatial resolution is 0.25 degrees (approximately 25 km x 25 km). The GDDP dataset has received warm welcome from the science community in conducting studies of climate change impacts at local to regional scales, but a comprehensive evaluation of its uncertainties is still missing. In this study, we apply the Perfect Model Experiment framework (Dixon et al. 2016) to quantify the key sources of uncertainties from the observational baseline dataset, the downscaling algorithm, and some intrinsic assumptions (e.g., the stationary assumption) inherent to the statistical downscaling techniques. We developed a set of metrics to evaluate downscaling errors resulted from bias-correction ("quantile-mapping"), spatial disaggregation, as well as the temporal-spatial non-stationarity of climate variability. Our results highlight the spatial disaggregation (or interpolation) errors, which dominate the overall uncertainties of the GDDP dataset, especially over heterogeneous and complex terrains (e.g., mountains and coastal area). In comparison, the temporal errors in the GDDP dataset tend to be more constrained. Our results also indicate that the downscaled daily precipitation also has relatively larger uncertainties than the temperature fields, reflecting the rather stochastic nature of precipitation in space. Therefore, our results provide insights in improving statistical downscaling algorithms and products in the future.

  6. Large scale validation of the M5L lung CAD on heterogeneous CT datasets

    Energy Technology Data Exchange (ETDEWEB)

    Lopez Torres, E., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it [CEADEN, Havana 11300, Cuba and INFN, Sezione di Torino, Torino 10125 (Italy); Fiorina, E.; Pennazio, F.; Peroni, C. [Department of Physics, University of Torino, Torino 10125, Italy and INFN, Sezione di Torino, Torino 10125 (Italy); Saletta, M.; Cerello, P., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it [INFN, Sezione di Torino, Torino 10125 (Italy); Camarlinghi, N.; Fantacci, M. E. [Department of Physics, University of Pisa, Pisa 56127, Italy and INFN, Sezione di Pisa, Pisa 56127 (Italy)

    2015-04-15

    Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large

  7. Using Real Datasets for Interdisciplinary Business/Economics Projects

    Science.gov (United States)

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  8. Dataset-driven research for improving recommender systems for learning

    NARCIS (Netherlands)

    Verbert, Katrien; Drachsler, Hendrik; Manouselis, Nikos; Wolpers, Martin; Vuorikari, Riina; Duval, Erik

    2011-01-01

    Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., & Duval, E. (2011). Dataset-driven research for improving recommender systems for learning. In Ph. Long, & G. Siemens (Eds.), Proceedings of 1st International Conference Learning Analytics & Knowledge (pp. 44-53). February,

  9. dataTEL - Datasets for Technology Enhanced Learning

    NARCIS (Netherlands)

    Drachsler, Hendrik; Verbert, Katrien; Sicilia, Miguel-Angel; Wolpers, Martin; Manouselis, Nikos; Vuorikari, Riina; Lindstaedt, Stefanie; Fischer, Frank

    2011-01-01

    Drachsler, H., Verbert, K., Sicilia, M. A., Wolpers, M., Manouselis, N., Vuorikari, R., Lindstaedt, S., & Fischer, F. (2011). dataTEL - Datasets for Technology Enhanced Learning. STELLAR Alpine Rendez-Vous White Paper. Alpine Rendez-Vous 2011 White paper collection, Nr. 13., France (2011)

  10. A reanalysis dataset of the South China Sea

    Science.gov (United States)

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  11. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Crooks, Lucy; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    As part of the QTLMAS XII workshop, a simulated dataset was distributed and participants were invited to submit analyses of the data based on genome-wide association, fine mapping and genomic selection. We have evaluated the findings from the groups that reported fine mapping and genome-wide asso...

  12. The LAMBADA dataset: Word prediction requiring a broad discourse context

    NARCIS (Netherlands)

    Paperno, D.; Kruszewski, G.; Lazaridou, A.; Pham, Q.N.; Bernardi, R.; Pezzelle, S.; Baroni, M.; Boleda, G.; Fernández, R.; Erk, K.; Smith, N.A.

    2016-01-01

    We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the

  13. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...

  14. Level-1 muon trigger performance with the full 2017 dataset

    CERN Document Server

    CMS Collaboration

    2018-01-01

    This document describes the performance of the CMS Level-1 Muon Trigger with the full dataset of 2017. Efficiency plots are included for each track finder (TF) individually and for the system as a whole. The efficiency is measured to be greater than 90% for all track finders.

  15. A Dataset for Visual Navigation with Neuromorphic Methods

    Directory of Open Access Journals (Sweden)

    Francisco eBarranco

    2016-02-01

    Full Text Available Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

  16. Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

    Science.gov (United States)

    Besha, A. A.; Steele, C. M.; Fernald, A.

    2014-12-01

    Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.

  17. Dataset: Multi Sensor-Orientation Movement Data of Goats

    NARCIS (Netherlands)

    Kamminga, Jacob Wilhelm

    2018-01-01

    This is a labeled dataset. Motion data were collected from six sensor nodes that were fixed with different orientations to a collar around the neck of goats. These six sensor nodes simultaneously, with different orientations, recorded various activities performed by the goat. We recorded the

  18. A dataset of human decision-making in teamwork management

    Science.gov (United States)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  19. UK surveillance: provision of quality assured information from combined datasets.

    Science.gov (United States)

    Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R

    2007-09-14

    Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.

  20. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    Science.gov (United States)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  1. Dataset demonstrating the temperature effect on average output polarization for QCA based reversible logic gates

    Directory of Open Access Journals (Sweden)

    Md. Kamrul Hassan

    2017-08-01

    Full Text Available Quantum-dot cellular automata (QCA is a developing nanotechnology, which seems to be a good candidate to replace the conventional complementary metal-oxide-semiconductor (CMOS technology. In this article, we present the dataset of average output polarization (AOP for basic reversible logic gates presented in Ali Newaz et al. (2016 [1]. QCADesigner 2.0.3 has been employed to analysis the AOP of reversible gates at different temperature levels in Kelvin (K unit.

  2. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2013-01-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution

  3. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  4. On sample size and different interpretations of snow stability datasets

    Science.gov (United States)

    Schirmer, M.; Mitterer, C.; Schweizer, J.

    2009-04-01

    Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar

  5. Data-driven decision support for radiologists: re-using the National Lung Screening Trial dataset for pulmonary nodule management.

    Science.gov (United States)

    Morrison, James J; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L

    2015-02-01

    Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was converted into Structured Query Language (SQL) tables hosted on a web server, and a web-based JavaScript application was developed which performs real-time queries. JavaScript is used for both the server-side and client-side language, allowing for rapid development of a robust client interface and server-side data layer. Real-time data mining of user-specified patient cohorts achieved a rapid return of cohort cancer statistics and lung nodule distribution information. This system demonstrates the potential of individualized real-time data mining using large high-quality clinical trial datasets to drive evidence-based clinical decision-making.

  6. Estimated Perennial Streams of Idaho and Related Geospatial Datasets

    Science.gov (United States)

    Rea, Alan; Skinner, Kenneth D.

    2009-01-01

    The perennial or intermittent status of a stream has bearing on many regulatory requirements. Because of changing technologies over time, cartographic representation of perennial/intermittent status of streams on U.S. Geological Survey (USGS) topographic maps is not always accurate and (or) consistent from one map sheet to another. Idaho Administrative Code defines an intermittent stream as one having a 7-day, 2-year low flow (7Q2) less than 0.1 cubic feet per second. To establish consistency with the Idaho Administrative Code, the USGS developed regional regression equations for Idaho streams for several low-flow statistics, including 7Q2. Using these regression equations, the 7Q2 streamflow may be estimated for naturally flowing streams anywhere in Idaho to help determine perennial/intermittent status of streams. Using these equations in conjunction with a Geographic Information System (GIS) technique known as weighted flow accumulation allows for an automated and continuous estimation of 7Q2 streamflow at all points along a stream, which in turn can be used to determine if a stream is intermittent or perennial according to the Idaho Administrative Code operational definition. The selected regression equations were applied to create continuous grids of 7Q2 estimates for the eight low-flow regression regions of Idaho. By applying the 0.1 ft3/s criterion, the perennial streams have been estimated in each low-flow region. Uncertainty in the estimates is shown by identifying a 'transitional' zone, corresponding to flow estimates of 0.1 ft3/s plus and minus one standard error. Considerable additional uncertainty exists in the model of perennial streams presented in this report. The regression models provide overall estimates based on general trends within each regression region. These models do not include local factors such as a large spring or a losing reach that may greatly affect flows at any given point. Site-specific flow data, assuming a sufficient period of

  7. Scalable and portable visualization of large atomistic datasets

    Science.gov (United States)

    Sharma, Ashish; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

    2004-10-01

    A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms. Program summaryTitle of program: Atomsviewer Catalogue identifier: ADUM Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUM Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics card Operating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5 Programming languages used: C++, C and OpenGL Memory required to execute with typical data: 1 gigabyte of RAM High speed storage required: 60 gigabytes No. of lines in the distributed program including test data, etc.: 550 241 No. of bytes in the distributed program including test data, etc.: 6 258 245 Number of bits in a word: Arbitrary Number of processors used: 1 Has the code been vectorized or parallelized: No Distribution format: tar gzip file Nature of physical problem: Scientific visualization of atomic systems Method of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data

  8. Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor

    Directory of Open Access Journals (Sweden)

    Chang Xu

    2018-05-01

    Full Text Available This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs. Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.

  9. Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.

    Science.gov (United States)

    Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong

    2018-05-24

    This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.

  10. Experiences and lessons learned from creating a generalized workflow for data publication of field campaign datasets

    Science.gov (United States)

    Santhana Vannan, S. K.; Ramachandran, R.; Deb, D.; Beaty, T.; Wright, D.

    2017-12-01

    This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps: Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.

  11. A Hybrid Method for Interpolating Missing Data in Heterogeneous Spatio-Temporal Datasets

    Directory of Open Access Journals (Sweden)

    Min Deng

    2016-02-01

    Full Text Available Space-time interpolation is widely used to estimate missing or unobserved values in a dataset integrating both spatial and temporal records. Although space-time interpolation plays a key role in space-time modeling, existing methods were mainly developed for space-time processes that exhibit stationarity in space and time. It is still challenging to model heterogeneity of space-time data in the interpolation model. To overcome this limitation, in this study, a novel space-time interpolation method considering both spatial and temporal heterogeneity is developed for estimating missing data in space-time datasets. The interpolation operation is first implemented in spatial and temporal dimensions. Heterogeneous covariance functions are constructed to obtain the best linear unbiased estimates in spatial and temporal dimensions. Spatial and temporal correlations are then considered to combine the interpolation results in spatial and temporal dimensions to estimate the missing data. The proposed method is tested on annual average temperature and precipitation data in China (1984–2009. Experimental results show that, for these datasets, the proposed method outperforms three state-of-the-art methods—e.g., spatio-temporal kriging, spatio-temporal inverse distance weighting, and point estimation model of biased hospitals-based area disease estimation methods.

  12. A conceptual prototype for the next-generation national elevation dataset

    Science.gov (United States)

    Stoker, Jason M.; Heidemann, Hans Karl; Evans, Gayla A.; Greenlee, Susan K.

    2013-01-01

    In 2012 the U.S. Geological Survey's (USGS) National Geospatial Program (NGP) funded a study to develop a conceptual prototype for a new National Elevation Dataset (NED) design with expanded capabilities to generate and deliver a suite of bare earth and above ground feature information over the United States. This report details the research on identifying operational requirements based on prior research, evaluation of what is needed for the USGS to meet these requirements, and development of a possible conceptual framework that could potentially deliver the kinds of information that are needed to support NGP's partners and constituents. This report provides an initial proof-of-concept demonstration using an existing dataset, and recommendations for the future, to inform NGP's ongoing and future elevation program planning and management decisions. The demonstration shows that this type of functional process can robustly create derivatives from lidar point cloud data; however, more research needs to be done to see how well it extends to multiple datasets.

  13. SPICE: exploration and analysis of post-cytometric complex multivariate datasets.

    Science.gov (United States)

    Roederer, Mario; Nozzi, Joshua L; Nason, Martha C

    2011-02-01

    Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.

  14. Distributed solar photovoltaic array location and extent dataset for remote sensing object identification

    Science.gov (United States)

    Bradbury, Kyle; Saboo, Raghav; L. Johnson, Timothy; Malof, Jordan M.; Devarajan, Arjun; Zhang, Wuming; M. Collins, Leslie; G. Newell, Richard

    2016-12-01

    Earth-observing remote sensing data, including aerial photography and satellite imagery, offer a snapshot of the world from which we can learn about the state of natural resources and the built environment. The components of energy systems that are visible from above can be automatically assessed with these remote sensing data when processed with machine learning methods. Here, we focus on the information gap in distributed solar photovoltaic (PV) arrays, of which there is limited public data on solar PV deployments at small geographic scales. We created a dataset of solar PV arrays to initiate and develop the process of automatically identifying solar PV locations using remote sensing imagery. This dataset contains the geospatial coordinates and border vertices for over 19,000 solar panels across 601 high-resolution images from four cities in California. Dataset applications include training object detection and other machine learning algorithms that use remote sensing imagery, developing specific algorithms for predictive detection of distributed PV systems, estimating installed PV capacity, and analysis of the socioeconomic correlates of PV deployment.

  15. Los catálogos en línea de acceso público del Mercosur disponibles en entorno web Web accessible online public access catalogs in the Mercosur

    Directory of Open Access Journals (Sweden)

    Elsa Barber

    2008-06-01

    Full Text Available Se analizan las interfaces de usuario de los catálogos en línea de acceso público (OPACs en entorno web de las bibliotecas universitarias, especializadas, públicas y nacionales de los países parte del Mercosur (Argentina, Brasil, Paraguay, Uruguay, para elaborar un diagnóstico de situación sobre: descripción bibliográfica, análisis temático, mensajes de ayuda al usuario, visualización de datos bibliográficos. Se adopta una metodología cuali-cuantitativa, se utiliza como instrumento de recolección de datos la lista de funcionalidades del sistema que proporciona Hildreth (1982, se actualiza, se obtiene un formulario que permite, mediante 38 preguntas cerradas, observar la frecuencia de aparición de las funcionalidades básicas propias de cuatro áreas: Área I - control de operaciones; Área II - control de formulación de la búsqueda y puntos de acceso; Área III - control de salida y Área IV - asistencia al usuario: información e instrucción. Se trabaja con la información correspondiente a 297 unidades. Se delimitan estratos por tipo de software, tipo de biblioteca y país. Se aplican a los resultados las pruebas de Chi-cuadrado, Odds ratio y regresión logística multinomial. El análisis corrobora la existencia de diferencias significativas en cada uno de los estratos y verifica que la mayoría de los OPACs relevados brindan prestaciones mínimas.User interfaces of web based online public access catalogs (OPACs of academic, special, public and national libraries in countries belonging to Mercosur (Argentina, Brazil, Paraguay, Uruguay are studied to provide a diagnosis of the situation of bibliographic description, subject analisis, help messages and bibliographic display. A cuali-cuantitative methodology is adopted and a checklist of systems functions created by Hildreth (1982 is updated and used as data collection tool. The resulting 38 closed questions checklist has allowed to observe the frequency of appearance of the

  16. A multimodal MRI dataset of professional chess players.

    Science.gov (United States)

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.

  17. Knowledge discovery with classification rules in a cardiovascular dataset.

    Science.gov (United States)

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  18. Quality-control of an hourly rainfall dataset and climatology of extremes for the UK.

    Science.gov (United States)

    Blenkinsop, Stephen; Lewis, Elizabeth; Chan, Steven C; Fowler, Hayley J

    2017-02-01

    Sub-daily rainfall extremes may be associated with flash flooding, particularly in urban areas but, compared with extremes on daily timescales, have been relatively little studied in many regions. This paper describes a new, hourly rainfall dataset for the UK based on ∼1600 rain gauges from three different data sources. This includes tipping bucket rain gauge data from the UK Environment Agency (EA), which has been collected for operational purposes, principally flood forecasting. Significant problems in the use of such data for the analysis of extreme events include the recording of accumulated totals, high frequency bucket tips, rain gauge recording errors and the non-operation of gauges. Given the prospect of an intensification of short-duration rainfall in a warming climate, the identification of such errors is essential if sub-daily datasets are to be used to better understand extreme events. We therefore first describe a series of procedures developed to quality control this new dataset. We then analyse ∼380 gauges with near-complete hourly records for 1992-2011 and map the seasonal climatology of intense rainfall based on UK hourly extremes using annual maxima, n-largest events and fixed threshold approaches. We find that the highest frequencies and intensities of hourly extreme rainfall occur during summer when the usual orographically defined pattern of extreme rainfall is replaced by a weaker, north-south pattern. A strong diurnal cycle in hourly extremes, peaking in late afternoon to early evening, is also identified in summer and, for some areas, in spring. This likely reflects the different mechanisms that generate sub-daily rainfall, with convection dominating during summer. The resulting quality-controlled hourly rainfall dataset will provide considerable value in several contexts, including the development of standard, globally applicable quality-control procedures for sub-daily data, the validation of the new generation of very high

  19. Publishing datasets with eSciDoc and panMetaDocs

    Science.gov (United States)

    Ulbricht, D.; Klump, J.; Bertelmann, R.

    2012-04-01

    Currently serveral research institutions worldwide undertake considerable efforts to have their scientific datasets published and to syndicate them to data portals as extensively described objects identified by a persistent identifier. This is done to foster the reuse of data, to make scientific work more transparent, and to create a citable entity that can be referenced unambigously in written publications. GFZ Potsdam established a publishing workflow for file based research datasets. Key software components are an eSciDoc infrastructure [1] and multiple instances of the data curation tool panMetaDocs [2]. The eSciDoc repository holds data objects and their associated metadata in container objects, called eSciDoc items. A key metadata element in this context is the publication status of the referenced data set. PanMetaDocs, which is based on PanMetaWorks [3], is a PHP based web application that allows to describe data with any XML-based metadata schema. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and make use of contextual information in a project setting. Access rights can be applied to set visibility of datasets to other project members and allow collaboration on and notifying about datasets (RSS) and interaction with the internal messaging system, that was inherited from panMetaWorks. When a dataset is to be published, panMetaDocs allows to change the publication status of the eSciDoc item from status "private" to "submitted" and prepare the dataset for verification by an external reviewer. After quality checks, the item publication status can be changed to "published". This makes the data and metadata available through the internet worldwide. PanMetaDocs is developed as an eSciDoc application. It is an easy to use graphical user interface to eSciDoc items, their data and metadata. It is also an application supporting a DOI publication agent during the process of

  20. Application of Density Estimation Methods to Datasets from a Glider

    Science.gov (United States)

    2014-09-30

    humpback and sperm whales as well as different dolphin species. OBJECTIVES The objective of this research is to extend existing methods for cetacean...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources...estimation from single sensor datasets. Required steps for a cue counting approach, where a cue has been defined as a clicking event (Küsel et al., 2011), to

  1. A review of continent scale hydrological datasets available for Africa

    OpenAIRE

    Bonsor, H.C.

    2010-01-01

    As rainfall becomes less reliable with predicted climate change the ability to assess the spatial and seasonal variations in groundwater availability on a large-scale (catchment and continent) is becoming increasingly important (Bates, et al. 2007; MacDonald et al. 2009). The scarcity of observed hydrological data, or difficulty in obtaining such data, within Africa means remotely sensed (RS) datasets must often be used to drive large-scale hydrological models. The different ap...

  2. Dataset of mitochondrial genome variants in oncocytic tumors

    Directory of Open Access Journals (Sweden)

    Lihua Lyu

    2018-04-01

    Full Text Available This dataset presents the mitochondrial genome variants associated with oncocytic tumors. These data were obtained by Sanger sequencing of the whole mitochondrial genomes of oncocytic tumors and the adjacent normal tissues from 32 patients. The mtDNA variants are identified after compared with the revised Cambridge sequence, excluding those defining haplogroups of our patients. The pathogenic prediction for the novel missense variants found in this study was performed with the Mitimpact 2 program.

  3. GLEAM version 3: Global Land Evaporation Datasets and Model

    Science.gov (United States)

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ

  4. Soil chemistry in lithologically diverse datasets: the quartz dilution effect

    Science.gov (United States)

    Bern, Carleton R.

    2009-01-01

    National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.

  5. Dataset on records of Hericium erinaceus in Slovakia

    OpenAIRE

    Vladimír Kunca; Marek Čiliak

    2017-01-01

    The data presented in this article are related to the research article entitled ?Habitat preferences of Hericium erinaceus in Slovakia? (Kunca and ?iliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status,...

  6. Diffeomorphic Iterative Centroid Methods for Template Estimation on Large Datasets

    OpenAIRE

    Cury , Claire; Glaunès , Joan Alexis; Colliot , Olivier

    2014-01-01

    International audience; A common approach for analysis of anatomical variability relies on the stimation of a template representative of the population. The Large Deformation Diffeomorphic Metric Mapping is an attractive framework for that purpose. However, template estimation using LDDMM is computationally expensive, which is a limitation for the study of large datasets. This paper presents an iterative method which quickly provides a centroid of the population in the shape space. This centr...

  7. A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

    Science.gov (United States)

    Kadijevich, Djordje M.

    2015-01-01

    Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…

  8. An Analysis on Better Testing than Training Performances on the Iris Dataset

    NARCIS (Netherlands)

    Schutten, Marten; Wiering, Marco

    2016-01-01

    The Iris dataset is a well known dataset containing information on three different types of Iris flowers. A typical and popular method for solving classification problems on datasets such as the Iris set is the support vector machine (SVM). In order to do so the dataset is separated in a set used

  9. New public dataset for spotting patterns in medieval document images

    Science.gov (United States)

    En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

    2017-01-01

    With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.

  10. Decoys Selection in Benchmarking Datasets: Overview and Perspectives

    Science.gov (United States)

    Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

    2018-01-01

    Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509

  11. ENHANCED DATA DISCOVERABILITY FOR IN SITU HYPERSPECTRAL DATASETS

    Directory of Open Access Journals (Sweden)

    B. Rasaiah

    2016-06-01

    Full Text Available Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015 with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  12. Multiresolution persistent homology for excessively large biomolecular datasets

    Energy Technology Data Exchange (ETDEWEB)

    Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  13. Tissue-Based MRI Intensity Standardization: Application to Multicentric Datasets

    Directory of Open Access Journals (Sweden)

    Nicolas Robitaille

    2012-01-01

    Full Text Available Intensity standardization in MRI aims at correcting scanner-dependent intensity variations. Existing simple and robust techniques aim at matching the input image histogram onto a standard, while we think that standardization should aim at matching spatially corresponding tissue intensities. In this study, we present a novel automatic technique, called STI for STandardization of Intensities, which not only shares the simplicity and robustness of histogram-matching techniques, but also incorporates tissue spatial intensity information. STI uses joint intensity histograms to determine intensity correspondence in each tissue between the input and standard images. We compared STI to an existing histogram-matching technique on two multicentric datasets, Pilot E-ADNI and ADNI, by measuring the intensity error with respect to the standard image after performing nonlinear registration. The Pilot E-ADNI dataset consisted in 3 subjects each scanned in 7 different sites. The ADNI dataset consisted in 795 subjects scanned in more than 50 different sites. STI was superior to the histogram-matching technique, showing significantly better intensity matching for the brain white matter with respect to the standard image.

  14. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander; Mularoni, Loris; Cope, Leslie M.; Medvedeva, Yulia; Mironov, Andrey A.; Makeev, Vsevolod J.; Wheelan, Sarah J.

    2012-01-01

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  15. Image segmentation evaluation for very-large datasets

    Science.gov (United States)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  16. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander

    2012-05-31

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  17. A cross-country Exchange Market Pressure (EMP dataset

    Directory of Open Access Journals (Sweden)

    Mohit Desai

    2017-06-01

    Full Text Available The data presented in this article are related to the research article titled - “An exchange market pressure measure for cross country analysis” (Patnaik et al. [1]. In this article, we present the dataset for Exchange Market Pressure values (EMP for 139 countries along with their conversion factors, ρ (rho. Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values for the point estimates of ρ’s. Using the standard errors of estimates of ρ’s, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  18. Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification

    Directory of Open Access Journals (Sweden)

    Mustafa Serter Uzer

    2013-01-01

    Full Text Available This paper offers a hybrid approach that uses the artificial bee colony (ABC algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.

  19. Global distribution of urban parameters derived from high-resolution global datasets for weather modelling

    Science.gov (United States)

    Kawano, N.; Varquez, A. C. G.; Dong, Y.; Kanda, M.

    2016-12-01

    Numerical model such as Weather Research and Forecasting model coupled with single-layer Urban Canopy Model (WRF-UCM) is one of the powerful tools to investigate urban heat island. Urban parameters such as average building height (Have), plain area index (λp) and frontal area index (λf), are necessary inputs for the model. In general, these parameters are uniformly assumed in WRF-UCM but this leads to unrealistic urban representation. Distributed urban parameters can also be incorporated into WRF-UCM to consider a detail urban effect. The problem is that distributed building information is not readily available for most megacities especially in developing countries. Furthermore, acquiring real building parameters often require huge amount of time and money. In this study, we investigated the potential of using globally available satellite-captured datasets for the estimation of the parameters, Have, λp, and λf. Global datasets comprised of high spatial resolution population dataset (LandScan by Oak Ridge National Laboratory), nighttime lights (NOAA), and vegetation fraction (NASA). True samples of Have, λp, and λf were acquired from actual building footprints from satellite images and 3D building database of Tokyo, New York, Paris, Melbourne, Istanbul, Jakarta and so on. Regression equations were then derived from the block-averaging of spatial pairs of real parameters and global datasets. Results show that two regression curves to estimate Have and λf from the combination of population and nightlight are necessary depending on the city's level of development. An index which can be used to decide which equation to use for a city is the Gross Domestic Product (GDP). On the other hand, λphas less dependence on GDP but indicated a negative relationship to vegetation fraction. Finally, a simplified but precise approximation of urban parameters through readily-available, high-resolution global datasets and our derived regressions can be utilized to estimate a

  20. SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

    KAUST Repository

    Giancola, Silvio; Amine, Mohieddine; Dghaily, Tarek; Ghanem, Bernard

    2018-01-01

    In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\\delta$ ranging from 5 to 60 seconds.

  1. SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

    KAUST Repository

    Giancola, Silvio

    2018-04-12

    In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\\\\delta$ ranging from 5 to 60 seconds.

  2. Se-SAD serial femtosecond crystallography datasets from selenobiotinyl-streptavidin

    Science.gov (United States)

    Yoon, Chun Hong; Demirci, Hasan; Sierra, Raymond G.; Dao, E. Han; Ahmadi, Radman; Aksit, Fulya; Aquila, Andrew L.; Batyuk, Alexander; Ciftci, Halilibrahim; Guillet, Serge; Hayes, Matt J.; Hayes, Brandon; Lane, Thomas J.; Liang, Meng; Lundström, Ulf; Koglin, Jason E.; Mgbam, Paul; Rao, Yashas; Rendahl, Theodore; Rodriguez, Evan; Zhang, Lindsey; Wakatsuki, Soichi; Boutet, Sébastien; Holton, James M.; Hunter, Mark S.

    2017-04-01

    We provide a detailed description of selenobiotinyl-streptavidin (Se-B SA) co-crystal datasets recorded using the Coherent X-ray Imaging (CXI) instrument at the Linac Coherent Light Source (LCLS) for selenium single-wavelength anomalous diffraction (Se-SAD) structure determination. Se-B SA was chosen as the model system for its high affinity between biotin and streptavidin where the sulfur atom in the biotin molecule (C10H16N2O3S) is substituted with selenium. The dataset was collected at three different transmissions (100, 50, and 10%) using a serial sample chamber setup which allows for two sample chambers, a front chamber and a back chamber, to operate simultaneously. Diffraction patterns from Se-B SA were recorded to a resolution of 1.9 Å. The dataset is publicly available through the Coherent X-ray Imaging Data Bank (CXIDB) and also on LCLS compute nodes as a resource for research and algorithm development.

  3. Mapping Global Ocean Surface Albedo from Satellite Observations: Models, Algorithms, and Datasets

    Science.gov (United States)

    Li, X.; Fan, X.; Yan, H.; Li, A.; Wang, M.; Qu, Y.

    2018-04-01

    Ocean surface albedo (OSA) is one of the important parameters in surface radiation budget (SRB). It is usually considered as a controlling factor of the heat exchange among the atmosphere and ocean. The temporal and spatial dynamics of OSA determine the energy absorption of upper level ocean water, and have influences on the oceanic currents, atmospheric circulations, and transportation of material and energy of hydrosphere. Therefore, various parameterizations and models have been developed for describing the dynamics of OSA. However, it has been demonstrated that the currently available OSA datasets cannot full fill the requirement of global climate change studies. In this study, we present a literature review on mapping global OSA from satellite observations. The models (parameterizations, the coupled ocean-atmosphere radiative transfer (COART), and the three component ocean water albedo (TCOWA)), algorithms (the estimation method based on reanalysis data, and the direct-estimation algorithm), and datasets (the cloud, albedo and radiation (CLARA) surface albedo product, dataset derived by the TCOWA model, and the global land surface satellite (GLASS) phase-2 surface broadband albedo product) of OSA have been discussed, separately.

  4. Similar Data Retrieval from Enormous Datasets on ELF/VLF Wave Spectrum Observed by Akebono

    Directory of Open Access Journals (Sweden)

    Y Kasahara

    2010-02-01

    Full Text Available As the total amount of data measured by scientific spacecraft is drastically increasing, it is necessary for researchers to develop new computation methods for efficient analysis of these enormous datasets. In the present study, we propose a new algorithm for similar data retrieval. We first discuss key descriptors that represent characteristics of the VLF/ELF waves observed by the Akebono spacecraft. Second, an algorithm for similar data retrieval is introduced. Finally, we demonstrate that the developed algorithm works well for the retrieval of the VLF spectrum with a small amount of CPU load.

  5. GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction

    Directory of Open Access Journals (Sweden)

    Zheng Huiru

    2009-01-01

    Full Text Available Abstract Background Information about protein interaction networks is fundamental to understanding protein function and cellular processes. Interaction patterns among proteins can suggest new drug targets and aid in the design of new therapeutic interventions. Efforts have been made to map interactions on a proteomic-wide scale using both experimental and computational techniques. Reference datasets that contain known interacting proteins (positive cases and non-interacting proteins (negative cases are essential to support computational prediction and validation of protein-protein interactions. Information on known interacting and non interacting proteins are usually stored within databases. Extraction of these data can be both complex and time consuming. Although, the automatic construction of reference datasets for classification is a useful resource for researchers no public resource currently exists to perform this task. Results GRIP (Gold Reference dataset constructor from Information on Protein complexes is a web-based system that provides researchers with the functionality to create reference datasets for protein-protein interaction prediction in Saccharomyces cerevisiae. Both positive and negative cases for a reference dataset can be extracted, organised and downloaded by the user. GRIP also provides an upload facility whereby users can submit proteins to determine protein complex membership. A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae. Conclusion GRIP is developed to retrieve information on protein complex, cellular localisation, and physical and genetic interactions in Saccharomyces cerevisiae. Manual construction of reference datasets can be a time consuming process requiring programming knowledge. GRIP simplifies and speeds up this process by allowing users to automatically construct reference datasets. GRIP is free to access at http://rosalind.infj.ulst.ac.uk/GRIP/.

  6. Astronaut Photography of the Earth: A Long-Term Dataset for Earth Systems Research, Applications, and Education

    Science.gov (United States)

    Stefanov, William L.

    2017-01-01

    The NASA Earth observations dataset obtained by humans in orbit using handheld film and digital cameras is freely accessible to the global community through the online searchable database at https://eol.jsc.nasa.gov, and offers a useful compliment to traditional ground-commanded sensor data. The dataset includes imagery from the NASA Mercury (1961) through present-day International Space Station (ISS) programs, and currently totals over 2.6 million individual frames. Geographic coverage of the dataset includes land and oceans areas between approximately 52 degrees North and South latitudes, but is spatially and temporally discontinuous. The photographic dataset includes some significant impediments for immediate research, applied, and educational use: commercial RGB films and camera systems with overlapping bandpasses; use of different focal length lenses, unconstrained look angles, and variable spacecraft altitudes; and no native geolocation information. Such factors led to this dataset being underutilized by the community but recent advances in automated and semi-automated image geolocation, image feature classification, and web-based services are adding new value to the astronaut-acquired imagery. A coupled ground software and on-orbit hardware system for the ISS is in development for planned deployment in mid-2017; this system will capture camera pose information for each astronaut photograph to allow automated, full georegistration of the data. The ground system component of the system is currently in use to fully georeference imagery collected in response to International Disaster Charter activations, and the auto-registration procedures are being applied to the extensive historical database of imagery to add value for research and educational purposes. In parallel, machine learning techniques are being applied to automate feature identification and classification throughout the dataset, in order to build descriptive metadata that will improve search

  7. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    Science.gov (United States)

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of

  8. Solar resource assessment in complex orography: a comparison of available datasets for the Trentino region

    Science.gov (United States)

    Laiti, Lavinia; Giovannini, Lorenzo; Zardi, Dino

    2015-04-01

    The accurate assessment of the solar radiation available at the Earth's surface is essential for a wide range of energy-related applications, such as the design of solar power plants, water heating systems and energy-efficient buildings, as well as in the fields of climatology, hydrology, ecology and agriculture. The characterization of solar radiation is particularly challenging in complex-orography areas, where topographic shadowing and altitude effects, together with local weather phenomena, greatly increase the spatial and temporal variability of such variable. At present, approaches ranging from surface measurements interpolation to orographic down-scaling of satellite data, to numerical model simulations are adopted for mapping solar radiation. In this contribution a high-resolution (200 m) solar atlas for the Trentino region (Italy) is presented, which was recently developed on the basis of hourly observations of global radiation collected from the local radiometric stations during the period 2004-2012. Monthly and annual climatological irradiation maps were obtained by the combined use of a GIS-based clear-sky model (r.sun module of GRASS GIS) and geostatistical interpolation techniques (kriging). Moreover, satellite radiation data derived by the MeteoSwiss HelioMont algorithm (2 km resolution) were used for missing-data reconstruction and for the final mapping, thus integrating ground-based and remote-sensing information. The results are compared with existing solar resource datasets, such as the PVGIS dataset, produced by the Joint Research Center Institute for Energy and Transport, and the HelioMont dataset, in order to evaluate the accuracy of the different datasets available for the region of interest.

  9. The sound of migration: exploring data sonification as a means of interpreting multivariate salmon movement datasets

    Directory of Open Access Journals (Sweden)

    Jens C. Hegg

    2018-02-01

    Full Text Available The migration of Pacific salmon is an important part of functioning freshwater ecosystems, but as populations have decreased and ecological conditions have changed, so have migration patterns. Understanding how the environment, and human impacts, change salmon migration behavior requires observing migration at small temporal and spatial scales across large geographic areas. Studying these detailed fish movements is particularly important for one threatened population of Chinook salmon in the Snake River of Idaho whose juvenile behavior may be rapidly evolving in response to dams and anthropogenic impacts. However, exploring movement data sets of large numbers of salmon can present challenges due to the difficulty of visualizing the multivariate, time-series datasets. Previous research indicates that sonification, representing data using sound, has the potential to enhance exploration of multivariate, time-series datasets. We developed sonifications of individual fish movements using a large dataset of salmon otolith microchemistry from Snake River Fall Chinook salmon. Otoliths, a balance and hearing organ in fish, provide a detailed chemical record of fish movements recorded in the tree-like rings they deposit each day the fish is alive. This data represents a scalable, multivariate dataset of salmon movement ideal for sonification. We tested independent listener responses to validate the effectiveness of the sonification tool and mapping methods. The sonifications were presented in a survey to untrained listeners to identify salmon movements with increasingly more fish, with and without visualizations. Our results showed that untrained listeners were most sensitive to transitions mapped to pitch and timbre. Accuracy results were non-intuitive; in aggregate, respondents clearly identified important transitions, but individual accuracy was low. This aggregate effect has potential implications for the use of sonification in the context of crowd

  10. SAMNet: a network-based approach to integrate multi-dimensional high throughput datasets.

    Science.gov (United States)

    Gosline, Sara J C; Spencer, Sarah J; Ursu, Oana; Fraenkel, Ernest

    2012-11-01

    The rapid development of high throughput biotechnologies has led to an onslaught of data describing genetic perturbations and changes in mRNA and protein levels in the cell. Because each assay provides a one-dimensional snapshot of active signaling pathways, it has become desirable to perform multiple assays (e.g. mRNA expression and phospho-proteomics) to measure a single condition. However, as experiments expand to accommodate various cellular conditions, proper analysis and interpretation of these data have become more challenging. Here we introduce a novel approach called SAMNet, for Simultaneous Analysis of Multiple Networks, that is able to interpret diverse assays over multiple perturbations. The algorithm uses a constrained optimization approach to integrate mRNA expression data with upstream genes, selecting edges in the protein-protein interaction network that best explain the changes across all perturbations. The result is a putative set of protein interactions that succinctly summarizes the results from all experiments, highlighting the network elements unique to each perturbation. We evaluated SAMNet in both yeast and human datasets. The yeast dataset measured the cellular response to seven different transition metals, and the human dataset measured cellular changes in four different lung cancer models of Epithelial-Mesenchymal Transition (EMT), a crucial process in tumor metastasis. SAMNet was able to identify canonical yeast metal-processing genes unique to each commodity in the yeast dataset, as well as human genes such as β-catenin and TCF7L2/TCF4 that are required for EMT signaling but escaped detection in the mRNA and phospho-proteomic data. Moreover, SAMNet also highlighted drugs likely to modulate EMT, identifying a series of less canonical genes known to be affected by the BCR-ABL inhibitor imatinib (Gleevec), suggesting a possible influence of this drug on EMT.

  11. Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization

    Directory of Open Access Journals (Sweden)

    Cameron Christopher JF

    2010-10-01

    Full Text Available Abstract This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work. Background Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology. Results Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems. Conclusions We conclude that our success is due to the ability of our system to: 1 encode molecules losslessly before presentation to the learning system, and 2

  12. CHARMe Commentary metadata for Climate Science: collecting, linking and sharing user feedback on climate datasets

    Science.gov (United States)

    Blower, Jon; Lawrence, Bryan; Kershaw, Philip; Nagni, Maurizio

    2014-05-01

    The research process can be thought of as an iterative activity, initiated based on prior domain knowledge, as well on a number of external inputs, and producing a range of outputs including datasets, studies and peer reviewed publications. These outputs may describe the problem under study, the methodology used, the results obtained, etc. In any new publication, the author may cite or comment other papers or datasets in order to support their research hypothesis. However, as their work progresses, the researcher may draw from many other latent channels of information. These could include for example, a private conversation following a lecture or during a social dinner; an opinion expressed concerning some significant event such as an earthquake or for example a satellite failure. In addition, other sources of information of grey literature are important public such as informal papers such as the arxiv deposit, reports and studies. The climate science community is not an exception to this pattern; the CHARMe project, funded under the European FP7 framework, is developing an online system for collecting and sharing user feedback on climate datasets. This is to help users judge how suitable such climate data are for an intended application. The user feedback could be comments about assessments, citations, or provenance of the dataset, or other information such as descriptions of uncertainty or data quality. We define this as a distinct category of metadata called Commentary or C-metadata. We link C-metadata with target climate datasets using a Linked Data approach via the Open Annotation data model. In the context of Linked Data, C-metadata plays the role of a resource which, depending on its nature, may be accessed as simple text or as more structured content. The project is implementing a range of software tools to create, search or visualize C-metadata including a JavaScript plugin enabling this functionality to be integrated in situ with data provider portals

  13. Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking

    KAUST Repository

    Huang, Huang

    2017-07-16

    This thesis focuses on two topics, computational methods for large spatial datasets and functional data ranking. Both are tackling the challenges of big and high-dimensional data. The first topic is motivated by the prohibitive computational burden in fitting Gaussian process models to large and irregularly spaced spatial datasets. Various approximation methods have been introduced to reduce the computational cost, but many rely on unrealistic assumptions about the process and retaining statistical efficiency remains an issue. We propose a new scheme to approximate the maximum likelihood estimator and the kriging predictor when the exact computation is infeasible. The proposed method provides different types of hierarchical low-rank approximations that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements with the hierarchical low-rank approximation and apply the proposed fast kriging to fill gaps for satellite images. The second topic is motivated by rank-based outlier detection methods for functional data. Compared to magnitude outliers, it is more challenging to detect shape outliers as they are often masked among samples. We develop a new notion of functional data depth by taking the integration of a univariate depth function. Having a form of the integrated depth, it shares many desirable features. Furthermore, the novel formation leads to a useful decomposition for detecting both shape and magnitude outliers. Our simulation studies show the proposed outlier detection procedure outperforms competitors in various outlier models. We also illustrate our methodology using real datasets of curves, images, and video frames. Finally, we introduce the functional data ranking technique to spatio-temporal statistics for visualizing and assessing covariance properties, such as

  14. Animated analysis of geoscientific datasets: An interactive graphical application

    Science.gov (United States)

    Morse, Peter; Reading, Anya; Lueg, Christopher

    2017-12-01

    Geoscientists are required to analyze and draw conclusions from increasingly large volumes of data. There is a need to recognise and characterise features and changing patterns of Earth observables within such large datasets. It is also necessary to identify significant subsets of the data for more detailed analysis. We present an innovative, interactive software tool and workflow to visualise, characterise, sample and tag large geoscientific datasets from both local and cloud-based repositories. It uses an animated interface and human-computer interaction to utilise the capacity of human expert observers to identify features via enhanced visual analytics. 'Tagger' enables users to analyze datasets that are too large in volume to be drawn legibly on a reasonable number of single static plots. Users interact with the moving graphical display, tagging data ranges of interest for subsequent attention. The tool provides a rapid pre-pass process using fast GPU-based OpenGL graphics and data-handling and is coded in the Quartz Composer visual programing language (VPL) on Mac OSX. It makes use of interoperable data formats, and cloud-based (or local) data storage and compute. In a case study, Tagger was used to characterise a decade (2000-2009) of data recorded by the Cape Sorell Waverider Buoy, located approximately 10 km off the west coast of Tasmania, Australia. These data serve as a proxy for the understanding of Southern Ocean storminess, which has both local and global implications. This example shows use of the tool to identify and characterise 4 different types of storm and non-storm events during this time. Events characterised in this way are compared with conventional analysis, noting advantages and limitations of data analysis using animation and human interaction. Tagger provides a new ability to make use of humans as feature detectors in computer-based analysis of large-volume geosciences and other data.

  15. Designing the colorectal cancer core dataset in Iran

    Directory of Open Access Journals (Sweden)

    Sara Dorri

    2017-01-01

    Full Text Available Background: There is no need to explain the importance of collection, recording and analyzing the information of disease in any health organization. In this regard, systematic design of standard data sets can be helpful to record uniform and consistent information. It can create interoperability between health care systems. The main purpose of this study was design the core dataset to record colorectal cancer information in Iran. Methods: For the design of the colorectal cancer core data set, a combination of literature review and expert consensus were used. In the first phase, the draft of the data set was designed based on colorectal cancer literature review and comparative studies. Then, in the second phase, this data set was evaluated by experts from different discipline such as medical informatics, oncology and surgery. Their comments and opinion were taken. In the third phase refined data set, was evaluated again by experts and eventually data set was proposed. Results: In first phase, based on the literature review, a draft set of 85 data elements was designed. In the second phase this data set was evaluated by experts and supplementary information was offered by professionals in subgroups especially in treatment part. In this phase the number of elements totally were arrived to 93 numbers. In the third phase, evaluation was conducted by experts and finally this dataset was designed in five main parts including: demographic information, diagnostic information, treatment information, clinical status assessment information, and clinical trial information. Conclusion: In this study the comprehensive core data set of colorectal cancer was designed. This dataset in the field of collecting colorectal cancer information can be useful through facilitating exchange of health information. Designing such data set for similar disease can help providers to collect standard data from patients and can accelerate retrieval from storage systems.

  16. FTSPlot: fast time series visualization for large datasets.

    Directory of Open Access Journals (Sweden)

    Michael Riss

    Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.

  17. A collection of annotated and harmonized human breast cancer transcriptome datasets, including immunologic classification [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Jessica Roelands

    2018-02-01

    Full Text Available The increased application of high-throughput approaches in translational research has expanded the number of publicly available data repositories. Gathering additional valuable information contained in the datasets represents a crucial opportunity in the biomedical field. To facilitate and stimulate utilization of these datasets, we have recently developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB. In this note, we describe a curated compendium of 13 public datasets on human breast cancer, representing a total of 2142 transcriptome profiles. We classified the samples according to different immune based classification systems and integrated this information into the datasets. Annotated and harmonized datasets were uploaded to GXB. Study samples were categorized in different groups based on their immunologic tumor response profiles, intrinsic molecular subtypes and multiple clinical parameters. Ranked gene lists were generated based on relevant group comparisons. In this data note, we demonstrate the utility of GXB to evaluate the expression of a gene of interest, find differential gene expression between groups and investigate potential associations between variables with a specific focus on immunologic classification in breast cancer. This interactive resource is publicly available online at: http://breastcancer.gxbsidra.org/dm3/geneBrowser/list.

  18. Identifying frauds and anomalies in Medicare-B dataset.

    Science.gov (United States)

    Jiwon Seo; Mendelevitch, Ofer

    2017-07-01

    Healthcare industry is growing at a rapid rate to reach a market value of $7 trillion dollars world wide. At the same time, fraud in healthcare is becoming a serious problem, amounting to 5% of the total healthcare spending, or $100 billion dollars each year in US. Manually detecting healthcare fraud requires much effort. Recently, machine learning and data mining techniques are applied to automatically detect healthcare frauds. This paper proposes a novel PageRank-based algorithm to detect healthcare frauds and anomalies. We apply the algorithm to Medicare-B dataset, a real-life data with 10 million healthcare insurance claims. The algorithm successfully identifies tens of previously unreported anomalies.

  19. Power analysis dataset for QCA based multiplexer circuits

    Directory of Open Access Journals (Sweden)

    Md. Abdullah-Al-Shafi

    2017-04-01

    Full Text Available Power consumption in irreversible QCA logic circuits is a vital and a major issue; however in the practical cases, this focus is mostly omitted.The complete power depletion dataset of different QCA multiplexers have been worked out in this paper. At −271.15 °C temperature, the depletion is evaluated under three separate tunneling energy levels. All the circuits are designed with QCADesigner, a broadly used simulation engine and QCAPro tool has been applied for estimating the power dissipation.

  20. Equalizing imbalanced imprecise datasets for genetic fuzzy classifiers

    Directory of Open Access Journals (Sweden)

    AnaM. Palacios

    2012-04-01

    Full Text Available Determining whether an imprecise dataset is imbalanced is not immediate. The vagueness in the data causes that the prior probabilities of the classes are not precisely known, and therefore the degree of imbalance can also be uncertain. In this paper we propose suitable extensions of different resampling algorithms that can be applied to interval valued, multi-labelled data. By means of these extended preprocessing algorithms, certain classification systems designed for minimizing the fraction of misclassifications are able to produce knowledge bases that are also adequate under common metrics for imbalanced classification.

  1. Dataset concerning the analytical approximation of the Ae3 temperature

    Directory of Open Access Journals (Sweden)

    B.L. Ennis

    2017-02-01

    The dataset includes the terms of the function and the values for the polynomial coefficients for major alloying elements in steel. A short description of the approximation method used to derive and validate the coefficients has also been included. For discussion and application of this model, please refer to the full length article entitled “The role of aluminium in chemical and phase segregation in a TRIP-assisted dual phase steel” 10.1016/j.actamat.2016.05.046 (Ennis et al., 2016 [1].

  2. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...... analyses to the chicken expression data led to different ranking of the Gene Ontology terms tested. A method for prediction of possible annotations was applied. Conclusion: Biological interpretation based on gene set analyses dependent on the statistical method used. Methods for predicting the possible...

  3. A Validation Dataset for CryoSat Sea Ice Investigators

    DEFF Research Database (Denmark)

    Julia, Gaudelli,; Baker, Steve; Haas, Christian

    Since its launch in April 2010 Cryosat has been collecting valuable sea ice data over the Arctic region. Over the same period ESA’s CryoVEx and NASA IceBridge validation campaigns have been collecting a unique set of coincident airborne measurements in the Arctic. The CryoVal-SI project has...... community. In this talk we will describe the composition of the validation dataset, summarising how it was processed and how to understand the content and format of the data. We will also explain how to access the data and the supporting documentation....

  4. Dataset of statements on policy integration of selected intergovernmental organizations

    Directory of Open Access Journals (Sweden)

    Jale Tosun

    2018-04-01

    Full Text Available This article describes data for 78 intergovernmental organizations (IGOs working on topics related to energy governance, environmental protection, and the economy. The number of IGOs covered also includes organizations active in other sectors. The point of departure for data construction was the Correlates of War dataset, from which we selected this sample of IGOs. We updated and expanded the empirical information on the IGOs selected by manual coding. Most importantly, we collected the primary law texts of the individual IGOs in order to code whether they commit themselves to environmental policy integration (EPI, climate policy integration (CPI and/or energy policy integration (EnPI.

  5. Dataset on the energy performance of atrium type hotel buildings.

    Science.gov (United States)

    Vujosevic, Milica; Krstic-Furundzic, Aleksandra

    2018-04-01

    The data presented in this article are related to the research article entitled "The Influence of Atrium on Energy Performance of Hotel Building" (Vujosevic and Krstic-Furundzic, 2017) [1], which describes the annual energy performance of atrium type hotel building in Belgrade climate conditions, with the objective to present the impact of the atrium on the hotel building's energy demands for space heating and cooling. This dataset is made publicly available to show energy performance of selected hotel design alternatives, in order to enable extended analyzes of these data for other researchers.

  6. Dataset on records of Hericium erinaceus in Slovakia.

    Science.gov (United States)

    Kunca, Vladimír; Čiliak, Marek

    2017-06-01

    The data presented in this article are related to the research article entitled "Habitat preferences of Hericium erinaceus in Slovakia" (Kunca and Čiliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status, host tree position and intensity of management of forest stands were evaluated in this study. All surveys were based on basidioma occurrence and some result from targeted searches.

  7. Dataset on records of Hericium erinaceus in Slovakia

    Directory of Open Access Journals (Sweden)

    Vladimír Kunca

    2017-06-01

    Full Text Available The data presented in this article are related to the research article entitled “Habitat preferences of Hericium erinaceus in Slovakia” (Kunca and Čiliak, 2016 [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status, host tree position and intensity of management of forest stands were evaluated in this study. All surveys were based on basidioma occurrence and some result from targeted searches.

  8. Explore Earth Science Datasets for STEM with the NASA GES DISC Online Visualization and Analysis Tool, Giovanni

    Science.gov (United States)

    Liu, Z.; Acker, J.; Kempler, S.

    2016-01-01

    The NASA Goddard Earth Sciences (GES) Data and Information Services Center(DISC) is one of twelve NASA Science Mission Directorate (SMD) Data Centers that provide Earth science data, information, and services to users around the world including research and application scientists, students, citizen scientists, etc. The GESDISC is the home (archive) of remote sensing datasets for NASA Precipitation and Hydrology, Atmospheric Composition and Dynamics, etc. To facilitate Earth science data access, the GES DISC has been developing user-friendly data services for users at different levels in different countries. Among them, the Geospatial Interactive Online Visualization ANd aNalysis Infrastructure (Giovanni, http:giovanni.gsfc.nasa.gov) allows users to explore satellite-based datasets using sophisticated analyses and visualization without downloading data and software, which is particularly suitable for novices (such as students) to use NASA datasets in STEM (science, technology, engineering and mathematics) activities. In this presentation, we will briefly introduce Giovanni along with examples for STEM activities.

  9. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Yu-Wei [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Simmons, Blake A. [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Singer, Steven W. [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2015-10-29

    The recovery of genomes from metagenomic datasets is a critical step to defining the functional roles of the underlying uncultivated populations. We previously developed MaxBin, an automated binning approach for high-throughput recovery of microbial genomes from metagenomes. Here, we present an expanded binning algorithm, MaxBin 2.0, which recovers genomes from co-assembly of a collection of metagenomic datasets. Tests on simulated datasets revealed that MaxBin 2.0 is highly accurate in recovering individual genomes, and the application of MaxBin 2.0 to several metagenomes from environmental samples demonstrated that it could achieve two complementary goals: recovering more bacterial genomes compared to binning a single sample as well as comparing the microbial community composition between different sampling environments. Availability and implementation: MaxBin 2.0 is freely available at http://sourceforge.net/projects/maxbin/ under BSD license. Supplementary information: Supplementary data are available at Bioinformatics online.

  10. a Variant of Lsd-Slam Capable of Processing High-Speed Low-Framerate Monocular Datasets

    Science.gov (United States)

    Schmid, S.; Fritsch, D.

    2017-11-01

    We develop a new variant of LSD-SLAM, called C-LSD-SLAM, which is capable of performing monocular tracking and mapping in high-speed low-framerate situations such as those of the KITTI datasets. The methods used here are robust against the influence of erronously triangulated points near the epipolar direction, which otherwise causes tracking divergence.

  11. High-resolution precipitation mapping in a mountainous watershed: ground truth for evaluating uncertainty in a national precipitation dataset

    Science.gov (United States)

    Christopher Daly; Melissa E. Slater; Joshua A. Roberti; Stephanie H. Laseter; Lloyd W. Swift

    2017-01-01

    A 69-station, densely spaced rain gauge network was maintained over the period 1951–1958 in the Coweeta Hydrologic Laboratory, located in the southern Appalachians in western North Carolina, USA. This unique dataset was used to develop the first digital seasonal and annual precipitation maps for the Coweeta basin, using elevation regression functions and...

  12. The Stream-Catchment (StreamCat) Dataset: A database of watershed metrics for the conterminous USA

    Science.gov (United States)

    We developed an extensive database of landscape metrics for ~2.65 million streams, and their associated catchments, within the conterminous USA: The Stream-Catchment (StreamCat) Dataset. These data are publically available and greatly reduce the specialized geospatial expertise n...

  13. The MetabolomeExpress Project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets

    Directory of Open Access Journals (Sweden)

    Carroll Adam J

    2010-07-01

    Full Text Available Abstract Background Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS makes it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting, processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis. Description Here, we present MetabolomeExpress, a new File Transfer Protocol (FTP server and web-tool for the online storage, processing, visualisation and statistical re-analysis of publicly submitted GC/MS metabolomics datasets. Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (eg. metabolite, species, organ/biofluid etc.. Users may also perform meta-analysis comparisons of multiple independent experiments or re-analyse public primary datasets via user-friendly tools for t-test, principal components analysis, hierarchical cluster analysis and correlation analysis. They may interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP their own data to the server for online processing via a novel raw data processing pipeline. Conclusions MetabolomeExpress https://www.metabolome-express.org provides a new opportunity for the general metabolomics community to transparently present online the raw and processed GC/MS data underlying their metabolomics publications. Transparent sharing of these data will allow researchers to

  14. Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

    Directory of Open Access Journals (Sweden)

    Sai Kiranmayee Samudrala

    2015-01-01

    Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

  15. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  16. Robust computational analysis of rRNA hypervariable tag datasets.

    Directory of Open Access Journals (Sweden)

    Maksim Sipos

    Full Text Available Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large (10(5-10(6 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods.

  17. BLAST-EXPLORER helps you building datasets for phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2010-01-01

    Full Text Available Abstract Background The right sampling of homologous sequences for phylogenetic or molecular evolution analyses is a crucial step, the quality of which can have a significant impact on the final interpretation of the study. There is no single way for constructing datasets suitable for phylogenetic analysis, because this task intimately depends on the scientific question we want to address, Moreover, database mining softwares such as BLAST which are routinely used for searching homologous sequences are not specifically optimized for this task. Results To fill this gap, we designed BLAST-Explorer, an original and friendly web-based application that combines a BLAST search with a suite of tools that allows interactive, phylogenetic-oriented exploration of the BLAST results and flexible selection of homologous sequences among the BLAST hits. Once the selection of the BLAST hits is done using BLAST-Explorer, the corresponding sequence can be imported locally for external analysis or passed to the phylogenetic tree reconstruction pipelines available on the Phylogeny.fr platform. Conclusions BLAST-Explorer provides a simple, intuitive and interactive graphical representation of the BLAST results and allows selection and retrieving of the BLAST hit sequences based a wide range of criterions. Although BLAST-Explorer primarily aims at helping the construction of sequence datasets for further phylogenetic study, it can also be used as a standard BLAST server with enriched output. BLAST-Explorer is available at http://www.phylogeny.fr

  18. Multiresolution comparison of precipitation datasets for large-scale models

    Science.gov (United States)

    Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

    2014-12-01

    Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.

  19. Benchmarking Deep Learning Models on Large Healthcare Datasets.

    Science.gov (United States)

    Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

    2018-06-04

    Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.

  20. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets.

    Science.gov (United States)

    Li, Lianwei; Ma, Zhanshan Sam

    2016-08-16

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health-the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples, we discovered that only 49 communities (less than 1%) satisfied the neutral theory, and concluded that human microbial communities are not neutral in general. The 49 positive cases, although only a tiny minority, do demonstrate the existence of neutral processes. We realize that the traditional doctrine of microbial biogeography "Everything is everywhere, but the environment selects" first proposed by Baas-Becking resolves the apparent contradiction. The first part of Baas-Becking doctrine states that microbes are not dispersal-limited and therefore are neutral prone, and the second part reiterates that the freely dispersed microbes must endure selection by the environment. Therefore, in most cases, it is the host environment that ultimately shapes the community assembly and tip the human microbiome to niche regime.