WorldWideScience

Sample records for fluxes gsstf dataset

  1. A Newly Distributed Satellite-based Global Air-sea Surface Turbulent Fluxes Data Set -- GSSTF2b

    Science.gov (United States)

    Shie, C.; Nelkin, E.; Ardizzone, J.; Savtchenko, A.; Chiu, L. S.; Adler, R. F.; Lin, I.; Gao, S.

    2010-12-01

    Accurate sea surface turbulent flux measurements are crucial to understanding the global water and energy cycle changes. Remote sensing is a valuable tool for global monitoring of these flux measurements. The GSSTF (Goddard Satellite-based Surface Turbulent Fluxes) algorithm was thus developed and applied to remote sensing research and applications. The recently revived and produced daily global (1ox1o) GSSTF2b (Version-2b) dataset (July 1987-December 2008) is currently under processing for an official distribution by NASA GES DISC (Goddard Earth Sciences Data and Information Services Center) due by the end of this month (September, 2010). Like its predecessor product GSSTF2, GSSTF2b is expected to provide the scientific community a longer-period and useful turbulent surface flux dataset for global energy and water cycle research, as well as regional and short period data analyses. We have recently been funded by the NASA/MEaSUREs Program to resume processing of the GSSTF with an objective of continually producing an up-to-date uniform and reliable dataset of sea surface turbulent fluxes, derived from improved input remote sensing data and model reanalysis, which would continue to be useful for global energy and water flux research and applications. The daily global (1ox1o) GSSTF2b dataset has lately been produced using upgraded and improved input datasets such as the Special Sensor Microwave Imager (SSM/I) Version-6 (V6) product (including brightness temperature [Tb], total precipitable water [W], and wind speed [U]) and the NCEP/DOE Reanalysis-2 (R2) product (including sea skin temperature [SKT], 2-meter air temperature [T2m], and sea level pressure [SLP]). The input datasets previously used for producing the GSSTF2 product were the SSM/I Version-4 (V4) product and the NCEP Reanalysis-1 (R1) product. The newly produced GSSTF2b was found to generally agree better with available ship measurements obtained from several field experiments in 1999 than its counterpart

  2. NCEP/DOE Reanalysis II in HDF-EOS5, for GSSTF2c, 1x1 deg Daily grid V2c

    Data.gov (United States)

    National Aeronautics and Space Administration — These data are the Goddard Satellite-based Surface Turbulent Fluxes Version-2c (GSSTF2c) Dataset recently produced through a MEaSURES funded project led by Dr....

  3. Surface Turbulent Fluxes, 1x1 deg Daily Grid, Satellite F10 V2c

    Data.gov (United States)

    National Aeronautics and Space Administration — These data are part of the Goddard Satellite-based Surface Turbulent Fluxes Version-2c (GSSTF 2c) Dataset recently produced through a MEaSURES funded project led by...

  4. Surface Turbulent Fluxes, 1x1 deg Daily Grid, Satellite F15 V2c

    Data.gov (United States)

    National Aeronautics and Space Administration — These data are part of the Goddard Satellite-based Surface Turbulent Fluxes Version-2c (GSSTF 2c) Dataset recently produced through a MEaSURES funded project led by...

  5. Surface Turbulent Fluxes, 1x1 deg Daily Grid, Satellite F13 V2c

    Data.gov (United States)

    National Aeronautics and Space Administration — These data are part of the Goddard Satellite-based Surface Turbulent Fluxes Version-2c (GSSTF 2c) Dataset recently produced through a MEaSURES funded project led by...

  6. Surface Turbulent Fluxes, 1x1 deg Daily Grid, Satellite F11 V2c

    Data.gov (United States)

    National Aeronautics and Space Administration — These data are part of the Goddard Satellite-based Surface Turbulent Fluxes Version-2c (GSSTF 2c) Dataset recently produced through a MEaSURES funded project led by...

  7. Surface Turbulent Fluxes, 1x1 deg Daily Grid, Satellite F08 V2c

    Data.gov (United States)

    National Aeronautics and Space Administration — These data are part of the Goddard Satellite-based Surface Turbulent Fluxes Version-2c (GSSTF 2c) Dataset recently produced through a MEaSURES funded project led by...

  8. The SeaFlux Turbulent Flux Dataset Version 1.0 Documentation

    Science.gov (United States)

    Clayson, Carol Anne; Roberts, J. Brent; Bogdanoff, Alec S.

    2012-01-01

    Under the auspices of the World Climate Research Programme (WCRP) Global Energy and Water cycle EXperiment (GEWEX) Data and Assessment Panel (GDAP), the SeaFlux Project was created to investigate producing a high-resolution satellite-based dataset of surface turbulent fluxes over the global oceans. The most current release of the SeaFlux product is Version 1.0; this represents the initial release of turbulent surface heat fluxes, associated near-surface variables including a diurnally varying sea surface temperature.

  9. Synthesizing Global and Local Datasets to Estimate Jurisdictional Forest Carbon Fluxes in Berau, Indonesia.

    Directory of Open Access Journals (Sweden)

    Bronson W Griscom

    Full Text Available Forest conservation efforts are increasingly being implemented at the scale of sub-national jurisdictions in order to mitigate global climate change and provide other ecosystem services. We see an urgent need for robust estimates of historic forest carbon emissions at this scale, as the basis for credible measures of climate and other benefits achieved. Despite the arrival of a new generation of global datasets on forest area change and biomass, confusion remains about how to produce credible jurisdictional estimates of forest emissions. We demonstrate a method for estimating the relevant historic forest carbon fluxes within the Regency of Berau in eastern Borneo, Indonesia. Our method integrates best available global and local datasets, and includes a comprehensive analysis of uncertainty at the regency scale.We find that Berau generated 8.91 ± 1.99 million tonnes of net CO2 emissions per year during 2000-2010. Berau is an early frontier landscape where gross emissions are 12 times higher than gross sequestration. Yet most (85% of Berau's original forests are still standing. The majority of net emissions were due to conversion of native forests to unspecified agriculture (43% of total, oil palm (28%, and fiber plantations (9%. Most of the remainder was due to legal commercial selective logging (17%. Our overall uncertainty estimate offers an independent basis for assessing three other estimates for Berau. Two other estimates were above the upper end of our uncertainty range. We emphasize the importance of including an uncertainty range for all parameters of the emissions equation to generate a comprehensive uncertainty estimate-which has not been done before. We believe comprehensive estimates of carbon flux uncertainty are increasingly important as national and international institutions are challenged with comparing alternative estimates and identifying a credible range of historic emissions values.

  10. Synthesizing Global and Local Datasets to Estimate Jurisdictional Forest Carbon Fluxes in Berau, Indonesia.

    Science.gov (United States)

    Griscom, Bronson W; Ellis, Peter W; Baccini, Alessandro; Marthinus, Delon; Evans, Jeffrey S; Ruslandi

    2016-01-01

    Forest conservation efforts are increasingly being implemented at the scale of sub-national jurisdictions in order to mitigate global climate change and provide other ecosystem services. We see an urgent need for robust estimates of historic forest carbon emissions at this scale, as the basis for credible measures of climate and other benefits achieved. Despite the arrival of a new generation of global datasets on forest area change and biomass, confusion remains about how to produce credible jurisdictional estimates of forest emissions. We demonstrate a method for estimating the relevant historic forest carbon fluxes within the Regency of Berau in eastern Borneo, Indonesia. Our method integrates best available global and local datasets, and includes a comprehensive analysis of uncertainty at the regency scale. We find that Berau generated 8.91 ± 1.99 million tonnes of net CO2 emissions per year during 2000-2010. Berau is an early frontier landscape where gross emissions are 12 times higher than gross sequestration. Yet most (85%) of Berau's original forests are still standing. The majority of net emissions were due to conversion of native forests to unspecified agriculture (43% of total), oil palm (28%), and fiber plantations (9%). Most of the remainder was due to legal commercial selective logging (17%). Our overall uncertainty estimate offers an independent basis for assessing three other estimates for Berau. Two other estimates were above the upper end of our uncertainty range. We emphasize the importance of including an uncertainty range for all parameters of the emissions equation to generate a comprehensive uncertainty estimate-which has not been done before. We believe comprehensive estimates of carbon flux uncertainty are increasingly important as national and international institutions are challenged with comparing alternative estimates and identifying a credible range of historic emissions values.

  11. Arctic ocean radiative fluxes and cloud forcing estimated from the ISCCP C2 cloud dataset, 1983-1990

    Science.gov (United States)

    Schweiger, Axel J.; Key, Jeffrey R.

    1994-01-01

    Radiative fluxes and cloud forcings for the ocean areas of the Arctic are computed from the monthly cloud product of the International Satellite Cloud Climatology Project (ISCCP) for 1983-90. Spatially averaged short-wave fluxes are compared well with climatological values, while downwelling longwave fluxes are significantly lower. This is probably due to the fact that the ISCCP cloud amounts are underestimates. Top-of-the-atmosphere radiative fluxes are in excellent agreement with measurements from the Earth Radiation Budget Experiment (ERBE). Computed cloud forcings indicate that clouds have a warming effect at the surface and at the top of the atmosphere during winter and a cooling effect during summer. The net radiative effect of clouds is larger at the surface during winter but greater at the top of the atmosphere during summer. Overall the net radiative effect of clouds at the top of the atmosphere is one of cooling. This is in contrast to a previous result from ERBE data showing arctic cloud forcings have a net warming effect. Sensitivities to errors in input parameters are generally greater during winter with cloud amount being the most important paarameter. During summer the surface radiation balance is most sensitive to errors in the measurements of surface reflectance. The results are encouraging, but the estimated error of 20 W/sq m in surface net radiative fluxes is too large, given that estimates of the net radiative warming effect due to a doubling of CO2 are on the order of 4 W/sq m. Because it is difficult to determine the accuracy of results with existing in situ observations, it is recommended that the development of improved algorithms for the retrieval of surface radiative properties be accompanied by the simultaneous assembly of validation datasets.

  12. Towards closure of regional heat budgets in the North Atlantic using Argo floats and surface flux datasets

    Directory of Open Access Journals (Sweden)

    N. C. Wells

    2009-01-01

    Full Text Available The upper ocean heat budget (0–300 m of the North Atlantic from 20°–60° N is investigated using data from Argo profiling floats for 1999–2005 and the NCEP/NCAR and NOC surface flux datasets. Estimates of the different terms in the budget (heat storage, advection, diffusion and surface exchange are obtained using the methodology developed by Hadfield et al. (2007. The method includes optimal interpolation of the individual profiles to produce gridded fields with error estimates at a 10×10 degree grid box resolution. Closure of the heat budget is obtained within the error estimates for some regions – particularly the eastern subtropical Atlantic – but not for those boxes that include the Gulf Stream. Over the whole range considered, closure is obtained for 13 (9 out of 20 boxes with the NOC (NCEP/NCAR surface fluxes. The seasonal heat budget at 20°–30° N, 35°–25° W is considered in detail. Here, the NCEP based budget has an annual mean residual of -55±35 W m-2 compared with a NOC based value of -4±35 W m-2. For this box, the net heat divergence of 36 W m-2 (Ekman=-4 W m-2, geostrophic=11 W m-2, diffusion=29 W m-2 offsets the net heating of 32 W m-2 from the NOC surface heat fluxes. The results in this box are consistent with an earlier evaluation of the fluxes using measurements from research buoys in the subduction array which revealed biases in NCEP but good agreement of the buoy values with the NOC fields.

  13. Towards closure of regional heat budgets in the North Atlantic using Argo floats and surface flux datasets

    Directory of Open Access Journals (Sweden)

    N. C. Wells

    2009-04-01

    Full Text Available The upper ocean heat budget (0–300 m of the North Atlantic from 20°–60° N is investigated using data from Argo profiling floats for 1999–2005 and the NCEP/NCAR and NOC surface flux datasets. Estimates of the different terms in the budget (heat storage, advection, diffusion and surface exchange are obtained using the methodology developed by Hadfield et al. (2007a, b. The method includes optimal interpolation of the individual profiles to produce gridded fields with error estimates at a 10°×10° grid box resolution. Closure of the heat budget is obtained within the error estimates for some regions – particularly the eastern subtropical Atlantic – but not for those boxes that include the Gulf Stream. Over the whole range considered, closure is obtained for 13 (9 out of 20 boxes with the NOC (NCEP/NCAR surface fluxes. The seasonal heat budget at 20–30° N, 35–25° W is considered in detail. Here, the NCEP based budget has an annual mean residual of −55±35 Wm−2 compared with a NOC based value of −4±35 Wm−2. For this box, the net heat divergence of 36 Wm−2 (Ekman=−4 Wm−2, geostrophic=11 Wm−2, diffusion=29 Wm−2 offsets the net heating of 32 Wm−2 from the NOC surface heat fluxes. The results in this box are consistent with an earlier evaluation of the fluxes using measurements from research buoys in the subduction array which revealed biases in NCEP but good agreement of the buoy values with the NOC fields.

  14. Greenhouse gas fluxes from drained organic soils - a synthesis of a large dataset

    Science.gov (United States)

    Tiemeyer, Bärbel

    2016-04-01

    Drained peatlands are hotspots of greenhouse gas (GHG) emissions. Agriculture is the major land use type for peatlands in Germany and other European countries, but strongly varies in its intensity regarding groundwater level and agricultural management. Although the mean annual water table depth is sometimes proposed as an overall predictor for GHG emissions, there is a strong variability of its effects on different peatlands. We synthesized 164 annual GHG budgets for 65 different sites in 13 German peatlands. Land use comprised arable land with different crops (n = 17) and grassland with a management gradient from very intensive use with up to five cuts per year to partially rewetted conservation grassland (n = 48). Carbon dioxide (net ecosystem exchange and ecosystem respiration), nitrous oxide and methane fluxes were measured with transparent and opaque manual chambers. Besides the GHG fluxes, biomass yield, fertilisation, groundwater level, climatic data, vegetation composition and soil properties were measured. Overall, we found a large variability of the total GHG budget ranging from small uptakes to extremely high emissions (> 70 t CO2-equivalents/(ha yr)). At nearly all sites, carbon dioxide was the major component of the GHG budget. Site conditions, especially the nitrogen content of the unsaturated zone and the intra-annual water level distribution, dominated the GHG emissions from grassland. Although these factors are influenced by natural conditions (peat type, regional hydrology), they could be modified by an improved water management. In the case of grassland, agricultural management such as the number of cuts had only a minor influence on the GHG budgets. Given comparable site conditions, there was no significant difference between the emissions from grassland and arable land. Due to the large heterogeneity of site conditions and crop types, emissions from arable land are difficult to explain, but management decisions such as the duration of soil

  15. A 3-year dataset of sensible and latent heat fluxes from the Tibetan Plateau, derived using eddy covariance measurements

    NARCIS (Netherlands)

    Li, Maoshan; Babel, Wolfgang; Chen, Xuelong; Zhang, Lang; Sun, Fanglin; Wang, Binbin; Ma, Yaoming; Hu, Zeyong; Foken, Thomas

    2016-01-01

    The Tibetan Plateau (TP) has become a focus of strong scientific interest due to its role in the global water cycle and its reaction to climate change. Regional flux estimates of sensible and latent heat are important variables for linking the energy and hydrological cycles at the TP’s surface. With

  16. The early vs the late 20th century Arctic warming: The role of energy and aerosol fluxes in reanalysis driven datasets

    Science.gov (United States)

    Wegmann, Martin; Broennimann, Stefan

    2014-05-01

    During the last two decades, the Arctic was put into the scientific focus as one of the most impacted regions worldwide concerning anthropogenic global warming. However, the warming between 1920 and 1940 proofs the importance of internal variability on yearly and decadal scale. Therefore, it is important to further investigate the role of external and internal forcings on the Arctic climate attribute process and causes leading to changes in the Arctic climate regime (Serreze & Barry 2009). Although much research effort was spent to understand the links and influences of and on the Arctic climate, there is still a need for further insights concerning this topic. Especially the results and discussion about anthropogenic global warming and Arctic amplification put the Arctic into the public and academic focus (Serreze & Barry 2011). However, the early 20th century Arctic warming, although discovered immediately, was scientifically forgotten until recently (Delworth & Knutson 2000, Bengtsson et al 2004, Grant et al 2009, Bekryaev et al 2010). The comparison of this earlier Arctic warming and the recent warming period grants a chance to deepen knowledge about the drivers of Arctic climate and can be used to evaluate the anthropogenic impact. The authors use the Twentieth Century Reanalysis (20CR) dataset and a nudged, reanalysis-driven Aerosol Global Circulation Model (A-GCM) to investigate the impact of atmospheric energy and aerosol fluxes into the Arctic during the 20th century. The 20CR dataset covers the period of 1871 - 2010 with a temporal resolution of 6hr and a spatial resolution of 2° x 2°. For the first time, this dataset (and ist 56 ensemble member) is used to compute the atmospheric energy flux, consisting of sensble heat, latent heat, potential energy and kinetic energy. The values are integrated around 70° N and between 1000 - 100 hPa. Aerosol fluxes for the same domain but for the years 1957 - 2000 are calculated based on the A-GCM nudged to the ECMWF

  17. Unified Scaling Law for flux pinning in practical superconductors: III. Minimum datasets, core parameters, and application of the Extrapolative Scaling Expression

    Science.gov (United States)

    Ekin, Jack W.; Cheggour, Najib; Goodrich, Loren; Splett, Jolene

    2017-03-01

    In Part 2 of these articles, an extensive analysis of pinning-force curves and raw scaling data was used to derive the Extrapolative Scaling Expression (ESE). This is a parameterization of the Unified Scaling Law (USL) that has the extrapolation capability of fundamental unified scaling, coupled with the application ease of a simple fitting equation. Here in Part 3, the accuracy of the ESE relation to interpolate and extrapolate limited critical-current data to obtain complete I c(B,T,ε) datasets is evaluated and compared with present fitting equations. Accuracy is analyzed in terms of root mean square (RMS) error and fractional deviation statistics. Highlights from 92 test cases are condensed and summarized, covering most fitting protocols and proposed parameterizations of the USL. The results show that ESE reliably extrapolates critical currents at fields B, temperatures T, and strains ε that are remarkably different from the fitted minimum dataset. Depending on whether the conductor is moderate-J c or high-J c, effective RMS extrapolation errors for ESE are in the range 2–5 A at 12 T, which approaches the I c measurement error (1–2%). The minimum dataset for extrapolating full I c(B,T,ε) characteristics is also determined from raw scaling data. It consists of one set of I c(B,ε) data at a fixed temperature (e.g., liquid helium temperature), and one set of I c(B,T) data at a fixed strain (e.g., zero applied strain). Error analysis of extrapolations from the minimum dataset with different fitting equations shows that ESE reduces the percentage extrapolation errors at individual data points at high fields, temperatures, and compressive strains down to 1/10th to 1/40th the size of those for extrapolations with present fitting equations. Depending on the conductor, percentage fitting errors for interpolations are also reduced to as little as 1/15th the size. The extrapolation accuracy of the ESE relation offers the prospect of straightforward implementation

  18. Temperature and heat flux datasets of a complex object in a fire plume for the validation of fire and thermal response codes.

    Energy Technology Data Exchange (ETDEWEB)

    Jernigan, Dann A.; Blanchat, Thomas K.

    2010-09-01

    It is necessary to improve understanding and develop temporally- and spatially-resolved integral scale validation data of the heat flux incident to a complex object in addition to measuring the thermal response of said object located within the fire plume for the validation of the SIERRA/FUEGO/SYRINX fire and SIERRA/CALORE codes. To meet this objective, a complex calorimeter with sufficient instrumentation to allow validation of the coupling between FUEGO/SYRINX/CALORE has been designed, fabricated, and tested in the Fire Laboratory for Accreditation of Models and Experiments (FLAME) facility. Validation experiments are specifically designed for direct comparison with the computational predictions. Making meaningful comparison between the computational and experimental results requires careful characterization and control of the experimental features or parameters used as inputs into the computational model. Validation experiments must be designed to capture the essential physical phenomena, including all relevant initial and boundary conditions. This report presents the data validation steps and processes, the results of the penlight radiant heat experiments (for the purpose of validating the CALORE heat transfer modeling of the complex calorimeter), and the results of the fire tests in FLAME.

  19. FLUXNET2015 Dataset: Batteries included

    Science.gov (United States)

    Pastorello, G.; Papale, D.; Agarwal, D.; Trotta, C.; Chu, H.; Canfora, E.; Torn, M. S.; Baldocchi, D. D.

    2016-12-01

    The synthesis datasets have become one of the signature products of the FLUXNET global network. They are composed from contributions of individual site teams to regional networks, being then compiled into uniform data products - now used in a wide variety of research efforts: from plant-scale microbiology to global-scale climate change. The FLUXNET Marconi Dataset in 2000 was the first in the series, followed by the FLUXNET LaThuile Dataset in 2007, with significant additions of data products and coverage, solidifying the adoption of the datasets as a research tool. The FLUXNET2015 Dataset counts with another round of substantial improvements, including extended quality control processes and checks, use of downscaled reanalysis data for filling long gaps in micrometeorological variables, multiple methods for USTAR threshold estimation and flux partitioning, and uncertainty estimates - all of which accompanied by auxiliary flags. This "batteries included" approach provides a lot of information for someone who wants to explore the data (and the processing methods) in detail. This inevitably leads to a large number of data variables. Although dealing with all these variables might seem overwhelming at first, especially to someone looking at eddy covariance data for the first time, there is method to our madness. In this work we describe the data products and variables that are part of the FLUXNET2015 Dataset, and the rationale behind the organization of the dataset, covering the simplified version (labeled SUBSET), the complete version (labeled FULLSET), and the auxiliary products in the dataset.

  20. Photographic dataset: random peppercorns

    CERN Document Server

    Helenius, Teemu

    2016-01-01

    This is a photographic dataset collected for testing image processing algorithms. The idea is to have sets of different but statistically similar images. In this work the images show randomly distributed peppercorns. The dataset is made available at www.fips.fi/photographic_dataset.php .

  1. Dataset Lifecycle Policy

    Science.gov (United States)

    Armstrong, Edward; Tauer, Eric

    2013-01-01

    The presentation focused on describing a new dataset lifecycle policy that the NASA Physical Oceanography DAAC (PO.DAAC) has implemented for its new and current datasets to foster improved stewardship and consistency across its archive. The overarching goal is to implement this dataset lifecycle policy for all new GHRSST GDS2 datasets and bridge the mission statements from the GHRSST Project Office and PO.DAAC to provide the best quality SST data in a cost-effective, efficient manner, preserving its integrity so that it will be available and usable to a wide audience.

  2. Dataset Lifecycle Policy

    Science.gov (United States)

    Armstrong, Edward; Tauer, Eric

    2013-01-01

    The presentation focused on describing a new dataset lifecycle policy that the NASA Physical Oceanography DAAC (PO.DAAC) has implemented for its new and current datasets to foster improved stewardship and consistency across its archive. The overarching goal is to implement this dataset lifecycle policy for all new GHRSST GDS2 datasets and bridge the mission statements from the GHRSST Project Office and PO.DAAC to provide the best quality SST data in a cost-effective, efficient manner, preserving its integrity so that it will be available and usable to a wide audience.

  3. Fixing Dataset Search

    Science.gov (United States)

    Lynnes, Chris

    2014-01-01

    Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

  4. Market Squid Ecology Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  5. Tables and figure datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  6. 2016 TRI Preliminary Dataset

    Science.gov (United States)

    The TRI preliminary dataset includes the most current TRI data available and reflects toxic chemical releases and pollution prevention activities that occurred at TRI facilities during the 2016 calendar year.

  7. National Hydrography Dataset (NHD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  8. USEWOD 2016 Research Dataset

    OpenAIRE

    Luczak-Roesch, Markus; Aljaloud, Saud; Berendt, Bettina; Hollink, Laura

    2016-01-01

    The USEWOD 2016 research dataset is a collection of usage data from Web of Data sources, which have been collected in 2015. It covers sources such as DBpedia, the Linked Data Fragments interface to DBpedia as well as Wikidata page views.\\ud \\ud This dataset can be requested via http://library.soton.ac.uk/datarequest - please also email a scanned copy of the signed Usage Agreement (to ).

  9. The GTZAN dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents...

  10. Dataset - Adviesregel PPL 2010

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an accompanyin

  11. SAMHSA Federated Datasets

    Data.gov (United States)

    Substance Abuse and Mental Health Services Administration, Department of Health and Human Services — This link provides a temporary method of accessing SAMHSA datasets that are found on the interactive portion of the Data.gov catalog. This is a temporary solution...

  12. Wiki-talk Datasets

    OpenAIRE

    Sun, Jun; Kunegis, Jérôme

    2016-01-01

    User interaction networks of Wikipedia of 28 different languages. Nodes (orininal wikipedia user IDs) represent users of the Wikipedia, and an edge from user A to user B denotes that user A wrote a message on the talk page of user B at a certain timestamp. More info: http://yfiua.github.io/academic/2016/02/14/wiki-talk-datasets.html

  13. Microarray Analysis Dataset

    Science.gov (United States)

    This file contains a link for Gene Expression Omnibus and the GSE designations for the publicly available gene expression data used in the study and reflected in Figures 6 and 7 for the Das et al., 2016 paper.This dataset is associated with the following publication:Das, K., C. Wood, M. Lin, A.A. Starkov, C. Lau, K.B. Wallace, C. Corton, and B. Abbott. Perfluoroalky acids-induced liver steatosis: Effects on genes controlling lipid homeostasis. TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 378: 32-52, (2017).

  14. Pilgrims Face Recognition Dataset -- HUFRD

    OpenAIRE

    Aly, Salah A.

    2012-01-01

    In this work, we define a new pilgrims face recognition dataset, called HUFRD dataset. The new developed dataset presents various pilgrims' images taken from outside the Holy Masjid El-Harram in Makkah during the 2011-2012 Hajj and Umrah seasons. Such dataset will be used to test our developed facial recognition and detection algorithms, as well as assess in the missing and found recognition system \\cite{crowdsensing}.

  15. NP-PAH Interaction Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  16. Dataset of NRDA emission data

    Data.gov (United States)

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  17. Turkey Run Landfill Emissions Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  18. Genomic Datasets for Cancer Research

    Science.gov (United States)

    A variety of datasets from genome-wide association studies of cancer and other genotype-phenotype studies, including sequencing and molecular diagnostic assays, are available to approved investigators through the Extramural National Cancer Institute Data Access Committee.

  19. Chemical product and function dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  20. Atlantic Offshore Seabird Dataset Catalog

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — Several bureaus within the Department of Interior compiled available information from seabird observation datasets from the Atlantic Outer Continental Shelf into a...

  1. Detecting bimodality in astronomical datasets

    Science.gov (United States)

    Ashman, Keith A.; Bird, Christina M.; Zepf, Stephen E.

    1994-01-01

    We discuss statistical techniques for detecting and quantifying bimodality in astronomical datasets. We concentrate on the KMM algorithm, which estimates the statistical significance of bimodality in such datasets and objectively partitions data into subpopulations. By simulating bimodal distributions with a range of properties we investigate the sensitivity of KMM to datasets with varying characteristics. Our results facilitate the planning of optimal observing strategies for systems where bimodality is suspected. Mixture-modeling algorithms similar to the KMM algorithm have been used in previous studies to partition the stellar population of the Milky Way into subsystems. We illustrate the broad applicability of KMM by analyzing published data on globular cluster metallicity distributions, velocity distributions of galaxies in clusters, and burst durations of gamma-ray sources. FORTRAN code for the KMM algorithm and directions for its use are available from the authors upon request.

  2. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  3. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  4. Statewide Datasets for Idaho StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset consists of a workspace (folder) containing four gridded datasets and a personal geodatabase. The gridded datasets are a grid of mean annual...

  5. Statewide datasets for Hawaii StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset consists of a workspace (folder) containing 41 gridded datasets and a personal geodatabase. The gridded datasets consist of 28 precipitation-frequency...

  6. CERC Dataset (Full Hadza Data)

    DEFF Research Database (Denmark)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7...

  7. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  8. Methane Flux

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — Methane (CH4) flux is the net rate of methane exchange between an ecosystem and the atmosphere. Data of this variable were generated by the USGS LandCarbon project...

  9. Matchmaking, datasets and physics analysis

    CERN Document Server

    Donno, Flavia; Eulisse, Giulio; Mazzucato, Mirco; Steenberg, Conrad; CERN. Geneva. IT Department; 10.1109/ICPPW.2005.48

    2005-01-01

    Grid enabled physics analysis requires a workload management system (WMS) that takes care of finding suitable computing resources to execute data intensive jobs. A typical example is the WMS available in the LCG2 (also referred to as EGEE-0) software system, used by several scientific experiments. Like many other current grid systems, LCG2 provides a file level granularity for accessing and analysing data. However, application scientists such as high energy physicists often require a higher abstraction level for accessing data, i.e. they prefer to use datasets rather than files in their physics analysis. We have improved the current WMS (in particular the Matchmaker) to allow physicists to express their analysis job requirements in terms of datasets. This required modifications to the WMS and its interface to potential data catalogues. As a result, we propose a simple data location interface that is based on a Web service approach and allows for interoperability of the WMS with new dataset and file catalogues...

  10. Viking Seismometer PDS Archive Dataset

    Science.gov (United States)

    Lorenz, R. D.

    2016-12-01

    The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.

  11. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  12. Plankton, temperature and other measurements found in datasets OSD and CTD taken from the HUDSON, PARIZEAU and other platforms in the North Atlantic, Coastal N Atlantic and other locations from 1989 to 1997 Joint Global Ocean Flux Study Canada from 1989 to 1998 (NODC Accession 0000480)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This report presents data sets collected as part of the Canadian portion of the Joint Global Ocean Flux Study. It contains sets of data, which are stored as text...

  13. 2008 TIGER/Line Nationwide Dataset

    Data.gov (United States)

    California Department of Resources — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  14. VT Hydrography Dataset - High Resolution NHD

    Data.gov (United States)

    Vermont Center for Geographic Information — (Link to Metadata) The Vermont Hydrography Dataset (VHD) is compliant with the local resolution (also known as High Resolution) National Hydrography Dataset (NHD)...

  15. TAO/TRITON, RAMA, and PIRATA Buoys, Quarterly, Buoyancy Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has quarterly Buoyancy Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  16. TAO/TRITON, RAMA, and PIRATA Buoys, Daily, Buoyancy Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has daily Buoyancy Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  17. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction The first part of the year presented an important test for the new Physics Performance and Dataset (PPD) group (cf. its mandate: http://cern.ch/go/8f77). The activity was focused on the validation of the new releases meant for the Monte Carlo (MC) production and the data-processing in 2012 (CMSSW 50X and 52X), and on the preparation of the 2012 operations. In view of the Chamonix meeting, the PPD and physics groups worked to understand the impact of the higher pile-up scenario on some of the flagship Higgs analyses to better quantify the impact of the high luminosity on the CMS physics potential. A task force is working on the optimisation of the reconstruction algorithms and on the code to cope with the performance requirements imposed by the higher event occupancy as foreseen for 2012. Concerning the preparation for the analysis of the new data, a new MC production has been prepared. The new samples, simulated at 8 TeV, are already being produced and the digitisation and recons...

  18. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The PPD activities, in the first part of 2013, have been focused mostly on the final physics validation and preparation for the data reprocessing of the full 8 TeV datasets with the latest calibrations. These samples will be the basis for the preliminary results for summer 2013 but most importantly for the final publications on the 8 TeV Run 1 data. The reprocessing involves also the reconstruction of a significant fraction of “parked data” that will allow CMS to perform a whole new set of precision analyses and searches. In this way the CMSSW release 53X is becoming the legacy release for the 8 TeV Run 1 data. The regular operation activities have included taking care of the prolonged proton-proton data taking and the run with proton-lead collisions that ended in February. The DQM and Data Certification team has deployed a continuous effort to promptly certify the quality of the data. The luminosity-weighted certification efficiency (requiring all sub-detectors to be certified as usab...

  19. Pattern Analysis On Banking Dataset

    Directory of Open Access Journals (Sweden)

    Amritpal Singh

    2015-06-01

    Full Text Available Abstract Everyday refinement and development of technology has led to an increase in the competition between the Tech companies and their going out of way to crack the system andbreak down. Thus providing Data mining a strategically and security-wise important area for many business organizations including banking sector. It allows the analyzes of important information in the data warehouse and assists the banks to look for obscure patterns in a group and discover unknown relationship in the data.Banking systems needs to process ample amount of data on daily basis related to customer information their credit card details limit and collateral details transaction details risk profiles Anti Money Laundering related information trade finance data. Thousands of decisionsbased on the related data are taken in a bank daily. This paper analyzes the banking dataset in the weka environment for the detection of interesting patterns based on its applications ofcustomer acquisition customer retention management and marketing and management of risk fraudulence detections.

  20. An alternative measure of solar activity from detailed sunspot datasets

    CERN Document Server

    Muraközy, Judit; Ludmány, András

    2016-01-01

    The sunspot number is analyzed by using detailed sunspot data, including aspects of observability, sunspot sizes, and proper identification of sunspot groups as discrete entities of the solar activity. The tests show that besides the subjective factors there are also objective causes of the ambiguities in the series of sunspot numbers. To introduce an alternative activity measure the physical meaning of the sunspot number has to be reconsidered. It contains two components whose numbers are governed by different physical mechanisms, this is one source of the ambiguity. This article suggests an activity index, which is the amount of emerged magnetic flux. The only long-term proxy measure is the detailed sunspot area dataset with proper calibration to the magnetic flux amount. The Debrecen sunspot databases provide an appropriate source for the establishment of the suggested activity index.

  1. Critical flux determination by flux-stepping

    DEFF Research Database (Denmark)

    Beier, Søren; Jonsson, Gunnar Eigil

    2010-01-01

    In membrane filtration related scientific literature, often step-by-step determined critical fluxes are reported. Using a dynamic microfiltration device, it is shown that critical fluxes determined from two different flux-stepping methods are dependent upon operational parameters such as step......, such values are more or less useless in itself as critical flux predictors, and constant flux verification experiments have to be conducted to check if the determined critical fluxes call predict sustainable flux regimes. However, it is shown that using the step-by-step predicted critical fluxes as start...

  2. River network routing on the NHDPlus dataset

    OpenAIRE

    David, Cédric; Maidment, David,; Niu, Guo-Yue; Yang, Zong-Liang; Habets, Florence; Eijkhout, Victor

    2011-01-01

    International audience; The mapped rivers and streams of the contiguous United States are available in a geographic information system (GIS) dataset called National Hydrography Dataset Plus (NHDPlus). This hydrographic dataset has about 3 million river and water body reaches along with information on how they are connected into net- works. The U.S. Geological Survey (USGS) National Water Information System (NWIS) provides stream- flow observations at about 20 thousand gauges located on theNHDP...

  3. River network routing on the NHDPlus dataset

    OpenAIRE

    David, Cédric; Maidment, David,; Niu, Guo-Yue; Yang, Zong-Liang; Habets, Florence; Eijkhout, Victor

    2011-01-01

    International audience; The mapped rivers and streams of the contiguous United States are available in a geographic information system (GIS) dataset called National Hydrography Dataset Plus (NHDPlus). This hydrographic dataset has about 3 million river and water body reaches along with information on how they are connected into net- works. The U.S. Geological Survey (USGS) National Water Information System (NWIS) provides stream- flow observations at about 20 thousand gauges located on theNHDP...

  4. Veterans Affairs Suicide Prevention Synthetic Dataset

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  5. A global distributed basin morphometric dataset

    Science.gov (United States)

    Shen, Xinyi; Anagnostou, Emmanouil N.; Mei, Yiwen; Hong, Yang

    2017-01-01

    Basin morphometry is vital information for relating storms to hydrologic hazards, such as landslides and floods. In this paper we present the first comprehensive global dataset of distributed basin morphometry at 30 arc seconds resolution. The dataset includes nine prime morphometric variables; in addition we present formulas for generating twenty-one additional morphometric variables based on combination of the prime variables. The dataset can aid different applications including studies of land-atmosphere interaction, and modelling of floods and droughts for sustainable water management. The validity of the dataset has been consolidated by successfully repeating the Hack's law.

  6. Nanoparticle-organic pollutant interaction dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  7. Veterans Affairs Suicide Prevention Synthetic Dataset Metadata

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  8. Comparison of Radiative Energy Flows in Observational Datasets and Climate Modeling

    Science.gov (United States)

    Raschke, Ehrhard; Kinne, Stefan; Rossow, William B.; Stackhouse, Paul W. Jr.; Wild, Martin

    2016-01-01

    This study examines radiative flux distributions and local spread of values from three major observational datasets (CERES, ISCCP, and SRB) and compares them with results from climate modeling (CMIP3). Examinations of the spread and differences also differentiate among contributions from cloudy and clear-sky conditions. The spread among observational datasets is in large part caused by noncloud ancillary data. Average differences of at least 10Wm(exp -2) each for clear-sky downward solar, upward solar, and upward infrared fluxes at the surface demonstrate via spatial difference patterns major differences in assumptions for atmospheric aerosol, solar surface albedo and surface temperature, and/or emittance in observational datasets. At the top of the atmosphere (TOA), observational datasets are less influenced by the ancillary data errors than at the surface. Comparisons of spatial radiative flux distributions at the TOA between observations and climate modeling indicate large deficiencies in the strength and distribution of model-simulated cloud radiative effects. Differences are largest for lower-altitude clouds over low-latitude oceans. Global modeling simulates stronger cloud radiative effects (CRE) by +30Wmexp -2) over trade wind cumulus regions, yet smaller CRE by about -30Wm(exp -2) over (smaller in area) stratocumulus regions. At the surface, climate modeling simulates on average about 15Wm(exp -2) smaller radiative net flux imbalances, as if climate modeling underestimates latent heat release (and precipitation). Relative to observational datasets, simulated surface net fluxes are particularly lower over oceanic trade wind regions (where global modeling tends to overestimate the radiative impact of clouds). Still, with the uncertainty in noncloud ancillary data, observational data do not establish a reliable reference.

  9. Seasonal variability of turbulent heat fluxes in the tropical Atlantic Ocean based on WHOI flux product

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    The mean seasonal variability of turbulent heat fluxes in the tropical Atlantic Ocean is examined using the Woods Hole Oceanographic Institution (WHOI) flux product. The most turbulent heat fluxes occur during winter seasons in the two hemispheres, whose centers are located at 10°~20°N and 5°~15°S respectively. In climatological ITCZ, the turbulent heat fluxes are the greatest from June to August, and in equatorial cold tongue the turbulent heat fluxes are the greatest from March to May. Seasonal variability of sensible heat flux is smaller than that of latent heat flux and mainly is dominated by the variations of air-sea temperature difference. In the region with larger climatological mean wind speed (air-sea humidity difference), the variations of air-sea humidity difference (wind speed) dominate the variability of latent heat flux. The characteristics of turbulent heat flux yielded from theory analysis and WHOI dataset is consistent in physics which turns out that WHOI's flux data are pretty reliable in the tropical Atlantic Ocean.

  10. Earth's surface heat flux

    Directory of Open Access Journals (Sweden)

    J. H. Davies

    2009-11-01

    Full Text Available We present a revised estimate of Earth's surface heat flux that is based upon a heat flow data-set with 38 347 measurements, which is 55% more than used in previous estimates. Our methodology, like others, accounts for hydrothermal circulation in young oceanic crust by utilising a half-space cooling approximation. For the rest of Earth's surface, we estimate the average heat flow for different geologic domains as defined by global digital geology maps; and then produce the global estimate by multiplying it by the total global area of that geologic domain. The averaging is done on a polygon set which results from an intersection of a 1 degree equal area grid with the original geology polygons; this minimises the adverse influence of clustering. These operations and estimates are derived accurately using methodologies from Geographical Information Science. We consider the virtually un-sampled Antarctica separately and also make a small correction for hot-spots in young oceanic lithosphere. A range of analyses is presented. These, combined with statistical estimates of the error, provide a measure of robustness. Our final preferred estimate is 47±2 TW, which is greater than previous estimates.

  11. Flux-P: Automating Metabolic Flux Analysis

    OpenAIRE

    Ebert, Birgitta E.; Anna-Lena Lamprecht; Bernhard Steffen; Blank, Lars M.

    2012-01-01

    Quantitative knowledge of intracellular fluxes in metabolic networks is invaluable for inferring metabolic system behavior and the design principles of biological systems. However, intracellular reaction rates can not often be calculated directly but have to be estimated; for instance, via 13C-based metabolic flux analysis, a model-based interpretation of stable carbon isotope patterns in intermediates of metabolism. Existing software such as FiatFlux, OpenFLUX or 13CFLUX supports experts in ...

  12. An improved Antarctic dataset for high resolution numerical ice sheet models (ALBMAP v1

    Directory of Open Access Journals (Sweden)

    A. M. Le Brocq

    2010-10-01

    Full Text Available The dataset described in this paper (ALBMAP has been created for the purposes of high-resolution numerical ice sheet modelling of the Antarctic Ice Sheet. It brings together data on the ice sheet configuration (e.g. ice surface and ice thickness and boundary conditions, such as the surface air temperature, accumulation and geothermal heat flux. The ice thickness and basal topography is based on the BEDMAP dataset (Lythe et al., 2001, however, there are a number of inconsistencies within BEDMAP and, since its release, more data has become available. The dataset described here addresses these inconsistencies, including some novel interpolation schemes for sub ice-shelf cavities, and incorporates some major new datasets. The inclusion of new datasets is not exhaustive, this considerable task is left for the next release of BEDMAP, however, the data and procedure documented here provides another step forward and demonstrates the issues that need addressing in a continental scale dataset useful for high resolution ice sheet modelling. The dataset provides an initial condition that is as close as possible to present-day ice sheet configuration, aiding modelling of the response of the Antarctic Ice Sheet to various forcings, which are, at present, not fully understood.

  13. BASE MAP DATASET, LOGAN COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  14. BASE MAP DATASET, KENDALL COUNTY, TEXAS, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  15. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  16. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  17. BASE MAP DATASET, ROGERS COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  18. Simulation of Smart Home Activity Datasets

    Directory of Open Access Journals (Sweden)

    Jonathan Synnott

    2015-06-01

    Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  19. BASE MAP DATASET, HARRISON COUNTY, TEXAS, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  20. BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  1. BASE MAP DATASET, SEQUOYAH COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  2. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  3. BASE MAP DATASET, CADDO COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  4. Climate Prediction Center IR 4km Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  5. Environmental Dataset Gateway (EDG) Search Widget

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  6. BASE MAP DATASET, CHEROKEE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  7. Hajj and Umrah Event Recognition Datasets

    CERN Document Server

    Zawbaa, Hossam

    2012-01-01

    In this note, new Hajj and Umrah Event Recognition datasets (HUER) are presented. The demonstrated datasets are based on videos and images taken during 2011-2012 Hajj and Umrah seasons. HUER is the first collection of datasets covering the six types of Hajj and Umrah ritual events (rotating in Tawaf around Kabaa, performing Sa'y between Safa and Marwa, standing on the mount of Arafat, staying overnight in Muzdalifah, staying two or three days in Mina, and throwing Jamarat). The HUER datasets also contain video and image databases for nine types of human actions during Hajj and Umrah (walking, drinking from Zamzam water, sleeping, smiling, eating, praying, sitting, shaving hairs and ablutions, reading the holy Quran and making duaa). The spatial resolutions are 1280 x 720 pixels for images and 640 x 480 pixels for videos and have lengths of 20 seconds in average with 30 frame per second rates.

  8. VT Hydrography Dataset - cartographic extract lines

    Data.gov (United States)

    Vermont Center for Geographic Information — (Link to Metadata) VHDCARTO is a simplified version of the local resolution Vermont Hydrography Dataset (VHD) that has been enriched with stream perenniality, e.g.,...

  9. VT Hydrography Dataset - cartographic extract polygons

    Data.gov (United States)

    Vermont Center for Geographic Information — (Link to Metadata) VHDCARTO is a simplified version of the local resolution Vermont Hydrography Dataset (VHD) that has been enriched with stream perenniality, e.g.,...

  10. Environmental Dataset Gateway (EDG) REST Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  11. BASE MAP DATASET, GARVIN COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  12. BASE MAP DATASET, OUACHITA COUNTY, ARKANSAS

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  13. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  14. Simulation of Smart Home Activity Datasets.

    Science.gov (United States)

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  15. BASE MAP DATASET, BRYAN COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  16. BASE MAP DATASET, DELAWARE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  17. BASE MAP DATASET, STEPHENS COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  18. BASE MAP DATASET, WOODWARD COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  19. BASE MAP DATASET, HOWARD COUNTY, ARKANSAS

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  20. Relevancy Ranking of Satellite Dataset Search Results

    Science.gov (United States)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2017-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  1. Evaluation of Global Observations-Based Evapotranspiration Datasets and IPCC AR4 Simulations

    Science.gov (United States)

    Mueller, B.; Seneviratne, S. I.; Jimenez, C.; Corti, T.; Hirschi, M.; Balsamo, G.; Ciais, P.; Dirmeyer, P.; Fisher, J. B.; Guo, Z.; hide

    2011-01-01

    Quantification of global land evapotranspiration (ET) has long been associated with large uncertainties due to the lack of reference observations. Several recently developed products now provide the capacity to estimate ET at global scales. These products, partly based on observational data, include satellite ]based products, land surface model (LSM) simulations, atmospheric reanalysis output, estimates based on empirical upscaling of eddycovariance flux measurements, and atmospheric water balance datasets. The LandFlux-EVAL project aims to evaluate and compare these newly developed datasets. Additionally, an evaluation of IPCC AR4 global climate model (GCM) simulations is presented, providing an assessment of their capacity to reproduce flux behavior relative to the observations ]based products. Though differently constrained with observations, the analyzed reference datasets display similar large-scale ET patterns. ET from the IPCC AR4 simulations was significantly smaller than that from the other products for India (up to 1 mm/d) and parts of eastern South America, and larger in the western USA, Australia and China. The inter-product variance is lower across the IPCC AR4 simulations than across the reference datasets in several regions, which indicates that uncertainties may be underestimated in the IPCC AR4 models due to shared biases of these simulations.

  2. An Evaluation of Satellite-Based and Re-Analysis Radiation Budget Datasets Using CERES EBAF Products

    Science.gov (United States)

    Gupta, Shashi; Stackhouse, Paul; Wong, Takmeng; Mikovitz, Colleen; Cox, Stephen; Zhang, Taiping

    2016-04-01

    Top-of-atmosphere (TOA) and surface radiative fluxes from CERES Energy Balanced and Filled (EBAF; Loeb et al., 2009; Kato et al. 2013) products are used to evaluate the performance of several widely used long-term radiation budget datasets. Two of those are derived from satellite observations and five more are from re-analysis products. Satellite-derived datasets are the NASA/GEWEX Surface and TOA Radiation Budget Dataset Release-3 and the ISCCP-FD Dataset. The re-analysis datasets are taken from NCEP-CFSR, ERA-Interim, Japanese Re-Analysis (JRA-55), MERRA and the newly released MERRA2 products. Close examination is made of the differences between MERRA and MERRA2 products for the purpose of identifying improvements achieved for MERRA2. Many of these datasets have undergone quality assessment under the GEWEX Radiative Flux Assessment (RFA) project. For the purposes of the present study, EBAF datasets are treated as reference and other datasets are compared with it. All-sky and clear-sky, SW and LW, TOA and surface fluxes are included in this study. A 7-year period (2001-2007) common to all datasets is chosen for comparisons of global and zonal averages, monthly and annual average timeseries, and their anomalies. These comparisons show significant differences between EBAF and the other datasets. Certain anomalies and trends observed in the satellite-derived datasets are attributable to corresponding features in satellite datasets used as input, especially ISCCP cloud properties. Comparisons of zonal averages showed significant differences especially over higher latitudes even when those differences are not obvious in the global averages. Special emphasis is placed on the analysis of the correspondence between spatial patterns of geographical distribution of the above fluxes on a 7-year average as well as on a month-by-month basis using the Taylor (2001) methodology. Results showed that for 7-year average fields correlation coefficients between spatial patterns

  3. Comparison of Shallow Survey 2012 Multibeam Datasets

    Science.gov (United States)

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  4. Two ultraviolet radiation datasets that cover China

    Science.gov (United States)

    Liu, Hui; Hu, Bo; Wang, Yuesi; Liu, Guangren; Tang, Liqin; Ji, Dongsheng; Bai, Yongfei; Bao, Weikai; Chen, Xin; Chen, Yunming; Ding, Weixin; Han, Xiaozeng; He, Fei; Huang, Hui; Huang, Zhenying; Li, Xinrong; Li, Yan; Liu, Wenzhao; Lin, Luxiang; Ouyang, Zhu; Qin, Boqiang; Shen, Weijun; Shen, Yanjun; Su, Hongxin; Song, Changchun; Sun, Bo; Sun, Song; Wang, Anzhi; Wang, Genxu; Wang, Huimin; Wang, Silong; Wang, Youshao; Wei, Wenxue; Xie, Ping; Xie, Zongqiang; Yan, Xiaoyuan; Zeng, Fanjiang; Zhang, Fawei; Zhang, Yangjian; Zhang, Yiping; Zhao, Chengyi; Zhao, Wenzhi; Zhao, Xueyong; Zhou, Guoyi; Zhu, Bo

    2017-07-01

    Ultraviolet (UV) radiation has significant effects on ecosystems, environments, and human health, as well as atmospheric processes and climate change. Two ultraviolet radiation datasets are described in this paper. One contains hourly observations of UV radiation measured at 40 Chinese Ecosystem Research Network stations from 2005 to 2015. CUV3 broadband radiometers were used to observe the UV radiation, with an accuracy of 5%, which meets the World Meteorology Organization's measurement standards. The extremum method was used to control the quality of the measured datasets. The other dataset contains daily cumulative UV radiation estimates that were calculated using an all-sky estimation model combined with a hybrid model. The reconstructed daily UV radiation data span from 1961 to 2014. The mean absolute bias error and root-mean-square error are smaller than 30% at most stations, and most of the mean bias error values are negative, which indicates underestimation of the UV radiation intensity. These datasets can improve our basic knowledge of the spatial and temporal variations in UV radiation. Additionally, these datasets can be used in studies of potential ozone formation and atmospheric oxidation, as well as simulations of ecological processes.

  5. Quality Visualization of Microarray Datasets Using Circos

    Directory of Open Access Journals (Sweden)

    Martin Koch

    2012-08-01

    Full Text Available Quality control and normalization is considered the most important step in the analysis of microarray data. At present there are various methods available for quality assessments of microarray datasets. However there seems to be no standard visualization routine, which also depicts individual microarray quality. Here we present a convenient method for visualizing the results of standard quality control tests using Circos plots. In these plots various quality measurements are drawn in a circular fashion, thus allowing for visualization of the quality and all outliers of each distinct array within a microarray dataset. The proposed method is intended for use with the Affymetrix Human Genome platform (i.e., GPL 96, GPL570 and GPL571. Circos quality measurement plots are a convenient way for the initial quality estimate of Affymetrix datasets that are stored in publicly available databases.

  6. Pbm: A new dataset for blog mining

    CERN Document Server

    Aziz, Mehwish

    2012-01-01

    Text mining is becoming vital as Web 2.0 offers collaborative content creation and sharing. Now Researchers have growing interest in text mining methods for discovering knowledge. Text mining researchers come from variety of areas like: Natural Language Processing, Computational Linguistic, Machine Learning, and Statistics. A typical text mining application involves preprocessing of text, stemming and lemmatization, tagging and annotation, deriving knowledge patterns, evaluating and interpreting the results. There are numerous approaches for performing text mining tasks, like: clustering, categorization, sentimental analysis, and summarization. There is a growing need to standardize the evaluation of these tasks. One major component of establishing standardization is to provide standard datasets for these tasks. Although there are various standard datasets available for traditional text mining tasks, but there are very few and expensive datasets for blog-mining task. Blogs, a new genre in web 2.0 is a digital...

  7. Genomics dataset of unidentified disclosed isolates

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

  8. Genomics dataset of unidentified disclosed isolates.

    Science.gov (United States)

    Rekadwad, Bhagwan N

    2016-09-01

    Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

  9. Spatial Evolution of Openstreetmap Dataset in Turkey

    Science.gov (United States)

    Zia, M.; Seker, D. Z.; Cakir, Z.

    2016-10-01

    Large amount of research work has already been done regarding many aspects of OpenStreetMap (OSM) dataset in recent years for developed countries and major world cities. On the other hand, limited work is present in scientific literature for developing or underdeveloped ones, because of poor data coverage. In presented study it has been demonstrated how Turkey-OSM dataset has spatially evolved in an 8 year time span (2007-2015) throughout the country. It is observed that there is an east-west spatial biasedness in OSM features density across the country. Population density and literacy level are found to be the two main governing factors controlling this spatial trend. Future research paradigms may involve considering contributors involvement and commenting about dataset health.

  10. Visualising Large Datasets in TOPCAT v4

    CERN Document Server

    Taylor, Mark

    2014-01-01

    TOPCAT is a widely used desktop application for manipulation of astronomical catalogues and other tables, which has long provided fast interactive visualisation features including 1, 2 and 3-d plots, multiple datasets, linked views, color coding, transparency and more. In Version 4 a new plotting library has been written from scratch to deliver new and enhanced visualisation capabilities. This paper describes some of the considerations in the design and implementation, particularly in regard to providing comprehensible interactive visualisation for multi-million point datasets.

  11. ArcHydro global datasets for Hawaii StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset consists of a personal geodatabase containing several vector datasets. These datasets may be used with the ArcHydro Tools, developed by ESRI in...

  12. MVIRI/SEVIRI TOA Radiation Datasets within the Climate Monitoring SAF

    Science.gov (United States)

    Urbain, Manon; Clerbaux, Nicolas; Ipe, Alessandro; Baudrez, Edward; Velazquez Blazquez, Almudena; Moreels, Johan

    2016-04-01

    Within CM SAF, Interim Climate Data Records (ICDR) of Top-Of-Atmosphere (TOA) radiation products from the Geostationary Earth Radiation Budget (GERB) instruments on the Meteosat Second Generation (MSG) satellites have been released in 2013. These datasets (referred to as CM-113 and CM-115, resp. for shortwave (SW) and longwave (LW) radiation) are based on the instantaneous TOA fluxes from the GERB Edition-1 dataset. They cover the time period 2004-2011. Extending these datasets backward in the past is not possible as no GERB instruments were available on the Meteosat First Generation (MFG) satellites. As an alternative, it is proposed to rely on the Meteosat Visible and InfraRed Imager (MVIRI - from 1982 until 2004) and the Spinning Enhanced Visible and Infrared Imager (SEVIRI - from 2004 onward) to generate a long Thematic Climate Data Record (TCDR) from Meteosat instruments. Combining MVIRI and SEVIRI allows an unprecedented temporal (30 minutes / 15 minutes) and spatial (2.5 km / 3 km) resolution compared to the Clouds and the Earth's Radiant Energy System (CERES) products. This is a step forward as it helps to increase the knowledge of the diurnal cycle and the small-scale spatial variations of radiation. The MVIRI/SEVIRI datasets (referred to as CM-23311 and CM-23341, resp. for SW and LW radiation) will provide daily and monthly averaged TOA Reflected Solar (TRS) and Emitted Thermal (TET) radiation in "all-sky" conditions (no clear-sky conditions for this first version of the datasets), as well as monthly averaged of the hourly integrated values. The SEVIRI Solar Channels Calibration (SSCC) and the operational calibration have been used resp. for the SW and LW channels. For MFG, it is foreseen to replace the latter by the EUMETSAT/GSICS recalibration of MVIRI using HIRS. The CERES TRMM angular dependency models have been used to compute TRS fluxes while theoretical models have been used for TET fluxes. The CM-23311 and CM-23341 datasets will cover a 32 years

  13. Thesaurus Dataset of Educational Technology in Chinese

    Science.gov (United States)

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  14. The Geometry of Finite Equilibrium Datasets

    DEFF Research Database (Denmark)

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...

  15. A Neural Network Classifier of Volume Datasets

    CERN Document Server

    Zukić, Dženan; Kolb, Andreas

    2009-01-01

    Many state-of-the art visualization techniques must be tailored to the specific type of dataset, its modality (CT, MRI, etc.), the recorded object or anatomical region (head, spine, abdomen, etc.) and other parameters related to the data acquisition process. While parts of the information (imaging modality and acquisition sequence) may be obtained from the meta-data stored with the volume scan, there is important information which is not stored explicitly (anatomical region, tracing compound). Also, meta-data might be incomplete, inappropriate or simply missing. This paper presents a novel and simple method of determining the type of dataset from previously defined categories. 2D histograms based on intensity and gradient magnitude of datasets are used as input to a neural network, which classifies it into one of several categories it was trained with. The proposed method is an important building block for visualization systems to be used autonomously by non-experts. The method has been tested on 80 datasets,...

  16. Flux-P: Automating Metabolic Flux Analysis

    Directory of Open Access Journals (Sweden)

    Birgitta E. Ebert

    2012-11-01

    Full Text Available Quantitative knowledge of intracellular fluxes in metabolic networks is invaluable for inferring metabolic system behavior and the design principles of biological systems. However, intracellular reaction rates can not often be calculated directly but have to be estimated; for instance, via 13C-based metabolic flux analysis, a model-based interpretation of stable carbon isotope patterns in intermediates of metabolism. Existing software such as FiatFlux, OpenFLUX or 13CFLUX supports experts in this complex analysis, but requires several steps that have to be carried out manually, hence restricting the use of this software for data interpretation to a rather small number of experiments. In this paper, we present Flux-P as an approach to automate and standardize 13C-based metabolic flux analysis, using the Bio-jETI workflow framework. Exemplarily based on the FiatFlux software, it demonstrates how services can be created that carry out the different analysis steps autonomously and how these can subsequently be assembled into software workflows that perform automated, high-throughput intracellular flux analysis of high quality and reproducibility. Besides significant acceleration and standardization of the data analysis, the agile workflow-based realization supports flexible changes of the analysis workflows on the user level, making it easy to perform custom analyses.

  17. Interpolation of diffusion weighted imaging datasets.

    Science.gov (United States)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W; Reislev, Nina L; Paulson, Olaf B; Ptito, Maurice; Siebner, Hartwig R

    2014-12-01

    Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal to the voxel size showed that conventional higher-order interpolation methods improved the geometrical representation of white-matter tracts with reduced partial-volume-effect (PVE), except at tract boundaries. Simulations and interpolation of ex-vivo monkey brain DWI datasets revealed that conventional interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical resolution and more anatomical details in complex regions such as tract boundaries and cortical layers, which are normally only visualized at higher image resolutions. Similar results were found with typical clinical human DWI dataset. However, a possible bias in quantitative values imposed by the interpolation method used should be considered. The results indicate that conventional interpolation methods can be successfully applied to DWI datasets for mining anatomical details that are normally seen only at higher resolutions, which will aid in tractography and microstructural mapping of tissue compartments.

  18. TAO/TRITON, RAMA, and PIRATA Buoys, Daily, Sensible Heat Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has daily Sensible Heat Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  19. TAO/TRITON, RAMA, and PIRATA Buoys, 5-Day, Buoyancy Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has 5-day Buoyancy Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  20. TAO/TRITON, RAMA, and PIRATA Buoys, Daily, Total Heat Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has daily Total Heat Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  1. TAO/TRITON, RAMA, and PIRATA Buoys, Quarterly, Heat Flux Due To Rain

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has quarterly Heat Flux Due To Rain data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  2. TAO/TRITON, RAMA, and PIRATA Buoys, 5-Day, Heat Flux Due To Rain

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has 5-day Heat Flux Due To Rain data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  3. TAO/TRITON, RAMA, and PIRATA Buoys, Monthly, Heat Flux Due To Rain

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has monthly Heat Flux Due To Rain data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  4. TAO/TRITON, RAMA, and PIRATA Buoys, Daily, Heat Flux Due To Rain

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has daily Heat Flux Due To Rain data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  5. TAO/TRITON, RAMA, and PIRATA Buoys, Monthly, Latent Heat Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has monthly Latent Heat Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  6. Surface Turbulent Fluxes, 1x1 deg Yearly Climatology, Set1 and NCEP V2c

    Data.gov (United States)

    National Aeronautics and Space Administration — These data are the Goddard Satellite-based Surface Turbulent Fluxes Version-2c Dataset recently produced through a MEaSURES funded project led by Dr. Chung-Lin Shie...

  7. Surface Turbulent Fluxes, 1x1 deg Seasonal Climatology, Set1 and NCEP V2c

    Data.gov (United States)

    National Aeronautics and Space Administration — These data are the Goddard Satellite-based Surface Turbulent Fluxes Version-2c Dataset recently produced through a MEaSUREs funded project led by Dr. Chung-Lin Shie...

  8. TAO/TRITON, RAMA, and PIRATA Buoys, Quarterly, Sensible Heat Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has quarterly Sensible Heat Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  9. TAO/TRITON, RAMA, and PIRATA Buoys, Quarterly, Latent Heat Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has quarterly Latent Heat Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  10. TAO/TRITON, RAMA, and PIRATA Buoys, Monthly, Total Heat Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has monthly Total Heat Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  11. TAO/TRITON, RAMA, and PIRATA Buoys, Daily, Latent Heat Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has daily Latent Heat Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  12. TAO/TRITON, RAMA, and PIRATA Buoys, 5-Day, Latent Heat Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has 5-day Latent Heat Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  13. TAO/TRITON, RAMA, and PIRATA Buoys, 5-Day, Total Heat Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has 5-day Total Heat Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  14. TAO/TRITON, RAMA, and PIRATA Buoys, Quarterly, Total Heat Flux

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset has quarterly Total Heat Flux data from the TAO/TRITON (Pacific Ocean, http://www.pmel.noaa.gov/tao/), RAMA (Indian Ocean,...

  15. Quantifying uncertainty in observational rainfall datasets

    Science.gov (United States)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  16. Clustering of Emerging Flux

    Science.gov (United States)

    Ruzmaikin, A.

    1997-01-01

    Observations show that newly emerging flux tends to appear on the Solar surface at sites where there is flux already. This results in clustering of solar activity. Standard dynamo theories do not predict this effect.

  17. Method of generating features optimal to a dataset and classifier

    Energy Technology Data Exchange (ETDEWEB)

    Bruillard, Paul J.; Gosink, Luke J.; Jarman, Kenneth D.

    2016-10-18

    A method of generating features optimal to a particular dataset and classifier is disclosed. A dataset of messages is inputted and a classifier is selected. An algebra of features is encoded. Computable features that are capable of describing the dataset from the algebra of features are selected. Irredundant features that are optimal for the classifier and the dataset are selected.

  18. Sharing Video Datasets in Design Research

    DEFF Research Database (Denmark)

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  19. RTK: efficient rarefaction analysis of large datasets.

    Science.gov (United States)

    Saary, Paul; Forslund, Kristoffer; Bork, Peer; Hildebrand, Falk

    2017-08-15

    The rapidly expanding microbiomics field is generating increasingly larger datasets, characterizing the microbiota in diverse environments. Although classical numerical ecology methods provide a robust statistical framework for their analysis, software currently available is inadequate for large datasets and some computationally intensive tasks, like rarefaction and associated analysis. Here we present a software package for rarefaction analysis of large count matrices, as well as estimation and visualization of diversity, richness and evenness. Our software is designed for ease of use, operating at least 7x faster than existing solutions, despite requiring 10x less memory. C ++ and R source code (GPL v.2) as well as binaries are available from https://github.com/hildebra/Rarefaction and from CRAN (https://cran.r-project.org/). bork@embl.de or falk.hildebrand@embl.de. Supplementary data are available at Bioinformatics online.

  20. Interpolation of diffusion weighted imaging datasets

    DEFF Research Database (Denmark)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal...... to the voxel size showed that conventional higher-order interpolation methods improved the geometrical representation of white-matter tracts with reduced partial-volume-effect (PVE), except at tract boundaries. Simulations and interpolation of ex-vivo monkey brain DWI datasets revealed that conventional...

  1. Automatic processing of multimodal tomography datasets.

    Science.gov (United States)

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  2. 3DSEM: A 3D microscopy dataset

    Directory of Open Access Journals (Sweden)

    Ahmad P. Tafti

    2016-03-01

    Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples.

  3. Scalable Machine Learning for Massive Astronomical Datasets

    Science.gov (United States)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  4. Scalable persistent identifier systems for dynamic datasets

    Science.gov (United States)

    Golodoniuc, P.; Cox, S. J. D.; Klump, J. F.

    2016-12-01

    Reliable and persistent identification of objects, whether tangible or not, is essential in information management. Many Internet-based systems have been developed to identify digital data objects, e.g., PURL, LSID, Handle, ARK. These were largely designed for identification of static digital objects. The amount of data made available online has grown exponentially over the last two decades and fine-grained identification of dynamically generated data objects within large datasets using conventional systems (e.g., PURL) has become impractical. We have compared capabilities of various technological solutions to enable resolvability of data objects in dynamic datasets, and developed a dataset-centric approach to resolution of identifiers. This is particularly important in Semantic Linked Data environments where dynamic frequently changing data is delivered live via web services, so registration of individual data objects to obtain identifiers is impractical. We use identifier patterns and pattern hierarchies for identification of data objects, which allows relationships between identifiers to be expressed, and also provides means for resolving a single identifier into multiple forms (i.e. views or representations of an object). The latter can be implemented through (a) HTTP content negotiation, or (b) use of URI querystring parameters. The pattern and hierarchy approach has been implemented in the Linked Data API supporting the United Nations Spatial Data Infrastructure (UNSDI) initiative and later in the implementation of geoscientific data delivery for the Capricorn Distal Footprints project using International Geo Sample Numbers (IGSN). This enables flexible resolution of multi-view persistent identifiers and provides a scalable solution for large heterogeneous datasets.

  5. Data Assimilation and Model Evaluation Experiment Datasets.

    Science.gov (United States)

    Lai, Chung-Chieng A.; Qian, Wen; Glenn, Scott M.

    1994-05-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMÉE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets.The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: 1)collection of observational data; 2) analysis and interpretation; 3) interpolation using the Optimum Thermal Interpolation System package; 4) quality control and re-analysis; and 5) data archiving and software documentation.The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement.Suggestions for DAMEE data usages include 1) ocean modeling and data assimilation studies, 2) diagnosis and theorectical studies, and 3) comparisons with locally detailed observations.

  6. Development of a 10 year (2001–2010 0.1° dataset of land-surface energy balance for mainland China

    Directory of Open Access Journals (Sweden)

    X. Chen

    2014-06-01

    Full Text Available In the absence of high resolution estimates of the components of surface energy balance for China, we developed an algorithm based on the surface energy balance system (SEBS to generate a dataset of land-surface energy and water fluxes on a monthly time scale from 2001 to 2010 at a 0.1° × 0.1° spatial resolution by using multi-satellite and meteorological forcing data. A remote-sensing-based method was developed to estimate canopy height, which was used to calculate roughness length and flux dynamics. The land-surface flux dataset was validated against "ground-truth" observations from 11 flux tower stations in China. The estimated fluxes correlate well with the stations' measurements for different vegetation types and climatic conditions (average bias = 15.3 W m−2, RMSE = 26.4 W m−2. The quality of the data product was also assessed against the GLDAS dataset. The results show that our method is efficient for producing a high-resolution dataset of surface energy flux for the Chinese landmass from satellite data. The validation results demonstrate that more accurate downward long-wave radiation datasets are needed to be able to accurately estimate turbulent fluxes and evapotranspiration when using the surface energy balance model. Trend analysis of land-surface radiation and energy exchange fluxes revealed that the Tibetan Plateau has undergone relatively stronger climatic change than other parts of China during the last 10 years. The capability of the dataset to provide spatial and temporal information on water-cycle and land–atmosphere interactions for the Chinese landmass is examined. The product is free to download for studies of the water cycle and environmental change in China.

  7. Continued development of a global precipitation dataset from satellite and ground-based gauges

    Science.gov (United States)

    Dietzsch, Felix; Andersson, Axel; Schröder, Marc; Ziese, Markus; Becker, Andreas

    2017-04-01

    The project framework MiKlip ("Mittelfristige Klimaprognosen") is focused on the development of an operational forecast system for decadal climate predictions. The objective of the "Daily Precipitation Analysis for the validation of Global medium-range Climate predictions Operationalized" (DAPAGLOCO) project, is the development and operationalization of a global precipitation dataset for forecast validation of the MPI-ESM experiments used in MiKlip. The dataset is a combination of rain gauge measurement data over land and satellite-based precipitation retrievals over ocean. Over land, gauge data from the Global Precipitation Climatology Centre (GPCC) at Deutscher Wetterdienst (DWD) are used. Over ocean, retrievals from the Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data (HOAPS) dataset are used as data source. The currently available dataset consists of 21 years of data (1988-2008) and is provided in different spatial resolutions of 1° and 2.5° on the global scale, and 0.5° for Europe. Rain rates over ocean are currently derived from satellite microwave imagers by using a neuronal network. For the future it is intended to switch this retrieval method to a 1D-Var method. The current state of the dataset is presented, an introduction to the future retrieval and its features is given and first results from evaluation and application are shown.

  8. GLEAM v3: updated land evaporation and root-zone soil moisture datasets

    Science.gov (United States)

    Martens, Brecht; Miralles, Diego; Lievens, Hans; van der Schalie, Robin; de Jeu, Richard; Fernández-Prieto, Diego; Verhoest, Niko

    2016-04-01

    Evaporation determines the availability of surface water resources and the requirements for irrigation. In addition, through its impacts on the water, carbon and energy budgets, evaporation influences the occurrence of rainfall and the dynamics of air temperature. Therefore, reliable estimates of this flux at regional to global scales are of major importance for water management and meteorological forecasting of extreme events. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to the limited global coverage of in situ measurements. Remote sensing techniques can help to overcome the lack of ground data. However, evaporation is not directly observable from satellite systems. As a result, recent efforts have focussed on combining the observable drivers of evaporation within process-based models. The Global Land Evaporation Amsterdam Model (GLEAM, www.gleam.eu) estimates terrestrial evaporation based on daily satellite observations of meteorological drivers of terrestrial evaporation, vegetation characteristics and soil moisture. Since the publication of the first version of the model in 2011, GLEAM has been widely applied for the study of trends in the water cycle, interactions between land and atmosphere and hydrometeorological extreme events. A third version of the GLEAM global datasets will be available from the beginning of 2016 and will be distributed using www.gleam.eu as gateway. The updated datasets include separate estimates for the different components of the evaporative flux (i.e. transpiration, bare-soil evaporation, interception loss, open-water evaporation and snow sublimation), as well as variables like the evaporative stress, potential evaporation, root-zone soil moisture and surface soil moisture. A new dataset using SMOS-based input data of surface soil moisture and vegetation optical depth will also be

  9. Recurrence Analysis of Eddy Covariance Fluxes

    Science.gov (United States)

    Lange, Holger; Flach, Milan; Foken, Thomas; Hauhs, Michael

    2015-04-01

    The eddy covariance (EC) method is one key method to quantify fluxes in biogeochemical cycles in general, and carbon and energy transport across the vegetation-atmosphere boundary layer in particular. EC data from the worldwide net of flux towers (Fluxnet) have also been used to validate biogeochemical models. The high resolution data are usually obtained at 20 Hz sampling rate but are affected by missing values and other restrictions. In this contribution, we investigate the nonlinear dynamics of EC fluxes using Recurrence Analysis (RA). High resolution data from the site DE-Bay (Waldstein-Weidenbrunnen) and fluxes calculated at half-hourly resolution from eight locations (part of the La Thuile dataset) provide a set of very long time series to analyze. After careful quality assessment and Fluxnet standard gapfilling pretreatment, we calculate properties and indicators of the recurrent structure based both on Recurrence Plots as well as Recurrence Networks. Time series of RA measures obtained from windows moving along the time axis are presented. Their interpretation is guided by three different questions: (1) Is RA able to discern periods where the (atmospheric) conditions are particularly suitable to obtain reliable EC fluxes? (2) Is RA capable to detect dynamical transitions (different behavior) beyond those obvious from visual inspection? (3) Does RA contribute to an understanding of the nonlinear synchronization between EC fluxes and atmospheric parameters, which is crucial for both improving carbon flux models as well for reliable interpolation of gaps? (4) Is RA able to recommend an optimal time resolution for measuring EC data and for analyzing EC fluxes? (5) Is it possible to detect non-trivial periodicities with a global RA? We will demonstrate that the answers to all five questions is affirmative, and that RA provides insights into EC dynamics not easily obtained otherwise.

  10. FLUXES FOR MECHANIZED ELECTRIC WELDING,

    Science.gov (United States)

    WELDING FLUXES, WELDING ), (* WELDING , WELDING FLUXES), ARC WELDING , WELDS, STABILITY, POROSITY, WELDING RODS, STEEL, CERAMIC MATERIALS, FLUXES(FUSION), TITANIUM ALLOYS, ALUMINUM ALLOYS, COPPER ALLOYS, ELECTRODEPOSITION

  11. Patterns of Flux Emergence

    Science.gov (United States)

    Title, A.; Cheung, M.

    2008-05-01

    The high spatial resolution and high cadence of the Solar Optical Telescope on the JAXA Hinode spacecraft have allowed capturing many examples of magnetic flux emergence from the scale of granulation to active regions. The observed patterns of emergence are quite similar. Flux emerges as a array of small bipoles on scales from 1 to 5 arc seconds throughout the region that the flux eventually condenses. Because the fields emerging from the underlying flux rope my appear many in small segments and the total flux (absolute sum) is not a conserved quantity the amount of total flux on the surface may vary significantly during the emergence process. Numerical simulations of flux emergence exhibit patterns similar to observations. Movies of both observations and numerical simulations will be presented.

  12. Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

    Science.gov (United States)

    Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil

    2009-07-01

    Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

  13. Development of a SPARK Training Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  14. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  15. Wild Type and PPAR KO Dataset

    Science.gov (United States)

    Data set 1 consists of the experimental data for the Wild Type and PPAR KO animal study and includes data used to prepare Figures 1-4 and Table 1 of the Das et al, 2016 paper.This dataset is associated with the following publication:Das, K., C. Wood, M. Lin, A.A. Starkov, C. Lau, K.B. Wallace, C. Corton, and B. Abbott. Perfluoroalky acids-induced liver steatosis: Effects on genes controlling lipid homeostasis. TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 378: 32-52, (2017).

  16. Standardized Automated CO2/H2O Flux Systems for Individual Research Groups and Flux Networks

    Science.gov (United States)

    Burba, George; Begashaw, Israel; Fratini, Gerardo; Griessbaum, Frank; Kathilankal, James; Xu, Liukang; Franz, Daniela; Joseph, Everette; Larmanou, Eric; Miller, Scott; Papale, Dario; Sabbatini, Simone; Sachs, Torsten; Sakai, Ricardo; McDermitt, Dayle

    2017-04-01

    In recent years, spatial and temporal flux data coverage improved significantly, and on multiple scales, from a single station to continental networks, due to standardization, automation, and management of data collection, and better handling of the extensive amounts of generated data. With more stations and networks, larger data flows from each station, and smaller operating budgets, modern tools are required to effectively and efficiently handle the entire process. Such tools are needed to maximize time dedicated to authoring publications and answering research questions, and to minimize time and expenses spent on data acquisition, processing, and quality control. Thus, these tools should produce standardized verifiable datasets and provide a way to cross-share the standardized data with external collaborators to leverage available funding, promote data analyses and publications. LI-COR gas analyzers are widely used in past and present flux networks such as AmeriFlux, ICOS, AsiaFlux, OzFlux, NEON, CarboEurope, and FluxNet-Canada, etc. These analyzers have gone through several major improvements over the past 30 years. However, in 2016, a three-prong development was completed to create an automated flux system which can accept multiple sonic anemometer and datalogger models, compute final and complete fluxes on-site, merge final fluxes with supporting weather soil and radiation data, monitor station outputs and send automated alerts to researchers, and allow secure sharing and cross-sharing of the station and data access. Two types of these research systems were developed: open-path (LI-7500RS) and enclosed-path (LI-7200RS). Key developments included: • Improvement of gas analyzer performance • Standardization and automation of final flux calculations onsite, and in real-time • Seamless integration with latest site management and data sharing tools In terms of the gas analyzer performance, the RS analyzers are based on established LI-7500/A and LI-7200

  17. Prompt atmospheric neutrino flux

    CERN Document Server

    Jeong, Yu Seon; Enberg, Rikard; Kim, C S; Reno, Mary Hall; Sarcevic, Ina; Stasto, Anna

    2016-01-01

    We evaluate the prompt atmospheric neutrino flux including nuclear correction and $B$ hadron contribution in the different frameworks: NLO perturbative QCD and dipole models. The nuclear effect is larger in the prompt neutrino flux than in the total charm production cross section, and it reduces the fluxes by $10\\% - 30\\%$ depending on the model. We also investigate the uncertainty using the QCD scales allowed by the charm cross section data from RHIC and LHC experiments.

  18. ArcHydro global datasets for Idaho StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset consists of a personal geodatabase containing several vector datasets. This database contains the information needed to link the HUCs together so a...

  19. Strontium removal jar test dataset for all figures and tables.

    Data.gov (United States)

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  20. Overview of the 2013 FireFlux II grass fire field experiment

    Science.gov (United States)

    C.B. Clements; B. Davis; D. Seto; J. Contezac; A. Kochanski; J.-B. Fillipi; N. Lareau; B. Barboni; B. Butler; S. Krueger; R. Ottmar; R. Vihnanek; W.E. Heilman; J. Flynn; M.A. Jenkins; J. Mandel; C. Teske; D. Jimenez; J. O' Brien; B. Lefer

    2014-01-01

    In order to better understand the dynamics of fire-atmosphere interactions and the role of micrometeorology on fire behaviour the FireFlux campaign was conducted in 2006 on a coastal tall-grass prairie in southeast Texas, USA. The FireFlux campaign dataset has become the international standard for evaluating coupled fire-atmosphere model systems. While FireFlux is one...

  1. Statistics of large detrital geochronology datasets

    Science.gov (United States)

    Saylor, J. E.; Sundell, K. E., II

    2014-12-01

    Implementation of quantitative metrics for inter-sample comparison of detrital geochronological data sets has lagged the increase in data set size, and ability to identify sub-populations and quantify their relative proportions. Visual comparison or application of some statistical approaches, particularly the Kolmogorov-Smirnov (KS) test, that initially appeared to provide a simple way of comparing detrital data sets, may be inadequate to quantify their similarity. We evaluate several proposed metrics by applying them to four large synthetic datasets drawn randomly from a parent dataset, as well as a recently published large empirical dataset consisting of four separate (n = ~1000 each) analyses of the same rock sample. Visual inspection of the cumulative probability density functions (CDF) and relative probability density functions (PDF) confirms an increasingly close correlation between data sets as the number of analyses increases. However, as data set size increases the KS test yields lower mean p-values implying greater confidence that the samples were not drawn from the same parent population and high standard deviations despite minor decreases in the mean difference between sample CDFs. We attribute this to the increasing sensitivity of the KS test when applied to larger data sets, which in turn limits its use for quantitative inter-sample comparison in detrital geochronology. Proposed alternative metrics, including Similarity, Likeness (complement to Mismatch), and the coefficient of determination (R2) of a cross-plot of PDF quantiles, point to an increasingly close correlation between data sets with increasing size, although they are the most sensitive at different ranges of data set sizes. The Similarity test is most sensitive to variation in data sets with n < 100 and is relatively insensitive to further convergence between larger data sets. The Likeness test reaches 90% of its asymptotic maximum at data set sizes of n = 200. The PDF cross-plot R2 value

  2. Evaluation of reanalysis datasets against observational soil temperature data over China

    Science.gov (United States)

    Yang, Kai; Zhang, Jingyong

    2017-03-01

    Soil temperature is a key land surface variable, and is a potential predictor for seasonal climate anomalies and extremes. Using observational soil temperature data in China for 1981-2005, we evaluate four reanalysis datasets, the land surface reanalysis of the European Centre for Medium-Range Weather Forecasts (ERA-Interim/Land), the second modern-era retrospective analysis for research and applications (MERRA-2), the National Center for Environmental Prediction Climate Forecast System Reanalysis (NCEP-CFSR), and version 2 of the Global Land Data Assimilation System (GLDAS-2.0), with a focus on 40 cm soil layer. The results show that reanalysis data can mainly reproduce the spatial distributions of soil temperature in summer and winter, especially over the east of China, but generally underestimate their magnitudes. Owing to the influence of precipitation on soil temperature, the four datasets perform better in winter than in summer. The ERA-Interim/Land and GLDAS-2.0 produce spatial characteristics of the climatological mean that are similar to observations. The interannual variability of soil temperature is well reproduced by the ERA-Interim/Land dataset in summer and by the CFSR dataset in winter. The linear trend of soil temperature in summer is well rebuilt by reanalysis datasets. We demonstrate that soil heat fluxes in April-June and in winter are highly correlated with the soil temperature in summer and winter, respectively. Different estimations of surface energy balance components can contribute to different behaviors in reanalysis products in terms of estimating soil temperature. In addition, reanalysis datasets can mainly rebuild the northwest-southeast gradient of soil temperature memory over China.

  3. Video Meteor Fluxes

    Science.gov (United States)

    Campbell-Brown, M. D.; Braid, D.

    2011-01-01

    The flux of meteoroids, or number of meteoroids per unit area per unit time, is critical for calibrating models of meteoroid stream formation and for estimating the hazard to spacecraft from shower and sporadic meteors. Although observations of meteors in the millimetre to centimetre size range are common, flux measurements (particularly for sporadic meteors, which make up the majority of meteoroid flux) are less so. It is necessary to know the collecting area and collection time for a given set of observations, and to correct for observing biases and the sensitivity of the system. Previous measurements of sporadic fluxes are summarized in Figure 1; the values are given as a total number of meteoroids striking the earth in one year to a given limiting mass. The Gr n et al. (1985) flux model is included in the figure for reference. Fluxes for sporadic meteoroids impacting the Earth have been calculated for objects in the centimeter size range using Super-Schmidt observations (Hawkins & Upton, 1958); this study used about 300 meteors, and used only the physical area of overlap of the cameras at 90 km to calculate the flux, corrected for angular speed of meteors, since a large angular speed reduces the maximum brightness of the meteor on the film, and radiant elevation, which takes into account the geometric reduction in flux when the meteors are not perpendicular to the horizontal. They bring up corrections for both partial trails (which tends to increase the collecting area) and incomplete overlap at heights other than 90 km (which tends to decrease it) as effects that will affect the flux, but estimated that the two effects cancelled one another. Halliday et al. (1984) calculated the flux of meteorite-dropping fireballs with fragment masses greater than 50 g, over the physical area of sky accessible to the MORP fireball cameras, counting only observations in clear weather. In the micron size range, LDEF measurements of small craters on spacecraft have been used to

  4. Controlled Vocabulary Standards for Anthropological Datasets

    Directory of Open Access Journals (Sweden)

    Celia Emmelhainz

    2014-07-01

    Full Text Available This article seeks to outline the use of controlled vocabulary standards for qualitative datasets in cultural anthropology, which are increasingly held in researcher-accessible government repositories and online digital libraries. As a humanistic science that can address almost any aspect of life with meaning to humans, cultural anthropology has proven difficult for librarians and archivists to effectively organize. Yet as anthropology moves onto the web, the challenge of organizing and curating information within the field only grows. In considering the subject classification of digital information in anthropology, I ask how we might best use controlled vocabularies for indexing digital anthropological data. After a brief discussion of likely concerns, I outline thesauri which may potentially be used for vocabulary control in metadata fields for language, location, culture, researcher, and subject. The article concludes with recommendations for those existing thesauri most suitable to provide a controlled vocabulary for describing digital objects in the anthropological world.

  5. Visualization of Cosmological Particle-Based Datasets

    CERN Document Server

    Navrátil, Paul Arthur; Bromm, Volker

    2007-01-01

    We describe our visualization process for a particle-based simulation of the formation of the first stars and their impact on cosmic history. The dataset consists of several hundred time-steps of point simulation data, with each time-step containing approximately two million point particles. For each time-step, we interpolate the point data onto a regular grid using a method taken from the radiance estimate of photon mapping. We import the resulting regular grid representation into ParaView, with which we extract isosurfaces across multiple variables. Our images provide insights into the evolution of the early universe, tracing the cosmic transition from an initially homogeneous state to one of increasing complexity. Specifically, our visualizations capture the build-up of regions of ionized gas around the first stars, their evolution, and their complex interactions with the surrounding matter. These observations will guide the upcoming James Webb Space Telescope, the key astronomy mission of the next decade.

  6. Predicting dataset popularity for the CMS experiment

    Science.gov (United States)

    Kuznetsov, V.; Li, T.; Giommi, L.; Bonacorsi, D.; Wildish, T.

    2016-10-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  7. Internationally coordinated glacier monitoring: strategy and datasets

    Science.gov (United States)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    (c) the Randolph Glacier Inventory (RGI), a new and globally complete digital dataset of outlines from about 180,000 glaciers with some meta-information, which has been used for many applications relating to the IPCC AR5 report. Concerning glacier changes, a database (Fluctuations of Glaciers) exists containing information about mass balance, front variations including past reconstructed time series, geodetic changes and special events. Annual mass balance reporting contains information for about 125 glaciers with a subset of 37 glaciers with continuous observational series since 1980 or earlier. Front variation observations of around 1800 glaciers are available from most of the mountain ranges world-wide. This database was recently updated with 26 glaciers having an unprecedented dataset of length changes from from reconstructions of well-dated historical evidence going back as far as the 16th century. Geodetic observations of about 430 glaciers are available. The database is completed by a dataset containing information on special events including glacier surges, glacier lake outbursts, ice avalanches, eruptions of ice-clad volcanoes, etc. related to about 200 glaciers. A special database of glacier photographs contains 13,000 pictures from around 500 glaciers, some of them dating back to the 19th century. A key challenge is to combine and extend the traditional observations with fast evolving datasets from new technologies.

  8. Predicting dataset popularity for the CMS experiment

    CERN Document Server

    INSPIRE-00005122; Li, Ting; Giommi, Luca; Bonacorsi, Daniele; Wildish, Tony

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  9. BDML Datasets - SSBD | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us SSBD BDML Datasets Data detail Data name BDML Datasets DOI 10.18908/lsdba.nbdc01349-001 Desc...This Database Database Description Download License Update History of This Database Site Policy | Contact Us BDML Datasets - SSBD | LSDB Archive ...

  10. SAGE Research Methods Datasets: A Data Analysis Educational Tool.

    Science.gov (United States)

    Vardell, Emily

    2016-01-01

    SAGE Research Methods Datasets (SRMD) is an educational tool designed to offer users the opportunity to obtain hands-on experience with data analysis. Users can search for and browse authentic datasets by method, discipline, and data type. Each of the datasets are supplemented with educational material on the research method and clear guidelines for how to approach data analysis.

  11. A new bed elevation dataset for Greenland

    Science.gov (United States)

    Bamber, J. L.; Griggs, J. A.; Hurkmans, R. T. W. L.; Dowdeswell, J. A.; Gogineni, S. P.; Howat, I.; Mouginot, J.; Paden, J.; Palmer, S.; Rignot, E.; Steinhage, D.

    2013-03-01

    We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM) over the entire island including across the glaciated-ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  12. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  13. Electron heat flux instability

    Science.gov (United States)

    Saeed, Sundas; Sarfraz, M.; Yoon, P. H.; Lazar, M.; Qureshi, M. N. S.

    2017-02-01

    The heat flux instability is an electromagnetic mode excited by a relative drift between the protons and two-component core-halo electrons. The most prominent application may be in association with the solar wind where drifting electron velocity distributions are observed. The heat flux instability is somewhat analogous to the electrostatic Buneman or ion-acoustic instability driven by the net drift between the protons and bulk electrons, except that the heat flux instability operates in magnetized plasmas and possesses transverse electromagnetic polarization. The heat flux instability is also distinct from the electrostatic counterpart in that it requires two electron species with relative drifts with each other. In the literature, the heat flux instability is often called the 'whistler' heat flux instability, but it is actually polarized in the opposite sense to the whistler wave. This paper elucidates all of these fundamental plasma physical properties associated with the heat flux instability starting from a simple model, and gradually building up more complexity towards a solar wind-like distribution functions. It is found that the essential properties of the instability are already present in the cold counter-streaming electron model, and that the instability is absent if the protons are ignored. These instability characteristics are highly reminiscent of the electron firehose instability driven by excessive parallel temperature anisotropy, propagating in parallel direction with respect to the ambient magnetic field, except that the free energy source for the heat flux instability resides in the effective parallel pressure provided by the counter-streaming electrons.

  14. Surface Turbulent Fluxes, 1x1 deg Monthly Grid, Set1 and Interpolated Data V2c

    Data.gov (United States)

    National Aeronautics and Space Administration — These data are the Goddard Satellite-based Surface Turbulent Fluxes Version-2c Dataset recently produced through a MEaSUREs funded project led by Dr. Chung-Lin Shie...

  15. Hydrology Research with the North American Land Data Assimilation System (NLDAS) Datasets at the NASA GES DISC Using Giovanni

    Science.gov (United States)

    Mocko, David M.; Rui, Hualan; Acker, James G.

    2013-01-01

    The North American Land Data Assimilation System (NLDAS) is a collaboration project between NASA/GSFC, NOAA, Princeton Univ., and the Univ. of Washington. NLDAS has created a surface meteorology dataset using the best-available observations and reanalyses the backbone of this dataset is a gridded precipitation analysis from rain gauges. This dataset is used to drive four separate land-surface models (LSMs) to produce datasets of soil moisture, snow, runoff, and surface fluxes. NLDAS datasets are available hourly and extend from Jan 1979 to near real-time with a typical 4-day lag. The datasets are available at 1/8th-degree over CONUS and portions of Canada and Mexico from 25-53 North. The datasets have been extensively evaluated against observations, and are also used as part of a drought monitor. NLDAS datasets are available from the NASA GES DISC and can be accessed via ftp, GDS, Mirador, and Giovanni. GES DISC news articles were published showing figures from the heat wave of 2011, Hurricane Irene, Tropical Storm Lee, and the low-snow winter of 2011-2012. For this presentation, Giovanni-generated figures using NLDAS data from the derecho across the U.S. Midwest and Mid-Atlantic will be presented. Also, similar figures will be presented from the landfall of Hurricane Isaac and the before-and-after drought conditions of the path of the tropical moisture into the central states of the U.S. Updates on future products and datasets from the NLDAS project will also be introduced.

  16. The impact of different soil texture datasets on soil moisture and evapotranspiration simulated by CLM4

    Science.gov (United States)

    Yan, B.; Dickinson, R. E.

    2012-12-01

    Evapotranspiration (ET) is both a moisture flux and an energy flux. It has a substantial impact on climate. Community Land Model Version 4 (CLM4) is a widely used land surface model that simulates moisture, energy and momentum exchange between land and atmosphere. However, ET from CLM4 suffers from relatively low accuracy, especially for ground evaporation. In the parameterization of CLM4, soil texture, by determining soil hydraulic properties, affects the evolution of soil moisture and consequently ET. The three components of ET in climate models can more readily be improved after an evaluation of soil texture dataset's impact on ET simulations. Besides the IGBP-DIS (International Geosphere-Biosphere Programme Data and Information System) dataset used in CLM4, another two US multi-layer soil particle content datasets, Soil Database for the Conterminous United States (CONUS-SOIL) and Global Soil Texture and Derived Water-Holding Capacities (Webb2000), are also used. The latter two show a consistent substantial reduction of both sand and clay contents in Mississippi River Basin. CLM4 is run off line over the US with the three different soil texture datasets (Control Run, CONUS SOIL and Webb2000). Comparisons of simulated soil moisture with NCEP (National Centers for Environmental Prediction) reanalysis data show a higher agreement between CONUS SOIL and NCEP over Mississippi River Basin. Compared with Control Run, soil moisture from the other two runs increases in Western US and decreases in Eastern US, which produces a stronger west-east soil moisture gradient. The response of ET to soil moisture change differs in different climate regimes. In Mississippi River Basin, the change of ET is negligible even if soil moisture increases substantially. On the other hand, in eastern US and US Central Great Plains, ET is very sensitive to soil moisture during the warm seasons, with the change of up to 10 W/m2.

  17. Towards GERB Edition 2 TOA fluxes

    Science.gov (United States)

    Ipe, Alessandro; Baudrez, Edward; Clerbaux, Nicolas; Moreels, Johan; Urbain, Manon; Velazquez Blazquez, Almudena

    2016-04-01

    The Geostationary Earth Radiation Budget (GERB) dataset currently covers more than 10 years from 2004 and makes it an unique record for the climate and the numerical weather prediction scientific communities through assimilation in various models and climate studies. Indeed, the geostationary platform of this broadband radiometer flying together with the Spinning Enhanced Visible and InfraRed Imager (SEVIRI) on board of the Meteosat Second Generation (MSG) satellites allows to estimate TOA solar and thermal fluxes every 15 minutes at spatial resolutions upto 10 km (nadir). In this contribution, we will discuss the improvements that were developped for the Edition 1 post-processing. These includes terminator and sunglint modeling through scene identification extrapolation. Moreover, with the experience acquired by generating the Edition 1 dataset as well as through its critical assessment, an improved Edition 2 of the processing is been implemented. This second version aims to fulfill climate data record standards. Such goal will be achieved by improving the scene identification for the selection of solar angular dependency models (ADMs), the solar and thermal narrow-to-broadband conversion schemes, as well as including new thermal ADMs for radiance-to-flux conversion and GERB instrument ageing correction schemes.

  18. ASSESSING SMALL SAMPLE WAR-GAMING DATASETS

    Directory of Open Access Journals (Sweden)

    W. J. HURLEY

    2013-10-01

    Full Text Available One of the fundamental problems faced by military planners is the assessment of changes to force structure. An example is whether to replace an existing capability with an enhanced system. This can be done directly with a comparison of measures such as accuracy, lethality, survivability, etc. However this approach does not allow an assessment of the force multiplier effects of the proposed change. To gauge these effects, planners often turn to war-gaming. For many war-gaming experiments, it is expensive, both in terms of time and dollars, to generate a large number of sample observations. This puts a premium on the statistical methodology used to examine these small datasets. In this paper we compare the power of three tests to assess population differences: the Wald-Wolfowitz test, the Mann-Whitney U test, and re-sampling. We employ a series of Monte Carlo simulation experiments. Not unexpectedly, we find that the Mann-Whitney test performs better than the Wald-Wolfowitz test. Resampling is judged to perform slightly better than the Mann-Whitney test.

  19. Reconstructing thawing quintessence with multiple datasets

    CERN Document Server

    Lima, Nelson A; Sahlén, Martin; Parkinson, David

    2015-01-01

    In this work we model the quintessence potential in a Taylor series expansion, up to second order, around the present-day value of the scalar field. The field is evolved in a thawing regime assuming zero initial velocity. We use the latest data from the Planck satellite, baryonic acoustic oscillations observations from the Sloan Digital Sky Survey, and Supernovae luminosity distance information from Union$2.1$ to constrain our models parameters, and also include perturbation growth data from WiggleZ. We show explicitly that the growth data does not perform as well as the other datasets in constraining the dark energy parameters we introduce. We also show that the constraints we obtain for our model parameters, when compared to previous works of nearly a decade ago, have not improved significantly. This is indicative of how little dark energy constraints, overall, have improved in the last decade, even when we add new growth of structure data to previous existent types of data.

  20. Workflow to numerically reproduce laboratory ultrasonic datasets

    Institute of Scientific and Technical Information of China (English)

    A. Biryukov; N. Tisato; G. Grasselli

    2014-01-01

    The risks and uncertainties related to the storage of high-level radioactive waste (HLRW) can be reduced thanks to focused studies and investigations. HLRWs are going to be placed in deep geological re-positories, enveloped in an engineered bentonite barrier, whose physical conditions are subjected to change throughout the lifespan of the infrastructure. Seismic tomography can be employed to monitor its physical state and integrity. The design of the seismic monitoring system can be optimized via con-ducting and analyzing numerical simulations of wave propagation in representative repository geometry. However, the quality of the numerical results relies on their initial calibration. The main aim of this paper is to provide a workflow to calibrate numerical tools employing laboratory ultrasonic datasets. The finite difference code SOFI2D was employed to model ultrasonic waves propagating through a laboratory sample. Specifically, the input velocity model was calibrated to achieve a best match between experi-mental and numerical ultrasonic traces. Likely due to the imperfections of the contact surfaces, the resultant velocities of P- and S-wave propagation tend to be noticeably lower than those a priori assigned. Then, the calibrated model was employed to estimate the attenuation in a montmorillonite sample. The obtained low quality factors (Q) suggest that pronounced inelastic behavior of the clay has to be taken into account in geophysical modeling and analysis. Consequently, this contribution should be considered as a first step towards the creation of a numerical tool to evaluate wave propagation in nuclear waste repositories.

  1. Classification of antimicrobial peptides with imbalanced datasets

    Science.gov (United States)

    Camacho, Francy L.; Torres, Rodrigo; Ramos Pollán, Raúl

    2015-12-01

    In the last years, pattern recognition has been applied to several fields for solving multiple problems in science and technology as for example in protein prediction. This methodology can be useful for prediction of activity of biological molecules, e.g. for determination of antimicrobial activity of synthetic and natural peptides. In this work, we evaluate the performance of different physico-chemical properties of peptides (descriptors groups) in the presence of imbalanced data sets, when facing the task of detecting whether a peptide has antimicrobial activity. We evaluate undersampling and class weighting techniques to deal with the class imbalance with different classification methods and descriptor groups. Our classification model showed an estimated precision of 96% showing that descriptors used to codify the amino acid sequences contain enough information to correlate the peptides sequences with their antimicrobial activity by means of learning machines. Moreover, we show how certain descriptor groups (pseudoaminoacid composition type I) work better with imbalanced datasets while others (dipeptide composition) work better with balanced ones.

  2. Net Ecosystem Carbon Flux

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — Net Ecosystem Carbon Flux is defined as the year-over-year change in Total Ecosystem Carbon Stock, or the net rate of carbon exchange between an ecosystem and the...

  3. Aeronet Solar Flux

    Data.gov (United States)

    National Aeronautics and Space Administration — SolRad-Net (Solar Radiation Network) is an established network of ground-based sensors providing high-frequency solar flux measurements in quasi-realtime to the...

  4. Flux in Tallinn

    Index Scriptorium Estoniae

    2004-01-01

    Rahvusvahelise elektroonilise kunsti sümpoosioni ISEA2004 klubiõhtu "Flux in Tallinn" klubis Bon Bon. Eestit esindasid Ropotator, Ars Intel Inc., Urmas Puhkan, Joel Tammik, Taavi Tulev (pseud. Wochtzchee). Klubiõhtu koordinaator Andres Lõo

  5. Flux in Tallinn

    Index Scriptorium Estoniae

    2004-01-01

    Rahvusvahelise elektroonilise kunsti sümpoosioni ISEA2004 klubiõhtu "Flux in Tallinn" klubis Bon Bon. Eestit esindasid Ropotator, Ars Intel Inc., Urmas Puhkan, Joel Tammik, Taavi Tulev (pseud. Wochtzchee). Klubiõhtu koordinaator Andres Lõo

  6. Nitrous Oxide Flux

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — Nitrous Oxide (N20) flux is the net rate of nitrous oxide exchange between an ecosystem and the atmosphere. Data of this variable were generated by the USGS...

  7. Carbon Dioxide Flux Measurement Systems

    Data.gov (United States)

    Oak Ridge National Laboratory — The Southern Great Plains (SGP) carbon dioxide flux (CO2 flux) measurement systems provide half-hour average fluxes of CO2, H2O (latent heat), and sensible heat. The...

  8. Flux Emergence (Theory)

    Science.gov (United States)

    Cheung, Mark C. M.; Isobe, Hiroaki

    2014-12-01

    Magnetic flux emergence from the solar convection zone into the overlying atmosphere is the driver of a diverse range of phenomena associated with solar activity. In this article, we introduce theoretical concepts central to the study of flux emergence and discuss how the inclusion of different physical effects (e.g., magnetic buoyancy, magnetoconvection, reconnection, magnetic twist, interaction with ambient field) in models impact the evolution of the emerging field and plasma.

  9. Theoretical magnetic flux emergence

    OpenAIRE

    MacTaggart, David

    2011-01-01

    Magnetic flux emergence is the subject of how magnetic fields from the solar interior can rise and expand into the atmosphere to produce active regions. It is the link that joins dynamics in the convection zone with dynamics in the atmosphere. In this thesis, we study many aspects of magnetic flux emergence through mathematical modelling and computer simulations. Our primary aim is to understand the key physical processes that lie behind emergence. The first chapter intro...

  10. Flux Emergence (Theory

    Directory of Open Access Journals (Sweden)

    Mark C. M. Cheung

    2014-07-01

    Full Text Available Magnetic flux emergence from the solar convection zone into the overlying atmosphere is the driver of a diverse range of phenomena associated with solar activity. In this article, we introduce theoretical concepts central to the study of flux emergence and discuss how the inclusion of different physical effects (e.g., magnetic buoyancy, magnetoconvection, reconnection, magnetic twist, interaction with ambient field in models impact the evolution of the emerging field and plasma.

  11. A reference GNSS tropospheric dataset over Europe.

    Science.gov (United States)

    Pacione, Rosa; Di Tomaso, Simona

    2016-04-01

    The present availability of 18 years of GNSS data belonging to the European Permanent Network (EPN, http://www.epncb.oma.be/) is a valuable database for the development of a climate data record of GNSS tropospheric products over Europe. This dataset has high potential for monitoring trend and variability in atmospheric water vapour, improving the knowledge of climatic trends of atmospheric water vapour and being useful for global and regional NWP reanalyses as well as climate model simulations. In the framework of the EPN-Repro2, a second reprocessing campaign of the EPN, five Analysis Centres have homogenously reprocessed the EPN network for the 1996-2013. Three Analysis Centres are providing homogenously reprocessed solutions for the entire network, which are analyzed by the three different software packages: Bernese, GAMIT and GIPSY-OASIS. Smaller subnetworks based on Bernese 5.2 are also provided. A huge effort is made for providing solutions that are the basis for deriving new coordinates, velocities and troposphere parameters, Zenith Tropospheric Delays and Horizontal Gradients, for the entire EPN. These individual contributions are combined in order to provide the official EPN reprocessed products. A preliminary tropospheric combined solution for the period 1996-2013 has been carried out. It is based on all the available homogenously reprocessed solutions and it offers the possibility to assess each of them prior to the ongoing final combination. We will present the results of the EPN Repro2 tropospheric combined products and how the climate community will benefit from them. Aknowledgment.The EPN Repro2 working group is acknowledged for providing the EPN solutions used in this work. E-GEOS activity is carried out in the framework of ASI contract 2015-050-R.0.

  12. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. A. Griggs

    2012-11-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2011. Around 344 000 line kilometres of airborne data were used, with the majority of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated/ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice shelf thickness was determined where a floating tongue exists, in particular in the north. The across-track spacing between flight lines warranted interpolation at 1 km postings near the ice sheet margin and 2.5 km in the interior. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±6 m to about ±200 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new data sets, particularly along the ice sheet margin, where ice velocity is highest and changes most marked. We use the new bed and surface DEMs to calculate the hydraulic potential for subglacial flow and present the large scale pattern of water routing. We estimate that the volume of ice included in our land/ice mask would raise eustatic sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  13. Climate Model Evaluation using New Datasets from the Clouds and the Earth's Radiant Energy System (CERES)

    Science.gov (United States)

    Loeb, Norman G.; Wielicki, Bruce A.; Doelling, David R.

    2008-01-01

    There are some in the science community who believe that the response of the climate system to anthropogenic radiative forcing is unpredictable and we should therefore call off the quest . The key limitation in climate predictability is associated with cloud feedback. Narrowing the uncertainty in cloud feedback (and therefore climate sensitivity) requires optimal use of the best available observations to evaluate and improve climate model processes and constrain climate model simulations over longer time scales. The Clouds and the Earth s Radiant Energy System (CERES) is a satellite-based program that provides global cloud, aerosol and radiative flux observations for improving our understanding of cloud-aerosol-radiation feedbacks in the Earth s climate system. CERES is the successor to the Earth Radiation Budget Experiment (ERBE), which has widely been used to evaluate climate models both at short time scales (e.g., process studies) and at decadal time scales. A CERES instrument flew on the TRMM satellite and captured the dramatic 1998 El Nino, and four other CERES instruments are currently flying aboard the Terra and Aqua platforms. Plans are underway to fly the remaining copy of CERES on the upcoming NPP spacecraft (mid-2010 launch date). Every aspect of CERES represents a significant improvement over ERBE. While both CERES and ERBE measure broadband radiation, CERES calibration is a factor of 2 better than ERBE. In order to improve the characterization of clouds and aerosols within a CERES footprint, we use coincident higher-resolution imager observations (VIRS, MODIS or VIIRS) to provide a consistent cloud-aerosol-radiation dataset at climate accuracy. Improved radiative fluxes are obtained by using new CERES-derived Angular Distribution Models (ADMs) for converting measured radiances to fluxes. CERES radiative fluxes are a factor of 2 more accurate than ERBE overall, but the improvement by cloud type and at high latitudes can be as high as a factor of 5

  14. Application of Huang-Hilbert Transforms to Geophysical Datasets

    Science.gov (United States)

    Duffy, Dean G.

    2003-01-01

    The Huang-Hilbert transform is a promising new method for analyzing nonstationary and nonlinear datasets. In this talk I will apply this technique to several important geophysical datasets. To understand the strengths and weaknesses of this method, multi- year, hourly datasets of the sea level heights and solar radiation will be analyzed. Then we will apply this transform to the analysis of gravity waves observed in a mesoscale observational net.

  15. Norwegian Hydrological Reference Dataset for Climate Change Studies

    Energy Technology Data Exchange (ETDEWEB)

    Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

    2012-07-01

    Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)

  16. Pgu-Face: A dataset of partially covered facial images

    Directory of Open Access Journals (Sweden)

    Seyed Reza Salari

    2016-12-01

    Full Text Available In this article we introduce a human face image dataset. Images were taken in close to real-world conditions using several cameras, often mobile phone׳s cameras. The dataset contains 224 subjects imaged under four different figures (a nearly clean-shaven countenance, a nearly clean-shaven countenance with sunglasses, an unshaven or stubble face countenance, an unshaven or stubble face countenance with sunglasses in up to two recording sessions. Existence of partially covered face images in this dataset could reveal the robustness and efficiency of several facial image processing algorithms. In this work we present the dataset and explain the recording method.

  17. Compression method based on training dataset of SVM

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The method to compress the training dataset of Support Vector Machine (SVM) based on the character of the Support Vector Machine is proposed.First,the distance between the unit in two training datasets,and then the samples that keep away from hyper-plane are discarded in order to compress the training dataset.The time spent in training SVM with the training dataset compressed by the method is shortened obviously.The result of the experiment shows that the algorithm is effective.

  18. Providing Geographic Datasets as Linked Data in Sdi

    Science.gov (United States)

    Hietanen, E.; Lehto, L.; Latvala, P.

    2016-06-01

    In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium's (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  19. Synthetic neuronal datasets for benchmarking directed functional connectivity metrics

    National Research Council Canada - National Science Library

    Rodrigues, João; Andrade, Alexandre

    2015-01-01

    Background. Datasets consisting of synthetic neural data generated with quantifiable and controlled parameters are a valuable asset in the process of testing and validating directed functional connectivity metrics...

  20. BIA Indian Lands Dataset (Indian Lands of the United States)

    Data.gov (United States)

    Federal Geographic Data Committee — The American Indian Reservations / Federally Recognized Tribal Entities dataset depicts feature location, selected demographics and other associated data for the 561...

  1. Mechanistic analysis of multi-omics datasets to generate kinetic parameters for constraint-based metabolic models

    Directory of Open Access Journals (Sweden)

    Cotten Cameron

    2013-01-01

    Full Text Available Abstract Background Constraint-based modeling uses mass balances, flux capacity, and reaction directionality constraints to predict fluxes through metabolism. Although transcriptional regulation and thermodynamic constraints have been integrated into constraint-based modeling, kinetic rate laws have not been extensively used. Results In this study, an in vivo kinetic parameter estimation problem was formulated and solved using multi-omic data sets for Escherichia coli. To narrow the confidence intervals for kinetic parameters, a series of kinetic model simplifications were made, resulting in fewer kinetic parameters than the full kinetic model. These new parameter values are able to account for flux and concentration data from 20 different experimental conditions used in our training dataset. Concentration estimates from the simplified kinetic model were within one standard deviation for 92.7% of the 790 experimental measurements in the training set. Gibbs free energy changes of reaction were calculated to identify reactions that were often operating close to or far from equilibrium. In addition, enzymes whose activities were positively or negatively influenced by metabolite concentrations were also identified. The kinetic model was then used to calculate the maximum and minimum possible flux values for individual reactions from independent metabolite and enzyme concentration data that were not used to estimate parameter values. Incorporating these kinetically-derived flux limits into the constraint-based metabolic model improved predictions for uptake and secretion rates and intracellular fluxes in constraint-based models of central metabolism. Conclusions This study has produced a method for in vivo kinetic parameter estimation and identified strategies and outcomes of kinetic model simplification. We also have illustrated how kinetic constraints can be used to improve constraint-based model predictions for intracellular fluxes and biomass

  2. Mechanistic analysis of multi-omics datasets to generate kinetic parameters for constraint-based metabolic models.

    Science.gov (United States)

    Cotten, Cameron; Reed, Jennifer L

    2013-01-30

    Constraint-based modeling uses mass balances, flux capacity, and reaction directionality constraints to predict fluxes through metabolism. Although transcriptional regulation and thermodynamic constraints have been integrated into constraint-based modeling, kinetic rate laws have not been extensively used. In this study, an in vivo kinetic parameter estimation problem was formulated and solved using multi-omic data sets for Escherichia coli. To narrow the confidence intervals for kinetic parameters, a series of kinetic model simplifications were made, resulting in fewer kinetic parameters than the full kinetic model. These new parameter values are able to account for flux and concentration data from 20 different experimental conditions used in our training dataset. Concentration estimates from the simplified kinetic model were within one standard deviation for 92.7% of the 790 experimental measurements in the training set. Gibbs free energy changes of reaction were calculated to identify reactions that were often operating close to or far from equilibrium. In addition, enzymes whose activities were positively or negatively influenced by metabolite concentrations were also identified. The kinetic model was then used to calculate the maximum and minimum possible flux values for individual reactions from independent metabolite and enzyme concentration data that were not used to estimate parameter values. Incorporating these kinetically-derived flux limits into the constraint-based metabolic model improved predictions for uptake and secretion rates and intracellular fluxes in constraint-based models of central metabolism. This study has produced a method for in vivo kinetic parameter estimation and identified strategies and outcomes of kinetic model simplification. We also have illustrated how kinetic constraints can be used to improve constraint-based model predictions for intracellular fluxes and biomass yield and identify potential metabolic limitations through the

  3. Public Availability to ECS Collected Datasets

    Science.gov (United States)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  4. The Flux-Flux Correlation Function for Anharmonic Barriers

    CERN Document Server

    Goussev, Arseni; Waalkens, Holger; Wiggins, Stephen

    2010-01-01

    The flux-flux correlation function formalism is a standard and widely used approach for the computation of reaction rates. In this paper we introduce a method to compute the classical and quantum flux-flux correlation functions for anharmonic barriers essentially analytically through the use of the classical and quantum normal forms. In the quantum case we show that the quantum normal form reduces the computation of the flux-flux correlation function to that of an effective one dimensional anharmonic barrier. The example of the computation of the quantum flux-flux correlation function for a fourth order anharmonic barrier is worked out in detail, and we present an analytical expression for the quantum mechanical microcanonical flux-flux correlation function. We then give a discussion of the short-time and harmonic limits.

  5. Flux pinning in superconductors

    CERN Document Server

    Matsushita, Teruo

    2014-01-01

    The book covers the flux pinning mechanisms and properties and the electromagnetic phenomena caused by the flux pinning common for metallic, high-Tc and MgB2 superconductors. The condensation energy interaction known for normal precipitates or grain boundaries and the kinetic energy interaction proposed for artificial Nb pins in Nb-Ti, etc., are introduced for the pinning mechanism. Summation theories to derive the critical current density are discussed in detail. Irreversible magnetization and AC loss caused by the flux pinning are also discussed. The loss originally stems from the ohmic dissipation of normal electrons in the normal core driven by the electric field induced by the flux motion. The readers will learn why the resultant loss is of hysteresis type in spite of such mechanism. The influence of the flux pinning on the vortex phase diagram in high Tc superconductors is discussed, and the dependencies of the irreversibility field are also described on other quantities such as anisotropy of supercondu...

  6. Flux Pinning in Superconductors

    CERN Document Server

    Matsushita, Teruo

    2007-01-01

    The book covers the flux pinning mechanisms and properties and the electromagnetic phenomena caused by the flux pinning common for metallic, high-Tc and MgB2 superconductors. The condensation energy interaction known for normal precipitates or grain boundaries and the kinetic energy interaction proposed for artificial Nb pins in Nb-Ti, etc., are introduced for the pinning mechanism. Summation theories to derive the critical current density are discussed in detail. Irreversible magnetization and AC loss caused by the flux pinning are also discussed. The loss originally stems from the ohmic dissipation of normal electrons in the normal core driven by the electric field induced by the flux motion. The readers will learn why the resultant loss is of hysteresis type in spite of such mechanism. The influence of the flux pinning on the vortex phase diagram in high Tc superconductors is discussed, and the dependencies of the irreversibility field are also described on other quantities such as anisotropy of supercondu...

  7. Quality control of CarboEurope flux data - Part 2: Inter-comparison of eddy-covariance software

    NARCIS (Netherlands)

    Mauder, M.; Foken, T.; Clement, R.; Elbers, J.A.; Eugster, W.; Grunwald, T.; Heusinkveld, B.G.; Kolle, O.

    2008-01-01

    As part of the quality assurance and quality control activities within the CarboEurope-IP network, a comparison of eddy-covariance software was conducted. For four five-day datasets, CO2 flux estimates were calculated by seven commonly used software packages to assess the uncertainty of CO2 flux est

  8. Development of a Global Historic Monthly Mean Precipitation Dataset

    Institute of Scientific and Technical Information of China (English)

    杨溯; 徐文慧; 许艳; 李庆祥

    2016-01-01

    Global historic precipitation dataset is the base for climate and water cycle research. There have been several global historic land surface precipitation datasets developed by international data centers such as the US National Climatic Data Center (NCDC), European Climate Assessment & Dataset project team, Met Office, etc., but so far there are no such datasets developed by any research institute in China. In addition, each dataset has its own focus of study region, and the existing global precipitation datasets only contain sparse observational stations over China, which may result in uncertainties in East Asian precipitation studies. In order to take into account comprehensive historic information, users might need to employ two or more datasets. However, the non-uniform data formats, data units, station IDs, and so on add extra difficulties for users to exploit these datasets. For this reason, a complete historic precipitation dataset that takes advantages of various datasets has been developed and produced in the National Meteorological Information Center of China. Precipitation observations from 12 sources are aggregated, and the data formats, data units, and station IDs are unified. Duplicated stations with the same ID are identified, with duplicated observations removed. Consistency test, correlation coefficient test, significance t-test at the 95% confidence level, and significance F-test at the 95% confidence level are conducted first to ensure the data reliability. Only those datasets that satisfy all the above four criteria are integrated to produce the China Meteorological Administration global precipitation (CGP) historic precipitation dataset version 1.0. It contains observations at 31 thousand stations with 1.87 × 107 data records, among which 4152 time series of precipitation are longer than 100 yr. This dataset plays a critical role in climate research due to its advantages in large data volume and high density of station network, compared to

  9. Development of a global historic monthly mean precipitation dataset

    Science.gov (United States)

    Yang, Su; Xu, Wenhui; Xu, Yan; Li, Qingxiang

    2016-04-01

    Global historic precipitation dataset is the base for climate and water cycle research. There have been several global historic land surface precipitation datasets developed by international data centers such as the US National Climatic Data Center (NCDC), European Climate Assessment & Dataset project team, Met Office, etc., but so far there are no such datasets developed by any research institute in China. In addition, each dataset has its own focus of study region, and the existing global precipitation datasets only contain sparse observational stations over China, which may result in uncertainties in East Asian precipitation studies. In order to take into account comprehensive historic information, users might need to employ two or more datasets. However, the non-uniform data formats, data units, station IDs, and so on add extra difficulties for users to exploit these datasets. For this reason, a complete historic precipitation dataset that takes advantages of various datasets has been developed and produced in the National Meteorological Information Center of China. Precipitation observations from 12 sources are aggregated, and the data formats, data units, and station IDs are unified. Duplicated stations with the same ID are identified, with duplicated observations removed. Consistency test, correlation coefficient test, significance t-test at the 95% confidence level, and significance F-test at the 95% confidence level are conducted first to ensure the data reliability. Only those datasets that satisfy all the above four criteria are integrated to produce the China Meteorological Administration global precipitation (CGP) historic precipitation dataset version 1.0. It contains observations at 31 thousand stations with 1.87 × 107 data records, among which 4152 time series of precipitation are longer than 100 yr. This dataset plays a critical role in climate research due to its advantages in large data volume and high density of station network, compared to

  10. Accuracy assessment of gridded precipitation datasets in the Himalayas

    Science.gov (United States)

    Khan, A.

    2015-12-01

    Accurate precipitation data are vital for hydro-climatic modelling and water resources assessments. Based on mass balance calculations and Turc-Budyko analysis, this study investigates the accuracy of twelve widely used precipitation gridded datasets for sub-basins in the Upper Indus Basin (UIB) in the Himalayas-Karakoram-Hindukush (HKH) region. These datasets are: 1) Global Precipitation Climatology Project (GPCP), 2) Climate Prediction Centre (CPC) Merged Analysis of Precipitation (CMAP), 3) NCEP / NCAR, 4) Global Precipitation Climatology Centre (GPCC), 5) Climatic Research Unit (CRU), 6) Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE), 7) Tropical Rainfall Measuring Mission (TRMM), 8) European Reanalysis (ERA) interim data, 9) PRINCETON, 10) European Reanalysis-40 (ERA-40), 11) Willmott and Matsuura, and 12) WATCH Forcing Data based on ERA interim (WFDEI). Precipitation accuracy and consistency was assessed by physical mass balance involving sum of annual measured flow, estimated actual evapotranspiration (average of 4 datasets), estimated glacier mass balance melt contribution (average of 4 datasets), and ground water recharge (average of 3 datasets), during 1999-2010. Mass balance assessment was complemented by Turc-Budyko non-dimensional analysis, where annual precipitation, measured flow and potential evapotranspiration (average of 5 datasets) data were used for the same period. Both analyses suggest that all tested precipitation datasets significantly underestimate precipitation in the Karakoram sub-basins. For the Hindukush and Himalayan sub-basins most datasets underestimate precipitation, except ERA-interim and ERA-40. The analysis indicates that for this large region with complicated terrain features and stark spatial precipitation gradients the reanalysis datasets have better consistency with flow measurements than datasets derived from records of only sparsely distributed climatic

  11. Protected Flux Pairing Qubit

    Science.gov (United States)

    Bell, Matthew; Zhang, Wenyuan; Ioffe, Lev; Gershenson, Michael

    2014-03-01

    We have studied the coherent flux tunneling in a qubit containing two submicron Josephson junctions shunted by a superinductor (a dissipationless inductor with an impedance much greater than the resistance quantum). The two low energy quantum states of this device, " open="|"> 0 and " open="|"> 1, are represented by even and odd number of fluxes in the loop, respectively. This device is dual to the charge pairing Josephson rhombi qubit. The spectrum of the device, studied by microwave spectroscopy, reflects the interference between coherent quantum phase slips in the two junctions (the Aharonov-Casher effect). The time domain measurements demonstrate the suppression of the qubit's energy relaxation in the protected regime, which illustrates the potential of this flux pairing device as a protected quantum circuit. Templeton Foundation, NSF, and ARO.

  12. Solar Magnetic Flux Ropes

    Indian Academy of Sciences (India)

    Boris Filippov; Olesya Martsenyuk; Abhishek K. Srivastava; Wahab Uddin

    2015-03-01

    In the early 1990s, it was found that the strongest disturbances of the space–weather were associated with huge ejections of plasma from the solar corona, which took the form of magnetic clouds when moved from the Sun. It is the collisions of the magnetic clouds with the Earth's magnetosphere that lead to strong, sometimes catastrophic changes in space–weather. The onset of a coronal mass ejection (CME) is sudden and no reliable forerunners of CMEs have been found till date. The CME prediction methodologies are less developed compared to the methods developed for the prediction of solar flares. The most probable initial magnetic configuration of a CME is a flux rope consisting of twisted field lines which fill the whole volume of a dark coronal cavity. The flux ropes can be in stable equilibrium in the coronal magnetic field for weeks and even months, but suddenly they lose their stability and erupt with high speed. Their transition to the unstable phase depends on the parameters of the flux rope (i.e., total electric current, twist, mass loading, etc.), as well as on the properties of the ambient coronal magnetic field. One of the major governing factors is the vertical gradient of the coronal magnetic field, which is estimated as decay index (). Cold dense prominence material can be collected in the lower parts of the helical flux tubes. Filaments are, therefore, good tracers of the flux ropes in the corona, which become visible long before the beginning of the eruption. The perspectives of the filament eruptions and following CMEs can be estimated by a comparison of observed filament heights with calculated decay index distributions. The present paper reviews the formation of magnetic flux ropes, their stable and unstable phases, eruption conditions, and also discusses their physical implications in the solar corona.

  13. Lunar Meteorites: A Global Geochemical Dataset

    Science.gov (United States)

    Zeigler, R. A.; Joy, K. H.; Arai, T.; Gross, J.; Korotev, R. L.; McCubbin, F. M.

    2017-01-01

    To date, the world's meteorite collections contain over 260 lunar meteorite stones representing at least 120 different lunar meteorites. Additionally, there are 20-30 as yet unnamed stones currently in the process of being classified. Collectively these lunar meteorites likely represent 40-50 distinct sampling locations from random locations on the Moon. Although the exact provenance of each individual lunar meteorite is unknown, collectively the lunar meteorites represent the best global average of the lunar crust. The Apollo sites are all within or near the Procellarum KREEP Terrane (PKT), thus lithologies from the PKT are overrepresented in the Apollo sample suite. Nearly all of the lithologies present in the Apollo sample suite are found within the lunar meteorites (high-Ti basalts are a notable exception), and the lunar meteorites contain several lithologies not present in the Apollo sample suite (e.g., magnesian anorthosite). This chapter will not be a sample-by-sample summary of each individual lunar meteorite. Rather, the chapter will summarize the different types of lunar meteorites and their relative abundances, comparing and contrasting the lunar meteorite sample suite with the Apollo sample suite. This chapter will act as one of the introductory chapters to the volume, introducing lunar samples in general and setting the stage for more detailed discussions in later more specialized chapters. The chapter will begin with a description of how lunar meteorites are ejected from the Moon, how deep samples are being excavated from, what the likely pairing relationships are among the lunar meteorite samples, and how the lunar meteorites can help to constrain the impactor flux in the inner solar system. There will be a discussion of the biases inherent to the lunar meteorite sample suite in terms of underrepresented lithologies or regions of the Moon, and an examination of the contamination and limitations of lunar meteorites due to terrestrial weathering. The

  14. Global Drought Assessment using a Multi-Model Dataset

    NARCIS (Netherlands)

    Lanen, van H.A.J.; Huijgevoort, van M.H.J.; Corzo Perez, G.; Wanders, N.; Hazenberg, P.; Loon, van A.F.; Estifanos, S.; Melsen, L.A.

    2011-01-01

    Large-scale models are often applied to study past drought (forced with global reanalysis datasets) and to assess future drought (using downscaled, bias-corrected forcing from climate models). The EU project WATer and global CHange (WATCH) provides a 0.5o degree global dataset of meteorological

  15. Really big data: Processing and analysis of large datasets

    Science.gov (United States)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  16. Primary Datasets for Case Studies of River-Water Quality

    Science.gov (United States)

    Goulder, Raymond

    2008-01-01

    Level 6 (final-year BSc) students undertook case studies on between-site and temporal variation in river-water quality. They used professionally-collected datasets supplied by the Environment Agency. The exercise gave students the experience of working with large, real-world datasets and led to their understanding how the quality of river water is…

  17. An Analysis of the GTZAN Music Genre Dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2012-01-01

    Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...

  18. Primary Datasets for Case Studies of River-Water Quality

    Science.gov (United States)

    Goulder, Raymond

    2008-01-01

    Level 6 (final-year BSc) students undertook case studies on between-site and temporal variation in river-water quality. They used professionally-collected datasets supplied by the Environment Agency. The exercise gave students the experience of working with large, real-world datasets and led to their understanding how the quality of river water is…

  19. Global Drought Assessment using a Multi-Model Dataset

    NARCIS (Netherlands)

    Lanen, van H.A.J.; Huijgevoort, van M.H.J.; Corzo Perez, G.; Wanders, N.; Hazenberg, P.; Loon, van A.F.; Estifanos, S.; Melsen, L.A.

    2011-01-01

    Large-scale models are often applied to study past drought (forced with global reanalysis datasets) and to assess future drought (using downscaled, bias-corrected forcing from climate models). The EU project WATer and global CHange (WATCH) provides a 0.5o degree global dataset of meteorological forc

  20. Querying Patterns in High-Dimensional Heterogenous Datasets

    Science.gov (United States)

    Singh, Vishwakarma

    2012-01-01

    The recent technological advancements have led to the availability of a plethora of heterogenous datasets, e.g., images tagged with geo-location and descriptive keywords. An object in these datasets is described by a set of high-dimensional feature vectors. For example, a keyword-tagged image is represented by a color-histogram and a…

  1. Squish: Near-Optimal Compression for Archival of Relational Datasets

    Science.gov (United States)

    Gao, Yihan; Parameswaran, Aditya

    2017-01-01

    Relational datasets are being generated at an alarmingly rapid rate across organizations and industries. Compressing these datasets could significantly reduce storage and archival costs. Traditional compression algorithms, e.g., gzip, are suboptimal for compressing relational datasets since they ignore the table structure and relationships between attributes. We study compression algorithms that leverage the relational structure to compress datasets to a much greater extent. We develop Squish, a system that uses a combination of Bayesian Networks and Arithmetic Coding to capture multiple kinds of dependencies among attributes and achieve near-entropy compression rate. Squish also supports user-defined attributes: users can instantiate new data types by simply implementing five functions for a new class interface. We prove the asymptotic optimality of our compression algorithm and conduct experiments to show the effectiveness of our system: Squish achieves a reduction of over 50% in storage size relative to systems developed in prior work on a variety of real datasets.

  2. New model for datasets citation and extraction reproducibility in VAMDC

    CERN Document Server

    Zwölf, Carlo Maria; Dubernet, Marie-Lise

    2016-01-01

    In this paper we present a new paradigm for the identification of datasets extracted from the Virtual Atomic and Molecular Data Centre (VAMDC) e-science infrastructure. Such identification includes information on the origin and version of the datasets, references associated to individual data in the datasets, as well as timestamps linked to the extraction procedure. This paradigm is described through the modifications of the language used to exchange data within the VAMDC and through the services that will implement those modifications. This new paradigm should enforce traceability of datasets, favour reproducibility of datasets extraction, and facilitate the systematic citation of the authors having originally measured and/or calculated the extracted atomic and molecular data.

  3. New model for datasets citation and extraction reproducibility in VAMDC

    Science.gov (United States)

    Zwölf, Carlo Maria; Moreau, Nicolas; Dubernet, Marie-Lise

    2016-09-01

    In this paper we present a new paradigm for the identification of datasets extracted from the Virtual Atomic and Molecular Data Centre (VAMDC) e-science infrastructure. Such identification includes information on the origin and version of the datasets, references associated to individual data in the datasets, as well as timestamps linked to the extraction procedure. This paradigm is described through the modifications of the language used to exchange data within the VAMDC and through the services that will implement those modifications. This new paradigm should enforce traceability of datasets, favor reproducibility of datasets extraction, and facilitate the systematic citation of the authors having originally measured and/or calculated the extracted atomic and molecular data.

  4. Airborne flux measurements of biogenic volatile organic compounds over California

    Science.gov (United States)

    Misztal, P. K.; Karl, T.; Weber, R.; Jonsson, H. H.; Guenther, A. B.; Goldstein, A. H.

    2014-03-01

    Biogenic Volatile Organic Compound (BVOC) fluxes were measured onboard the CIRPAS Twin Otter aircraft as part of the California Airborne BVOC Emission Research in Natural Ecosystem Transects (CABERNET) campaign during June 2011. The airborne virtual disjunct eddy covariance (AvDEC) approach used measurements from a PTR-MS and a wind radome probe to directly determine fluxes of isoprene, MVK + MAC, methanol, monoterpenes, and MBO over ∼10 000 km of flight paths focusing on areas of California predicted to have the largest emissions of isoprene. The Fast Fourier Transform (FFT) approach was used to calculate fluxes over long transects of more than 15 km, most commonly between 50 and 150 km. The Continuous Wavelet Transformation (CWT) approach was used over the same transects to also calculate "instantaneous" fluxes with localization of both frequency and time independent of non-stationarities. Vertical flux divergence of isoprene is expected due to its relatively short lifetime and was measured directly using "racetrack" profiles at multiple altitudes. It was found to be linear and in the range 5% to 30% depending on the ratio of aircraft altitude to PBL height (z / zi). Fluxes were generally measured by flying consistently at 400 ± 50 m (a.g.l.) altitude, and extrapolated to the surface according to the determined flux divergence. The wavelet-derived surface fluxes of isoprene averaged to 2 km spatial resolution showed good correspondence to Basal Emission Factor (BEF) landcover datasets used to drive biogenic VOC (BVOC) emission models. The surface flux of isoprene was close to zero over Central Valley crops and desert shrublands, but was very high (up to 15 mg m-2 h-1) above oak woodlands, with clear dependence of emissions on temperature and oak density. Isoprene concentrations of up to 8 ppb were observed at aircraft height on the hottest days and over the dominant source regions. While isoprene emissions from agricultural crop regions, shrublands, and

  5. Muon and neutrino fluxes

    Science.gov (United States)

    Edwards, P. G.; Protheroe, R. J.

    1985-01-01

    The result of a new calculation of the atmospheric muon and neutrino fluxes and the energy spectrum of muon-neutrinos produced in individual extensive air showers (EAS) initiated by proton and gamma-ray primaries is reported. Also explained is the possibility of detecting atmospheric nu sub mu's due to gamma-rays from these sources.

  6. Coupled superconducting flux qubits

    NARCIS (Netherlands)

    Plantenberg, J.H.

    2007-01-01

    This thesis presents results of theoretical and experimental work on superconducting persistent-current quantum bits. These qubits offer an attractive route towards scalable solid-state quantum computing. The focus of this work is on the gradiometer flux qubit which has a special geometric design, t

  7. Generic flux coupling analysis

    NARCIS (Netherlands)

    Reimers, A.C.; Goldstein, Y.; Bockmayr, A.

    2015-01-01

    Flux coupling analysis (FCA) has become a useful tool for aiding metabolic reconstructions and guiding genetic manipulations. Originally, it was introduced for constraint-based models of metabolic networks that are based on the steady-state assumption. Recently, we have shown that the steady-state a

  8. Lobotomy of flux compactifications

    NARCIS (Netherlands)

    Dibitetto, Giuseppe; Guarino, Adolfo; Roest, Diederik

    2014-01-01

    We provide the dictionary between four-dimensional gauged supergravity and type II compactifications on (6) with metric and gauge fluxes in the absence of supersymmetry breaking sources, such as branes and orientifold planes. Secondly, we prove that there is a unique isotropic compactification

  9. Disconnecting Solar Magnetic Flux

    CERN Document Server

    DeForest, C E; McComas, D J

    2011-01-01

    Disconnection of open magnetic flux by reconnection is required to balance the injection of open flux by CMEs and other eruptive events. Making use of recent advances in heliospheric background subtraction, we have imaged many abrupt disconnection events. These events produce dense plasma clouds whose distinctie shape can now be traced from the corona across the inner solar system via heliospheric imaging. The morphology of each initial event is characteristic of magnetic reconnection across a current sheet, and the newly-disconnected flux takes the form of a "U"-shaped loop that moves outward, accreting coronal and solar wind material. We analyzed one such event on 2008 December 18 as it formed and accelerated at 20 m/s^2 to 320 km/s, expanding self-similarly until it exited our field of view 1.2 AU from the Sun. From acceleration and photometric mass estimates we derive the coronal magnetic field strength to be 8uT, 6 Rs above the photosphere, and the entrained flux to be 1.6x10^11 Wb (1.6x10^19 Mx). We mod...

  10. Coupled superconducting flux qubits

    NARCIS (Netherlands)

    Plantenberg, J.H.

    2007-01-01

    This thesis presents results of theoretical and experimental work on superconducting persistent-current quantum bits. These qubits offer an attractive route towards scalable solid-state quantum computing. The focus of this work is on the gradiometer flux qubit which has a special geometric design, t

  11. Evaluating multimodel variability of humidity over Europe using long term GPS network and ground base datasets

    Science.gov (United States)

    Bastin, Sophie; Bock, Olivier; Parracho, Ana

    2016-04-01

    Thanks to efforts made to reanalyse observed data to produce long-term homogenized datasets of new parameters or multi-parameters in recent years, we can better characterize, evaluate and analyse the water cycle in models at different scales. In this paper, a few MED-CORDEX simulations covering the ERA-interim period are evaluated against reprocessed IWV from GPS datasets over the European domain, from 1995 to 2008. The humidity is an important component of the water cycle, and models often have difficulties representing it. The high quality, consistent, long-term IWV dataset recently produced from GPS at more than 100 stations over Europe, with about half of the stations having nearly 15 years of data over the period from 1995 to 2010 is therefore used to evaluate the simulated IWV at seasonal, interannual and possibly diurnal time scales. Regional features are then identified, corresponding to different climate regimes. Other datasets, such as reanalysis of multi-parameters observed at one site (SIRTA, Palaiseau, France) over more than 10 years, or more regional networks are used to explain the dispersion of IWV among the different models and their biases against observations. The relationship between IWV and surface temperature is also evaluated locally to assess how much the sources of humidity from advection or surface fluxes are enough to reach the total capacity of the atmosphere in humidity when temperature increases. Over arid areas, this relation can depart from the Clausius-Clapeyron relation when temperature becomes too high. The ability of models to reproduce this relation during present climate is of high importance to estimate future climate.

  12. Stationary versus non-stationary (13)C-MFA: a comparison using a consistent dataset.

    Science.gov (United States)

    Noack, Stephan; Nöh, Katharina; Moch, Matthias; Oldiges, Marco; Wiechert, Wolfgang

    2011-07-10

    Besides the well-established (13)C-metabolic flux analysis ((13)C-MFA) which characterizes a cell's fluxome in a metabolic and isotopic stationary state a current area of research is isotopically non-stationary MFA. Non-stationary (13)C-MFA uses short-time isotopic transient data instead of long-time isotopic equilibrium data and thus is capable to resolve fluxes within much shorter labeling experiments. However, a comparison of both methods with data from one single experiment has not been made so far. In order to create a consistent database for directly comparing both methods a (13)C-labeling experiment in a fed-batch cultivation with a Corynebacterium glutamicum lysine producer was carried out. During the experiment the substrate glucose was switched from unlabeled to a specifically labeled glucose mixture which was immediately traced by fast sampling and metabolite quenching. The time course of labeling enrichments in intracellular metabolites until isotopic stationarity was monitored by LC-MS/MS. The resulting dataset was evaluated using the classical as well as the isotopic non-stationary MFA approach. The results show that not only the obtained relative data, i.e. intracellular flux distributions, but also the more informative quantitative fluxome data significantly depend on the combination of the measurements and the underlying modeling approach used for data integration. Taking further criteria on the experimental and computational part into consideration, the current limitations of both methods are demonstrated and possible pitfalls are concluded.

  13. Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

    Science.gov (United States)

    Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

    2016-11-01

    This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.

  14. Discovery and Reuse of Open Datasets: An Exploratory Study

    Directory of Open Access Journals (Sweden)

    Sara

    2016-07-01

    Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

  15. PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

    Directory of Open Access Journals (Sweden)

    E. Hietanen

    2016-06-01

    Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  16. Flux measurement and modeling in a typical mediterranean vineyard

    Science.gov (United States)

    Marras, Serena; Bellucco, Veronica; Pyles, David R.; Falk, Matthias; Sirca, Costantino; Duce, Pierpaolo; Snyder, Richard L.; Tha Paw U, Kyaw; Spano, Donatella

    2014-05-01

    Vineyard ecosystems are typical in the Mediterranean area, since wine is one of the most important economic sectors. Nevertheless, only a few studies have been conducted to investigate the interactions between this kind of vegetation and the atmosphere. These information are important both to understand the behaviour of such ecosystems in different environmental conditions, and are crucial to parameterize crop and flux simulation models. Combining direct measurements and modelling can obtain reliable estimates of surface fluxes and crop evapotranspiration. This study would contribute both to (1) directly measure energy fluxes and evapotranspiration in a typical Mediterranean vineyard, located in the South of Sardinia (Italy), through the application of the Eddy Covariance micrometeorological technique and to (2) evaluate the land surface model ACASA (Advanced-Canopy-Atmosphere-Soil Algorithm) in simulating energy fluxes and evapotranspiration over vineyard. Independent datasets of direct measurements were used to calibrate and validate model results during the growing period. Statistical analysis was performed to evaluate model performance and accuracy in predicting surface fluxes. Results will be showed as well as the model capability to be used for future studies to predict energy fluxes and crop water requirements under actual and future climate.

  17. Triples, Fluxes, and Strings

    CERN Document Server

    De Boer, J; Hori, K; Keurentjes, A; Morgan, J; Morrison, Douglas Robert Ogston; Sethi, S K; Boer, Jan de; Dijkgraaf, Robbert; Hori, Kentaro; Keurentjes, Arjan; Morgan, John; Morrison, David R.; Sethi, Savdeep

    2002-01-01

    We study string compactifications with sixteen supersymmetries. The moduli space for these compactifications becomes quite intricate in lower dimensions, partly because there are many different irreducible components. We focus primarily, but not exclusively, on compactifications to seven or more dimensions. These vacua can be realized in a number ways: the perturbative constructions we study include toroidal compactifications of the heterotic/type I strings, asymmetric orbifolds, and orientifolds. In addition, we describe less conventional M and F theory compactifications on smooth spaces. The last class of vacua considered are compactifications on singular spaces with non-trivial discrete fluxes. We find a number of new components in the string moduli space. Contained in some of these components are M theory compactifications with novel kinds of ``frozen'' singularities. We are naturally led to conjecture the existence of new dualities relating spaces with different singular geometries and fluxes. As our stu...

  18. Atmospheric lepton fluxes

    Directory of Open Access Journals (Sweden)

    Gaisser Thomas K.

    2015-01-01

    Full Text Available This review of atmospheric muons and neutrinos emphasizes the high energy range relevant for backgrounds to high-energy neutrinos of astrophysical origin. After a brief historical introduction, the main distinguishing features of atmospheric νμ and νe are discussed, along with the implications of the muon charge ratio for the νµ / ν̅µ ratio. Methods to account for effects of the knee in the primary cosmic-ray spectrum and the energy-dependence of hadronic interactions on the neutrino fluxes are discussed and illustrated in the context of recent results from IceCube. A simple numerical/analytic method is proposed for systematic investigation of uncertainties in neutrino fluxes arising from uncertainties in the primary cosmic-ray spectrum/composition and hadronic interactions.

  19. Lobotomy of flux compactifications

    Science.gov (United States)

    Dibitetto, Giuseppe; Guarino, Adolfo; Roest, Diederik

    2014-05-01

    We provide the dictionary between four-dimensional gauged supergravity and type II compactifications on 6 with metric and gauge fluxes in the absence of supersymmetry breaking sources, such as branes and orientifold planes. Secondly, we prove that there is a unique isotropic compactification allowing for critical points. It corresponds to a type IIA background given by a product of two 3-tori with SO(3) twists and results in a unique theory (gauging) with a non-semisimple gauge algebra. Besides the known four AdS solutions surviving the orientifold projection to = 4 induced by O6-planes, this theory contains a novel AdS solution that requires non-trivial orientifold-odd fluxes, hence being a genuine critical point of the = 8 theory.

  20. Lobotomy of flux compactifications

    Energy Technology Data Exchange (ETDEWEB)

    Dibitetto, Giuseppe [Institutionen för fysik och astronomi, University of Uppsala,Box 803, SE-751 08 Uppsala (Sweden); Guarino, Adolfo [Albert Einstein Center for Fundamental Physics, Institute for Theoretical Physics,Bern University, Sidlerstrasse 5, CH-3012 Bern (Switzerland); Roest, Diederik [Centre for Theoretical Physics, University of Groningen,Nijenborgh 4 9747 AG Groningen (Netherlands)

    2014-05-15

    We provide the dictionary between four-dimensional gauged supergravity and type II compactifications on T{sup 6} with metric and gauge fluxes in the absence of supersymmetry breaking sources, such as branes and orientifold planes. Secondly, we prove that there is a unique isotropic compactification allowing for critical points. It corresponds to a type IIA background given by a product of two 3-tori with SO(3) twists and results in a unique theory (gauging) with a non-semisimple gauge algebra. Besides the known four AdS solutions surviving the orientifold projection to N=4 induced by O6-planes, this theory contains a novel AdS solution that requires non-trivial orientifold-odd fluxes, hence being a genuine critical point of the N=8 theory.

  1. High Flux Calorimetry.

    Science.gov (United States)

    1984-05-05

    These approaches are based on proven principles which have served the thermal test community well for years. Other concepts hold promise of being able to...8217. --......- - ... .... - - The thermal test community has developed instrumentation which is quite suitable for the moderate, and relatively constant, flux...on the maximum phase II system fluence of 400 cal/cm2 . Second, the present thermal test community will have confidence in the performance of an

  2. Lobotomy of Flux Compactifications

    OpenAIRE

    Giuseppe Dibitetto; Adolfo Guarino(Albert Einstein Center for Fundamental Physics, Institute for Theoretical Physics, Bern University, Sidlerstrasse 5, CH-3012 Bern, Switzerland); Diederik Roest

    2014-01-01

    We provide the dictionary between four-dimensional gauged supergravity and type II compactifications on $ \\mathbb{T} $ 6 with metric and gauge fluxes in the absence of supersymmetry breaking sources, such as branes and orientifold planes. Secondly, we prove that there is a unique isotropic compactification allowing for critical points. It corresponds to a type IIA background given by a product of two 3-tori with SO(3) twists and results in a unique theory (gauging) with a non-semisimple gauge...

  3. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    Science.gov (United States)

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  4. Estuarine Shoreline and Barrier-Island Sandline Change Assessment Dataset

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The Barrier Island and Estuarine Wetland Physical Change Assessment Dataset was created to calibrate and test probability models of barrier island sandline and...

  5. Original Vector Datasets for Hawaii StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — These datasets each consist of a folder containing a personal geodatabase of the NHD, and shapefiles used in the HydroDEM process. These files are provided as a...

  6. AFSC/REFM: Seabird Necropsy dataset of North Pacific

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...

  7. BASE MAP DATASET, RIO ARRIBA COUNTY, NEW MEXICO, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control, governmental unit,...

  8. U.S. Climate Divisional Dataset (Version Superseded)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...

  9. BASE MAP DATASET,SOLANO COUNTY, CALIFORNIA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  10. NOAA Global Surface Temperature Dataset, Version 4.0

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...

  11. A Large-Scale 3D Object Recognition dataset

    DEFF Research Database (Denmark)

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    This paper presents a new large scale dataset targeting evaluation of local shape descriptors and 3d object recognition algorithms. The dataset consists of point clouds and triangulated meshes from 292 physical scenes taken from 11 different views; a total of approximately 3204 views. Each...... geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  12. National Elevation Dataset (NED) of Rocky Mountain National Park

    Data.gov (United States)

    National Park Service, Department of the Interior — (USGS text) The U.S. Geological Survey has developed a National Elevation Dataset (NED). The NED is a seamless mosaic of best-available elevation data. The...

  13. Native Prairie Adaptive Management (NPAM) Monitoring Tabular Datasets

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — Four core tabular datasets are collected annually for the NPAM Project. The first is tbl_PlantGoups_Monitoring which includes the belt transect monitoring data for...

  14. Estuarine Shoreline and Barrier-Island Sandline Change Assessment Dataset

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The Barrier Island and Estuarine Wetland Physical Change Assessment Dataset was created to calibrate and test probability models of barrier island sandline and...

  15. Wi-Fi Crowdsourced Fingerprinting Dataset for Indoor Positioning

    Directory of Open Access Journals (Sweden)

    Elena Simona Lohan

    2017-10-01

    Full Text Available Benchmark open-source Wi-Fi fingerprinting datasets for indoor positioning studies are still hard to find in the current literature and existing public repositories. This is unlike other research fields, such as the image processing field, where benchmark test images such as the Lenna image or Face Recognition Technology (FERET databases exist, or the machine learning field, where huge datasets are available for example at the University of California Irvine (UCI Machine Learning Repository. It is the purpose of this paper to present a new openly available Wi-Fi fingerprint dataset, comprised of 4648 fingerprints collected with 21 devices in a university building in Tampere, Finland, and to present some benchmark indoor positioning results using these data. The datasets and the benchmarking software are distributed under the open-source MIT license and can be found on the EU Zenodo repository.

  16. The Flora Mycologica Iberica Project fungi occurrence dataset

    Directory of Open Access Journals (Sweden)

    Francisco Pando

    2016-09-01

    Full Text Available The dataset contains detailed distribution information on several fungal groups. The information has been revised, and in many times compiled, by expert mycologist(s working on the monographs for the Flora Mycologica Iberica Project (FMI. Records comprise both collection and observational data, obtained from a variety of sources including field work, herbaria, and the literature. The dataset contains 59,235 records, of which 21,393 are georeferenced. These correspond to 2,445 species, grouped in 18 classes. The geographical scope of the dataset is Iberian Peninsula (Continental Portugal and Spain, and Andorra and Balearic Islands. The complete dataset is available in Darwin Core Archive format via the Global Biodiversity Information Facility (GBIF.

  17. Food recognition: a new dataset, experiments and results.

    Science.gov (United States)

    Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo

    2016-12-07

    We propose a new dataset for the evaluation of food recognition algorithms that can be used in dietary monitoring applications. Each image depicts a real canteen tray with dishes and foods arranged in different ways. Each tray contains multiple instances of food classes. The dataset contains 1,027 canteen trays for a total of 3,616 food instances belonging to 73 food classes. The food on the tray images have been manually segmented using carefully drawn polygonal boundaries. We have benchmarked the dataset by designing an automatic tray analysis pipeline that takes a tray image as input, finds the regions of interest, and predicts for each region the corresponding food class. We have experimented three different classification strategies using also several visual descriptors. We achieve about 79% of food and tray recognition accuracy using Convolutional-Neural-Networksbased features. The dataset, as well as the benchmark framework, are available to the research community.

  18. Karna Particle Size Dataset for Tables and Figures

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains 1) table of bulk Pb-XAS LCF results, 2) table of bulk As-XAS LCF results, 3) figure data of particle size distribution, and 4) figure data for...

  19. Environmental Dataset Gateway (EDG) CS-W Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  20. BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  1. Comparative visualization for parameter studies of dataset series.

    Science.gov (United States)

    Malik, Muhammad Muddassir; Heinzl, Christoph; Eduard Gröller, M

    2010-01-01

    This paper proposes comparison and visualization techniques to carry out parameter studies for the special application area of dimensional measurement using 3D X-ray computed tomography (3DCT). A dataset series is generated by scanning a specimen multiple times by varying parameters of an industrial 3DCT device. A high-resolution series is explored using our planar-reformatting-based visualization system. We present a novel multi-image view and an edge explorer for comparing and visualizing gray values and edges of several datasets simultaneously. Visualization results and quantitative data are displayed side by side. Our technique is scalable and generic. It can be effective in various application areas like parameter studies of imaging modalities and dataset artifact detection. For fast data retrieval and convenient usability, we use bricking of the datasets and efficient data structures. We evaluate the applicability of the proposed techniques in collaboration with our company partners.

  2. Eddy Correlation Flux Measurement System

    Data.gov (United States)

    Oak Ridge National Laboratory — The eddy correlation (ECOR) flux measurement system provides in situ, half-hour measurements of the surface turbulent fluxes of momentum, sensible heat, latent heat,...

  3. Multi-year investigation of flux ropes in the Martian ionosphere

    Science.gov (United States)

    Cartwright, M. L.; Brain, D.; Halekas, J. S.; Eastwood, J. P.

    2011-12-01

    A magnetic flux rope is a collection of twisted magnetic field lines capable of transporting plasma from one region to another. Several studies report the occurrence of magnetic flux ropes in the Martian ionosphere [Cloutier et al., 1999; Vignes et al., 2004; Eastwood et al., 2008; Brain et al., 2010; Morgan et al., 2011]. Observations of a flux rope transporting ionospheric plasma away from Mars indicate that flux ropes could be an important means of atmospheric loss. Interestingly, there are at least three suggested flux rope formation mechanisms at Mars; the first is similar to Venus type events where the flux rope is formed via a shear related instability that occurs by interaction with the solar wind [Cloutier et al., 1999; Vignes et al., 2004]. The second mechanism is similar to plasmoid creation in the Earth's magnetotail, where the flux rope is created when the crustal fields stretch and shear due to interaction with the solar wind [Brain et al., 2010; Morgan et al., 2011]. The third flux rope formation mechanism is based on the identification of flux ropes near current sheets on the night side of Mars and likely created via collisionless magnetic reconnection [Eastwood et al., 2008]. Previous statistical surveys suggest that all three of these formation mechanisms are continuously active at Mars, but have had difficulty differentiating the three populations of flux ropes due to the spacecraft orbit or lack of events. We conducted a larger statistical study of the Martian flux ropes using two years of the MGS magnetic field and suprathermal electron datasets in the circular mapping orbit at ~400km. The purpose of this study is to collect a large dataset of events to characterize the flux rope formation mechanisms and study the relationship to solar cycle.

  4. Interacting with Large 3D Datasets on a Mobile Device.

    Science.gov (United States)

    Schultz, Chris; Bailey, Mike

    2016-01-01

    A detail-on-demand scheme can alleviate both memory and GPU pressure on mobile devices caused by volume rendering. This approach allows a user to explore an entire dataset at its native resolution while simultaneously constraining the texture size being rendered to a dimension that does not exceed the processing capabilities of a portable device. This scheme produces higher-quality, more focused images rendered at interactive frame rates, while preserving the native resolution of the dataset.

  5. Geoseq: a tool for dissecting deep-sequencing datasets

    OpenAIRE

    Homann Robert; George Ajish; Levovitz Chaya; Shah Hardik; Cancio Anthony; Gurtowski James; Sachidanandam Ravi

    2010-01-01

    Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments...

  6. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy

    OpenAIRE

    Levin, Barnaby D.A.; Padgett, Elliot; Chen, Chien-Chun; Scott, M C; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D.; Robinson, Richard D.

    2016-01-01

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the cu...

  7. Sampling Within k-Means Algorithm to Cluster Large Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL

    2011-08-01

    Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

  8. A survey of results on mobile phone datasets analysis

    CERN Document Server

    Blondel, Vincent D; Krings, Gautier

    2015-01-01

    In this paper, we review some advances made recently in the study of mobile phone datasets. This area of research has emerged a decade ago, with the increasing availability of large-scale anonymized datasets, and has grown into a stand-alone topic. We will survey the contributions made so far on the social networks that can be constructed with such data, the study of personal mobility, geographical partitioning, urban planning, and help towards development as well as security and privacy issues.

  9. General Purpose Multimedia Dataset - GarageBand 2008

    DEFF Research Database (Denmark)

    Meng, Anders

    This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....

  10. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    Science.gov (United States)

    Altman, R B

    2017-05-01

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.

  11. Comparison of the global air-sea freshwater exchange evaluated from independent datasets

    Institute of Scientific and Technical Information of China (English)

    ZHOU Tianjun

    2003-01-01

    The implied air-sea freshwater flux is examined in two reanalysis datasets provided respectively by European Center for Medium-range Weather Forecasts and National Center for Environmental Prediction. Not only for annual mean state but also for seasonal variation, two reanalyses agree qualitatively well with each other, and both can reproduce reasonably the global distribution of E-P (i.e. Evaporation minus Precipitation). In the view of quantitative comparison, however, remarkable difference has been found on regional scales, especially over the middle and lower latitudes, with some local disagreement exceeding 100 cm/yr. One important difference between current and previous evaluation is the new found net evaporation over the high latitude North Atlantic, which is demonstrated to result from the transient disturbances during boreal winter.

  12. Energetics of the martian atmosphere using the Mars Analysis Correction Data Assimilation (MACDA) dataset

    Science.gov (United States)

    Battalio, Michael; Szunyogh, Istvan; Lemmon, Mark

    2016-09-01

    The energetics of the atmosphere of the northern hemisphere of Mars during the pre-winter solstice period are explored using the Mars Analysis Correction Data Assimilation (MACDA) dataset (v1.0) and the eddy kinetic energy equation, with the quasi-geostrophic omega equation providing vertical velocities. Traveling waves are typically triggered by geopotential flux convergence. The effect of dust on baroclinic instability is examined by comparing a year with a global-scale dust storm (GDS) to two years without a global-scale dust storm. During the non-GDS years, results agree with that of a previous study using a general circulation model simulation. In the GDS year, waves develop a mixed baroclinic/barotropic growth phase before decaying barotropically. Though the total amount of eddy kinetic energy generated by baroclinic energy conversion is lower during the GDS year, the maximum eddy intensity is not diminished. Instead, the number of intense eddies is reduced by about 50%.

  13. [The flux of historiography].

    Science.gov (United States)

    Mazzolini, R G

    2001-01-01

    The author places Grmek's editorial within the flux of the historiographical debate which, since the middle of the 1970s, has concentrated on two major crises due to the end of social science-oriented 'scientific history' and to the 'linguistic turn'. He also argues that Grmek's historiographical work of the 1980s and 1990s was to some extent an alternative to certain observed changes in historical fashion and has achieved greater intelligibility because of its commitment to a rational vision of science and historiography.

  14. Estimating Annual CO2 Flux for Lutjewad Station Using Three Different Gap-Filling Techniques

    NARCIS (Netherlands)

    Dragomir, Carmelia M.; Klaassen, Wim; Voiculescu, Mirela; Georgescu, Lucian P.; van der Laan, Sander; Calfapietra, C.; Staebler, R.M.

    2012-01-01

    Long-term measurements of CO2 flux can be obtained using the eddy covariance technique, but these datasets are affected by gaps which hinder the estimation of robust long-term means and annual ecosystem exchanges. We compare results obtained using three gap-fill techniques: multiple regression (MR),

  15. Estimating Annual CO2 Flux for Lutjewad Station Using Three Different Gap-Filling Techniques

    NARCIS (Netherlands)

    Dragomir, Carmelia M.; Klaassen, Wim; Voiculescu, Mirela; Georgescu, Lucian P.; van der Laan, Sander; Calfapietra, C.; Staebler, R.M.

    2012-01-01

    Long-term measurements of CO2 flux can be obtained using the eddy covariance technique, but these datasets are affected by gaps which hinder the estimation of robust long-term means and annual ecosystem exchanges. We compare results obtained using three gap-fill techniques: multiple regression (MR),

  16. Critical heat flux thermodynamics

    Energy Technology Data Exchange (ETDEWEB)

    Collado, F.J. E-mail: fjk@posta.unizar.es

    2002-11-01

    Convective boiling in subcooled water flowing through a heated channel is essential in many engineering applications where high heat flux need to be accommodated, such as in the divertor plates of fusion reactors. There are many available correlations for predicting heat transfer in the individual regimes of the empirical Nukiyama boiling curve, although unfortunately there is no physical fundamentals of such curve. Recently, the author has shown that the classical entropy balance could contain key information about boiling heat transfer. So, it was found that the average thermal gap in the heated channel (the wall temperature minus the average temperature of the coolant fluid) was strongly correlated with the efficiency of a theoretical reversible engine placed in this thermal gap. In this work and from the new proposed correlation, a new expression of the wall temperature in function of the average fluid temperature is derived and successfully checked against experimental data from General Electric. This expression suggests a new and simple definition of the critical heat flux (CHF), a key parameter of the thermal-hydraulic design of fusion reactors. Finally, based on the new definition, the CHF trends are commented.

  17. A multiscale dataset for understanding complex eco-hydrological processes in a heterogeneous oasis system

    Science.gov (United States)

    Li, Xin; Liu, Shaomin; Xiao, Qin; Ma, Mingguo; Jin, Rui; Che, Tao; Wang, Weizhen; Hu, Xiaoli; Xu, Ziwei; Wen, Jianguang; Wang, Liangxu

    2017-06-01

    We introduce a multiscale dataset obtained from Heihe Watershed Allied Telemetry Experimental Research (HiWATER) in an oasis-desert area in 2012. Upscaling of eco-hydrological processes on a heterogeneous surface is a grand challenge. Progress in this field is hindered by the poor availability of multiscale observations. HiWATER is an experiment designed to address this challenge through instrumentation on hierarchically nested scales to obtain multiscale and multidisciplinary data. The HiWATER observation system consists of a flux observation matrix of eddy covariance towers, large aperture scintillometers, and automatic meteorological stations; an eco-hydrological sensor network of soil moisture and leaf area index; hyper-resolution airborne remote sensing using LiDAR, imaging spectrometer, multi-angle thermal imager, and L-band microwave radiometer; and synchronical ground measurements of vegetation dynamics, and photosynthesis processes. All observational data were carefully quality controlled throughout sensor calibration, data collection, data processing, and datasets generation. The data are freely available at figshare and the Cold and Arid Regions Science Data Centre. The data should be useful for elucidating multiscale eco-hydrological processes and developing upscaling methods.

  18. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Science.gov (United States)

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  19. ATLANTIC BATS: a dataset of bat communities from the Atlantic Forests of South America.

    Science.gov (United States)

    de Lara Muylaert, Renata; Stevens, Richard D; Esbérard, Carlos Eduardo Lustosa; Mello, Marco Aurelio Ribeiro; Garbino, Guilherme Siniciato Terra; Varzinczak, Luiz H; Faria, Deborah; de Moraes Weber, Marcelo; Kerches Rogeri, Patricia; Regolin, André Luis; de Oliveira, Hernani Fernandes Magalhães; Costa, Luciana de Moraes; Barros, Marília A S; Sabino-Santos, Gilberto; Crepaldi de Morais, Mara Ariane; Kavagutti, Vinicius Silva; Passos, Fernando C; Marjakangas, Emma-Liina; Maia, Felipe Gonçalves Motta; Ribeiro, Milton Cezar; Galetti, Mauro

    2017-09-06

    Bats are the second most diverse mammal order and they provide vital ecosystem functions (e.g., pollination, seed dispersal, and nutrient flux in caves) and services (e.g., crop pest suppression). Bats are also important vectors of infectious diseases, harboring more than 100 different virus types. In the present study, we compiled information on bat communities from the Atlantic Forests of South America, a species-rich biome that are highly threatened by habitat loss and fragmentation. ATLANTIC BATS dataset comprises 135 quantitative studies carried out in 205 sites, which cover most vegetation types of the tropical and subtropical Atlantic Forest: dense ombrophilous forest, mixed ombrophilous forest, semideciduous forest, deciduous forest, savanna, steppe, and open ombrophilous forest. The dataset includes information on more than 90,000 captures of 98 bat species of 8 families. Species richness averaged 12.1 per site, with a median value of 10 species (ranging from 1 to 53 species). Six species occurred in more than 50% of the communities: Artibeus lituratus, Carollia perspicillata, Sturnira lilium, Artibeus fimbriatus, Glossophaga soricina, and Platyrrhinus lineatus. The number of captures divided by sampling effort, a proxy for abundance, varied from 0.000001 to 0.77 individuals/hour*m(2) (0.04+0.007 individuals/hour*m(2) ). Our dataset reveals a hyper-dominance of eight species that together that comprise 80% of all captures: Platyrrhinus lineatus (2.3%), Molossus molossus (2.8%), Artibeus obscurus (3.4%), Artibeus planirostris (5.2%), Artibeus fimbriatus (7%), Sturnira lilium (14.5%), Carollia perspicillata (15.6%), and Artibeus lituratus (29.2%). This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  20. Global dataset of biogenic VOC emissions calculated by the MEGAN model over the last 30 years

    Directory of Open Access Journals (Sweden)

    K. Sindelarova

    2014-04-01

    Full Text Available The Model of Emissions of Gases and Aerosols from Nature (MEGANv2.1 together with the Modern-Era Retrospective Analysis for Research and Applications (MERRA meteorological fields were used to create a global emission dataset of biogenic volatile organic compounds (BVOC available on a monthly basis for the time period of 1980–2010. This dataset is called MEGAN-MACC. The model estimated mean annual total BVOC emission of 760 Tg (C yr−1 consisting of isoprene (70%, monoterpenes (11%, methanol (6%, acetone (3%, sesquiterpenes (2.5% and other BVOC species each contributing less than 2%. Several sensitivity model runs were performed to study the impact of different model input and model settings on isoprene estimates and resulted in differences of up to ±17% of the reference isoprene total. A greater impact was observed for a sensitivity run applying parameterization of soil moisture deficit that led to a 50% reduction of isoprene emissions on a global scale, most significantly in specific regions of Africa, South America and Australia. MEGAN-MACC estimates are comparable to results of previous studies. More detailed comparison with other isoprene inventories indicated significant spatial and temporal differences between the datasets especially for Australia, Southeast Asia and South America. MEGAN-MACC estimates of isoprene, α-pinene and group of monoterpenes showed a reasonable agreement with surface flux measurements at sites located in tropical forests in the Amazon and Malaysia. The model was able to capture the seasonal variation of isoprene emissions in the Amazon forest.

  1. E-Flux2 and SPOT: Validated Methods for Inferring Intracellular Metabolic Flux Distributions from Transcriptomic Data.

    Directory of Open Access Journals (Sweden)

    Min Kyung Kim

    Full Text Available Several methods have been developed to predict system-wide and condition-specific intracellular metabolic fluxes by integrating transcriptomic data with genome-scale metabolic models. While powerful in many settings, existing methods have several shortcomings, and it is unclear which method has the best accuracy in general because of limited validation against experimentally measured intracellular fluxes.We present a general optimization strategy for inferring intracellular metabolic flux distributions from transcriptomic data coupled with genome-scale metabolic reconstructions. It consists of two different template models called DC (determined carbon source model and AC (all possible carbon sources model and two different new methods called E-Flux2 (E-Flux method combined with minimization of l2 norm and SPOT (Simplified Pearson cOrrelation with Transcriptomic data, which can be chosen and combined depending on the availability of knowledge on carbon source or objective function. This enables us to simulate a broad range of experimental conditions. We examined E. coli and S. cerevisiae as representative prokaryotic and eukaryotic microorganisms respectively. The predictive accuracy of our algorithm was validated by calculating the uncentered Pearson correlation between predicted fluxes and measured fluxes. To this end, we compiled 20 experimental conditions (11 in E. coli and 9 in S. cerevisiae, of transcriptome measurements coupled with corresponding central carbon metabolism intracellular flux measurements determined by 13C metabolic flux analysis (13C-MFA, which is the largest dataset assembled to date for the purpose of validating inference methods for predicting intracellular fluxes. In both organisms, our method achieves an average correlation coefficient ranging from 0.59 to 0.87, outperforming a representative sample of competing methods. Easy-to-use implementations of E-Flux2 and SPOT are available as part of the open-source package

  2. The LANDFIRE Refresh strategy: updating the national dataset

    Science.gov (United States)

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  3. Securely measuring the overlap between private datasets with cryptosets.

    Science.gov (United States)

    Swamidass, S Joshua; Matlock, Matthew; Rozenblit, Leon

    2015-01-01

    Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.

  4. The KUSC Classical Music Dataset for Audio Key Finding

    Directory of Open Access Journals (Sweden)

    Ching-Hua Chuan

    2014-08-01

    Full Text Available In this paper, we present a benchmark dataset based on the KUSC classical music collection and provide baseline key-finding comparison results. Audio key finding is a basic music information retrieval task; it forms an essential component of systems for music segmentation, similarity assessment, and mood detection. Due to copyright restrictions and a labor-intensive annotation process, audio key finding algorithms have only been evaluated using small proprietary datasets to date. To create a common base for systematic comparisons, we have constructed a dataset comprising of more than 3,000 excerpts of classical music. The excerpts are made publicly accessible via commonly used acoustic features such as pitch-based spectrograms and chromagrams. We introduce a hybrid annotation scheme that combines the use of title keys with expert validation and correction of only the challenging cases. The expert musicians also provide ratings of key recognition difficulty. Other meta-data include instrumentation. As demonstration of use of the dataset, and to provide initial benchmark comparisons for evaluating new algorithms, we conduct a series of experiments reporting key determination accuracy of four state-of-the-art algorithms. We further show the importance of considering factors such as estimated tuning frequency, key strength or confidence value, and key recognition difficulty in key finding. In the future, we plan to expand the dataset to include meta-data for other music information retrieval tasks.

  5. Determining the Real Data Completeness of a Relational Dataset

    Institute of Scientific and Technical Information of China (English)

    Yong-Nan Liu; Jian-Zhong Li; Zhao-Nian Zou

    2016-01-01

    Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it. Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data.

  6. Spatial Accuracy Assessment and Integration of Global Land Cover Datasets

    Directory of Open Access Journals (Sweden)

    Nandin-Erdene Tsendbazar

    2015-11-01

    Full Text Available Along with the creation of new maps, current efforts for improving global land cover (GLC maps focus on integrating maps by accounting for their relative merits, e.g., agreement amongst maps or map accuracy. Such integration efforts may benefit from the use of multiple GLC reference datasets. Using available reference datasets, this study assesses spatial accuracy of recent GLC maps and compares methods for creating an improved land cover (LC map. Spatial correspondence with reference dataset was modeled for Globcover-2009, Land Cover-CCI-2010, MODIS-2010 and Globeland30 maps for Africa. Using different scenarios concerning the used input data, five integration methods for an improved LC map were tested and cross-validated. Comparison of the spatial correspondences showed that the preferences for GLC maps varied spatially. Integration methods using both the GLC maps and reference data at their locations resulted in 4.5%–13% higher correspondence with the reference LC than any of the input GLC maps. An integrated LC map and LC class probability maps were computed using regression kriging, which produced the highest correspondence (76%. Our results demonstrate the added value of using reference datasets and geostatistics for improving GLC maps. This approach is useful as more GLC reference datasets are becoming publicly available and their reuse is being encouraged.

  7. Gaseous mercury flux from salt marshes is mediated by solar radiation and temperature

    Science.gov (United States)

    Sizmur, Tom; McArthur, Gordon; Risk, David; Tordon, Robert; O'Driscoll, Nelson J.

    2017-03-01

    Salt marshes are ecologically sensitive ecosystems where mercury (Hg) methylation and biomagnification can occur. Understanding the mechanisms controlling gaseous Hg flux from salt marshes is important to predict the retention of Hg in coastal wetlands and project the impact of environmental change on the global Hg cycle. We monitored Hg flux from a remote salt marsh over 9 days which included three cloudless days and a 4 mm rainfall event. We observed a cyclical diel relationship between Hg flux and solar radiation. When measurements at the same irradiance intensity are considered, Hg flux was greater in the evening when the sediment was warm than in the morning when the sediment was cool. This is evidence to suggest that both solar radiation and sediment temperature directly influence the rate of Hg(II) photoreduction in salt marshes. Hg flux could be predicted from solar radiation and sediment temperature in sub-datasets collected during cloudless days (R2 = 0.99), and before (R2 = 0.97) and after (R2 = 0.95) the rainfall event, but the combined dataset could not account for the lower Hg flux after the rainfall event that is in contrast to greater Hg flux observed from soils after rainfall events.

  8. Modeling water and carbon fluxes above summer maize field in North China Plain with back-propagation neural networks

    Institute of Scientific and Technical Information of China (English)

    QIN Zhong; SU Gao-li; YU Qiang; HU Bing-min; LI Jun

    2005-01-01

    In this work, datasets of water and carbon fluxes measured with eddy covariance technique above a summer maize field in the North China Plain were simulated with artificial neural networks (ANNs) to explore the fluxes responses to local environmental variables. The results showed that photosynthetically active radiation (PAR), vapor pressure deficit (VPD), air temperature (T) and leaf area index (LAI) were primary factors regulating both water vapor and carbon dioxide fluxes. Three-layer back-propagation neural networks (BP) could be applied to model fluxes exchange between cropland surface and atmosphere without using detailed physiological information or specific parameters of the plant.

  9. Modeling the latent dimensions of multivariate signaling datasets

    Science.gov (United States)

    Jensen, Karin J.; Janes, Kevin A.

    2012-08-01

    Cellular signal transduction is coordinated by modifications of many proteins within cells. Protein modifications are not independent, because some are connected through shared signaling cascades and others jointly converge upon common cellular functions. This coupling creates a hidden structure within a signaling network that can point to higher level organizing principles of interest to systems biology. One can identify important covariations within large-scale datasets by using mathematical models that extract latent dimensions—the key structural elements of a measurement set. In this paper, we introduce two principal component-based methods for identifying and interpreting latent dimensions. Principal component analysis provides a starting point for unbiased inspection of the major sources of variation within a dataset. Partial least-squares regression reorients these dimensions toward a specific hypothesis of interest. Both approaches have been used widely in studies of cell signaling, and they should be standard analytical tools once highly multivariate datasets become straightforward to accumulate.

  10. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy

    CERN Document Server

    Levin, Barnaby D A; Chen, Chien-Chun; Scott, M C; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruna, Hector D; Robinson, Richard D; Ercius, Peter; Kourkoutis, Lena F; Miao, Jianwei; Muller, David A; Hovden, Robert

    2016-01-01

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180{\\deg} tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of p...

  11. Dataset of transcriptional landscape of B cell early activation

    Directory of Open Access Journals (Sweden)

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  12. Robust Machine Learning Applied to Terascale Astronomical Datasets

    CERN Document Server

    Ball, Nicholas M; Myers, Adam D

    2008-01-01

    We present recent results from the LCDM (Laboratory for Cosmological Data Mining; http://lcdm.astro.uiuc.edu) collaboration between UIUC Astronomy and NCSA to deploy supercomputing cluster resources and machine learning algorithms for the mining of terascale astronomical datasets. This is a novel application in the field of astronomy, because we are using such resources for data mining, and not just performing simulations. Via a modified implementation of the NCSA cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million stars and galaxies in the Sloan Digital Sky Survey, improved distance measures, and a full exploitation of the simple but powerful k-nearest neighbor algorithm. A driving principle of this work is that our methods should be extensible from current terascale datasets to upcoming petascale datasets and beyond. We discuss issues encountered to-date, and further issues for the transition to petascale. In particular, disk I/O will become a major limit...

  13. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    Science.gov (United States)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  14. Review Studies for the ATLAS Open Data Dataset

    CERN Document Server

    The ATLAS collaboration

    2016-01-01

    This document presents approval plots from selected analyses using the ATLAS Open Data dataset. This dataset containing "1\\ \\text{fb}^{-1}" of "8 \\text{TeV}" data collected by ATLAS along with a selection of Monte Carlo simulated events, is intended to be released to the public for educational use only alongside tools to enable students to get started quickly and easily. The corrections applied to the Monte Carlo have been simplified for the purposes of the intended use and reduce processing time, and the approval plots should indicate clearly reasons for disagreement between Monte Carlo and data. As the dataset is for educational purposes only, although some low statistic analyses can be done and educational objectives achieved it will be clear that the user can not use it beyond the use case due to the low statistics.

  15. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

    Science.gov (United States)

    Liu, Bo; Pop, Mihai

    Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.

  16. ESTATE: Strategy for Exploring Labeled Spatial Datasets Using Association Analysis

    Science.gov (United States)

    Stepinski, Tomasz F.; Salazar, Josue; Ding, Wei; White, Denis

    We propose an association analysis-based strategy for exploration of multi-attribute spatial datasets possessing naturally arising classification. Proposed strategy, ESTATE (Exploring Spatial daTa Association patTErns), inverts such classification by interpreting different classes found in the dataset in terms of sets of discriminative patterns of its attributes. It consists of several core steps including discriminative data mining, similarity between transactional patterns, and visualization. An algorithm for calculating similarity measure between patterns is the major original contribution that facilitates summarization of discovered information and makes the entire framework practical for real life applications. Detailed description of the ESTATE framework is followed by its application to the domain of ecology using a dataset that fuses the information on geographical distribution of biodiversity of bird species across the contiguous United States with distributions of 32 environmental variables across the same area.

  17. A cross-country Exchange Market Pressure (EMP) dataset.

    Science.gov (United States)

    Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

    2017-06-01

    The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ's. Using the standard errors of estimates of ρ's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  18. Modelling and analysis of turbulent datasets using ARMA processes

    CERN Document Server

    Faranda, Davide; Dubrulle, Bérèngere; Daviaud, François; Saint-Michel, Brice; Herbert, Éric; Cortet, Pierre-Philippe

    2014-01-01

    We introduce a novel way to extract information from turbulent datasets by applying an ARMA statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new intermittency parameter $\\Upsilon$ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von K\\'arm\\'an swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the intermittency parameter is highest in regions where shear layer vortices are present, t...

  19. Permanent magnet flux-biased magnetic actuator with flux feedback

    Science.gov (United States)

    Groom, Nelson J. (Inventor)

    1991-01-01

    The invention is a permanent magnet flux-biased magnetic actuator with flux feedback for adjustably suspending an element on a single axis. The magnetic actuator includes a pair of opposing electromagnets and provides bi-directional forces along the single axis to the suspended element. Permanent magnets in flux feedback loops from the opposing electromagnets establish a reference permanent magnet flux-bias to linearize the force characteristics of the electromagnets to extend the linear range of the actuator without the need for continuous bias currents in the electromagnets.

  20. Braneworld Flux Inflation

    CERN Document Server

    Kanno, S; Wands, D; Kanno, Sugumi; Soda, Jiro; Wands, David

    2005-01-01

    We propose a geometrical model of brane inflation where inflation is driven by the flux generated by opposing brane charges and terminated by the collision of the branes, with charge annihilation. We assume the collision process is completely inelastic and the kinetic energy is transformed into the thermal energy after collision. Thereafter the two branes coalesce together and behave as a single brane universe with zero effective cosmological constant. In the Einstein frame, the 4-dimensional effective theory changes abruptly at the collision point. Therefore, our inflationary model is necessarily 5-dimensional in nature. As the collision process has no singularity in 5-dimensional gravity, we can follow the evolution of fluctuations during the whole history of the universe. It turns out that the radion field fluctuations have a steeply tilted, red spectrum, while the primordial gravitational waves have a flat spectrum. Instead, primordial density perturbations could be generated by a curvaton mechanism.

  1. Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets.

    Science.gov (United States)

    McKinney, Bill; Meyer, Peter A; Crosas, Mercè; Sliz, Piotr

    2017-01-01

    Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension-functionality supporting preservation of file system structure within Dataverse-which is essential for both in-place computation and supporting non-HTTP data transfers.

  2. Structural diversity of biologically interesting datasets: a scaffold analysis approach

    Directory of Open Access Journals (Sweden)

    Khanna Varun

    2011-08-01

    Full Text Available Abstract Background The recent public availability of the human metabolome and natural product datasets has revitalized "metabolite-likeness" and "natural product-likeness" as a drug design concept to design lead libraries targeting specific pathways. Many reports have analyzed the physicochemical property space of biologically important datasets, with only a few comprehensively characterizing the scaffold diversity in public datasets of biological interest. With large collections of high quality public data currently available, we carried out a comparative analysis of current day leads with other biologically relevant datasets. Results In this study, we note a two-fold enrichment of metabolite scaffolds in drug dataset (42% as compared to currently used lead libraries (23%. We also note that only a small percentage (5% of natural product scaffolds space is shared by the lead dataset. We have identified specific scaffolds that are present in metabolites and natural products, with close counterparts in the drugs, but are missing in the lead dataset. To determine the distribution of compounds in physicochemical property space we analyzed the molecular polar surface area, the molecular solubility, the number of rings and the number of rotatable bonds in addition to four well-known Lipinski properties. Here, we note that, with only few exceptions, most of the drugs follow Lipinski's rule. The average values of the molecular polar surface area and the molecular solubility in metabolites is the highest while the number of rings is the lowest. In addition, we note that natural products contain the maximum number of rings and the rotatable bonds than any other dataset under consideration. Conclusions Currently used lead libraries make little use of the metabolites and natural products scaffold space. We believe that metabolites and natural products are recognized by at least one protein in the biosphere therefore, sampling the fragment and scaffold

  3. A synthetic Longitudinal Study dataset for England and Wales.

    Science.gov (United States)

    Dennett, Adam; Norman, Paul; Shelton, Nicola; Stuchbury, Rachel

    2016-12-01

    This article describes the new synthetic England and Wales Longitudinal Study 'spine' dataset designed for teaching and experimentation purposes. In the United Kingdom, there exist three Census-based longitudinal micro-datasets, known collectively as the Longitudinal Studies. The England and Wales Longitudinal Study (LS) is a 1% sample of the population of England and Wales (around 500,000 individuals), linking individual person records from the 1971 to 2011 Censuses. The synthetic data presented contains a similar number of individuals to the original data and accurate longitudinal transitions between 2001 and 2011 for key demographic variables, but unlike the original data, is open access.

  4. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    Directory of Open Access Journals (Sweden)

    Spjuth Ola

    2010-06-01

    Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join

  5. An ensemble approach for feature selection of Cyber Attack Dataset

    CERN Document Server

    Singh, Shailendra

    2009-01-01

    Feature selection is an indispensable preprocessing step when mining huge datasets that can significantly improve the overall system performance. Therefore in this paper we focus on a hybrid approach of feature selection. This method falls into two phases. The filter phase select the features with highest information gain and guides the initialization of search process for wrapper phase whose output the final feature subset. The final feature subsets are passed through the Knearest neighbor classifier for classification of attacks. The effectiveness of this algorithm is demonstrated on DARPA KDDCUP99 cyber attack dataset.

  6. Southern Hemisphere strong polar mesoscale cyclones in high-resolution datasets

    Science.gov (United States)

    Pezza, Alexandre; Sadler, Katherine; Uotila, Petteri; Vihma, Timo; Mesquita, Michel D. S.; Reid, Phil

    2016-09-01

    Mesoscale cyclones are small low-pressure systems (usually explosive cyclones" (i.e., rapid intensification but not necessarily short lived). A short climatology (2009-2012) is obtained by using high resolution (0.5°) Antarctic Mesoscale Prediction System (AMPS) mean sea level pressure. The results show a significant improvement of spatial detail compared to the 0.75° resolution ERA-interim dataset, with a total count approximately 46 % higher in AMPS. The subset of mesoscale cyclones that are explosive is small, with a total genesis number of about 13 % that of polar lows. In addition, only about 1 % of the polar lows are explosive, suggesting that cyclones that undergo rapid intensification tend to become larger longer lived (and hence are no longer regarded as polar lows). Mesoscale cyclones are more frequent in winter, with a maximum concentration around the Antarctic but also occurring as far north as Tasmania and New Zealand. Analysis of sensible heat flux and sea ice extent anomalies during the genesis days shows that there is a large spread of genesis points over both positive and negative flux anomalies in winter, with a somewhat random pattern in the other seasons.

  7. ArcHydro 8-digit HUC datasets for Hawaii StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — These datasets each consist of a workspace (folder) containing a collection of gridded datasets plus a personal geodatabase containing several vector datasets. These...

  8. ArcHydro 8-digit HUC datasets for Idaho StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — These datasets consist of a workspace (folder) containing a collection of gridded datasets plus a personal geodatabase containing several vector datasets. These...

  9. Optimal fluxes and Reynolds stresses

    CERN Document Server

    Jimenez, Javier

    2016-01-01

    It is remarked that fluxes in conservation laws, such as the Reynolds stresses in the momentum equation of turbulent shear flows, or the spectral energy flux in isotropic turbulence, are only defined up to an arbitrary solenoidal field. While this is not usually significant for long-time averages, it becomes important when fluxes are modelled locally in large-eddy simulations, or in the analysis of intermittency and cascades. As an example, a numerical procedure is introduced to compute fluxes in scalar conservation equations in such a way that their total integrated magnitude is minimised. The result is an irrotational vector field that derives from a potential, thus minimising sterile flux `circuits'. The algorithm is generalised to tensor fluxes and applied to the transfer of momentum in a turbulent channel. The resulting instantaneous Reynolds stresses are compared with their traditional expressions, and found to be substantially different.

  10. Heat Flux Apportionment to Heterogeneous Surfaces Using Flux Footprint Analysis

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Heat flux data collected from the Baiyangdian Heterogeneous Field Experiment were analyzed using the footprint method. High resolution (25 m) Landsat-5 satellite imaging was used to determine the land cover as one of four surface types: farmland, lake, wetland, or village. Data from two observation sites in September 2005 were used. One site (Wangjiazhai) was characterized by highly heterogeneous surfaces in the central area of the Baiyangdian: lake/wetland. The other site (Xiongxian) was on land with more uniform surface cover. An improved Eulerian analytical flux footprint model was used to determine "source areas" of the heat fluxes measured at towers located at each site from surrounding landscapes of mixed surface types.In relative terms results show that wetland and lake areas generally contributed most to the observed heat flux at Wangjiazhai, while farmland contributed most at Xiongxian. Given the areal distribution of surface type contributions, calculations were made to obtain the magnitudes of the heat flux from lake, wetland and farmland to the total observed flux and apportioned contributions of each surface type to the sensible and latent heat fluxes. Results show that on average the sensible heat flux from wetland and farmland were comparable over the diurnal cycle, while the latent heat flux from farmland was somewhat larger by about 30-50 W m-2 during daytime. The latent and sensible fluxes from the lake source in daytime were about 50 W m-2 and 100 W m-2 less, respectively, than from wetland and farmland. The results are judged reasonable and serve to demonstrate the potential for flux apportionment over heterogeneous surfaces.

  11. New Examples of Flux Vacua

    CERN Document Server

    Maxfield, Travis; Robbins, Daniel; Sethi, Savdeep

    2013-01-01

    Type IIB toroidal orientifolds are among the earliest examples of flux vacua. By applying T-duality, we construct the first examples of massive IIA flux vacua with Minkowski space-times, along with new examples of type IIA flux vacua. The backgrounds are surprisingly simple with no four-form flux at all. They serve as illustrations of the ingredients needed to build type IIA and massive IIA solutions with scale separation. To check that these backgrounds are actually solutions, we formulate the complete set of type II supergravity equations of motion in a very useful form that treats the R-R fields democratically.

  12. Heat Flux Instrumentation Laboratory (HFIL)

    Data.gov (United States)

    Federal Laboratory Consortium — Description: The Heat Flux Instrumentation Laboratory is used to develop advanced, flexible, thin film gauge instrumentation for the Air Force Research Laboratory....

  13. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Science.gov (United States)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  14. Quantifying the Observability of CO2 Flux Uncertainty in Atmospheric CO2 Records Using Products from Nasa's Carbon Monitoring Flux Pilot Project

    Science.gov (United States)

    Ott, Lesley; Pawson, Steven; Collatz, Jim; Watson, Gregg; Menemenlis, Dimitris; Brix, Holger; Rousseaux, Cecile; Bowman, Kevin; Bowman, Kevin; Liu, Junjie; Eldering, Annmarie; Gunson, Michael; Kawa, Stephan R.

    2014-01-01

    NASAs Carbon Monitoring System (CMS) Flux Pilot Project (FPP) was designed to better understand contemporary carbon fluxes by bringing together state-of-the art models with remote sensing datasets. Here we report on simulations using NASAs Goddard Earth Observing System Model, version 5 (GEOS-5) which was used to evaluate the consistency of two different sets of observationally constrained land and ocean fluxes with atmospheric CO2 records. Despite the strong data constraint, the average difference in annual terrestrial biosphere flux between the two land (NASA Ames CASA and CASA-GFED) models is 1.7 Pg C for 2009-2010. Ocean models (NOBM and ECCO2-Darwin) differ by 35 in their global estimates of carbon flux with particularly strong disagreement in high latitudes. Based upon combinations of terrestrial and ocean fluxes, GEOS-5 reasonably simulated the seasonal cycle observed at northern hemisphere surface sites and by the Greenhouse gases Observing SATellite (GOSAT) while the model struggled to simulate the seasonal cycle at southern hemisphere surface locations. Though GEOS-5 was able to reasonably reproduce the patterns of XCO2 observed by GOSAT, it struggled to reproduce these aspects of AIRS observations. Despite large differences between land and ocean flux estimates, resulting differences in atmospheric mixing ratio were small, typically less than 5 ppmv at the surface and 3 ppmv in the XCO2 column. A statistical analysis based on the variability of observations shows that flux differences of these magnitudes are difficult to distinguish from natural variability, regardless of measurement platform.

  15. Kernel regression estimates of time delays between gravitationally lensed fluxes

    CERN Document Server

    Otaibi, Sultanah AL; Cuevas-Tello, Juan C; Mandel, Ilya; Raychaudhury, Somak

    2015-01-01

    Strongly lensed variable quasars can serve as precise cosmological probes, provided that time delays between the image fluxes can be accurately measured. A number of methods have been proposed to address this problem. In this paper, we explore in detail a new approach based on kernel regression estimates, which is able to estimate a single time delay given several datasets for the same quasar. We develop realistic artificial data sets in order to carry out controlled experiments to test of performance of this new approach. We also test our method on real data from strongly lensed quasar Q0957+561 and compare our estimates against existing results.

  16. Dataset de contenidos musicales de video, basado en emociones

    Directory of Open Access Journals (Sweden)

    Luis Alejandro Solarte Moncayo

    2016-07-01

    Full Text Available Agilizar el acceso al contenido, disminuyendo los tiempos de navegación por los catálogos multimedia, es uno de los retos del servicio de video bajo demanda (VoD, el cual es consecuencia del incremento de la cantidad de contenidos en las redes actuales. En este artículo, se describe el proceso de conformación de un dataset de videos musicales. Este dataset fue usado para el diseño e implementación de un servicio de VoD, el cual busca mejorar el acceso al contenido, mediante la clasificación musical de emociones. Así, en este trabajo se presenta la adaptación de un modelo de clasificación de emociones a partir del modelo de arousal-valence. Además, se describe el desarrollo de una herramienta Java para la clasificación de contenidos, la cual fue usada en la conformación del dataset. Finalmente, con el propósito de evaluar el dataset construido, se muestra la estructura funcional del servicio de VoD desarrollado.

  17. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...

  18. A dataset of human decision-making in teamwork management

    Science.gov (United States)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  19. A dataset of forest biomass structure for Eurasia

    Science.gov (United States)

    Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

    2017-05-01

    The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.

  20. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    Science.gov (United States)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  1. SisFall: A Fall and Movement Dataset

    Science.gov (United States)

    Sucerquia, Angela; López, José David; Vargas-Bonilla, Jesús Francisco

    2017-01-01

    Research on fall and movement detection with wearable devices has witnessed promising growth. However, there are few publicly available datasets, all recorded with smartphones, which are insufficient for testing new proposals due to their absence of objective population, lack of performed activities, and limited information. Here, we present a dataset of falls and activities of daily living (ADLs) acquired with a self-developed device composed of two types of accelerometer and one gyroscope. It consists of 19 ADLs and 15 fall types performed by 23 young adults, 15 ADL types performed by 14 healthy and independent participants over 62 years old, and data from one participant of 60 years old that performed all ADLs and falls. These activities were selected based on a survey and a literature analysis. We test the dataset with widely used feature extraction and a simple to implement threshold based classification, achieving up to 96% of accuracy in fall detection. An individual activity analysis demonstrates that most errors coincide in a few number of activities where new approaches could be focused. Finally, validation tests with elderly people significantly reduced the fall detection performance of the tested features. This validates findings of other authors and encourages developing new strategies with this new dataset as the benchmark. PMID:28117691

  2. Div400: a social image retrieval result diversification dataset

    DEFF Research Database (Denmark)

    Ionescu, Bogdan; Radu, Anca-Livia; Menendez Blanco, Maria

    2014-01-01

    In this paper we propose a new dataset, Div400, that was designed to support shared evaluation in different areas of social media photo retrieval, e.g., machine analysis (re-ranking, machine learning), human-based computation (crowdsourcing) or hybrid approaches (relevance feedback, machinecrowd ...

  3. Dataset - Droevendaal, Rolde and Colijnsplaat, 1996-2003

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Booij et al., 2001; Slabbekoorn, 2002; Slabbekoorn, 2003; Van Evert et al., 2011; Van Geel, 2003; Van Geel and Wijnholds, 2003; Van Geel et al., 2004). The data are presented as an SQL dump of

  4. Image dataset for testing search and detection models

    NARCIS (Netherlands)

    Toet, A.; Bijl, P.; Valeton, J.M.

    2001-01-01

    The TNO Human Factors Searchû2 image dataset consists of: a set of 44 high-resolution digital color images of different complex natural scenes, the ground truth corresponding to each of these scenes, and the results of psychophysical experiments on each of these images. The images in the Searchû2 da

  5. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based o...

  6. Automated single particle detection and tracking for large microscopy datasets.

    Science.gov (United States)

    Wilson, Rhodri S; Yang, Lei; Dun, Alison; Smyth, Annya M; Duncan, Rory R; Rickman, Colin; Lu, Weiping

    2016-05-01

    Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates.

  7. Using Real Datasets for Interdisciplinary Business/Economics Projects

    Science.gov (United States)

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  8. A global experimental dataset for assessing grain legume production

    Science.gov (United States)

    Cernay, Charles; Pelzer, Elise; Makowski, David

    2016-09-01

    Grain legume crops are a significant component of the human diet and animal feed and have an important role in the environment, but the global diversity of agricultural legume species is currently underexploited. Experimental assessments of grain legume performances are required, to identify potential species with high yields. Here, we introduce a dataset including results of field experiments published in 173 articles. The selected experiments were carried out over five continents on 39 grain legume species. The dataset includes measurements of grain yield, aerial biomass, crop nitrogen content, residual soil nitrogen content and water use. When available, yields for cereals and oilseeds grown after grain legumes in the crop sequence are also included. The dataset is arranged into a relational database with nine structured tables and 198 standardized attributes. Tillage, fertilization, pest and irrigation management are systematically recorded for each of the 8,581 crop*field site*growing season*treatment combinations. The dataset is freely reusable and easy to update. We anticipate that it will provide valuable information for assessing grain legume production worldwide.

  9. Determining Scale-dependent Patterns in Spatial and Temporal Datasets

    Science.gov (United States)

    Roy, A.; Perfect, E.; Mukerji, T.; Sylvester, L.

    2016-12-01

    Spatial and temporal datasets of interest to Earth scientists often contain plots of one variable against another, e.g., rainfall magnitude vs. time or fracture aperture vs. spacing. Such data, comprised of distributions of events along a transect / timeline along with their magnitudes, can display persistent or antipersistent trends, as well as random behavior, that may contain signatures of underlying physical processes. Lacunarity is a technique that was originally developed for multiscale analysis of data. In a recent study we showed that lacunarity can be used for revealing changes in scale-dependent patterns in fracture spacing data. Here we present a further improvement in our technique, with lacunarity applied to various non-binary datasets comprised of event spacings and magnitudes. We test our technique on a set of four synthetic datasets, three of which are based on an autoregressive model and have magnitudes at every point along the "timeline" thus representing antipersistent, persistent, and random trends. The fourth dataset is made up of five clusters of events, each containing a set of random magnitudes. The concept of lacunarity ratio, LR, is introduced; this is the lacunarity of a given dataset normalized to the lacunarity of its random counterpart. It is demonstrated that LR can successfully delineate scale-dependent changes in terms of antipersistence and persistence in the synthetic datasets. This technique is then applied to three different types of data: a hundred-year rainfall record from Knoxville, TN, USA, a set of varved sediments from Marca Shale, and a set of fracture aperture and spacing data from NE Mexico. While the rainfall data and varved sediments both appear to be persistent at small scales, at larger scales they both become random. On the other hand, the fracture data shows antipersistence at small scale (within cluster) and random behavior at large scales. Such differences in behavior with respect to scale-dependent changes in

  10. Comparison and validation of gridded precipitation datasets for Spain

    Science.gov (United States)

    Quintana-Seguí, Pere; Turco, Marco; Míguez-Macho, Gonzalo

    2016-04-01

    In this study, two gridded precipitation datasets are compared and validated in Spain: the recently developed SAFRAN dataset and the Spain02 dataset. These are validated using rain gauges and they are also compared to the low resolution ERA-Interim reanalysis. The SAFRAN precipitation dataset has been recently produced, using the SAFRAN meteorological analysis, which is extensively used in France (Durand et al. 1993, 1999; Quintana-Seguí et al. 2008; Vidal et al., 2010) and which has recently been applied to Spain (Quintana-Seguí et al., 2015). SAFRAN uses an optimal interpolation (OI) algorithm and uses all available rain gauges from the Spanish State Meteorological Agency (Agencia Estatal de Meteorología, AEMET). The product has a spatial resolution of 5 km and it spans from September 1979 to August 2014. This dataset has been produced mainly to be used in large scale hydrological applications. Spain02 (Herrera et al. 2012, 2015) is another high quality precipitation dataset for Spain based on a dense network of quality-controlled stations and it has different versions at different resolutions. In this study we used the version with a resolution of 0.11°. The product spans from 1971 to 2010. Spain02 is well tested and widely used, mainly, but not exclusively, for RCM model validation and statistical downscliang. ERA-Interim is a well known global reanalysis with a spatial resolution of ˜79 km. It has been included in the comparison because it is a widely used product for continental and global scale studies and also in smaller scale studies in data poor countries. Thus, its comparison with higher resolution products of a data rich country, such as Spain, allows us to quantify the errors made when using such datasets for national scale studies, in line with some of the objectives of the EU-FP7 eartH2Observe project. The comparison shows that SAFRAN and Spain02 perform similarly, even though their underlying principles are different. Both products are largely

  11. From elementary flux modes to elementary flux vectors: Metabolic pathway analysis with arbitrary linear flux constraints.

    Science.gov (United States)

    Klamt, Steffen; Regensburger, Georg; Gerstl, Matthias P; Jungreuthmayer, Christian; Schuster, Stefan; Mahadevan, Radhakrishnan; Zanghellini, Jürgen; Müller, Stefan

    2017-04-01

    Elementary flux modes (EFMs) emerged as a formal concept to describe metabolic pathways and have become an established tool for constraint-based modeling and metabolic network analysis. EFMs are characteristic (support-minimal) vectors of the flux cone that contains all feasible steady-state flux vectors of a given metabolic network. EFMs account for (homogeneous) linear constraints arising from reaction irreversibilities and the assumption of steady state; however, other (inhomogeneous) linear constraints, such as minimal and maximal reaction rates frequently used by other constraint-based techniques (such as flux balance analysis [FBA]), cannot be directly integrated. These additional constraints further restrict the space of feasible flux vectors and turn the flux cone into a general flux polyhedron in which the concept of EFMs is not directly applicable anymore. For this reason, there has been a conceptual gap between EFM-based (pathway) analysis methods and linear optimization (FBA) techniques, as they operate on different geometric objects. One approach to overcome these limitations was proposed ten years ago and is based on the concept of elementary flux vectors (EFVs). Only recently has the community started to recognize the potential of EFVs for metabolic network analysis. In fact, EFVs exactly represent the conceptual development required to generalize the idea of EFMs from flux cones to flux polyhedra. This work aims to present a concise theoretical and practical introduction to EFVs that is accessible to a broad audience. We highlight the close relationship between EFMs and EFVs and demonstrate that almost all applications of EFMs (in flux cones) are possible for EFVs (in flux polyhedra) as well. In fact, certain properties can only be studied with EFVs. Thus, we conclude that EFVs provide a powerful and unifying framework for constraint-based modeling of metabolic networks.

  12. Regional climate change study requires new temperature datasets

    Science.gov (United States)

    Wang, Kaicun; Zhou, Chunlüe

    2017-04-01

    Analyses of global mean air temperature (Ta), i. e., NCDC GHCN, GISS, and CRUTEM4, are the fundamental datasets for climate change study and provide key evidence for global warming. All of the global temperature analyses over land are primarily based on meteorological observations of the daily maximum and minimum temperatures (Tmax and Tmin) and their averages (T2) because in most weather stations, the measurements of Tmax and Tmin may be the only choice for a homogenous century-long analysis of mean temperature. Our studies show that these datasets are suitable for long-term global warming studies. However, they may have substantial biases in quantifying local and regional warming rates, i.e., with a root mean square error of more than 25% at 5 degree grids. From 1973 to 1997, the current datasets tend to significantly underestimate the warming rate over the central U.S. and overestimate the warming rate over the northern high latitudes. Similar results revealed during the period 1998-2013, the warming hiatus period, indicate the use of T2 enlarges the spatial contrast of temperature trends. This is because T2 over land only samples air temperature twice daily and cannot accurately reflect land-atmosphere and incoming radiation variations in the temperature diurnal cycle. For better regional climate change detection and attribution, we suggest creating new global mean air temperature datasets based on the recently available high spatiotemporal resolution meteorological observations, i.e., daily four observations weather station since 1960s. These datasets will not only help investigate dynamical processes on temperature variances but also help better evaluate the reanalyzed and modeled simulations of temperature and make some substantial improvements for other related climate variables in models, especially over regional and seasonal aspects.

  13. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2013-01-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution

  14. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2012-07-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly lower in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx while for CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution of emissions in certain regions for the Aero2k dataset.

  15. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  16. Predicting MHC class I epitopes in large datasets

    Directory of Open Access Journals (Sweden)

    Lengauer Thomas

    2010-02-01

    Full Text Available Abstract Background Experimental screening of large sets of peptides with respect to their MHC binding capabilities is still very demanding due to the large number of possible peptide sequences and the extensive polymorphism of the MHC proteins. Therefore, there is significant interest in the development of computational methods for predicting the binding capability of peptides to MHC molecules, as a first step towards selecting peptides for actual screening. Results We have examined the performance of four diverse MHC Class I prediction methods on comparatively large HLA-A and HLA-B allele peptide binding datasets extracted from the Immune Epitope Database and Analysis resource (IEDB. The chosen methods span a representative cross-section of available methodology for MHC binding predictions. Until the development of IEDB, such an analysis was not possible, as the available peptide sequence datasets were small and spread out over many separate efforts. We tested three datasets which differ in the IC50 cutoff criteria used to select the binders and non-binders. The best performance was achieved when predictions were performed on the dataset consisting only of strong binders (IC50 less than 10 nM and clear non-binders (IC50 greater than 10,000 nM. In addition, robustness of the predictions was only achieved for alleles that were represented with a sufficiently large (greater than 200, balanced set of binders and non-binders. Conclusions All four methods show good to excellent performance on the comprehensive datasets, with the artificial neural networks based method outperforming the other methods. However, all methods show pronounced difficulties in correctly categorizing intermediate binders.

  17. Continuous SO2 flux measurements for Vulcano Island, Italy

    Directory of Open Access Journals (Sweden)

    Fabio Vita

    2012-06-01

    Full Text Available The La Fossa cone of Vulcano Island (Aeolian Archipelago, Italy is a closed conduit volcano. Today, Vulcano Island is characterized by sulfataric activity, with a large fumarolic field that is mainly located in the summit area. A scanning differential optical absorption spectroscopy instrument designed by the Optical Sensing Group of Chalmers University of Technology in Göteborg, Sweden, was installed in the framework of the European project "Network for Observation of Volcanic and Atmospheric Change", in March 2008. This study presents the first dataset of SO2 plume fluxes recorded for a closed volcanic system. Between 2008 and 2010, the SO2 fluxes recorded showed average values of 12 t.d–1 during the normal sulfataric activity of Vulcano Island, with one exceptional event of strong degassing that occurred between September and December, 2009, when the SO2 emissions reached up to 100 t.d–1.

  18. A case study of eddy covariance flux of N2O measured within forest ecosystems: quality control and flux error analysis

    Directory of Open Access Journals (Sweden)

    T. Markkanen

    2010-02-01

    Full Text Available Eddy covariance (EC flux measurements of nitrous oxide (N2O obtained by using a 3-D sonic anemometer and a tunable diode laser gas analyzer for N2O were investigated. Two datasets (Sorø, Denmark and Kalevansuo, Finland from different measurement campaigns including sub-canopy flux measurements of energy and carbon dioxide are discussed with a focus on selected quality control aspects and flux error analysis. Although fast response trace gas analyzers based on spectroscopic techniques are increasingly used in ecosystem research, their suitability for reliable estimates of EC fluxes is still limited, and some assumptions have to be made for filtering and processing data. The N2O concentration signal was frequently dominated by offset drifts (fringe effect, which can give an artificial extra contribution to the fluxes when the resulting concentration fluctuations are correlated with the fluctuations of the vertical wind velocity. Based on Allan variance analysis of the N2O signal, we found that a recursive running mean filter with a time constant equal to 50 s was suitable to damp the influence of the periodic drift. Although the net N2O fluxes over the whole campaign periods were quite small at both sites (~5 μg N m−2 h−1 for Kalevansuo and ~10 μg N m−2 h−1 for Sorø, the calculated sub-canopy EC fluxes were in good agreement with those estimated by automatic soil chambers. However, EC N2O flux measurements show larger random uncertainty than the sensible heat fluxes, and classification according to statistical significance of single flux values indicates that downward N2O fluxes have larger random error.

  19. Data Acquisition and Flux Calculations

    DEFF Research Database (Denmark)

    Rebmann, C.; Kolle, O; Heinesch, B;

    2012-01-01

    In this chapter, the basic theory and the procedures used to obtain turbulent fluxes of energy, mass, and momentum with the eddy covariance technique will be detailed. This includes a description of data acquisition, pretreatment of high-frequency data and flux calculation....

  20. Superconducting wires and fractional flux

    Science.gov (United States)

    Sá de Melo, C. A. R.

    1996-05-01

    The quantization of flux quanta in superconductors is revisited and analyzed in a new geometry. The system analyzed is a superconducting wire. The geometry is such that the superconducting wire winds N times around an insulating cylinder and that the wire has its end connected back to its beginning, thus producing an N-loop short circuited solenoid. The winding number N acts as a topological index that controls flux quantization. In this case, fractional flux quanta can be measured through the center of the insulating cylinder, provided that the cylinder radius is small enough. The Little-Parks experiment for an identical geometry is discussed. The period of oscillation of the transition temperature of the wire is found to vary as 1/N in units of flux Φ relative to the flux quantum Φ0. When a SQUID is made in such a geometry the maximal current through the SQUID varies with period Φ0/N.

  1. Interpreting Flux from Broadband Photometry

    CERN Document Server

    Brown, Peter J; Roming, Peter W A; Siegel, Michael

    2016-01-01

    We discuss the transformation of observed photometry into flux for the creation of spectral energy distributions and the computation of bolometric luminosities. We do this in the context of supernova studies, particularly as observed with the Swift spacecraft, but the concepts and techniques should be applicable to many other types of sources and wavelength regimes. Traditional methods of converting observed magnitudes to flux densities are not very accurate when applied to UV photometry. Common methods for extinction and the integration of pseudo-bolometric fluxes can also lead to inaccurate results. The sources of inaccuracy, though, also apply to other wavelengths. Because of the complicated nature of translating broad-band photometry into monochromatic flux densities, comparison between observed photometry and a spectroscopic model is best done by comparing in the natural units of the observations. We recommend that integrated flux measurements be made using a spectrum or spectral energy distribution whic...

  2. Climate impacts on the structures of the North Pacific air-sea CO2 flux variability

    Directory of Open Access Journals (Sweden)

    Y. Nojiri

    2011-05-01

    Full Text Available Some dominant spatial and temporal structures of the North Pacific air-sea CO2 fluxes in response to the Pacific Decadal Oscillation (PDO are identified in four data products from four independent sources: an assimilated CO2 flux product, two forward model solutions, and a gridded pCO2 dataset constructed with a neural network approach. The interannual variability of CO2 flux is found to be an order of magnitude weaker compared to the seasonal cycle of CO2 flux in the North Pacific. A statistical approach is employed to quantify the signal-to-noise ratio in the reconstructed dataset to delineate the representativity errors. The dominant variability with a signal-to-noise ratio above one is identified and its correlations with PDO are examined. A tentative four-box structure in the North Pacific air-sea CO2 flux variability linked to PDO emerges in which two positively correlated boxes are oriented in the northwest and southeast directions and contrarily, the negatively correlated boxes are oriented in the northeast and southwest directions. This pattern is verified with the CO2 and pCO2 from four products and its relations to the interannual El Niño-Southern Oscillation (ENSO and lower-frequency PDO are separately identified. A combined EOF analysis between air-sea CO2 flux and key variables representing ocean-atmosphere interactions is carried out to elicit robust oscillations in the North Pacific CO2 flux in response to the PDO. The proposed spatial and temporal structures of the North Pacific CO2 fluxes are insightful since they separate the secular trends of the surface ocean carbon from the interannual variability. The regional characterization of the North Pacific in terms of PDO and CO2 flux variability is also instructive for determining the homogeneous oceanic domains for the Regional Carbon Cycle and Assessment Processes (RECCAP.

  3. An Analysis on Better Testing than Training Performances on the Iris Dataset

    NARCIS (Netherlands)

    Schutten, Marten; Wiering, Marco

    2016-01-01

    The Iris dataset is a well known dataset containing information on three different types of Iris flowers. A typical and popular method for solving classification problems on datasets such as the Iris set is the support vector machine (SVM). In order to do so the dataset is separated in a set used fo

  4. Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

    Directory of Open Access Journals (Sweden)

    Folorunso Olufemi A.

    2011-04-01

    Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.

  5. Application of XML database technology to biological pathway datasets.

    Science.gov (United States)

    Jiang, Keyuan; Nash, Christopher

    2006-01-01

    The study of biological systems has accumulated a significant amount of biological pathway data, which is evident through the continued growth in both the number of databases and amount of data available. The development of BioPAX standard leads to the increased availability of biological pathway datasets through the use of a special XML format, but the lack of standard storage mechanism makes the querying and aggregation of BioPAX compliant data challenging. To address this shortcoming, we have developed a storage mechanism leveraging the existing XML technologies: the XML database and XQuery. The goal of our project is to provide a generic and centralized store with efficient queries for the needs of biomedical research. A SOAP-based Web service and direct HTTP request methods have also developed to facilitate public consumption of the datasets online.

  6. Robust Machine Learning Applied to Terascale Astronomical Datasets

    Science.gov (United States)

    Ball, N. M.; Brunner, R. J.; Myers, A. D.

    2008-08-01

    We present recent results from the Laboratory for Cosmological Data Mining {http://lcdm.astro.uiuc.edu} at the National Center for Supercomputing Applications (NCSA) to provide robust classifications and photometric redshifts for objects in the terascale-class Sloan Digital Sky Survey (SDSS). Through a combination of machine learning in the form of decision trees, k-nearest neighbor, and genetic algorithms, the use of supercomputing resources at NCSA, and the cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million objects in the SDSS, improved photometric redshifts, and a full exploitation of the powerful k-nearest neighbor algorithm. This work is the first to apply the full power of these algorithms to contemporary terascale astronomical datasets, and the improvement over existing results is demonstrable. We discuss issues that we have encountered in dealing with data on the terascale, and possible solutions that can be implemented to deal with upcoming petascale datasets.

  7. An axiomatic approach to intrinsic dimension of a dataset

    CERN Document Server

    Pestov, Vladimir

    2007-01-01

    We perform a deeper analysis of an axiomatic approach to the concept of intrinsic dimension of a dataset proposed by us in the IJCNN'07 paper (arXiv:cs/0703125). The main features of our approach are that a high intrinsic dimension of a dataset reflects the presence of the curse of dimensionality (in a certain mathematically precise sense), and that dimension of a discrete i.i.d. sample of a low-dimensional manifold is, with high probability, close to that of the manifold. At the same time, the intrinsic dimension of a sample is easily corrupted by moderate high-dimensional noise (of the same amplitude as the size of the manifold) and suffers from prohibitevely high computational complexity (computing it is an $NP$-complete problem). We outline a possible way to overcome these difficulties.

  8. Content-level deduplication on mobile internet datasets

    Science.gov (United States)

    Hou, Ziyu; Chen, Xunxun; Wang, Yang

    2017-06-01

    Various systems and applications involve a large volume of duplicate items. Based on high data redundancy in real world datasets, data deduplication can reduce storage capacity and improve the utilization of network bandwidth. However, chunks of existing deduplications range in size from 4KB to over 16KB, existing systems are not applicable to the datasets consisting of short records. In this paper, we propose a new framework called SF-Dedup which is able to implement the deduplication process on a large set of Mobile Internet records, the size of records can be smaller than 100B, or even smaller than 10B. SF-Dedup is a short fingerprint, in-line, hash-collisions-resolved deduplication. Results of experimental applications illustrate that SH-Dedup is able to reduce storage capacity and shorten query time on relational database.

  9. Serial femtosecond crystallography datasets from G protein-coupled receptors.

    Science.gov (United States)

    White, Thomas A; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R; Yoon, Chun Hong; Yefanov, Oleksandr M; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim

    2016-08-01

    We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data.

  10. Out-of-core clustering of volumetric datasets

    Institute of Scientific and Technical Information of China (English)

    GRANBERG Carl J.; LI Ling

    2006-01-01

    In this paper we present a novel method for dividing and clustering large volumetric scalar out-of-core datasets. This work is based on the Ordered Cluster Binary Tree (OCBT) structure created using a top-down or divisive clustering method. The OCBT structure allows fast and efficient sub volume queries to be made in combination with level of detail (LOD) queries of the tree. The initial partitioning of the large out-of-core dataset is done by using non-axis aligned planes calculated using Principal Component Analysis (PCA). A hybrid OCBT structure is also proposed where an in-core cluster binary tree is combined with a large out-of-core file.

  11. Interpreting Flux from Broadband Photometry

    Science.gov (United States)

    Brown, Peter J.; Breeveld, Alice; Roming, Peter W. A.; Siegel, Michael

    2016-10-01

    We discuss the transformation of observed photometry into flux for the creation of spectral energy distributions (SED) and the computation of bolometric luminosities. We do this in the context of supernova studies, particularly as observed with the Swift spacecraft, but the concepts and techniques should be applicable to many other types of sources and wavelength regimes. Traditional methods of converting observed magnitudes to flux densities are not very accurate when applied to UV photometry. Common methods for extinction and the integration of pseudo-bolometric fluxes can also lead to inaccurate results. The sources of inaccuracy, though, also apply to other wavelengths. Because of the complicated nature of translating broadband photometry into monochromatic flux densities, comparison between observed photometry and a spectroscopic model is best done by forward modeling the spectrum into the count rates or magnitudes of the observations. We recommend that integrated flux measurements be made using a spectrum or SED which is consistent with the multi-band photometry rather than converting individual photometric measurements to flux densities, linearly interpolating between the points, and integrating. We also highlight some specific areas where the UV flux can be mischaracterized.

  12. MEME-ChIP: motif analysis of large DNA datasets.

    Science.gov (United States)

    Machanick, Philip; Bailey, Timothy L

    2011-06-15

    Advances in high-throughput sequencing have resulted in rapid growth in large, high-quality datasets including those arising from transcription factor (TF) ChIP-seq experiments. While there are many existing tools for discovering TF binding site motifs in such datasets, most web-based tools cannot directly process such large datasets. The MEME-ChIP web service is designed to analyze ChIP-seq 'peak regions'--short genomic regions surrounding declared ChIP-seq 'peaks'. Given a set of genomic regions, it performs (i) ab initio motif discovery, (ii) motif enrichment analysis, (iii) motif visualization, (iv) binding affinity analysis and (v) motif identification. It runs two complementary motif discovery algorithms on the input data--MEME and DREME--and uses the motifs they discover in subsequent visualization, binding affinity and identification steps. MEME-ChIP also performs motif enrichment analysis using the AME algorithm, which can detect very low levels of enrichment of binding sites for TFs with known DNA-binding motifs. Importantly, unlike with the MEME web service, there is no restriction on the size or number of uploaded sequences, allowing very large ChIP-seq datasets to be analyzed. The analyses performed by MEME-ChIP provide the user with a varied view of the binding and regulatory activity of the ChIP-ed TF, as well as the possible involvement of other DNA-binding TFs. MEME-ChIP is available as part of the MEME Suite at http://meme.nbcr.net.

  13. Simultaneous clustering of multiple gene expression and physical interaction datasets.

    Directory of Open Access Journals (Sweden)

    Manikandan Narayanan

    2010-04-01

    Full Text Available Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.

  14. Microscopic images dataset for automation of RBCs counting

    Directory of Open Access Journals (Sweden)

    Sherif Abbas

    2015-12-01

    Full Text Available A method for Red Blood Corpuscles (RBCs counting has been developed using RBCs light microscopic images and Matlab algorithm. The Dataset consists of Red Blood Corpuscles (RBCs images and there RBCs segmented images. A detailed description using flow chart is given in order to show how to produce RBCs mask. The RBCs mask was used to count the number of RBCs in the blood smear image.

  15. Evaluating summarised radionuclide concentration ratio datasets for wildlife.

    Science.gov (United States)

    Wood, M D; Beresford, N A; Howard, B J; Copplestone, D

    2013-12-01

    Concentration ratios (CR(wo-media)) are used in most radioecological models to predict whole-body radionuclide activity concentrations in wildlife from those in environmental media. This simplistic approach amalgamates the various factors influencing transfer within a single generic value and, as a result, comparisons of model predictions with site-specific measurements can vary by orders of magnitude. To improve model predictions, the development of 'condition-specific' CR(wo-media) values has been proposed (e.g. for a specific habitat). However, the underlying datasets for most CR(wo-media) value databases, such as the wildlife transfer database (WTD) developed within the IAEA EMRAS II programme, include summarised data. This presents challenges for the calculation and subsequent statistical evaluation of condition-specific CR(wo-media) values. A further complication is the common use of arithmetic summary statistics to summarise data in source references, even though CR(wo-media) values generally tend towards a lognormal distribution and should, therefore, be summarised using geometric statistics. In this paper, we propose a statistically-defensible and robust method for reconstructing underlying datasets to calculate condition-specific CR(wo-media) values from summarised data and deriving geometric summary statistics. This method is applied to terrestrial datasets from the WTD. Statistically significant differences in sub-category CR(wo-media) values (e.g. mammals categorised by feeding strategy) were identified, which may justify the use of these CR(wo-media) values for specific assessment contexts. However, biases and limitations within the underlying datasets of the WTD explain some of these differences. Given the uncertainty in the summarised CR(wo-media) values, we suggest that the CR(wo-media) approach to estimating transfer is used with caution above screening-level assessments. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights

  16. Microscopic images dataset for automation of RBCs counting.

    Science.gov (United States)

    Abbas, Sherif

    2015-12-01

    A method for Red Blood Corpuscles (RBCs) counting has been developed using RBCs light microscopic images and Matlab algorithm. The Dataset consists of Red Blood Corpuscles (RBCs) images and there RBCs segmented images. A detailed description using flow chart is given in order to show how to produce RBCs mask. The RBCs mask was used to count the number of RBCs in the blood smear image.

  17. Analysis of Heart Diseases Dataset using Neural Network Approach

    CERN Document Server

    Rani, K Usha

    2011-01-01

    One of the important techniques of Data mining is Classification. Many real world problems in various fields such as business, science, industry and medicine can be solved by using classification approach. Neural Networks have emerged as an important tool for classification. The advantages of Neural Networks helps for efficient classification of given data. In this study a Heart diseases dataset is analyzed using Neural Network approach. To increase the efficiency of the classification process parallel approach is also adopted in the training phase.

  18. Circumpolar dataset of sequenced specimens of Promachocrinus kerguelensis (Echinodermata, Crinoidea

    Directory of Open Access Journals (Sweden)

    Lenaïg G. Hemery

    2013-07-01

    Full Text Available This circumpolar dataset of the comatulid (Echinodermata: Crinoidea Promachocrinus kerguelensis (Carpenter, 1888 from the Southern Ocean, documents biodiversity associated with the specimens sequenced in Hemery et al. (2012. The aim of Hemery et al. (2012 paper was to use phylogeographic and phylogenetic tools to assess the genetic diversity, demographic history and evolutionary relationships of this very common and abundant comatulid, in the context of the glacial history of the Antarctic and Sub-Antarctic shelves (Thatje et al. 2005, 2008. Over one thousand three hundred specimens (1307 used in this study were collected during seventeen cruises from 1996 to 2010, in eight regions of the Southern Ocean: Kerguelen Plateau, Davis Sea, Dumont d’Urville Sea, Ross Sea, Amundsen Sea, West Antarctic Peninsula, East Weddell Sea and Scotia Arc including the tip of the Antarctic Peninsula and the Bransfield Strait. We give here the metadata of this dataset, which lists sampling sources (cruise ID, ship name, sampling date, sampling gear, sampling sites (station, geographic coordinates, depth and genetic data (phylogroup, haplotype, sequence ID for each of the 1307 specimens. The identification of the specimens was controlled by an expert taxonomist specialist of crinoids (Marc Eléaume, Muséum national d’Histoire naturelle, Paris and all the COI sequences were matched against those available on the Barcode of Life Data System (BOLD: http://www.boldsystems.org/index.php/IDS_OpenIdEngine. This dataset can be used by studies dealing with, among other interests, Antarctic and/or crinoid diversity (species richness, distribution patterns, biogeography or habitat / ecological niche modeling. This dataset is accessible through the GBIF network at http://ipt.biodiversity.aq/resource.do?r=proke.

  19. How To Break Anonymity of the Netflix Prize Dataset

    OpenAIRE

    Narayanan, Arvind; Shmatikov, Vitaly

    2006-01-01

    We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. ...

  20. OpenCL based machine learning labeling of biomedical datasets

    Science.gov (United States)

    Amoros, Oscar; Escalera, Sergio; Puig, Anna

    2011-03-01

    In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and

  1. Data Integration Framework Data Management Plan Remote Sensing Dataset

    Science.gov (United States)

    2016-07-01

    lidar , X-band radar, and electro-optical (EO) and infrared (IR) imagery at the FRF. These datasets provide near real-time observations of the littoral...Mobile District, Operations Division, Spatial Data Branch CHL Coastal and Hydraulics Laboratory CHS Coastal Hazards System CLARIS Coastal Lidar ... system IR infrared ISO International Organization for Standardization JPG joint photographic experts group (file format) km kilometer LAS laser

  2. Pantheon: A Dataset for the Study of Global Cultural Production

    CERN Document Server

    Yu, Amy Zhao; Hu, Kevin; Lu, Tiffany; Hidalgo, César A

    2015-01-01

    We present the Pantheon 1.0 dataset: a manually curated dataset of individuals that have transcended linguistic, temporal, and geographic boundaries. The Pantheon 1.0 dataset includes the 11,341 biographies present in more than 25 languages in Wikipedia and is enriched with: (i) manually curated demographic information (place of birth, date of birth, and gender), (ii) a cultural domain classification categorizing each biography at three levels of aggregation (i.e. Arts/Fine Arts/Painting), and (iii) measures of global visibility (fame) including the number of languages in which a biography is present in Wikipedia, the monthly page-views received by a biography (2008-2013), and a global visibility metric we name the Historical Popularity Index (HPI). We validate our measures of global visibility (HPI and Wikipedia language editions) using external measures of accomplishment in several cultural domains: Tennis, Swimming, Car Racing, and Chess. In all of these cases we find that measures of accomplishments and f...

  3. Increasing consistency of disease biomarker prediction across datasets.

    Directory of Open Access Journals (Sweden)

    Maria D Chikina

    Full Text Available Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such 'latent variables' (without prior knowledge of the variables in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern.

  4. Principal Component Analysis of Process Datasets with Missing Values

    Directory of Open Access Journals (Sweden)

    Kristen A. Severson

    2017-07-01

    Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.

  5. Synchronization of networks of chaotic oscillators: Structural and dynamical datasets

    Directory of Open Access Journals (Sweden)

    Ricardo Sevilla-Escoboza

    2016-06-01

    Full Text Available We provide the topological structure of a series of N=28 Rössler chaotic oscillators diffusively coupled through one of its variables. The dynamics of the y variable describing the evolution of the individual nodes of the network are given for a wide range of coupling strengths. Datasets capture the transition from the unsynchronized behavior to the synchronized one, as a function of the coupling strength between oscillators. The fact that both the underlying topology of the system and the dynamics of the nodes are given together makes this dataset a suitable candidate to evaluate the interplay between functional and structural networks and serve as a benchmark to quantify the ability of a given algorithm to extract the structural network of connections from the observation of the dynamics of the nodes. At the same time, it is possible to use the dataset to analyze the different dynamical properties (randomness, complexity, reproducibility, etc. of an ensemble of oscillators as a function of the coupling strength.

  6. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander

    2012-05-31

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  7. Scaling statistical multiple sequence alignment to large datasets

    Directory of Open Access Journals (Sweden)

    Michael Nute

    2016-11-01

    Full Text Available Abstract Background Multiple sequence alignment is an important task in bioinformatics, and alignments of large datasets containing hundreds or thousands of sequences are increasingly of interest. While many alignment methods exist, the most accurate alignments are likely to be based on stochastic models where sequences evolve down a tree with substitutions, insertions, and deletions. While some methods have been developed to estimate alignments under these stochastic models, only the Bayesian method BAli-Phy has been able to run on even moderately large datasets, containing 100 or so sequences. A technique to extend BAli-Phy to enable alignments of thousands of sequences could potentially improve alignment and phylogenetic tree accuracy on large-scale data beyond the best-known methods today. Results We use simulated data with up to 10,000 sequences representing a variety of model conditions, including some that are significantly divergent from the statistical models used in BAli-Phy and elsewhere. We give a method for incorporating BAli-Phy into PASTA and UPP, two strategies for enabling alignment methods to scale to large datasets, and give alignment and tree accuracy results measured against the ground truth from simulations. Comparable results are also given for other methods capable of aligning this many sequences. Conclusions Extensions of BAli-Phy using PASTA and UPP produce significantly more accurate alignments and phylogenetic trees than the current leading methods.

  8. ENHANCED DATA DISCOVERABILITY FOR IN SITU HYPERSPECTRAL DATASETS

    Directory of Open Access Journals (Sweden)

    B. Rasaiah

    2016-06-01

    Full Text Available Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015 with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  9. Enhanced Data Discoverability for in Situ Hyperspectral Datasets

    Science.gov (United States)

    Rasaiah, B.; Bellman, C.; Hewson, R. D.; Jones, S. D.; Malthus, T. J.

    2016-06-01

    Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015) with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  10. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy.

    Science.gov (United States)

    Levin, Barnaby D A; Padgett, Elliot; Chen, Chien-Chun; Scott, M C; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D; Robinson, Richard D; Ercius, Peter; Kourkoutis, Lena F; Miao, Jianwei; Muller, David A; Hovden, Robert

    2016-06-07

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data.

  11. Improved Cosmological Constraints from New, Old and Combined Supernova Datasets

    CERN Document Server

    Kowalski, M; Aldering, G; Agostinho, R J; Amadon, A; Amanullah, R; Balland, C; Barbary, K; Blanc, G; Challis, P J; Conley, A; Connolly, N V; Covarrubias, R; Dawson, K S; Deustua, S E; Ellis, R; Fabbro, S; Fadeev, V; Fan, X; Farris, B; Folatelli, G; Frye, B L; Garavini, G; Gates, E L; Germany, L; Goldhaber, G; Goldman, B; Goobar, A; Groom, D E; Haïssinski, J; Hardin, D; Hook, I; Kent, S; Kim, A G; Knop, R A; Lidman, C; Linder, E V; Méndez, J; Meyers, J; Miller, G J; Moniez, M; Mourão, A M; Newberg, H; Nobili, S; Nugent, P E; Pain, R; Perdereau, O; Perlmutter, S; Phillips, M M; Prasad, V; Quimby, R; Regnault, N; Rich, J; Rubenstein, E P; Ruiz-Lapuente, P; Santos, F D; Schaefer, B E; Schommer, R A; Smith, R C; Soderberg, A M; Spadafora, A L; Strolger, L -G; Strovink, M; Suntzeff, N B; Suzuki, N; Thomas, R C; Walton, N A; Wang, L; Wood-Vasey, W M; Yun, J L

    2008-01-01

    We present a new compilation of Type Ia supernovae (SNe Ia), a new dataset of low-redshift nearby-Hubble-flow SNe and new analysis procedures to work with these heterogeneous compilations. This ``Union'' compilation of 414 SN Ia, which reduces to 307 SNe after selection cuts, includes the recent large samples of SNe Ia from the Supernova Legacy Survey and ESSENCE Survey, the older datasets, as well as the recently extended dataset of distant supernovae observed with HST. A single, consistent and blind analysis procedure is used for all the various SN Ia subsamples, and a new procedure is implemented that consistently weights the heterogeneous data sets and rejects outliers. We present the latest results from this Union compilation and discuss the cosmological constraints from this new compilation and its combination with other cosmological measurements (CMB and BAO). The constraint we obtain from supernovae on the dark energy density is $\\Omega_\\Lambda= 0.713^{+0.027}_{-0.029} (stat)}^{+0.036}_{-0.039} (sys)}...

  12. New public dataset for spotting patterns in medieval document images

    Science.gov (United States)

    En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

    2017-01-01

    With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.

  13. Strategies for analyzing highly enriched IP-chip datasets

    Directory of Open Access Journals (Sweden)

    Tavaré Simon

    2009-09-01

    Full Text Available Abstract Background Chromatin immunoprecipitation on tiling arrays (ChIP-chip has been employed to examine features such as protein binding and histone modifications on a genome-wide scale in a variety of cell types. Array data from the latter studies typically have a high proportion of enriched probes whose signals vary considerably (due to heterogeneity in the cell population, and this makes their normalization and downstream analysis difficult. Results Here we present strategies for analyzing such experiments, focusing our discussion on the analysis of Bromodeoxyruridine (BrdU immunoprecipitation on tiling array (BrdU-IP-chip datasets. BrdU-IP-chip experiments map large, recently replicated genomic regions and have similar characteristics to histone modification/location data. To prepare such data for downstream analysis we employ a dynamic programming algorithm that identifies a set of putative unenriched probes, which we use for both within-array and between-array normalization. We also introduce a second dynamic programming algorithm that incorporates a priori knowledge to identify and quantify positive signals in these datasets. Conclusion Highly enriched IP-chip datasets are often difficult to analyze with traditional array normalization and analysis strategies. Here we present and test a set of analytical tools for their normalization and quantification that allows for accurate identification and analysis of enriched regions.

  14. Igloo-Plot: a tool for visualization of multidimensional datasets.

    Science.gov (United States)

    Kuntal, Bhusan K; Ghosh, Tarini Shankar; Mande, Sharmila S

    2014-01-01

    Advances in science and technology have resulted in an exponential growth of multivariate (or multi-dimensional) datasets which are being generated from various research areas especially in the domain of biological sciences. Visualization and analysis of such data (with the objective of uncovering the hidden patterns therein) is an important and challenging task. We present a tool, called Igloo-Plot, for efficient visualization of multidimensional datasets. The tool addresses some of the key limitations of contemporary multivariate visualization and analysis tools. The visualization layout, not only facilitates an easy identification of clusters of data-points having similar feature compositions, but also the 'marker features' specific to each of these clusters. The applicability of the various functionalities implemented herein is demonstrated using several well studied multi-dimensional datasets. Igloo-Plot is expected to be a valuable resource for researchers working in multivariate data mining studies. Igloo-Plot is available for download from: http://metagenomics.atc.tcs.com/IglooPlot/.

  15. Filtergraph: An Interactive Web Application for Visualization of Astronomy Datasets

    CERN Document Server

    Burger, Dan; Pepper, Joshua; Siverd, Robert J; Paegert, Martin; De Lee, Nathan M

    2013-01-01

    Filtergraph is a web application being developed and maintained by the Vanderbilt Initiative in Data-intensive Astrophysics (VIDA) to flexibly and rapidly visualize a large variety of astronomy datasets of various formats and sizes. The user loads a flat-file dataset into Filtergraph which automatically generates an interactive data portal that can be easily shared with others. From this portal, the user can immediately generate scatter plots of up to 5 dimensions as well as histograms and tables based on the dataset. Key features of the portal include intuitive controls with auto-completed variable names, the ability to filter the data in real time through user-specified criteria, the ability to select data by dragging on the screen, and the ability to perform arithmetic operations on the data in real time. To enable seamless data visualization and exploration, changes are quickly rendered on screen and visualizations can be exported as high quality graphics files. The application is optimized for speed in t...

  16. Image segmentation evaluation for very-large datasets

    Science.gov (United States)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  17. Multiresolution persistent homology for excessively large biomolecular datasets

    Science.gov (United States)

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-01-01

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs. PMID:26450288

  18. Multiresolution persistent homology for excessively large biomolecular datasets

    Energy Technology Data Exchange (ETDEWEB)

    Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  19. Airborne flux measurements of Biogenic Isoprene over California

    Energy Technology Data Exchange (ETDEWEB)

    Misztal, P.; Karl, Thomas G.; Weber, Robin; Jonsson, H. H.; Guenther, Alex B.; Goldstein, Allen H.

    2014-10-10

    Biogenic Volatile Organic Compound (BVOC) fluxes were measured onboard the CIRPAS Twin Otter aircraft as part of the California Airborne BVOC Emission Research in Natural Ecosystem Transects (CABERNET) campaign during June 2011. The airborne virtual disjunct eddy covariance (AvDEC) approach used measurements from a PTR-MS and a wind radome probe to directly determine fluxes of isoprene, MVK+MAC, methanol, monoterpenes, and MBO over ~10,000-km of flight paths focusing on areas of California predicted to have the largest emissions of isoprene. The Fast Fourier Transform (FFT) approach was used to calculate fluxes over long transects of more than 15 km, most commonly between 50 and 150 km. The Continuous Wavelet Transformation (CWT) approach was used over the same transects to also calculate "instantaneous" fluxes with localization of both frequency and time independent of non-stationarities. Vertical flux divergence of isoprene is expected due to its relatively short lifetime and was measured directly using "racetrack" profiles at multiple altitudes. It was found to be linear and in the range 5% to 30% depending on the ratio of aircraft altitude to PBL height (z/zi). Fluxes were generally measured by flying consistently 1 at 400 m ±50 m (a.g.l.) altitude, and extrapolated to the surface according to the determined flux divergence. The wavelet-derived surface fluxes of isoprene averaged to 2 km spatial resolution showed good correspondence to Basal Emission Factor (BEF) landcover datasets used to drive biogenic VOC (BVOC) emission models. The surface flux of isoprene was close to zero over Central Valley crops and desert shrublands, but was very high (up to 15 mg m-2 h-1) above oak woodlands, with clear dependence of emissions on temperature and oak density. Isoprene concentrations of up to 8 ppb were observed at aircraft height on the hottest days and over the dominant source regions. While isoprene emissions from agricultural crop regions, shrublands, and

  20. Airborne flux measurements of biogenic volatile organic compounds over California

    Directory of Open Access Journals (Sweden)

    P. K. Misztal

    2014-03-01

    Full Text Available Biogenic Volatile Organic Compound (BVOC fluxes were measured onboard the CIRPAS Twin Otter aircraft as part of the California Airborne BVOC Emission Research in Natural Ecosystem Transects (CABERNET campaign during June 2011. The airborne virtual disjunct eddy covariance (AvDEC approach used measurements from a PTR-MS and a wind radome probe to directly determine fluxes of isoprene, MVK + MAC, methanol, monoterpenes, and MBO over ∼10 000 km of flight paths focusing on areas of California predicted to have the largest emissions of isoprene. The Fast Fourier Transform (FFT approach was used to calculate fluxes over long transects of more than 15 km, most commonly between 50 and 150 km. The Continuous Wavelet Transformation (CWT approach was used over the same transects to also calculate "instantaneous" fluxes with localization of both frequency and time independent of non-stationarities. Vertical flux divergence of isoprene is expected due to its relatively short lifetime and was measured directly using "racetrack" profiles at multiple altitudes. It was found to be linear and in the range 5% to 30% depending on the ratio of aircraft altitude to PBL height (z / zi. Fluxes were generally measured by flying consistently at 400 ± 50 m (a.g.l. altitude, and extrapolated to the surface according to the determined flux divergence. The wavelet-derived surface fluxes of isoprene averaged to 2 km spatial resolution showed good correspondence to Basal Emission Factor (BEF landcover datasets used to drive biogenic VOC (BVOC emission models. The surface flux of isoprene was close to zero over Central Valley crops and desert shrublands, but was very high (up to 15 mg m−2 h−1 above oak woodlands, with clear dependence of emissions on temperature and oak density. Isoprene concentrations of up to 8 ppb were observed at aircraft height on the hottest days and over the dominant source regions. While isoprene emissions from agricultural crop regions

  1. A Reconnecting Flux Rope Dynamo

    OpenAIRE

    Baggaley, Andrew W.; Barenghi, Carlo F.; Shukurov, Anvar; Subramanian, Kandaswamy

    2009-01-01

    We develop a new model of the fluctuation dynamo in which the magnetic field is confined in thin flux ropes advected by a multi-scale flow modeling turbulence. Magnetic dissipation occurs only via reconnection of the flux ropes. We investigate the kinetic energy release into heat, mediated by the dynamo action, both in our model and by solving the induction equation with the same flow. We find that a flux rope dynamo is an order of magnitude more efficient at converting mechanical energy into...

  2. Generalised Geometry and Flux Vacua

    CERN Document Server

    Larfors, Magdalena

    2015-01-01

    This note discusses the connection between generalised geometry and flux compactifications of string theory. Firstly, we explain in a pedestrian manner how the supersymmetry constraints of type II ${\\mathcal{N}}=1$ flux compactifications can be restated as integrability constraints on certain generalised complex structures. This reformulation uses generalised complex geometry, a mathematical framework that geometrizes the B-field. Secondly, we discuss how exceptional generalised geometry may provide a similar geometrization of the RR fields. Thirdly, we examine the connection between generalised geometry and non-geometry, and finally we present recent developments where generalised geometry is used to construct explicit examples of flux compactifications to flat space.

  3. Imposing strong constraints on tropical terrestrial CO2 fluxes using passenger aircraft based measurements

    Science.gov (United States)

    Niwa, Y.; Machida, T.; Sawa, Y.; Matsueda, H.; Schuck, T. J.; Brenninkmeijer, C. A.; Imasu, R.; Satoh, M.

    2011-12-01

    Better understanding of the global and regional carbon budget is needed to perform a reliable prediction of future climate with an earth system model. However, the reliability of CO2 source/sink estimation by inverse modeling, which is one of the promising methods to estimate regional carbon budget, is limited because of sparse observational data coverage. Very few observational data are available in tropics. Therefore, especially the reconstruction of tropical terrestrial fluxes has considerable uncertainties. In this study, regional CO2 fluxes for 2006-2008 are estimated by inverse modeling using the Comprehensive Observation Network for Trace gases by Airliner (CONTRAIL) in addition to the surface measurement dataset of GLOBALVIEW-CO2. CONTRAIL is a recently established CO2 measurement network using in-situ measurement instruments on board commercial aircraft. Five CONTRAIL aircraft travel back and forth between Japan and many areas: Europe, North America, Southeast Asia, South Asia, and Australia. The Bayesian synthesis approach is used to estimate monthly fluxes for 42 regions using NICAM-TM simulations with existing CO2 flux datasets and monthly mean observational data. It is demonstrated that the aircraft data have great impact on estimated tropical terrestrial fluxes. By adding the aircraft data to the surface data, the analyzed uncertainty of tropical fluxes has been reduced by 15 % and more than 30 % uncertainty reduction rate is found in Southeast and South Asia. Specifically, for annual net CO2 fluxes, nearly neutral fluxes of Indonesia, which is estimated using the surface dataset alone, turn to positive fluxes, i.e. carbon sources. In Indonesia, a remarkable carbon release during the severe drought period of October-December in 2006 is estimated, which suggests that biosphere respiration or biomass burning was larger than the prior fluxes. Comparison of the optimized atmospheric CO2 with independent aircraft measurements of CARIBIC tends to validate

  4. Validation of a Meteosat Second Generation solar radiation dataset over the northeastern Iberian Peninsula

    Directory of Open Access Journals (Sweden)

    J. Cristóbal

    2013-01-01

    Full Text Available Solar radiation plays a key role in the Earth's energy balance and is used as an essential input data in radiation-based evapotranspiration (ET models. Accurate gridded solar radiation data at high spatial and temporal resolution are needed to retrieve ET over large domains. In this work we present an evaluation at hourly, daily and monthly time steps and regional scale (Catalonia, NE Iberian Peninsula of a satellite-based solar radiation product developed by the Land Surface Analysis Satellite Application Facility (LSA SAF using data from the Meteosat Second Generation (MSG Spinning Enhanced Visible and Infrared Imager (SEVIRI. Product performance and accuracy were evaluated for datasets segmented into two terrain classes (flat and hilly areas and two atmospheric conditions (clear and cloudy sky, as well as for the full dataset as a whole. Evaluation against measurements made with ground-based pyranometers yielded good results in flat areas with an averaged model RMSE of 65 W m−2 (19%, 34 W m−2 (9.7% and 21 W m−2 (5.6%, for hourly, daily and monthly-averaged solar radiation and including clear and cloudy sky conditions and snow or ice cover. Hilly areas yielded intermediate results with an averaged model RMSE (root mean square error of 89 W m−2 (27%, 48 W m−2 (14.5% and 32 W m−2 (9.3%, for hourly, daily and monthly time steps, suggesting the need of further improvements (e.g., terrain corrections required for retrieving localized variability in solar radiation in these areas. According to the literature, the LSA SAF solar radiation product appears to have sufficient accuracy to serve as a useful and operative input to evaporative flux retrieval models.

  5. Winter climate change and sea ice-atmosphere interaction at high northern latitudes in ERA40 dataset

    Institute of Scientific and Technical Information of China (English)

    Liu Xiying

    2006-01-01

    Based on the reanalysis dataset ERA40 of European Center of Medium Range Weather Forcast (ECMWF), winter climate change and characteristics of sea ice-atmosphere interaction at high northern latitudes for recent several tens of years are analyzed. Superposed upon the background of global warming, the amplitude of temperature increase in winter at high northern latitudes is bigger and it exhibits different features in different regions. From the end of 1970 s, the Greenland Sea, the Barents Sea and most part of Euro-Asian continent and North American continent are getting warmer, whereas the Labrador Sea, the Greenland and the area around the Bering Strait are getting colder. Meanwhile, the sea level pressure in the central part of the northern polar region and the place where the climatic Icelandic low exist decreases, but in places farther southward it increases. Since the 1970 s, the sensible heat flux and latent heat flux sent to the atmosphere from the Greenland Sea and the Barents Sea has increased, this is mainly due to the reduction of sea ice concentration and the weakening of insulator and shield effect of the solid ice accordingly caused by the increase of air temperature. In sea ice free area of the Norwegian Sea, the sensible heat flux and latent heat flux sent to the atmosphere has reduced due to decrease of temperature and humidity differences between the air and the sea surface caused by increase of air temperature and humidity. In the Labrador Sea, due to decrease of air temperature and humidity and increase of temperature and humidity differences between the air and the sea surface accordingly, the sea gives more sensible heat flux and latent heat flux to the air. This will lead to the growth of sea ice extent there. The features of linear regression of sea level pressure, sea ice concentration and sum of sensible heat flux and latent heat flux toward time series of the leading mode of EOF expansion of surface air temperature are close to those of

  6. Adaptation of a resistive model to pesticide volatilization from plants at the field scale: Comparison with a dataset

    Science.gov (United States)

    Lichiheb, Nebila; Personne, Erwan; Bedos, Carole; Barriuso, Enrique

    2014-02-01

    Volatilization from plants is known to greatly contribute to pesticide emission into the atmosphere. Modeling would allow estimating this contribution, but few models are actually available because of our poor understanding of processes occurring at the leaf surface, competing with volatilization, and also because available datasets for validating models are lacking. The SURFATM-Pesticides model was developed to predict pesticide volatilization from plants. It is based on the concept of resistances and takes into account two processes competing with volatilization (leaf penetration and photodegradation). Model is here presented and simulated results are compared with the experimental dataset obtained at the field scale for two fungicides applied on wheat, fenpropidin and chlorothalonil. These fungicides were chosen because they are largely used, as well as because of their differentiated vapor pressures. The model simulates the energy balance and surface temperature which are in good agreement with the experimental data, using the climatic variables as inputs. The model also satisfactorily simulates the volatilization fluxes of chlorothalonil. In fact, by integrating estimated rate coefficients of leaf penetration and photodegradation for chlorothalonil giving in the literature, the volatilization fluxes were estimated to be 24.8 ng m-2 s-1 compared to 23.6 ng m-2 s-1 measured by the aerodynamic profile method during the first hours after application. At six days, the cumulated volatilization fluxes were estimated by the model to be 19 g ha-1 compared to 17.5 g ha-1 measured by the inverse modeling approach. However, due to the lack of data to estimate processes competing with volatilization for fenpropidin, the volatilization of this compound is still not well modeled yet. Thus the model confirms that processes competing with volatilization represent an important factor affecting pesticide volatilization from plants.

  7. Physics of Magnetic Flux Ropes

    CERN Document Server

    Priest, E R; Lee, L C

    1990-01-01

    The American Geophysical Union Chapman Conference on the Physics of Magnetic Flux Ropes was held at the Hamilton Princess Hotel, Hamilton, Bermuda on March 27–31, 1989. Topics discussed ranged from solar flux ropes, such as photospheric flux tubes, coronal loops and prominences, to flux ropes in the solar wind, in planetary ionospheres, at the Earth's magnetopause, in the geomagnetic tail and deep in the Earth's magnetosphere. Papers presented at that conference form the nucleus of this book, but the book is more than just a proceedings of the conference. We have solicited articles from all interested in this topic. Thus, there is some material in the book not discussed at the conference. Even in the case of papers presented at the conference, there is generally a much more detailed and rigorous presentation than was possible in the time allowed by the oral and poster presentations.

  8. High Flux Isotope Reactor (HFIR)

    Data.gov (United States)

    Federal Laboratory Consortium — The HFIR at Oak Ridge National Laboratory is a light-water cooled and moderated reactor that is the United States’ highest flux reactor-based neutron source. HFIR...

  9. What is flux balance analysis?

    OpenAIRE

    Orth, Jeffrey D.; Thiele, Ines; Palsson, Bernhard Ø

    2010-01-01

    Flux balance analysis is a mathematical approach for analyzing the flow of metabolites through a metabolic network. This primer covers the theoretical basis of the approach, several practical examples and a software toolbox for performing the calculations.

  10. Conical electromagnetic radiation flux concentrator

    Science.gov (United States)

    Miller, E. R.

    1972-01-01

    Concentrator provides method of concentrating a beam of electromagnetic radiation into a smaller beam, presenting a higher flux density. Smaller beam may be made larger by sending radiation through the device in the reverse direction.

  11. Specification of ROP flux shape

    Energy Technology Data Exchange (ETDEWEB)

    Min, Byung Joo [Korea Atomic Energy Research Institute, Taejon (Korea, Republic of); Gray, A. [Atomic Energy of Canada Ltd., Chalk River, ON (Canada)

    1997-06-01

    The CANDU 9 480/SEU core uses 0.9% SEU (Slightly Enriched Uranium) fuel. The use f SEU fuel enables the reactor to increase the radial power form factor from 0.865, which is typical in current natural uranium CANDU reactors, to 0.97 in the nominal CANDU 9 480/SEU core. The difference is a 12% increase in reactor power. An additional 5% increase can be achieved due to a reduced refuelling ripple. The channel power limits were also increased by 3% for a total reactor power increase of 20%. This report describes the calculation of neutron flux distributions in the CANDU 9 480/SEU core under conditions specified by the C and I engineers. The RFSP code was used to calculate of neutron flux shapes for ROP analysis. Detailed flux values at numerous potential detector sites were calculated for each flux shape. (author). 6 tabs., 70 figs., 4 refs.

  12. Flux Emergence at the Photosphere

    Science.gov (United States)

    Cheung, M. C. M.; Schüssler, M.; Moreno-Insertis, F.

    2006-12-01

    To model the emergence of magnetic fields at the photosphere, we carried out 3D magneto-hydrodynamics (MHD) simulations using the MURaM code. Our simulations take into account the effects of compressibility, energy exchange via radiative transfer and partial ionization in the equation of state. All these physical ingredients are essential for a proper treatment of the problem. In the simulations, an initially buoyant magnetic flux tube is embedded in the upper layers of the convection zone. We find that the interaction between the flux tube and the external flow field has an important influence on the emergent morphology of the magnetic field. Depending on the initial properties of the flux tube (e.g. field strength, twist, entropy etc.), the emergence process can also modify the local granulation pattern. The inclusion of radiative transfer allows us to directly compare the simulation results with real observations of emerging flux.

  13. Periodicities in photospheric magnetic flux

    Institute of Scientific and Technical Information of China (English)

    SONG; Wenbin; WANG; Jingxiu

    2006-01-01

    Magnetic field plays an important role in solar structure and activity. In principle, the determination of magnetic flux would provide the best general-purpose index of solar activity. Currently, the periodicity studies corresponding to photospheric magnetic flux (PMF) are very few possibly due to the absence of a uniform flux sequence. In this paper, by using 383 NSO/Kitt Peak magnetic synoptic charts we reconstruct a flux sequence from February 1975 to August 2003 and perform a relatively systemic periodicity analysis with two methods of the Scargle periodogram and the Morlet wavelet transform. As a result, four periods are found at around 1050, 500, 300 and 160 days. We analyze these periods' temporal variabilities in detail and discuss their respective origins briefly.

  14. UVIS G280 Flux Calibration

    Science.gov (United States)

    Bushouse, Howard

    2009-07-01

    Flux calibration, image displacement, and spectral trace of the UVIS G280 grism will be established using observations of the HST flux standard start GD71. Accompanying direct exposures will provide the image displacement measurements and wavelength zeropoints for dispersed exposures. The calibrations will be obtained at the central position of each CCD chip and at the center of the UVIS field. No additional field-dependent variations will be derived.

  15. Boundary fluxes for nonlocal diffusion

    Science.gov (United States)

    Cortazar, Carmen; Elgueta, Manuel; Rossi, Julio D.; Wolanski, Noemi

    We study a nonlocal diffusion operator in a bounded smooth domain prescribing the flux through the boundary. This problem may be seen as a generalization of the usual Neumann problem for the heat equation. First, we prove existence, uniqueness and a comparison principle. Next, we study the behavior of solutions for some prescribed boundary data including blowing up ones. Finally, we look at a nonlinear flux boundary condition.

  16. Sources of uncertainty in eddy covariance ozone flux measurements made by dry chemiluminescence fast response analysers

    Directory of Open Access Journals (Sweden)

    J. B. A. Muller

    2009-09-01

    Full Text Available Eddy covariance ozone flux measurements are the most direct way to estimate ozone removal near the surface. Over vegetated surfaces, high quality ozone fluxes are required to probe the underlying processes for which it is necessary to separate the flux into the components of stomatal and non-stomatal deposition. Detailed knowledge of the processes that control non-stomatal deposition is limited and more accurate ozone flux measurements are needed to quantify this component of the deposited flux. We present a systematic intercomparison study of eddy covariance ozone flux measurements made using two fast response dry chemiluminescence analysers. Ozone deposition was measured over a well characterised managed grassland near Edinburgh, Scotland, during August 2007. A data quality control procedure specific to these analysers is introduced. Absolute ozone fluxes were calculated based on the relative signals of the dry chemiluminescence analysers using three different calibration methods and the results are compared for both analysers. It is shown that the error in the fitted parameters required for the flux calculations provides a substantial source of uncertainty in the fluxes. The choice of the calculation method itself can also constitute an uncertainty in the flux as the calculated fluxes by the three methods do not agree within error at all times. This finding highlights the need for a consistent and rigorous approach for comparable data-sets, such as e.g. in flux networks. Ozone fluxes calculated by one of the methods were then used to compare the two analysers in more detail. This systematic analyser comparison reveals half-hourly flux values differing by up to a factor of two at times with the difference in mean hourly flux ranging from 0 to 23% with an error in the mean daily flux of ±12%. The comparison of analysers shows that the agreement in fluxes is excellent for some days but that there is an underlying uncertainty as a result of

  17. iMS2Flux – a high–throughput processing tool for stable isotope labeled mass spectrometric data used for metabolic flux analysis

    Directory of Open Access Journals (Sweden)

    Poskar C Hart

    2012-11-01

    Full Text Available Abstract Background Metabolic flux analysis has become an established method in systems biology and functional genomics. The most common approach for determining intracellular metabolic fluxes is to utilize mass spectrometry in combination with stable isotope labeling experiments. However, before the mass spectrometric data can be used it has to be corrected for biases caused by naturally occurring stable isotopes, by the analytical technique(s employed, or by the biological sample itself. Finally the MS data and the labeling information it contains have to be assembled into a data format usable by flux analysis software (of which several dedicated packages exist. Currently the processing of mass spectrometric data is time-consuming and error-prone requiring peak by peak cut-and-paste analysis and manual curation. In order to facilitate high-throughput metabolic flux analysis, the automation of multiple steps in the analytical workflow is necessary. Results Here we describe iMS2Flux, software developed to automate, standardize and connect the data flow between mass spectrometric measurements and flux analysis programs. This tool streamlines the transfer of data from extraction via correction tools to 13C-Flux software by processing MS data from stable isotope labeling experiments. It allows the correction of large and heterogeneous MS datasets for the presence of naturally occurring stable isotopes, initial biomass and several mass spectrometry effects. Before and after data correction, several checks can be performed to ensure accurate data. The corrected data may be returned in a variety of formats including those used by metabolic flux analysis software such as 13CFLUX, OpenFLUX and 13CFLUX2. Conclusion iMS2Flux is a versatile, easy to use tool for the automated processing of mass spectrometric data containing isotope labeling information. It represents the core framework for a standardized workflow and data processing. Due to its flexibility

  18. P fluxes and exotic branes

    Science.gov (United States)

    Lombardo, Davide M.; Riccioni, Fabio; Risoli, Stefano

    2016-12-01

    We consider the N = 1 superpotential generated in type-II orientifold models by non-geometric fluxes. In particular, we focus on the family of P fluxes, that are related by T-duality transformations to the S-dual of the Q flux. We determine the general rule that transforms a given flux in this family under a single T-duality transformation. This rule allows to derive a complete expression for the superpotential for both the IIA and the IIB theory for the particular case of a {T}^6/[{Z}_2× {Z}_2] orientifold. We then consider how these fluxes modify the generalised Bianchi identities. In particular, we derive a fully consistent set of quadratic constraints coming from the NS-NS Bianchi identities. On the other hand, the P flux Bianchi identities induce tadpoles, and we determine a set of exotic branes that can be consistently included in order to cancel them. This is achieved by determining a universal transformation rule under T-duality satisfied by all the branes in string theory.

  19. $P$ fluxes and exotic branes

    CERN Document Server

    Lombardo, Davide M; Risoli, Stefano

    2016-01-01

    We consider the ${\\cal N}=1$ superpotential generated in type-II orientifold models by non-geometric fluxes. In particular, we focus on the family of $P$ fluxes, that are related by T-duality transformations to the S-dual of the $Q$ flux. We determine the general rule that transforms a given flux in this family under a single T-duality transformation. This rule allows to derive a complete expression for the superpotential for both the IIA and the IIB theory for the particular case of a $T^6/[\\mathbb{Z}_2 \\times \\mathbb{Z}_2 ]$ orientifold. We then consider how these fluxes modify the generalised Bianchi identities. In particular, we derive a fully consistent set of quadratic constraints coming from the NS-NS Bianchi identities. On the other hand, the $P$ flux Bianchi identities induce tadpoles, and we determine a set of exotic branes that can be consistently included in order to cancel them. This is achieved by determining a universal transformation rule under T-duality satisfied by all the branes in string t...

  20. Data Mining Solar X-Ray Flares Triggered by Emerging Magnetic Flux

    Science.gov (United States)

    Loftus, Kaitlyn; Saar, Steven H.; Schanche, Nicole

    2017-01-01

    We investigate the association between emerging magnetic flux and solar X-ray flares to identify, and if possible quantify, distinguishing physical properties of flares triggered by flux emergence versus those triggered by other sources. Our study uses as its basis GOES-classified solar flares from March 2011 through June 2016 that have been identified by the Space Weather Prediction Center’s flare detection algorithm. The basic X-ray flare data is then enriched with data about related EUV-spectrum flares, emerging fluxes, active regions, eruptions, and sigmoids, which are all characterized by event-specific keywords, identified via SDO feature finding tools, and archived in the Heliophysics Events Knowledgebase (HEK). Using appropriate spatial and temporal parameters for each event type to determine association, we create a catalogue of solar events associated with each GOES-classified flare. After accounting for the primitive state of many of these event detection algorithms, we statistically analyze the compiled dataset to determine the effects of an emerging flux trigger on flare properties. A two-sample Kolmogorov-Smirnov test confirms with 99.9% confidence that flares triggered by emerging flux have a different peak flux distribution than non-emerging-flux-associated flares. We observe no linear or logarithmic correlations between flares’ and their associated emerging fluxes’ individual properties and find flares triggered by emerging flux are ~ 10% more likely to cause an eruption inside an active region while outside of an active region, the flare’s association with emerging flux has no effect on its likeliness to cause an eruption. We also compare the morphologies of the flares triggered by emerging flux and flares not via a superposed epoch analysis of lightcurves. Our results will be of interest for predicting flare behavior as a function of magnetic activity (where we can use enhanced rates of emerging flux as a proxy for heightened stellar

  1. Understanding COS Fluxes in a Boreal Forest: Towards COS-Based GPP Estimates.

    Science.gov (United States)

    Chen, H.; Kooijmans, L.; Franchin, A.; Keskinen, H.; Levula, J.; Mammarella, I.; Maseyk, K. S.; Pihlatie, M.; Praplan, A. P.; Seibt, U.; Sun, W.; Vesala, T.

    2015-12-01

    Carbonyl Sulfide (COS) is a promising new tracer that can be used to partition the Net Ecosystem Exchange into gross primary production (GPP) and respiration. COS and CO2 vegetation fluxes are closely related as these gases share the same diffusion pathway into stomata, which makes COS a potentially powerful tracer for GPP. While vegetative uptake is the largest sink of COS, the environmental drivers are poorly understood, and soil fluxes represent an important but relatively unconstrained component. Therefore, the realization of the COS tracer method requires proper characterization of both soil and ecosystem fluxes. A campaign to provide better constrained soil and ecosystem COS flux data for boreal forests took place in the summer of 2015 at the SMEAR II site in Hyytiälä, Finland. Eddy covariance flux measurements were made above the forest canopy on an Aerodyne continuous-wave quantum cascade laser (QCL) system that is capable of measuring COS, CO2, CO and H2O. Soil COS fluxes were obtained using modified LI-COR LI-8100 chambers together with high accuracy concentration measurements from another Aerodyne QCL instrument. The same instrument alternately measured concentrations in and above the canopy on a cycle through 4 heights, which will be used to calculate ecosystem fluxes using the Radon-tracer method, providing ecosystem fluxes under low-turbulent conditions. We will compare ecosystem fluxes from both eddy covariance and profile measurements and show estimates of the fraction of ecosystem fluxes attributed to the soil component. With the better understanding of ecosystem and soil COS fluxes, as obtained with this dataset, we will be able to derive COS-based GPP estimates for the Hyytiälä site.

  2. Geospatial datasets for watershed delineation and characterization used in the Hawaii StreamStats web application

    Science.gov (United States)

    Rea, Alan; Skinner, Kenneth D.

    2012-01-01

    The U.S. Geological Survey Hawaii StreamStats application uses an integrated suite of raster and vector geospatial datasets to delineate and characterize watersheds. The geospatial datasets used to delineate and characterize watersheds on the StreamStats website, and the methods used to develop the datasets are described in this report. The datasets for Hawaii were derived primarily from 10 meter resolution National Elevation Dataset (NED) elevation models, and the National Hydrography Dataset (NHD), using a set of procedures designed to enforce the drainage pattern from the NHD into the NED, resulting in an integrated suite of elevation-derived datasets. Additional sources of data used for computing basin characteristics include precipitation, land cover, soil permeability, and elevation-derivative datasets. The report also includes links for metadata and downloads of the geospatial datasets.

  3. Long-term Trend of Cold Air Mass Amount below a Designated Potential Temperature in Northern and Southern Hemisphere Winters with 7 Different Reanalysis Datasets

    Science.gov (United States)

    Kanno, Y.; Abdillah, M. R.; Iwasaki, T.

    2015-12-01

    This study addresses that the hemispheric total cold air mass amount defined below a threshold potential temperature of 280 K is a good indicator of the long-term trend of climate change in the polar region. We demonstrate quantitative analyses of warming trend in the Northern Hemisphere (NH) and Southern Hemisphere (SH) winters, using 7 different reanalysis datasets (JRA-55, JRA-55C, JRA-55AMIP, ERA-interim, CFSR, JRA-25, NCEP-NCAR). Hemispheric total cold air mass amount in the NH winter exhibit a statistically significant decreasing trend in all reanalysis datasets at a rate about -1.37 to -0.77% per decade over the period 1959-2012 and at a rate about -1.57 to -0.82% per decade over 1980-2012. There is no statistically significant trend in the equatorward cold air mass flux across latitude of 45N, which is an indicator for hemispheric-scale cold air outbreak, over the period 1980-2012 except for NCEP-NCAR reanalysis dataset which shows substantial decreasing trend of about -3.28% per decade. The spatial distribution of the long-term trend of cold air mass amount in the NH winter is almost consistent among reanalysis datasets except for JRA-55AMIP over the period 1980-2012. Cold air mass amount increases over Central Siberia, Kamchatka peninsula, and Bering Sea, while it decreases over Norwegian Sea, Barents Sea, Kara Sea, Greenland, Canada, Northern part of United States, and East Asia. In the SH winter, on the other hand, there is a large discrepancy in hemispheric total cold air mass amount and equatorward cold air mass flux across latitude of 50S over the period 1980-2010 among reanalysis datasets. This result indicate that there is a large uncertainty in the long-term trend of cold air mass amount in the SH winter.

  4. Designing the colorectal cancer core dataset in Iran

    Directory of Open Access Journals (Sweden)

    Sara Dorri

    2017-01-01

    Full Text Available Background: There is no need to explain the importance of collection, recording and analyzing the information of disease in any health organization. In this regard, systematic design of standard data sets can be helpful to record uniform and consistent information. It can create interoperability between health care systems. The main purpose of this study was design the core dataset to record colorectal cancer information in Iran. Methods: For the design of the colorectal cancer core data set, a combination of literature review and expert consensus were used. In the first phase, the draft of the data set was designed based on colorectal cancer literature review and comparative studies. Then, in the second phase, this data set was evaluated by experts from different discipline such as medical informatics, oncology and surgery. Their comments and opinion were taken. In the third phase refined data set, was evaluated again by experts and eventually data set was proposed. Results: In first phase, based on the literature review, a draft set of 85 data elements was designed. In the second phase this data set was evaluated by experts and supplementary information was offered by professionals in subgroups especially in treatment part. In this phase the number of elements totally were arrived to 93 numbers. In the third phase, evaluation was conducted by experts and finally this dataset was designed in five main parts including: demographic information, diagnostic information, treatment information, clinical status assessment information, and clinical trial information. Conclusion: In this study the comprehensive core data set of colorectal cancer was designed. This dataset in the field of collecting colorectal cancer information can be useful through facilitating exchange of health information. Designing such data set for similar disease can help providers to collect standard data from patients and can accelerate retrieval from storage systems.

  5. FTSPlot: fast time series visualization for large datasets.

    Directory of Open Access Journals (Sweden)

    Michael Riss

    Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.

  6. Determining similarity of scientific entities in annotation datasets.

    Science.gov (United States)

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/

  7. Can atmospheric reanalysis datasets be used to reproduce flood characteristics?

    Science.gov (United States)

    Andreadis, K.; Schumann, G.; Stampoulis, D.

    2014-12-01

    Floods are one of the costliest natural disasters and the ability to understand their characteristics and their interactions with population, land cover and climate changes is of paramount importance. In order to accurately reproduce flood characteristics such as water inundation and heights both in the river channels and floodplains, hydrodynamic models are required. Most of these models operate at very high resolutions and are computationally very expensive, making their application over large areas very difficult. However, a need exists for such models to be applied at regional to global scales so that the effects of climate change with regards to flood risk can be examined. We use the LISFLOOD-FP hydrodynamic model to simulate a 40-year history of flood characteristics at the continental scale, particularly over Australia. LISFLOOD-FP is a 2-D hydrodynamic model that solves the approximate Saint-Venant equations at large scales (on the order of 1 km) using a sub-grid representation of the river channel. This implementation is part of an effort towards a global 1-km flood modeling framework that will allow the reconstruction of a long-term flood climatology. The components of this framework include a hydrologic model (the widely-used Variable Infiltration Capacity model) and a meteorological dataset that forces it. In order to extend the simulated flood climatology to 50-100 years in a consistent manner, reanalysis datasets have to be used. The objective of this study is the evaluation of multiple atmospheric reanalysis datasets (ERA, NCEP, MERRA, JRA) as inputs to the VIC/LISFLOOD-FP model. Comparisons of the simulated flood characteristics are made with both satellite observations of inundation and a benchmark simulation of LISFLOOD-FP being forced by observed flows. Finally, the implications of the availability of a global flood modeling framework for producing flood hazard maps and disseminating disaster information are discussed.

  8. Rapid global fitting of large fluorescence lifetime imaging microscopy datasets.

    Directory of Open Access Journals (Sweden)

    Sean C Warren

    Full Text Available Fluorescence lifetime imaging (FLIM is widely applied to obtain quantitative information from fluorescence signals, particularly using Förster Resonant Energy Transfer (FRET measurements to map, for example, protein-protein interactions. Extracting FRET efficiencies or population fractions typically entails fitting data to complex fluorescence decay models but such experiments are frequently photon constrained, particularly for live cell or in vivo imaging, and this leads to unacceptable errors when analysing data on a pixel-wise basis. Lifetimes and population fractions may, however, be more robustly extracted using global analysis to simultaneously fit the fluorescence decay data of all pixels in an image or dataset to a multi-exponential model under the assumption that the lifetime components are invariant across the image (dataset. This approach is often considered to be prohibitively slow and/or computationally expensive but we present here a computationally efficient global analysis algorithm for the analysis of time-correlated single photon counting (TCSPC or time-gated FLIM data based on variable projection. It makes efficient use of both computer processor and memory resources, requiring less than a minute to analyse time series and multiwell plate datasets with hundreds of FLIM images on standard personal computers. This lifetime analysis takes account of repetitive excitation, including fluorescence photons excited by earlier pulses contributing to the fit, and is able to accommodate time-varying backgrounds and instrument response functions. We demonstrate that this global approach allows us to readily fit time-resolved fluorescence data to complex models including a four-exponential model of a FRET system, for which the FRET efficiencies of the two species of a bi-exponential donor are linked, and polarisation-resolved lifetime data, where a fluorescence intensity and bi-exponential anisotropy decay model is applied to the analysis

  9. Spatially-based quality control for daily precipitation datasets

    Science.gov (United States)

    Serrano-Notivoli, Roberto; de Luis, Martín; Beguería, Santiago; Ángel Saz, Miguel

    2016-04-01

    There are many reasons why wrong data can appear in original precipitation datasets but their common characteristic is that all of them do not correspond to the natural variability of the climate variable. For this reason, is necessary a comprehensive analysis of the data of each station in each day, to be certain that the final dataset will be consistent and reliable. Most of quality control techniques applied over daily precipitation are based on the comparison of each observed value with the rest of values in same series or in reference series built from its nearest stations. These methods are inherited from monthly precipitation studies, but in daily scale the variability is bigger and the methods have to be different. A common character shared by all of these approaches is that they made reconstructions based on the best-correlated reference series, which could be a biased decision because, for example, a extreme precipitation occurred in one day in more than one station could be flagged as erroneous. We propose a method based on the specific conditions of the day and location to determine the reliability of each observation. This method keeps the local variance of the variable and the time-structure independence. To do that, individually for each daily value, we first compute the probability of precipitation occurrence through a multivariate logistic regression using the 10 nearest observations in a binomial mode (0=dry; 1=wet), this produces a binomial prediction (PB) between 0 and 1. Then, we compute a prediction of precipitation magnitude (PM) with the raw data of the same 10 nearest observations. Through these predictions we explore the original data in each day and location by five criteria: 1) Suspect data; 2) Suspect zero; 3) Suspect outlier; 4) Suspect wet and 5) Suspect dry. Tests over different datasets addressed that flagged data depend mainly on the number of available data and the homogeneous distribution of them.

  10. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    Science.gov (United States)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  11. A Validation Dataset for CryoSat Sea Ice Investigators

    DEFF Research Database (Denmark)

    Julia, Gaudelli,; Baker, Steve; Haas, Christian

    Since its launch in April 2010 Cryosat has been collecting valuable sea ice data over the Arctic region. Over the same period ESA’s CryoVEx and NASA IceBridge validation campaigns have been collecting a unique set of coincident airborne measurements in the Arctic. The CryoVal-SI project has colla...... community. In this talk we will describe the composition of the validation dataset, summarising how it was processed and how to understand the content and format of the data. We will also explain how to access the data and the supporting documentation....

  12. Conformations of Macromolecules and their Complexes from Heterogeneous Datasets

    CERN Document Server

    Schwander, P; Ourmazd, A

    2014-01-01

    We describe a new generation of algorithms capable of mapping the structure and conformations of macromolecules and their complexes from large ensembles of heterogeneous snapshots, and demonstrate the feasibility of determining both discrete and continuous macromolecular conformational spectra. These algorithms naturally incorporate conformational heterogeneity without resort to sorting and classification, or prior knowledge of the type of heterogeneity present. They are applicable to single-particle diffraction and image datasets produced by X-ray lasers and cryo-electron microscopy, respectively, and particularly suitable for systems not easily amenable to purification or crystallization.

  13. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...... analyses to the chicken expression data led to different ranking of the Gene Ontology terms tested. A method for prediction of possible annotations was applied. Conclusion: Biological interpretation based on gene set analyses dependent on the statistical method used. Methods for predicting the possible...

  14. Agile data management for curation of genomes to watershed datasets

    Science.gov (United States)

    Varadharajan, C.; Agarwal, D.; Faybishenko, B.; Versteeg, R.

    2015-12-01

    A software platform is being developed for data management and assimilation [DMA] as part of the U.S. Department of Energy's Genomes to Watershed Sustainable Systems Science Focus Area 2.0. The DMA components and capabilities are driven by the project science priorities and the development is based on agile development techniques. The goal of the DMA software platform is to enable users to integrate and synthesize diverse and disparate field, laboratory, and simulation datasets, including geological, geochemical, geophysical, microbiological, hydrological, and meteorological data across a range of spatial and temporal scales. The DMA objectives are (a) developing an integrated interface to the datasets, (b) storing field monitoring data, laboratory analytical results of water and sediments samples collected into a database, (c) providing automated QA/QC analysis of data and (d) working with data providers to modify high-priority field and laboratory data collection and reporting procedures as needed. The first three objectives are driven by user needs, while the last objective is driven by data management needs. The project needs and priorities are reassessed regularly with the users. After each user session we identify development priorities to match the identified user priorities. For instance, data QA/QC and collection activities have focused on the data and products needed for on-going scientific analyses (e.g. water level and geochemistry). We have also developed, tested and released a broker and portal that integrates diverse datasets from two different databases used for curation of project data. The development of the user interface was based on a user-centered design process involving several user interviews and constant interaction with data providers. The initial version focuses on the most requested feature - i.e. finding the data needed for analyses through an intuitive interface. Once the data is found, the user can immediately plot and download data

  15. Visualization and data sharing of COSMIC radio occultation dataset

    Science.gov (United States)

    Ho, Y.; Weber, W. J.; Chastang, J.; Murray, D.; McWhirter, J.; Integrated Data Viewer

    2010-12-01

    Visualizing the trajectory and the sounding profile of the COSMIC netCDF dataset, and its evolution through time is developed in Unidata's Integrated data Viewer (IDV). The COSMIC radio occultation data is located in a remote data server called RAMADDA, which is a content management system for earth science data. The combination of these two software packages provides a powerful visualization and analysis tools for sharing real time and archived data for research and education. In this presentation we would like to demonstrate the development and the usage of these two software packages.

  16. SDCLIREF - A sub-daily gridded reference dataset

    Science.gov (United States)

    Wood, Raul R.; Willkofer, Florian; Schmid, Franz-Josef; Trentini, Fabian; Komischke, Holger; Ludwig, Ralf

    2017-04-01

    Climate change is expected to impact the intensity and frequency of hydrometeorological extreme events. In order to adequately capture and analyze extreme rainfall events, in particular when assessing flood and flash flood situations, data is required at high spatial and sub-daily resolution which is often not available in sufficient density and over extended time periods. The ClimEx project (Climate Change and Hydrological Extreme Events) addresses the alteration of hydrological extreme events under climate change conditions. In order to differentiate between a clear climate change signal and the limits of natural variability, unique Single-Model Regional Climate Model Ensembles (CRCM5 driven by CanESM2, RCP8.5) were created for a European and North-American domain, each comprising 50 members of 150 years (1951-2100). In combination with the CORDEX-Database, this newly created ClimEx-Ensemble is a one-of-a-kind model dataset to analyze changes of sub-daily extreme events. For the purpose of bias-correcting the regional climate model ensembles as well as for the baseline calibration and validation of hydrological catchment models, a new sub-daily (3h) high-resolution (500m) gridded reference dataset (SDCLIREF) was created for a domain covering the Upper Danube and Main watersheds ( 100.000km2). As the sub-daily observations lack a continuous time series for the reference period 1980-2010, the need for a suitable method to bridge the gap of the discontinuous time series arouse. The Method of Fragments (Sharma and Srikanthan (2006); Westra et al. (2012)) was applied to transform daily observations to sub-daily rainfall events to extend the time series and densify the station network. Prior to applying the Method of Fragments and creating the gridded dataset using rigorous interpolation routines, data collection of observations, operated by several institutions in three countries (Germany, Austria, Switzerland), and the subsequent quality control of the observations

  17. Dynamic Mode Decomposition for Large and Streaming Datasets

    CERN Document Server

    Hemati, Maziar S; Rowley, Clarence W

    2014-01-01

    We formulate a low-storage method for performing dynamic mode decomposition that can be updated inexpensively as new data become available; this formulation allows dynamical information to be extracted from large datasets and data streams. We present two algorithms: the first is mathematically equivalent to a standard "batch-processed" formulation; the second introduces a compression step that maintains computational efficiency, while enhancing the ability to isolate pertinent dynamical information from noisy measurements. Both algorithms reliably capture dominant fluid dynamic behaviors, as demonstrated on cylinder wake data collected from both direct numerical simulations and particle image velocimetry experiments

  18. Integrated dataset of screening hits against multiple neglected disease pathogens.

    Directory of Open Access Journals (Sweden)

    Solomon Nwaka

    2011-12-01

    Full Text Available New chemical entities are desperately needed that overcome the limitations of existing drugs for neglected diseases. Screening a diverse library of 10,000 drug-like compounds against 7 neglected disease pathogens resulted in an integrated dataset of 744 hits. We discuss the prioritization of these hits for each pathogen and the strong correlation observed between compounds active against more than two pathogens and mammalian cell toxicity. Our work suggests that the efficiency of early drug discovery for neglected diseases can be enhanced through a collaborative, multi-pathogen approach.

  19. Dataset concerning the analytical approximation of the Ae3 temperature

    Directory of Open Access Journals (Sweden)

    B.L. Ennis

    2017-02-01

    The dataset includes the terms of the function and the values for the polynomial coefficients for major alloying elements in steel. A short description of the approximation method used to derive and validate the coefficients has also been included. For discussion and application of this model, please refer to the full length article entitled “The role of aluminium in chemical and phase segregation in a TRIP-assisted dual phase steel” 10.1016/j.actamat.2016.05.046 (Ennis et al., 2016 [1].

  20. Landfills as critical infrastructures: analysis of observational datasets after 12 years of non-invasive monitoring

    Science.gov (United States)

    Scozzari, Andrea; Raco, Brunella; Battaglini, Raffaele

    2016-04-01

    This work presents the results of more than ten years of observations, performed on a regular basis, on a municipal solid waste disposal located in Italy. Observational data are generated by the combination of non-invasive techniques, involving the direct measurement of biogas release to the atmosphere and thermal infrared imaging. In fact, part of the generated biogas tends to escape from the landfill surface even when collecting systems are installed and properly working. Thus, methodologies for estimating the behaviour of a landfill system by means of direct and/or indirect measurement systems have been developed in the last decades. It is nowadays known that these infrastructures produce more than 20% of the total anthropogenic methane released to the atmosphere, justifying the need for a systematic and efficient monitoring of such infrastructures. During the last 12 years, observational data regarding a solid waste disposal site located in Tuscany (Italy) have been collected on a regular basis. The collected datasets consist in direct measurements of gas flux with the accumulation chamber method, combined with the detection of thermal anomalies by infrared radiometry. This work discusses the evolution of the estimated performance of the landfill system, its trends, the benefits and the critical aspects of such relatively long-term monitoring activity.

  1. Construction and Analysis of Long-Term Surface Temperature Dataset in Fujian Province

    Science.gov (United States)

    Li, W. E.; Wang, X. Q.; Su, H.

    2017-09-01

    Land surface temperature (LST) is a key parameter of land surface physical processes on global and regional scales, linking the heat fluxes and interactions between the ground and atmosphere. Based on MODIS 8-day LST products (MOD11A2) from the split-window algorithms, we constructed and obtained the monthly and annual LST dataset of Fujian Province from 2000 to 2015. Then, we analyzed the monthly and yearly time series LST data and further investigated the LST distribution and its evolution features. The average LST of Fujian Province reached the highest in July, while the lowest in January. The monthly and annual LST time series present a significantly periodic features (annual and interannual) from 2000 to 2015. The spatial distribution showed that the LST in North and West was lower than South and East in Fujian Province. With the rapid development and urbanization of the coastal area in Fujian Province, the LST in coastal urban region was significantly higher than that in mountainous rural region. The LST distributions might affected by the climate, topography and land cover types. The spatio-temporal distribution characteristics of LST could provide good references for the agricultural layout and environment monitoring in Fujian Province.

  2. Monitoring and simulation of water, heat,and CO2 fluxes in terrestrial ecosystems based on the APEIS-FLUX system

    Institute of Scientific and Technical Information of China (English)

    WATANABEMasataka; WANGQinxue; HAYASHISeiji; MURAKAMIShogo; LIUJiyuan; OUYANGZhu; LIYan; LIYingnian; WANGKelin

    2005-01-01

    The Integrated Environmental Monitoring (IEM) project, part of the Asia-Pacific Environmental Innovation Strategy (APEIS) project, developed an integrated environmental monitoring system that can be used to detect, monitor, and assess environmental disasters, degradation, and their impacts in the Asia-Pacific region. The system primarily employs data from the moderate resolution imaging spectrometer (MODIS) sensor on the Earth Observation System- (EOS-) Terra/Aqua satellite,as well as those from ground observations at five sites in different ecological systems in China. From the preliminary data analysis on both annual and daily variations of water, heat and COz fluxes, we can confirm that this system basically has been working well. The results show that both latent flux and CO2 flux are much greater in the crop field than those in the grassland and the saline desert, whereas the sensible heat flux shows the opposite trend. Different data products from MODIS have very different correspondence, e.g. MODIS-derived land surface temperature has a close correlation with measured ones, but LAI and NPP are quite different from ground measurements, which suggests that the algorithms used to process MODIS data need to be revised by using the local dataset. We are now using the APEIS-FLUX data to develop an integrated model, which can simulate the regional water,heat, and carbon fluxes. Finally, we are expected to use this model to develop more precise high-order MODIS products in Asia-Pacific region.

  3. Physics of magnetic flux tubes

    CERN Document Server

    Ryutova, Margarita

    2015-01-01

    This book is the first account of the physics of magnetic flux tubes from their fundamental properties to collective phenomena in an ensembles of flux tubes. The physics of magnetic flux tubes is absolutely vital for understanding fundamental physical processes in the solar atmosphere shaped and governed by magnetic fields. High-resolution and high cadence observations from recent space and  ground-based instruments taken simultaneously at different heights and temperatures not only show the ubiquity of filamentary structure formation but also allow to study how various events are interconnected by system of magnetic flux tubes. The book covers both theory and observations. Theoretical models presented in analytical and phenomenological forms are tailored for practical applications. These are welded with state-of-the-art observations from early decisive ones to the most recent data that open a new phase-space for exploring the Sun and sun-like stars. Concept of magnetic flux tubes is central to various magn...

  4. Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

    Directory of Open Access Journals (Sweden)

    Sai Kiranmayee Samudrala

    2015-01-01

    Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

  5. Using Benford's law to investigate Natural Hazard dataset homogeneity.

    Science.gov (United States)

    Joannes-Boyau, Renaud; Bodin, Thomas; Scheffers, Anja; Sambridge, Malcolm; May, Simon Matthias

    2015-07-09

    Working with a large temporal dataset spanning several decades often represents a challenging task, especially when the record is heterogeneous and incomplete. The use of statistical laws could potentially overcome these problems. Here we apply Benford's Law (also called the "First-Digit Law") to the traveled distances of tropical cyclones since 1842. The record of tropical cyclones has been extensively impacted by improvements in detection capabilities over the past decades. We have found that, while the first-digit distribution for the entire record follows Benford's Law prediction, specific changes such as satellite detection have had serious impacts on the dataset. The least-square misfit measure is used as a proxy to observe temporal variations, allowing us to assess data quality and homogeneity over the entire record, and at the same time over specific periods. Such information is crucial when running climatic models and Benford's Law could potentially be used to overcome and correct for data heterogeneity and/or to select the most appropriate part of the record for detailed studies.

  6. Publicly Releasing a Large Simulation Dataset with NDS Labs

    Science.gov (United States)

    Goldbaum, Nathan

    2016-03-01

    Optimally, all publicly funded research should be accompanied by the tools, code, and data necessary to fully reproduce the analysis performed in journal articles describing the research. This ideal can be difficult to attain, particularly when dealing with large (>10 TB) simulation datasets. In this lightning talk, we describe the process of publicly releasing a large simulation dataset to accompany the submission of a journal article. The simulation was performed using Enzo, an open source, community-developed N-body/hydrodynamics code and was analyzed using a wide range of community- developed tools in the scientific Python ecosystem. Although the simulation was performed and analyzed using an ecosystem of sustainably developed tools, we enable sustainable science using our data by making it publicly available. Combining the data release with the NDS Labs infrastructure allows a substantial amount of added value, including web-based access to analysis and visualization using the yt analysis package through an IPython notebook interface. In addition, we are able to accompany the paper submission to the arXiv preprint server with links to the raw simulation data as well as interactive real-time data visualizations that readers can explore on their own or share with colleagues during journal club discussions. It is our hope that the value added by these services will substantially increase the impact and readership of the paper.

  7. Predicting weather regime transitions in Northern Hemisphere datasets

    Energy Technology Data Exchange (ETDEWEB)

    Kondrashov, D. [University of California, Department of Atmospheric and Oceanic Sciences and Institute of Geophysics and Planetary Physics, Los Angeles, CA (United States); Shen, J. [UCLA, Department of Statistics, Los Angeles, CA (United States); Berk, R. [UCLA, Department of Statistics, Los Angeles, CA (United States); University of Pennsylvania, Department of Criminology, Philadelphia, PA (United States); D' Andrea, F.; Ghil, M. [Ecole Normale Superieure, Departement Terre-Atmosphere-Ocean and Laboratoire de Meteorologie Dynamique (CNRS and IPSL), Paris Cedex 05 (France)

    2007-10-15

    A statistical learning method called random forests is applied to the prediction of transitions between weather regimes of wintertime Northern Hemisphere (NH) atmospheric low-frequency variability. A dataset composed of 55 winters of NH 700-mb geopotential height anomalies is used in the present study. A mixture model finds that the three Gaussian components that were statistically significant in earlier work are robust; they are the Pacific-North American (PNA) regime, its approximate reverse (the reverse PNA, or RNA), and the blocked phase of the North Atlantic Oscillation (BNAO). The most significant and robust transitions in the Markov chain generated by these regimes are PNA {yields} BNAO, PNA {yields} RNA and BNAO {yields} PNA. The break of a regime and subsequent onset of another one is forecast for these three transitions. Taking the relative costs of false positives and false negatives into account, the random-forests method shows useful forecasting skill. The calculations are carried out in the phase space spanned by a few leading empirical orthogonal functions of dataset variability. Plots of estimated response functions to a given predictor confirm the crucial influence of the exit angle on a preferred transition path. This result points to the dynamic origin of the transitions. (orig.)

  8. Datasets for radiation network algorithm development and testing

    Energy Technology Data Exchange (ETDEWEB)

    Rao, Nageswara S [ORNL; Sen, Satyabrata [ORNL; Berry, M. L.. [New Jersey Institute of Technology; Wu, Qishi [University of Memphis; Grieme, M. [New Jersey Institute of Technology; Brooks, Richard R [ORNL; Cordone, G. [Clemson University

    2016-01-01

    Domestic Nuclear Detection Office s (DNDO) Intelligence Radiation Sensors Systems (IRSS) program supported the development of networks of commercial-off-the-shelf (COTS) radiation counters for detecting, localizing, and identifying low-level radiation sources. Under this program, a series of indoor and outdoor tests were conducted with multiple source strengths and types, different background profiles, and various types of source and detector movements. Following the tests, network algorithms were replayed in various re-constructed scenarios using sub-networks. These measurements and algorithm traces together provide a rich collection of highly valuable datasets for testing the current and next generation radiation network algorithms, including the ones (to be) developed by broader R&D communities such as distributed detection, information fusion, and sensor networks. From this multiple TeraByte IRSS database, we distilled out and packaged the first batch of canonical datasets for public release. They include measurements from ten indoor and two outdoor tests which represent increasingly challenging baseline scenarios for robustly testing radiation network algorithms.

  9. Influence of reanalysis datasets on dynamically downscaling the recent past

    Science.gov (United States)

    Moalafhi, Ditiro B.; Evans, Jason P.; Sharma, Ashish

    2017-08-01

    Multiple reanalysis datasets currently exist that can provide boundary conditions for dynamic downscaling and simulating local hydro-climatic processes at finer spatial and temporal resolutions. Previous work has suggested that there are two reanalyses alternatives that provide the best lateral boundary conditions for downscaling over southern Africa. This study dynamically downscales these reanalyses (ERA-I and MERRA) over southern Africa to a high resolution (10 km) grid using the WRF model. Simulations cover the period 1981-2010. Multiple observation datasets were used for both surface temperature and precipitation to account for observational uncertainty when assessing results. Generally, temperature is simulated quite well, except over the Namibian coastal plain where the simulations show anomalous warm temperature related to the failure to propagate the influence of the cold Benguela current inland. Precipitation tends to be overestimated in high altitude areas, and most of southern Mozambique. This could be attributed to challenges in handling complex topography and capturing large-scale circulation patterns. While MERRA driven WRF exhibits slightly less bias in temperature especially for La Nina years, ERA-I driven simulations are on average superior in terms of RMSE. When considering multiple variables and metrics, ERA-I is found to produce the best simulation of the climate over the domain. The influence of the regional model appears to be large enough to overcome the small difference in relative errors present in the lateral boundary conditions derived from these two reanalyses.

  10. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Directory of Open Access Journals (Sweden)

    Seyhan Yazar

    Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  11. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets.

    Science.gov (United States)

    Li, Lianwei; Ma, Zhanshan Sam

    2016-08-16

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health-the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples, we discovered that only 49 communities (less than 1%) satisfied the neutral theory, and concluded that human microbial communities are not neutral in general. The 49 positive cases, although only a tiny minority, do demonstrate the existence of neutral processes. We realize that the traditional doctrine of microbial biogeography "Everything is everywhere, but the environment selects" first proposed by Baas-Becking resolves the apparent contradiction. The first part of Baas-Becking doctrine states that microbes are not dispersal-limited and therefore are neutral prone, and the second part reiterates that the freely dispersed microbes must endure selection by the environment. Therefore, in most cases, it is the host environment that ultimately shapes the community assembly and tip the human microbiome to niche regime.

  12. BLAST-EXPLORER helps you building datasets for phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2010-01-01

    Full Text Available Abstract Background The right sampling of homologous sequences for phylogenetic or molecular evolution analyses is a crucial step, the quality of which can have a significant impact on the final interpretation of the study. There is no single way for constructing datasets suitable for phylogenetic analysis, because this task intimately depends on the scientific question we want to address, Moreover, database mining softwares such as BLAST which are routinely used for searching homologous sequences are not specifically optimized for this task. Results To fill this gap, we designed BLAST-Explorer, an original and friendly web-based application that combines a BLAST search with a suite of tools that allows interactive, phylogenetic-oriented exploration of the BLAST results and flexible selection of homologous sequences among the BLAST hits. Once the selection of the BLAST hits is done using BLAST-Explorer, the corresponding sequence can be imported locally for external analysis or passed to the phylogenetic tree reconstruction pipelines available on the Phylogeny.fr platform. Conclusions BLAST-Explorer provides a simple, intuitive and interactive graphical representation of the BLAST results and allows selection and retrieving of the BLAST hit sequences based a wide range of criterions. Although BLAST-Explorer primarily aims at helping the construction of sequence datasets for further phylogenetic study, it can also be used as a standard BLAST server with enriched output. BLAST-Explorer is available at http://www.phylogeny.fr

  13. Digital Astronaut Photography: A Discovery Dataset for Archaeology

    Science.gov (United States)

    Stefanov, William L.

    2010-01-01

    Astronaut photography acquired from the International Space Station (ISS) using commercial off-the-shelf cameras offers a freely-accessible source for high to very high resolution (4-20 m/pixel) visible-wavelength digital data of Earth. Since ISS Expedition 1 in 2000, over 373,000 images of the Earth-Moon system (including land surface, ocean, atmospheric, and lunar images) have been added to the Gateway to Astronaut Photography of Earth online database (http://eol.jsc.nasa.gov ). Handheld astronaut photographs vary in look angle, time of acquisition, solar illumination, and spatial resolution. These attributes of digital astronaut photography result from a unique combination of ISS orbital dynamics, mission operations, camera systems, and the individual skills of the astronaut. The variable nature of astronaut photography makes the dataset uniquely useful for archaeological applications in comparison with more traditional nadir-viewing multispectral datasets acquired from unmanned orbital platforms. For example, surface features such as trenches, walls, ruins, urban patterns, and vegetation clearing and regrowth patterns may be accentuated by low sun angles and oblique viewing conditions (Fig. 1). High spatial resolution digital astronaut photographs can also be used with sophisticated land cover classification and spatial analysis approaches like Object Based Image Analysis, increasing the potential for use in archaeological characterization of landscapes and specific sites.

  14. Multiresolution comparison of precipitation datasets for large-scale models

    Science.gov (United States)

    Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

    2014-12-01

    Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.

  15. Standardized dataset health services: Part 2--top to bottom.

    Science.gov (United States)

    Galemore, Cynthia A; Maughan, Erin D

    2014-07-01

    It is critical for school nurses to promote and educate others on what they do. Data can help shape the message into understandable language across education and health. While Part 1 of this article discusses NASN's progress on identifying a standardized dataset for school health services, Part 2 focuses on the analysis and sharing of data at the local level. Examples of how to use the data to improve practice and create change are included. Guidance is provided in creating and sharing data as part of an annual report as a final step in advocating for school health services commensurate with student health needs. As the work on an evidence-based uniform dataset continues at the national level, what should be the response at the local level? Do we wait, or do we continue to collect certain data? The purpose of Part 2 of this article is to describe how data being collected locally illustrate health trends, benchmarking, and school nursing outcomes and can be compiled and shared in an annual report.

  16. Comprehensive comparison of large-scale tissue expression datasets

    Directory of Open Access Journals (Sweden)

    Alberto Santos

    2015-06-01

    Full Text Available For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org, which makes all the scored and integrated data available through a single user-friendly web interface.

  17. Unsupervised verification of laser-induced breakdown spectroscopy dataset clustering

    Science.gov (United States)

    Wójcik, Michał R.; Zdunek, Rafał; Antończak, Arkadiusz J.

    2016-12-01

    Laser-induced breakdown spectroscopy is a versatile, optical technique used in a wide range of qualitative and quantitative analyses conducted with the use of various chemometric techniques. The aim of this research is to demonstrate the possibility of unsupervised clustering of an unknown dataset using K-means clustering algorithm, and verifying its input parameters through investigating generalized eigenvalues derived with linear discriminant analysis. In all the cases, principal component analyses have been applied to reduce data dimensionality and shorten computation time of the whole operation. The experiment was conducted on a dataset collected from twenty four different materials divided into six groups: metals, semiconductors, ceramics, rocks, metal alloys and others with the use of a three-channel spectrometer (298.02-628.73nm overall spectral range) and a UV (248nm) excimer laser. Additionally, two more complex groups containing all specimens and all specimens excluding rocks were created. The resulting spaces of eigenvalues were calculated for every group and three different distances in the multidimensional space (cosine, square Euclidean and L1). As expected, the correct numbers of specimens within groups with small deviations were obtained, and the validity of the unsupervised method has thus been proven.

  18. The Path from Large Earth Science Datasets to Information

    Science.gov (United States)

    Vicente, G. A.

    2013-12-01

    The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.

  19. Investigating uncertainties in global gridded datasets of climate extremes

    Directory of Open Access Journals (Sweden)

    R. J. H. Dunn

    2014-05-01

    Full Text Available We assess the effects of different methodological choices made during the construction of gridded datasets of climate extremes, focusing primarily on HadEX2. Using global timeseries of the indices and their coverage, as well as uncertainty maps, we show that the choices which have the greatest effect are those relating to the station network used or which drastically change the values for individual grid boxes. The latter are most affected by the number of stations required in or around a grid box and the gridding method used. Most parametric changes have a small impact, on global and on grid box scales, whereas structural changes to the methods or input station networks may have large effects. On grid box scales, trends in temperature indices are very robust to most choices, especially in areas which have high station density (e.g. North America, Europe and Asia. Precipitation trends, being less spatially coherent, can be more susceptible to methodological changes, but are still clear in regions of high station density. Regional trends from all indices derived from areas with few stations should be treated with care. On a global scale, the linear trends over 1951–2010 from almost all choices fall within the statistical range of trends from HadEX2. This demonstrates the robust nature of HadEX2 and related datasets to choices in the creation method.

  20. Parton distributions based on a maximally consistent dataset

    CERN Document Server

    Rojo, Juan

    2014-01-01

    The choice of data that enters a global QCD analysis can have a substantial impact on the resulting parton distributions and their predictions for collider observables. One of the main reasons for this has to do with the possible presence of inconsistencies, either internal within an experiment or external between different experiments. In order to assess the robustness of the global fit, different definitions of a conservative PDF set, that is, a PDF set based on a maximally consistent dataset, have been introduced. However, these approaches are typically affected by theory biases in the selection of the dataset. In this contribution, after a brief overview of recent NNPDF developments, we propose a new, fully objective, definition of a conservative PDF set, based on the Bayesian reweighting approach. Using the new NNPDF3.0 framework, we produce various conservative sets, which turn out to be mutually in agreement within the respective PDF uncertainties, as well as with the global fit. We explore some of the...

  1. Reliability of brain volume measurements: a test-retest dataset.

    Science.gov (United States)

    Maclaren, Julian; Han, Zhaoying; Vos, Sjoerd B; Fischbein, Nancy; Bammer, Roland

    2014-01-01

    Evaluation of neurodegenerative disease progression may be assisted by quantification of the volume of structures in the human brain using magnetic resonance imaging (MRI). Automated segmentation software has improved the feasibility of this approach, but often the reliability of measurements is uncertain. We have established a unique dataset to assess the repeatability of brain segmentation and analysis methods. We acquired 120 T1-weighted volumes from 3 subjects (40 volumes/subject) in 20 sessions spanning 31 days, using the protocol recommended by the Alzheimer's Disease Neuroimaging Initiative (ADNI). Each subject was scanned twice within each session, with repositioning between the two scans, allowing determination of test-retest reliability both within a single session (intra-session) and from day to day (inter-session). To demonstrate the application of the dataset, all 3D volumes were processed using FreeSurfer v5.1. The coefficient of variation of volumetric measurements was between 1.6% (caudate) and 6.1% (thalamus). Inter-session variability exceeded intra-session variability for lateral ventricle volume (P<0.0001), indicating that ventricle volume in the subjects varied between days.

  2. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  3. FTSPlot: fast time series visualization for large datasets.

    Science.gov (United States)

    Riss, Michael

    2014-01-01

    The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N)); the visualization itself can be done with a complexity of O(1) and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with visualization method for long-term electrophysiological experiments.

  4. Comparison of LDA and SPRT on Clinical Dataset Classifications.

    Science.gov (United States)

    Lee, Chih; Nkounkou, Brittany; Huang, Chun-Hsi

    2011-04-19

    In this work, we investigate the well-known classification algorithm LDA as well as its close relative SPRT. SPRT affords many theoretical advantages over LDA. It allows specification of desired classification error rates α and β and is expected to be faster in predicting the class label of a new instance. However, SPRT is not as widely used as LDA in the pattern recognition and machine learning community. For this reason, we investigate LDA, SPRT and a modified SPRT (MSPRT) empirically using clinical datasets from Parkinson's disease, colon cancer, and breast cancer. We assume the same normality assumption as LDA and propose variants of the two SPRT algorithms based on the order in which the components of an instance are sampled. Leave-one-out cross-validation is used to assess and compare the performance of the methods. The results indicate that two variants, SPRT-ordered and MSPRT-ordered, are superior to LDA in terms of prediction accuracy. Moreover, on average SPRT-ordered and MSPRT-ordered examine less components than LDA before arriving at a decision. These advantages imply that SPRT-ordered and MSPRT-ordered are the preferred algorithms over LDA when the normality assumption can be justified for a dataset.

  5. Efficient Reachability Query Evaluation in Large Spatiotemporal Contact Datasets

    CERN Document Server

    Shirani-Mehr, Houtan; Shahabi, Cyrus

    2012-01-01

    With the advent of reliable positioning technologies and prevalence of location-based services, it is now feasible to accurately study the propagation of items such as infectious viruses, sensitive information pieces, and malwares through a population of moving objects, e.g., individuals, mobile devices, and vehicles. In such application scenarios, an item passes between two objects when the objects are sufficiently close (i.e., when they are, so-called, in contact), and hence once an item is initiated, it can penetrate the object population through the evolving network of contacts among objects, termed contact network. In this paper, for the first time we define and study reachability queries in large (i.e., disk-resident) contact datasets which record the movement of a (potentially large) set of objects moving in a spatial environment over an extended time period. A reachability query verifies whether two objects are "reachable" through the evolving contact network represented by such contact datasets. We p...

  6. Utilizing the Antarctic Master Directory to find orphan datasets

    Science.gov (United States)

    Bonczkowski, J.; Carbotte, S. M.; Arko, R. A.; Grebas, S. K.

    2011-12-01

    While most Antarctic data are housed at an established disciplinary-specific data repository, there are data types for which no suitable repository exists. In some cases, these "orphan" data, without an appropriate national archive, are served from local servers by the principal investigators who produced the data. There are many pitfalls with data served privately, including the frequent lack of adequate documentation to ensure the data can be understood by others for re-use and the impermanence of personal web sites. For example, if an investigator leaves an institution and the data moves, the link published is no longer accessible. To ensure continued availability of data, submission to long-term national data repositories is needed. As stated in the National Science Foundation Office of Polar Programs (NSF/OPP) Guidelines and Award Conditions for Scientific Data, investigators are obligated to submit their data for curation and long-term preservation; this includes the registration of a dataset description into the Antarctic Master Directory (AMD), http://gcmd.nasa.gov/Data/portals/amd/. The AMD is a Web-based, searchable directory of thousands of dataset descriptions, known as DIF records, submitted by scientists from over 20 countries. It serves as a node of the International Directory Network/Global Change Master Directory (IDN/GCMD). The US Antarctic Program Data Coordination Center (USAP-DCC), http://www.usap-data.org/, funded through NSF/OPP, was established in 2007 to help streamline the process of data submission and DIF record creation. When data does not quite fit within any existing disciplinary repository, it can be registered within the USAP-DCC as the fallback data repository. Within the scope of the USAP-DCC we undertook the challenge of discovering and "rescuing" orphan datasets currently registered within the AMD. In order to find which DIF records led to data served privately, all records relating to US data within the AMD were parsed. After

  7. Disequilibrium of 13CO2 fluxes between photosynthesis and respiration in North American temperate forest biomes

    Science.gov (United States)

    Lai, C.; Ehleringer, J.; Schauer, A.; Tans, P.; Hollinger, D.; Paw U, K.; Wofsy, S.

    2003-12-01

    We report the first weekly dataset of seasonal and interannual variability in δ 13C of CO2 fluxes from dominant forest ecosystems in the US. We observed large variations in the δ 13C of respired biosphere-atmosphere fluxes (δ 13CR) across 3 temperate coniferous and deciduous forest ecosystems (-24.9 +/- 0.4 to -31.3 +/- 0.6 per mil). Values of δ 13CR were significantly correlated with growing-season soil water availability. By analyzing daytime flask measurements collected at the top of canopies, we estimated an annual mean, flux-weighted δ 13C of net ecosystem CO2 exchange fluxes (δ 13Cnet). Combining δ 13CR and δ 13Cnet, along with eddy-covariance measured fluxes, we estimated regional discrimination against 13C during photosynthesis (Δ A) for these 3 forest ecosystems. Our approach allows for examination of the interannual correlations between gross primary production fluxes and Δ A that could potentially modulate atmospheric 13C budget. The results showed that C3 forests in temperate regions in the U.S. exhibited a slight isotopic disequilibrium ( 3 per mil). Such subtle isotopic disequilibrium however, when associated with enormous one-way gross fluxes, can effectively affect atmospheric 13C budget.

  8. A Reconnecting Flux Rope Dynamo

    CERN Document Server

    Baggaley, Andrew W; Shukurov, Anvar; Subramanian, Kandaswamy

    2009-01-01

    We develop a new model of the fluctuation dynamo in which the magnetic field is confined in thin flux ropes advected by a multi-scale flow modeling turbulence. Magnetic dissipation occurs only via reconnection of the flux ropes. We investigate the kinetic energy release into heat, mediated by the dynamo action, both in our model and by solving the induction equation with the same flow. We find that a flux rope dynamo is an order of magnitude more efficient at converting mechanical energy into heat. The probability density of the magnetic energy release in reconnections has a power-law form with the slope -3, consistent with the Solar corona heating by nanoflares.

  9. Reconnecting flux-rope dynamo

    Science.gov (United States)

    Baggaley, Andrew W.; Barenghi, Carlo F.; Shukurov, Anvar; Subramanian, Kandaswamy

    2009-11-01

    We develop a model of the fluctuation dynamo in which the magnetic field is confined to thin flux ropes advected by a multiscale model of turbulence. Magnetic dissipation occurs only via reconnection of the flux ropes. This model can be viewed as an implementation of the asymptotic limit Rm→∞ for a continuous magnetic field, where magnetic dissipation is strongly localized to small regions of strong-field gradients. We investigate the kinetic-energy release into heat mediated by the dynamo action, both in our model and by solving the induction equation with the same flow. We find that a flux-rope dynamo is an order of magnitude more efficient at converting mechanical energy into heat. The probability density of the magnetic energy release in reconnections has a power-law form with the slope -3 , consistent with the solar corona heating by nanoflares.

  10. Reconnecting flux-rope dynamo.

    Science.gov (United States)

    Baggaley, Andrew W; Barenghi, Carlo F; Shukurov, Anvar; Subramanian, Kandaswamy

    2009-11-01

    We develop a model of the fluctuation dynamo in which the magnetic field is confined to thin flux ropes advected by a multiscale model of turbulence. Magnetic dissipation occurs only via reconnection of the flux ropes. This model can be viewed as an implementation of the asymptotic limit R_{m}-->infinity for a continuous magnetic field, where magnetic dissipation is strongly localized to small regions of strong-field gradients. We investigate the kinetic-energy release into heat mediated by the dynamo action, both in our model and by solving the induction equation with the same flow. We find that a flux-rope dynamo is an order of magnitude more efficient at converting mechanical energy into heat. The probability density of the magnetic energy release in reconnections has a power-law form with the slope -3 , consistent with the solar corona heating by nanoflares.

  11. BVOC fluxes above mountain grassland

    Directory of Open Access Journals (Sweden)

    I. Bamberger

    2010-05-01

    Full Text Available Grasslands comprise natural tropical savannah over managed temperate fields to tundra and cover one quarter of the Earth's land surface. Plant growth, maintenance and decay result in volatile organic compound (VOCs emissions to the atmosphere. Furthermore, biogenic VOCs (BVOCs are emitted as a consequence of various environmental stresses including cutting and drying during harvesting. Fluxes of BVOCs were measured with a proton-transfer-reaction-mass-spectrometer (PTR-MS over temperate mountain grassland in Stubai Valley (Tyrol, Austria over one growing season (2008. VOC fluxes were calculated from the disjunct PTR-MS data using the virtual disjunct eddy covariance method and the gap filling method. Methanol fluxes obtained with the two independent flux calculation methods were highly correlated (y = 0.95×−0.12, R2 = 0.92. Methanol showed strong daytime emissions throughout the growing season – with maximal values of 9.7 nmol m−2 s−1, methanol fluxes from the growing grassland were considerably higher at the beginning of the growing season in June compared to those measured during October (2.5 nmol m−2 s−1. Methanol was the only component that exhibited consistent fluxes during the entire growing periods of the grass. The cutting and drying of the grass increased the emissions of methanol to up to 78.4 nmol m−2 s−1. In addition, emissions of acetaldehyde (up to 11.0 nmol m−2 s−1, and hexenal (leaf aldehyde, up to 8.6 nmol m−2 s−1 were detected during/after harvesting.

  12. Where is the Open Flux?

    Science.gov (United States)

    Linker, Jon A.; Downs, Cooper; Caplan, Ronald M.; Lionello, Roberto; Mikic, Zoran; Riley, Pete; Henney, Carl John; Arge, Charles; Owens, Matthew

    2017-08-01

    The Sun’s magnetic field has been observed in the photosphere from ground- and space-based observatories for many years. Global maps of the solar magnetic field based on full disk magnetograms (either built up over a solar rotation, or evolved using flux transport models) are commonly used as boundary conditions for coronal and solar wind models. Maps from different observatories typically agree qualitatively but often disagree quantitatively. Estimation of the coronal/solar wind physics can range from potential field source surface (PFSS) models with empirical prescriptions to magnetohydrodynamic (MHD) models with realistic energy transport and sub-grid scale descriptions of heating and acceleration. Two primary observational constraints on the models are (1) The open field regions in the model should approximately correspond to coronal holes observed in emission, and (2) the magnitude of the open magnetic flux in the model should match that inferred from in situ spacecraft measurements. We have investigated the July 2010 time period, using PFSS and MHD models computed using several available magnetic maps, coronal hole boundaries detected from STEREO and SDO EUV observations, and estimates of the interplanetary magnetic flux from in situ ACE measurements. We show that for all the model/map combinations, models that agree for (1) underestimate the interplanetary magnetic flux, or, conversely, for models to match (2), the modeled open field regions are larger than observed coronal holes. Alternatively, we estimate the open magnetic flux entirely from solar observations by combining detected coronal hole boundaries with observatory synoptic magnetic maps, and show that this method also underestimates the interplanetary magnetic flux. We discuss possible resolutions.Research supported by NASA, AFOSR, and NSF.

  13. Flux attenuation at NREL's High-Flux Solar Furnace

    Science.gov (United States)

    Bingham, Carl E.; Scholl, Kent L.; Lewandowski, Allan A.

    1994-10-01

    The High-Flux Solar Furnace (HFSF) at the National Renewable Energy Laboratory (NREL) has a faceted primary concentrator and a long focal-length-to-diameter ratio (due to its off-axis design). Each primary facet can be aimed individually to produce different flux distributions at the target plane. Two different types of attenuators are used depending on the flux distribution. A sliding-plate attenuator is used primarily when the facets are aimed at the same target point. The alternate attenuator resembles a venetian blind. Both attenuators are located between the concentrator and the focal point. The venetian-blind attenuator is primarily used to control the levels of sunlight failing on a target when the primary concentrators are not focused to a single point. This paper will demonstrate the problem of using the sliding-plate attenuator with a faceted concentrator when the facets are not aimed at the same target point. We will show that although the alternate attenuator necessarily blocks a certain amount of incoming sunlight, even when fully open, it provides a more even attenuation of the flux for alternate aiming strategies.

  14. Estimating Annual CO2 Flux for Lutjewad Station Using Three Different Gap-Filling Techniques

    Directory of Open Access Journals (Sweden)

    Carmelia M. Dragomir

    2012-01-01

    Full Text Available Long-term measurements of CO2 flux can be obtained using the eddy covariance technique, but these datasets are affected by gaps which hinder the estimation of robust long-term means and annual ecosystem exchanges. We compare results obtained using three gap-fill techniques: multiple regression (MR, multiple imputation (MI, and artificial neural networks (ANNs, applied to a one-year dataset of hourly CO2 flux measurements collected in Lutjewad, over a flat agriculture area near the Wadden Sea dike in the north of the Netherlands. The dataset was separated in two subsets: a learning and a validation set. The performances of gap-filling techniques were analysed by calculating statistical criteria: coefficient of determination (R2, root mean square error (RMSE, mean absolute error (MAE, maximum absolute error (MaxAE, and mean square bias (MSB. The gap-fill accuracy is seasonally dependent, with better results in cold seasons. The highest accuracy is obtained using ANN technique which is also less sensitive to environmental/seasonal conditions. We argue that filling gaps directly on measured CO2 fluxes is more advantageous than the common method of filling gaps on calculated net ecosystem change, because ANN is an empirical method and smaller scatter is expected when gap filling is applied directly to measurements.

  15. Charm production in flux tubes

    CERN Document Server

    Aguiar, C E; Nazareth, R A M S; Pech, G

    1996-01-01

    We argue that the non-perturbative Schwinger mechanism may play an important role in the hadronic production of charm. We present a flux tube model which assumes that the colliding hadrons become color charged because of gluon exchange, and that a single non-elementary flux tube is built up as they recede. The strong chromoelectric field inside this tube creates quark pairs (including charmed ones) and the ensuing color screening breaks the tube into excited hadronic clusters. On their turn these clusters, or `fireballs', decay statistically into the final hadrons. The model is able to account for the soft production of charmed, strange and lighter hadrons within a unified framework.

  16. Charm production in flux tubes

    Science.gov (United States)

    Aguiar, C. E.; Kodama, T.; Nazareth, R. A. M. S.; Pech, G.

    1996-01-01

    We argue that the nonperturbative Schwinger mechanism may play an important role in the hadronic production of charm. We present a flux tube model which assumes that the colliding hadrons become color charged because of gluon exchange, and that a single nonelementary flux tube is built up as they recede. The strong chromoelectric field inside this tube creates quark pairs (including charmed ones) and the ensuing color screening breaks the tube into excited hadronic clusters. In their turn these clusters, or ``fireballs,'' decay statistically into the final hadrons. The model is able to account for the soft production of charmed, strange, and lighter hadrons within a unified framework.

  17. Initiation of CMEs by Magnetic Flux Emergence

    Indian Academy of Sciences (India)

    Govind Dubey; Bart van der Holst; Stefaan Poedts

    2006-06-01

    The initiation of solar Coronal Mass Ejections (CMEs) is studied in the framework of numerical magnetohydrodynamics (MHD). The initial CME model includes a magnetic flux rope in spherical, axisymmetric geometry. The initial configuration consists of a magnetic flux rope embedded in a gravitationally stratified solar atmosphere with a background dipole magnetic field. The flux rope is in equilibrium due to an image current below the photosphere. An emerging flux triggering mechanism is used to make this equilibrium system unstable. When the magnetic flux emerges within the filament below the flux rope, this results in a catastrophic behavior similar to previous models. As a result, the flux rope rises and a current sheet forms below it. It is shown that the magnetic reconnection in the current sheet below the flux rope in combination with the outward curvature forces results in a fast ejection of the flux rope as observed for solar CMEs.We have done a parametric study of the emerging flux rate.

  18. Dataset for Probabilistic estimation of residential air exchange rates for population-based exposure modeling

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset provides the city-specific air exchange rate measurements, modeled, literature-based as well as housing characteristics. This dataset is associated with...

  19. County and Parish Boundaries, Created when parcel dataset was developed, Published in 2000, Eureka County.

    Data.gov (United States)

    NSGIC GIS Inventory (aka Ramona) — This County and Parish Boundaries dataset as of 2000. It is described as 'Created when parcel dataset was developed'. Data by this publisher are often provided in...

  20. Watershed Boundary Dataset; 12-Digit Watersheds Dissolved to 8-Digit Watersheds

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset is the digital hydrologic unit boundary layer for the 8-digit subwatershed boundaries for the conterminous United States. This dataset is intended to be...

  1. Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

    Directory of Open Access Journals (Sweden)

    Mingwei Leng

    2013-01-01

    Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.

  2. Gridded 5km GHCN-Daily Temperature and Precipitation Dataset, Version 1

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Gridded 5km GHCN-Daily Temperature and Precipitation Dataset (nClimGrid) consists of four climate variables derived from the GHCN-D dataset: maximum temperature,...

  3. An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

    Directory of Open Access Journals (Sweden)

    Kang Zhang

    2014-01-01

    Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

  4. Estimating renewable water flux from landscape features

    Science.gov (United States)

    Peterson, Heidi; Nieber, John; Kanivetsky, Roman; Shmagin, Boris

    2010-05-01

    Water level fluctuations are not always an indicator of ground water recharge or discharge. Fluctuations occurring over a period of decades can be attributed to naturally occurring climatic changes or anthropogenic activities including land use changes, pumping, irrigation and other engineering modifications. When long-term ground water extraction exceeds recharge, impacts on the natural hydrodynamics of the ground water system, including decreases in the hydrologic unit storage and hydraulic head, may occur. If extraction limitations are set, these ground water units can still take decades to centuries to recover. The complexity of vadose zone, ground water and surface water interactions presents hydrological research challenges for the development of operational hierarchies and up-scaling from reaches to watersheds. Multiple techniques for quantifying ground and surface water exchanges, specifically Geographic Information System (GIS) technology, numerical models and statistical analyses must be applied to overcome these challenges. These techniques promote a multidisciplinary and multi-scale approach to addressing hydrologic system research. By using watershed boundaries as a quantification unit and applying the three types of flow systems resulting from Laplace equation solutions (Tóth, 1963), regional, intermediate and local systems could be addressed. Multivariate exploratory data analysis techniques were used to establish watershed interconnections based on the spatio-temporal structure of annual, seasonal, monthly and minimal monthly runoff. The analysis of an initial dataset of 129 watersheds spanning the State of Minnesota, USA, and split into three varying time periods, resulted in five hydrologic regimes with either a positive, negative or absence of trend in annual stream discharge. At a given point on the earth's surface, a combination of different layers, each representing fundamental landscape components, yields unique features related to

  5. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    Science.gov (United States)

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating

  6. Accuracy assessment of seven global land cover datasets over China

    Science.gov (United States)

    Yang, Yongke; Xiao, Pengfeng; Feng, Xuezhi; Li, Haixing

    2017-03-01

    Land cover (LC) is the vital foundation to Earth science. Up to now, several global LC datasets have arisen with efforts of many scientific communities. To provide guidelines for data usage over China, nine LC maps from seven global LC datasets (IGBP DISCover, UMD, GLC, MCD12Q1, GLCNMO, CCI-LC, and GlobeLand30) were evaluated in this study. First, we compared their similarities and discrepancies in both area and spatial patterns, and analysed their inherent relations to data sources and classification schemes and methods. Next, five sets of validation sample units (VSUs) were collected to calculate their accuracy quantitatively. Further, we built a spatial analysis model and depicted their spatial variation in accuracy based on the five sets of VSUs. The results show that, there are evident discrepancies among these LC maps in both area and spatial patterns. For LC maps produced by different institutes, GLC 2000 and CCI-LC 2000 have the highest overall spatial agreement (53.8%). For LC maps produced by same institutes, overall spatial agreement of CCI-LC 2000 and 2010, and MCD12Q1 2001 and 2010 reach up to 99.8% and 73.2%, respectively; while more efforts are still needed if we hope to use these LC maps as time series data for model inputting, since both CCI-LC and MCD12Q1 fail to represent the rapid changing trend of several key LC classes in the early 21st century, in particular urban and built-up, snow and ice, water bodies, and permanent wetlands. With the highest spatial resolution, the overall accuracy of GlobeLand30 2010 is 82.39%. For the other six LC datasets with coarse resolution, CCI-LC 2010/2000 has the highest overall accuracy, and following are MCD12Q1 2010/2001, GLC 2000, GLCNMO 2008, IGBP DISCover, and UMD in turn. Beside that all maps exhibit high accuracy in homogeneous regions; local accuracies in other regions are quite different, particularly in Farming-Pastoral Zone of North China, mountains in Northeast China, and Southeast Hills. Special

  7. Approximate Nearest Neighbor Search for a Dataset of Normalized Vectors

    Science.gov (United States)

    Terasawa, Kengo; Tanaka, Yuzuru

    This paper describes a novel algorithm for approximate nearest neighbor searching. For solving this problem especially in high dimensional spaces, one of the best-known algorithm is Locality-Sensitive Hashing (LSH). This paper presents a variant of the LSH algorithm that outperforms previously proposed methods when the dataset consists of vectors normalized to unit length, which is often the case in pattern recognition. The LSH scheme is based on a family of hash functions that preserves the locality of points. This paper points out that for our special case problem we can design efficient hash functions that map a point on the hypersphere into the closest vertex of the randomly rotated regular polytope. The computational analysis confirmed that the proposed method could improve the exponent ρ, the main indicator of the performance of the LSH algorithm. The practical experiments also supported the efficiency of our algorithm both in time and in space.

  8. Comparison of Supernovae Datasets Constraints on Dark Energy

    Institute of Scientific and Technical Information of China (English)

    ZHANG Cheng-Wu; XU Li-Xin; CHANG Bao-Rong; LIU Hong-Ya

    2007-01-01

    Cosmological measurements suggest that our universe contains a dark energy component. In order to study the dark energy evolution, we constrain a parameterized dark energy equation of state ω(z) = ω0+ω1(z/1+z) using the recent observational datasets: 157 Gold type la supernovae and the newly released 182 Gold type la supernovae by the maximum likelihood method. It is found that the best fit ω(z) crosses -1 in the past and the present best fit value of ω(0)<-1 obtained from 157 Gold-type la supernovae. The crossing of -1 is not realized and ω0=-1 is not ruled out in 1σ confidence level for the 182 Gold-type la supernovae. It is also found that the range of parameter ω0 is wide even in 1σ confidence level and the best fit ω(z) is sensitive to the prior of Ωm.

  9. Robust Machine Learning Applied to Terascale Astronomical Datasets

    CERN Document Server

    Ball, Nicholas M; Myers, Adam D

    2007-01-01

    We present recent results from the Laboratory for Cosmological Data Mining (http://lcdm.astro.uiuc.edu) at the National Center for Supercomputing Applications (NCSA) to provide robust classifications and photometric redshifts for objects in the terascale-class Sloan Digital Sky Survey (SDSS). Through a combination of machine learning in the form of decision trees, k-nearest neighbor, and genetic algorithms, the use of supercomputing resources at NCSA, and the cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million objects in the SDSS, improved photometric redshifts, and a full exploitation of the powerful k-nearest neighbor algorithm. This work is the first to apply the full power of these algorithms to contemporary terascale astronomical datasets, and the improvement over existing results is demonstrable. We discuss issues that we have encountered in dealing with data on the terascale, and possible solutions that can be implemented to deal with upcoming petasc...

  10. The INGV tectonomagnetic network: 2004-2005 preliminary dataset analysis

    Science.gov (United States)

    Masci, F.; Palangio, P.; Meloni, A.

    2006-09-01

    It is well established that earthquakes and volcanic eruption can produce small variations in the local geomagnetic field. The Italian Istituto Nazionale di Geofisica e Vulcanologia (INGV) tectonomagnetic network was installed in Central Italy since 1989 to investigate possible effects on the local geomagnetic field related to earthquakes occurrences. At the present time, total geomagnetic field intensity data are collected in four stations using proton precession magnetometers. We report the complete dataset for the period of years 2004-2005. The data of each station are differentiated respect to the data of the other stations in order to detect local field anomalies removing the contributions from the other sources, external and internal to the Earth. Unfortunately, no correlation between geomagnetic anomalies and the local seismic activity, recorded in Central Italy by the INGV Italian Seismic National Network, was found in this period. Some deceptive structures present in the differentiated data are pointed out.

  11. The INGV tectonomagnetic network: 2004–2005 preliminary dataset analysis

    Directory of Open Access Journals (Sweden)

    F. Masci

    2006-01-01

    Full Text Available It is well established that earthquakes and volcanic eruption can produce small variations in the local geomagnetic field. The Italian Istituto Nazionale di Geofisica e Vulcanologia (INGV tectonomagnetic network was installed in Central Italy since 1989 to investigate possible effects on the local geomagnetic field related to earthquakes occurrences. At the present time, total geomagnetic field intensity data are collected in four stations using proton precession magnetometers. We report the complete dataset for the period of years 2004–2005. The data of each station are differentiated respect to the data of the other stations in order to detect local field anomalies removing the contributions from the other sources, external and internal to the Earth. Unfortunately, no correlation between geomagnetic anomalies and the local seismic activity, recorded in Central Italy by the INGV Italian Seismic National Network, was found in this period. Some deceptive structures present in the differentiated data are pointed out.

  12. Reconstructing flaw image using dataset of full matrix capture technique

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Tae Hun; Kim, Yong Sik; Lee, Jeong Seok [KHNP Central Research Institute, Daejeon (Korea, Republic of)

    2017-02-15

    A conventional phased array ultrasonic system offers the ability to steer an ultrasonic beam by applying independent time delays of individual elements in the array and produce an ultrasonic image. In contrast, full matrix capture (FMC) is a data acquisition process that collects a complete matrix of A-scans from every possible independent transmit-receive combination in a phased array transducer and makes it possible to reconstruct various images that cannot be produced by conventional phased array with the post processing as well as images equivalent to a conventional phased array image. In this paper, a basic algorithm based on the LLL mode total focusing method (TFM) that can image crack type flaws is described. And this technique was applied to reconstruct flaw images from the FMC dataset obtained from the experiments and ultrasonic simulation.

  13. Geocoding and stereo display of tropical forest multisensor datasets

    Science.gov (United States)

    Welch, R.; Jordan, T. R.; Luvall, J. C.

    1990-01-01

    Concern about the future of tropical forests has led to a demand for geocoded multisensor databases that can be used to assess forest structure, deforestation, thermal response, evapotranspiration, and other parameters linked to climate change. In response to studies being conducted at the Braulino Carrillo National Park, Costa Rica, digital satellite and aircraft images recorded by Landsat TM, SPOT HRV, Thermal Infrared Multispectral Scanner, and Calibrated Airborne Multispectral Scanner sensors were placed in register using the Landsat TM image as the reference map. Despite problems caused by relief, multitemporal datasets, and geometric distortions in the aircraft images, registration was accomplished to within + or - 20 m (+ or - 1 data pixel). A digital elevation model constructed from a multisensor Landsat TM/SPOT stereopair proved useful for generating perspective views of the rugged, forested terrain.

  14. xarray: N-D labeled Arrays and Datasets in Python

    Directory of Open Access Journals (Sweden)

    Stephan Hoyer

    2017-04-01

    Full Text Available xarray is an open source project and Python package that provides a toolkit and data structures for N-dimensional labeled arrays. Our approach combines an application programing interface (API inspired by pandas with the Common Data Model for self-described scientific data. Key features of the xarray package include label-based indexing and arithmetic, interoperability with the core scientific Python packages (e.g., pandas, NumPy, Matplotlib, out-of-core computation on datasets that don’t fit into memory, a wide range of serialization and input/output (I/O options, and advanced multi-dimensional data manipulation tools such as group-by and resampling. xarray, as a data model and analytics toolkit, has been widely adopted in the geoscience community but is also used more broadly for multi-dimensional data analysis in physics, machine learning and finance.

  15. Dataset concerning the analytical approximation of the Ae3 temperature.

    Science.gov (United States)

    Ennis, B L; Jimenez-Melero, E; Mostert, R; Santillana, B; Lee, P D

    2017-02-01

    In this paper we present a new polynomial function for calculating the local phase transformation temperature (Ae3 ) between the austenite+ferrite and the fully austenitic phase fields during heating and cooling of steel:[Formula: see text] The dataset includes the terms of the function and the values for the polynomial coefficients for major alloying elements in steel. A short description of the approximation method used to derive and validate the coefficients has also been included. For discussion and application of this model, please refer to the full length article entitled "The role of aluminium in chemical and phase segregation in a TRIP-assisted dual phase steel" 10.1016/j.actamat.2016.05.046 (Ennis et al., 2016) [1].

  16. Original Dataset - dbQSNP | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us dbQSNP Original Dataset Data detail Data name Original Dataset DOI 10.18908/lsdba.nbdc00042-...This Database Site Policy | Contact Us Original Dataset - dbQSNP | LSDB Archive ... ...switchLanguage; BLAST Search Image Search Home About Archive Update History Data

  17. Analysis of Public Datasets for Wearable Fall Detection Systems

    Directory of Open Access Journals (Sweden)

    Eduardo Casilari

    2017-06-01

    Full Text Available Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs. In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.. Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  18. Automatic identification of variables in epidemiological datasets using logic regression.

    Science.gov (United States)

    Lorenz, Matthias W; Abdi, Negin Ashtiani; Scheckenbach, Frank; Pflug, Anja; Bülbül, Alpaslan; Catapano, Alberico L; Agewall, Stefan; Ezhov, Marat; Bots, Michiel L; Kiechl, Stefan; Orth, Andreas

    2017-04-13

    For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable. For each variable in a set of target variables, a number of simple rules were manually created. With logic regression, an optimal Boolean combination of these rules was searched for every target variable, using a random subset of a large database of epidemiological and clinical cohort data (construction subset). In a second subset of this database (validation subset), this optimal combination rules were validated. In the construction sample, 41 target variables were allocated on average with a positive predictive value (PPV) of 34%, and a negative predictive value (NPV) of 95%. In the validation sample, PPV was 33%, whereas NPV remained at 94%. In the construction sample, PPV was 50% or less in 63% of all variables, in the validation sample in 71% of all variables. We demonstrated that the application of logic regression in a complex data management task in large epidemiological IPD meta-analyses is feasible. However, the performance of the algorithm is poor, which may require backup strategies.

  19. Black branes in flux compactifications

    Energy Technology Data Exchange (ETDEWEB)

    Torroba, Gonzalo; Wang, Huajia

    2013-10-01

    We construct charged black branes in type IIA flux compactifications that are dual to (2 + 1)-dimensional field theories at finite density. The internal space is a general Calabi-Yau manifold with fluxes, with internal dimensions much smaller than the AdS radius. Gauge fields descend from the 3-form RR potential evaluated on harmonic forms of the Calabi-Yau, and Kaluza-Klein modes decouple. Black branes are described by a four-dimensional effective field theory that includes only a few light fields and is valid over a parametrically large range of scales. This effective theory determines the low energy dynamics, stability and thermodynamic properties. Tools from flux compactifications are also used to construct holographic CFTs with no relevant scalar operators, that can lead to symmetric phases of condensed matter systems stable to very low temperatures. The general formalism is illustrated with simple examples such as toroidal compactifications and manifolds with a single size modulus. We initiate the classification of holographic phases of matter described by flux compactifications, which include generalized Reissner-Nordstrom branes, nonsupersymmetric AdS2×R2 and hyperscaling violating solutions.

  20. Temporal and spatial changes in mixed layer properties and atmospheric net heat flux in the Nordic Seas

    Energy Technology Data Exchange (ETDEWEB)

    Smirnov, A; Alekseev, G [SI ' Arctic and Antarctic Research Institute' , St. Petersburg (Russian Federation); Korablev, A; Esau, I, E-mail: avsmir@aari.nw.r [Nansen Environmental and Remote Sensing Centre, Bergen (Norway)

    2010-08-15

    The Nordic Seas are an important area of the World Ocean where warm Atlantic waters penetrate far north forming the mild climate of Northern Europe. These waters represent the northern rim of the global thermohaline circulation. Estimates of the relationships between the net heat flux and mixed layer properties in the Nordic Seas are examined. Oceanographic data are derived from the Oceanographic Data Base (ODB) compiled in the Arctic and Antarctic Research Institute. Ocean weather ship 'Mike' (OWS) data are used to calculate radiative and turbulent components of the net heat flux. The net shortwave flux was calculated using a satellite albedo dataset and the EPA model. The net longwave flux was estimated by Southampton Oceanography Centre (SOC) method. Turbulent fluxes at the air-sea interface were calculated using the COARE 3.0 algorithm. The net heat flux was calculated by using oceanographic and meteorological data of the OWS 'Mike'. The mixed layer depth was estimated for the period since 2002 until 2009 by the 'Mike' data as well. A good correlation between these two parameters has been found. Sensible and latent heat fluxes controlled by surface air temperature/sea surface temperature gradient are the main contributors into net heat flux. Significant correlation was found between heat fluxes variations at the OWS 'Mike' location and sea ice export from the Arctic Ocean.

  1. Improved 3D density modelling of the Central Andes from combining terrestrial datasets with satellite based datasets

    Science.gov (United States)

    Schaller, Theresa; Sobiesiak, Monika; Götze, Hans-Jürgen; Ebbing, Jörg

    2015-04-01

    As horizontal gravity gradients are proxies for large stresses, the uniquely high gravity gradients of the South American continental margin seem to be indicative for the frequently occurring large earthquakes at this plate boundary. It has been observed that these earthquakes can break repeatedly the same respective segment but can also combine to form M>9 earthquakes at the end of longer seismic cycles. A large seismic gap left behind by the 1877 M~9 earthquake existed in the northernmost part of Chile. This gap has partially been ruptured in the Mw 7.7 2007 Tocopilla earthquake and the Mw 8.2 2014 Pisagua earthquake. The nature of this seismological segmentation and the distribution of energy release in an earthquake is part of ongoing research. It can be assumed that both features are related to thickness variations of high density bodies located in the continental crust of the coastal area. These batholiths produce a clear maximum in the gravity signal. Those maxima also show a good spatial correlation with seismic asperity structures and seismological segment boundaries. Understanding of the tectonic situation can be improved through 3D forward density modelling of the gravity field. Problems arise in areas with less ground measurements. Especially in the high Andes severe gaps exist due to the inaccessibility of some regions. Also the transition zone between on and offshore date data displays significant problems, particularly since this is the area that is most interesting in terms of seismic hazard. We modelled the continental and oceanic crust and upper mantle using different gravity datasets. The first one includes terrestrial data measured at a station spacing of 5 km or less along all passable roads combined with satellite altimetry data offshore. The second data set is the newly released EIGEN-6C4 which combines the latest satellite data with ground measurements. The spherical harmonics maximum degree of EIGEN-6C4 is 2190 which corresponds to a

  2. Multidecadal Fluvial Sediment Fluxes to Deltas under Environmental Change Scenarios

    Science.gov (United States)

    Dunn, Frances; Darby, Stephen; Nicholls, Robert

    2016-04-01

    Sediment delivery is vital to sustain delta environments on which over half a billion people live worldwide. Due to factors such as subsidence and sea level rise, deltas sink relative to sea level if sediment is not delivered to and retained on their surfaces. Deltas which sink relative to sea level experience flooding, land degradation and loss, which endangers anthropogenic activities and populations. The future of fluvial sediment fluxes, a key mechanism for sediment delivery to deltas, is uncertain due to complex environmental changes which are predicted to occur over the coming decades. This research investigates fluvial sediment fluxes under environmental changes in order to assess the sustainability of delta environments under potential future scenarios up to 2100. Global datasets of climate change, reservoir construction, and population and GDP as proxies for anthropogenic influence through land use changes are used to drive the catchment numerical model WBMsed, which is being used to investigate the effects of these environmental changes on fluvial sediment delivery. This process produces fluvial sediment fluxes under multiple future scenarios which will be used to assess the future sustainability of a selection of 8 vulnerable deltas, although the approach can be applied to deltas worldwide. By modelling potential future scenarios of fluvial sediment flux, this research contributes to the prognosis for delta environments. The future scenarios will inform management at multiple temporal scales, and indicate the potential consequences for deltas of various anthropogenic activities. This research will both forewarn managers of potentially unsustainable deltas and indicate those anthropogenic activities which encourage or hinder the creation of sustainable delta environments.

  3. Uncertainty in acquiring elemental fluxes from subtropical mountainous rivers

    Directory of Open Access Journals (Sweden)

    T. Y. Lee

    2009-12-01

    Full Text Available Mountainous watersheds in high standing islands of the western tropical and subtropical Pacific have received great international attention regarding its high physical and chemical weathering rates caused by cyclone invasion, friable lithology and high tectonic activity. Since mountainous region is usually difficult to assess, particularly, during severe weather conditions, hydrological responses of elements against full-scale of water discharge (often >2 orders of magnitude are rarely reported. In this study, we conducted discrete sampling (~3 day interval throughout four seasons and intensive sampling (hourly during typhoon floods from three upstream watersheds in Taiwan during 2002–2005. These observations revealing various elemental responses are taken as complete dataset (i.e. reference flux to evaluate flux uncertainty among constituents caused by different sampling frequency, sample size and estimators. Five constituents are analyzed, including nitrate (NO3, sulfate (SO4, dissolved organic carbon (DOC, calcium (Ca, and silicate (Si. Each has specific environmental and geological implications. Direct average, flow-weighted and rating curve methods were applied. Basing on statistical analyses, flow-weighted method is the most conservative method, and is recommended to use for all constituents if few samples are available. The rating curve method is suggested, only when water samples in high-flows are available. Direct average method is only appropriate for stable constituents, such as Si. These findings, such as concentration-discharge variation, sampling frequency effect, and flux estimator assessment, offer fundamental knowledge while estimating geochemical fluxes from small mountainous rivers in Oceania region.

  4. Determinants of Seasonality of Planktonic Foraminifera Shell Flux: Consequences for Paleoproxies

    Science.gov (United States)

    Jonkers, L.; Kucera, M.

    2014-12-01

    Planktonic foraminifera are widely used proxy carriers in paleoceanography. The flux of foraminiferal shells to the sea floor is not even throughout the year, creating a seasonal bias in the surface conditions recorded in an average fossil sample. This bias, and its changes through time, may account for an important part of the variability in paleoclimate records, but it is often ignored because of limited knowledge on the determinants of flux seasonality. To address this issue we have compiled a global dataset on shell flux seasonality from sediment traps. The database contains 38 globally distributed time series of at least one year and covers >20 species. We use periodic regression to objectively determine peak flux timing and amplitude. Significant seasonality is observed in 80 % of the cases studied and we distinguish three distinct groups of foraminifera with different modes of seasonality. This division is independent of ocean basin or upwelling and appears to reflect three principle patterns of phenology. Warm-water and symbiont-bearing species change flux seasonality by concentrating a larger proportion of the annual flux in a shorter period in colder water. Peak flux timing appears random at high temperatures and shifts towards autumn at lower temperatures. Seasonal flux variability is small at high temperatures (within their optimal range) resulting in a negligible seasonal bias. In colder waters the timing appear constant and the strength of the peak flux can be predicted by temperature. Temperate and cold-water dwellers adjust their peak timing with average temperature. Peak flux in these species occurs later during the year at lower temperatures and follows chlorophyll maxima by approximately a month. The strength of peak flux is similar across the temperature range, but the association with productivity allows for prediction of the timing of peak flux. Peak flux of deep-dwelling seems to occur in spring independent of temperature, which may agree

  5. A framework to utilize turbulent flux measurements for mesoscale models and remote sensing applications

    Directory of Open Access Journals (Sweden)

    W. Babel

    2011-05-01

    Full Text Available Meteorologically measured fluxes of energy and matter between the surface and the atmosphere originate from a source area of certain extent, located in the upwind sector of the device. The spatial representativeness of such measurements is strongly influenced by the heterogeneity of the landscape. The footprint concept is capable of linking observed data with spatial heterogeneity. This study aims at upscaling eddy covariance derived fluxes to a grid size of 1 km edge length, which is typical for mesoscale models or low resolution remote sensing data.

    Here an upscaling strategy is presented, utilizing footprint modelling and SVAT modelling as well as observations from a target land-use area. The general idea of this scheme is to model fluxes from adjacent land-use types and combine them with the measured flux data to yield a grid representative flux according to the land-use distribution within the grid cell. The performance of the upscaling routine is evaluated with real datasets, which are considered to be land-use specific fluxes in a grid cell. The measurements above rye and maize fields stem from the LITFASS experiment 2003 in Lindenberg, Germany and the respective modelled timeseries were derived by the SVAT model SEWAB. Contributions from each land-use type to the observations are estimated using a forward lagrangian stochastic model. A representation error is defined as the error in flux estimates made when accepting the measurements unchanged as grid representative flux and ignoring flux contributions from other land-use types within the respective grid cell.

    Results show that this representation error can be reduced up to 56 % when applying the spatial integration. This shows the potential for further application of this strategy, although the absolute differences between flux observations from rye and maize were so small, that the spatial integration would be rejected in a real situation. Corresponding thresholds for

  6. Compiling a Comprehensive EVA Training Dataset for NASA Astronauts

    Science.gov (United States)

    Laughlin, M. S.; Murray, J. D.; Lee, L. R.; Wear, M. L.; Van Baalen, M.

    2016-01-01

    Training for a spacewalk or extravehicular activity (EVA) is considered a hazardous duty for NASA astronauts. This places astronauts at risk for decompression sickness as well as various musculoskeletal disorders from working in the spacesuit. As a result, the operational and research communities over the years have requested access to EVA training data to supplement their studies. The purpose of this paper is to document the comprehensive EVA training data set that was compiled from multiple sources by the Lifetime Surveillance of Astronaut Health (LSAH) epidemiologists to investigate musculoskeletal injuries. The EVA training dataset does not contain any medical data, rather it only documents when EVA training was performed, by whom and other details about the session. The first activities practicing EVA maneuvers in water were performed at the Neutral Buoyancy Simulator (NBS) at the Marshall Spaceflight Center in Huntsville, Alabama. This facility opened in 1967 and was used for EVA training until the early Space Shuttle program days. Although several photographs show astronauts performing EVA training in the NBS, records detailing who performed the training and the frequency of training are unavailable. Paper training records were stored within the NBS after it was designated as a National Historic Landmark in 1985 and closed in 1997, but significant resources would be needed to identify and secure these records, and at this time LSAH has not pursued acquisition of these early training records. Training in the NBS decreased when the Johnson Space Center in Houston, Texas, opened the Weightless Environment Training Facility (WETF) in 1980. Early training records from the WETF consist of 11 hand-written dive logbooks compiled by individual workers that were digitized at the request of LSAH. The WETF was integral in the training for Space Shuttle EVAs until its closure in 1998. The Neutral Buoyancy Laboratory (NBL) at the Sonny Carter Training Facility near JSC

  7. Determination of Energy Fluxes Over Agricultural Surfaces

    OpenAIRE

    Josefina Argete

    1994-01-01

    An energy budget was conducted over two kinds if surfaces: grass and corn canopy. The net radiative flux and the soil heat flux were directly measured while the latent and sensible heat flux were calculated from the vertical profiles if wet and dry-bulb temperature and wind speed. The crop storage flux was also estimated. Using the gradient or aerodynamic equations, the calculated fluxes when compared to the measured fluxes in the context of an energy budget gave an SEE = 63 Wm-2 over grass a...

  8. BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters

    Directory of Open Access Journals (Sweden)

    Mithun Biswas

    2017-06-01

    Full Text Available BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.

  9. BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters.

    Science.gov (United States)

    Biswas, Mithun; Islam, Rafiqul; Shom, Gautam Kumar; Shopon, Md; Mohammed, Nabeel; Momen, Sifat; Abedin, Anowarul

    2017-06-01

    BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.

  10. Inorganic carbon dominates total dissolved carbon concentrations and fluxes in British rivers: Application of the THINCARB model - Thermodynamic modelling of inorganic carbon in freshwaters.

    Science.gov (United States)

    Jarvie, Helen P; King, Stephen M; Neal, Colin

    2017-01-01

    River water-quality studies rarely measure dissolved inorganic carbon (DIC) routinely, and there is a gap in our knowledge of the contributions of DIC to aquatic carbon fluxes and cycling processes. Here, we present the THINCARB model (THermodynamic modelling of INorganic CARBon), which uses widely-measured determinands (pH, alkalinity and temperature) to calculate DIC concentrations, speciation (bicarbonate, HCO3(-); carbonate, CO3(2-); and dissolved carbon dioxide, H2CO3(⁎)) and excess partial pressures of carbon dioxide (EpCO2) in freshwaters. If calcium concentration measurements are available, THINCARB also calculates calcite saturation. THINCARB was applied to the 39-year Harmonised Monitoring Scheme (HMS) dataset, encompassing all the major British rivers discharging to the coastal zone. Model outputs were combined with the HMS dissolved organic carbon (DOC) datasets, and with spatial land use, geology, digital elevation and hydrological datasets. We provide a first national-scale evaluation of: the spatial and temporal variability in DIC concentrations and fluxes in British rivers; the contributions of DIC and DOC to total dissolved carbon (TDC); and the contributions to DIC from HCO3(-) and CO3(2-) from weathering sources and H2CO3(⁎) from microbial respiration. DIC accounted for >50% of TDC concentrations in 87% of the HMS samples. In the seven largest British rivers, DIC accounted for an average of 80% of the TDC flux (ranging from 57% in the upland River Tay, to 91% in the lowland River Thames). DIC fluxes exceeded DOC fluxes, even under high-flow conditions, including in the Rivers Tay and Tweed, draining upland peaty catchments. Given that particulate organic carbon fluxes from UK rivers are consistently lower than DOC fluxes, DIC fluxes are therefore also the major source of total carbon fluxes to the coastal zone. These results demonstrate the importance of accounting for DIC concentrations and fluxes for quantifying carbon transfers from land

  11. Privacy preserving data anonymization of spontaneous ADE reporting system dataset.

    Science.gov (United States)

    Lin, Wen-Yang; Yang, Duen-Chuan; Wang, Jie-Teng

    2016-07-18

    To facilitate long-term safety surveillance of marketing drugs, many spontaneously reporting systems (SRSs) of ADR events have been established world-wide. Since the data collected by SRSs contain sensitive personal health information that should be protected to prevent the identification of individuals, it procures the issue of privacy preserving data publishing (PPDP), that is, how to sanitize (anonymize) raw data before publishing. Although much work has been done on PPDP, very few studies have focused on protecting privacy of SRS data and none of the anonymization methods is favorable for SRS datasets, due to which contain some characteristics such as rare events, multiple individual records, and multi-valued sensitive attributes. We propose a new privacy model called MS(k, θ (*) )-bounding for protecting published spontaneous ADE reporting data from privacy attacks. Our model has the flexibility of varying privacy thresholds, i.e., θ (*) , for different sensitive values and takes the characteristics of SRS data into consideration. We also propose an anonymization algorithm for sanitizing the raw data to meet the requirements specified through the proposed model. Our algorithm adopts a greedy-based clustering strategy to group the records into clusters, conforming to an innovative anonymization metric aiming to minimize the privacy risk as well as maintain the data utility for ADR detection. Empirical study was conducted using FAERS dataset from 2004Q1 to 2011Q4. We compared our model with four prevailing methods, including k-anonymity, (X, Y)-anonymity, Multi-sensitive l-diversity, and (α, k)-anonymity, evaluated via two measures, Danger Ratio (DR) and Information Loss (IL), and considered three different scenarios of threshold setting for θ (*) , including uniform setting, level-wise setting and frequency-based setting. We also conducted experiments to inspect the impact of anonymized data on the strengths of discovered ADR signals. With all three

  12. Mining hydrogeological data from existing AEM datasets for mineral Mining

    Science.gov (United States)

    Menghini, Antonio; Viezzoli, Andrea; Teatini, Pietro; Cattarossi, Andrea

    2017-04-01

    Large amount of existing Airborne Electromagnetic (AEM) data are potentially available all over the World. Originally acquired for mining purposes, AEM data traditionally do not get processed in detail and inverted: most of the orebodies can be easily detected by analyzing just the peak anomaly directly evidenced by voltage values (the so-called "bump detection"). However, the AEM acquisitions can be accurately re-processed and inverted to provide detailed 3D models of resistivity: a first step towards hydrogeological studies and modelling. This is a great opportunity especially for the African continent, where the detection of exploitable groundwater resources is a crucial issue. In many cases, a while after AEM data have been acquired by the mining company, Governments become owners of those datasets and have the opportunity to develop detailed hydrogeological characterizations at very low costs. We report the case in which existing VTEM (Versatile Time Domain Electromagnetic - Geotech Ltd) data, originally acquired to detect gold deposits, are used to improve the hydrogeological knowledge of a roughly 50 km2 pilot-test area in Sierra Leone. Thanks to an accurate processing workflow and an advanced data inversion, based on the Spatially Constrained Inversion (SCI) algorithm, we have been able to resolve the thickness of the regolith aquifer and the top of the granitic-gneiss or greenstone belt bedrock. Moreover, the occurrence of different lithological units (more or less conductive) directly related to groundwater flow, sometimes having also a high chargeability (e.g. in the case of lateritic units), has been detailed within the regolith. The most promising areas to drill new productive wells have been recognized where the bedrock is deeper and the regolith thickness is larger. A further info that was considered in hydrogeological mapping is the resistivity of the regolith, provided that the most permeable layers coincide with the most resistive units. The

  13. Constrained Allocation Flux Balance Analysis

    CERN Document Server

    Mori, Matteo; Martin, Olivier C; De Martino, Andrea; Marinari, Enzo

    2016-01-01

    New experimental results on bacterial growth inspire a novel top-down approach to study cell metabolism, combining mass balance and proteomic constraints to extend and complement Flux Balance Analysis. We introduce here Constrained Allocation Flux Balance Analysis, CAFBA, in which the biosynthetic costs associated to growth are accounted for in an effective way through a single additional genome-wide constraint. Its roots lie in the experimentally observed pattern of proteome allocation for metabolic functions, allowing to bridge regulation and metabolism in a transparent way under the principle of growth-rate maximization. We provide a simple method to solve CAFBA efficiently and propose an "ensemble averaging" procedure to account for unknown protein costs. Applying this approach to modeling E. coli metabolism, we find that, as the growth rate increases, CAFBA solutions cross over from respiratory, growth-yield maximizing states (preferred at slow growth) to fermentative states with carbon overflow (preferr...

  14. Holonomy-flux spinfoam amplitude

    CERN Document Server

    Perini, Claudio

    2012-01-01

    We introduce a holomorphic representation for the Lorentzian EPRL spinfoam on arbitrary 2-complexes. The representation is obtained via the Ashtekar-Lewandowski-Marolf-Mour\\~ao-Thiemann heat kernel coherent state transform. The new variables are classical holonomy-flux phase space variables $(h,X)\\simeq \\mathcal T^*SU(2)$ of Hamiltonian loop quantum gravity prescribing the holonomies of the Ashtekar connection $A=\\Gamma + \\gamma K$, and their conjugate gravitational fluxes. For small heat kernel `time' the spinfoam amplitude is peaked on classical space-time geometries, where at most countably many curvatures are allowed for non-zero Barbero-Immirzi parameter. We briefly comment on the possibility to use the alternative flipped classical limit.

  15. Flavor mixings in flux compactifications

    Science.gov (United States)

    Buchmuller, Wilfried; Schweizer, Julian

    2017-04-01

    A multiplicity of quark-lepton families can naturally arise as zero modes in flux compactifications. The flavor structure of quark and lepton mass matrices is then determined by the wave function profiles of the zero modes. We consider a supersymmetric S O (10 )×U (1 ) model in six dimensions compactified on the orbifold T2/Z2 with Abelian magnetic flux. A bulk 16 -plet charged under the U (1 ) provides the quark-lepton generations whereas two uncharged 10 -plets yield two Higgs doublets. Bulk anomaly cancellation requires the presence of additional 16 - and 10 -plets. The corresponding zero modes form vectorlike split multiplets that are needed to obtain a successful flavor phenomenology. We analyze the pattern of flavor mixings for the two heaviest families of the Standard Model and discuss possible generalizations to three and more generations.

  16. Structural control of metabolic flux.

    Directory of Open Access Journals (Sweden)

    Max Sajitz-Hermstein

    Full Text Available Organisms have to continuously adapt to changing environmental conditions or undergo developmental transitions. To meet the accompanying change in metabolic demands, the molecular mechanisms of adaptation involve concerted interactions which ultimately induce a modification of the metabolic state, which is characterized by reaction fluxes and metabolite concentrations. These state transitions are the effect of simultaneously manipulating fluxes through several reactions. While metabolic control analysis has provided a powerful framework for elucidating the principles governing this orchestrated action to understand metabolic control, its applications are restricted by the limited availability of kinetic information. Here, we introduce structural metabolic control as a framework to examine individual reactions' potential to control metabolic functions, such as biomass production, based on structural modeling. The capability to carry out a metabolic function is determined using flux balance analysis (FBA. We examine structural metabolic control on the example of the central carbon metabolism of Escherichia coli by the recently introduced framework of functional centrality (FC. This framework is based on the Shapley value from cooperative game theory and FBA, and we demonstrate its superior ability to assign "share of control" to individual reactions with respect to metabolic functions and environmental conditions. A comparative analysis of various scenarios illustrates the usefulness of FC and its relations to other structural approaches pertaining to metabolic control. We propose a Monte Carlo algorithm to estimate FCs for large networks, based on the enumeration of elementary flux modes. We further give detailed biological interpretation of FCs for production of lactate and ATP under various respiratory conditions.

  17. Flux tubes at Finite Temperature

    CERN Document Server

    Bicudo, Pedro; Cardoso, Marco

    2016-01-01

    We show the flux tubes produced by static quark-antiquark, quark-quark and quark-gluon charges at finite temperature. The sources are placed in the lattice with fundamental and adjoint Polyakov loops. We compute the square densities of the chromomagnetic and chromoelectric fields above and below the phase transition. Our results are gauge invariant and produced in pure gauge SU(3). The codes are written in CUDA and the computations are performed with GPUs.

  18. Classical Transitions for Flux Vacua

    CERN Document Server

    Deskins, J Tate; Yang, I-Sheng

    2012-01-01

    We present the simplest model for classical transitions in flux vacua. A complex field with a spontaneously broken U(1) symmetry is embedded in $M_2\\times S_1$. We numerically construct different winding number vacua, the vortices interpolating between them, and simulate the collisions of these vortices. We show that classical transitions are generic at large boosts, independent of whether or not vortices miss each other in the compact $S_1$.

  19. Surface fluxes in heterogeneous landscape

    Energy Technology Data Exchange (ETDEWEB)

    Bay Hasager, C.

    1997-01-01

    The surface fluxes in homogeneous landscapes are calculated by similarity scaling principles. The methodology is well establish. In heterogeneous landscapes with spatial changes in the micro scale range, i e from 100 m to 10 km, advective effects are significant. The present work focus on these effects in an agricultural countryside typical for the midlatitudes. Meteorological and satellite data from a highly heterogeneous landscape in the Rhine Valley, Germany was collected in the large-scale field experiment TRACT (Transport of pollutants over complex terrain) in 1992. Classified satellite images, Landsat TM and ERS SAR, are used as basis for roughness maps. The roughnesses were measured at meteorological masts in the various cover classes and assigned pixel by pixel to the images. The roughness maps are aggregated, i e spatially averaged, into so-called effective roughness lengths. This calculation is performed by a micro scale aggregation model. The model solves the linearized atmospheric flow equations by a numerical (Fast Fourier Transform) method. This model also calculate maps of friction velocity and momentum flux pixel wise in heterogeneous landscapes. It is indicated how the aggregation methodology can be used to calculate the heat fluxes based on the relevant satellite data i e temperature and soil moisture information. (au) 10 tabs., 49 ills., 223 refs.

  20. A dataset from bottom trawl survey around Taiwan.

    Science.gov (United States)

    Shao, Kwang-Tsao; Lin, Jack; Wu, Chung-Han; Yeh, Hsin-Ming; Cheng, Tun-Yuan

    2012-01-01

    Bottom trawl fishery is one of the most important coastal fisheries in Taiwan both in production and economic values. However, its annual production started to decline due to overfishing since the 1980s. Its bycatch problem also damages the fishery resource seriously. Thus, the government banned the bottom fishery within 3 nautical miles along the shoreline in 1989. To evaluate the effectiveness of this policy, a four year survey was conducted from 2000-2003, in the waters around Taiwan and Penghu (Pescadore) Islands, one region each year respectively. All fish specimens collected from trawling were brought back to lab for identification, individual number count and body weight measurement. These raw data have been integrated and established in Taiwan Fish Database (http://fishdb.sinica.edu.tw). They have also been published through TaiBIF (http://taibif.tw), FishBase and GBIF (website see below). This dataset contains 631 fish species and 3,529 records, making it the most complete demersal fish fauna and their temporal and spatial distributional data on the soft marine habitat in Taiwan.