salvelinus alpinus dataset: Topics by WorldWideScience.org

Sample records for salvelinus alpinus dataset

Characterization of a Neochlamydia-like Bacterium Associated with Epitheliocystis in Cultured Artic Char Salvelinus alpinus

Science.gov (United States)

Infections of branchial epithelium by intracellular gram-negative bacteria, termed epitheliocystis, have limited culture of Arctic char (Salvelinus alpinus). To characterize a bacterium associated with epitheliocystis in cultured char, gills were sampled for histopathologic examination, conventional...
If Arctic charr Salvelinus alpinus is “the most diverse vertebrate,” what is the lake charr Salvelinus namaycush?

Science.gov (United States)

Muir, Andrew M.; Hansen, Michael J.; Bronte, Charles R.; Krueger, Charles C.

2016-01-01

Teleost fishes are prominent vertebrate models of evolution, illustrated among old-world radiations by the Cichlidae of East African Great Lakes and new-world radiations by the circumpolar Arctic charr Salvelinus alpinus. Herein, we describe variation in lake charr S. namaycush morphology, life history, physiology, and ecology, as another example of radiation. The lake charr is restricted to northern North America, where it originated from glacial refugia and diversified in large lakes. Shallow and deepwater morphs arose in multiple lakes, with a large-bodied shallow-water ‘lean’ morph in shallow inshore depths, a small-bodied mid-water ‘humper’ morph on offshore shoals or banks, and a large-bodied deep-water ‘siscowet’ morph at depths > 100 m. Eye position, gape size, and gillraker length and spacing adapted for feeding on different-sized prey, with piscivorous morphs (leans and siscowets) reaching larger asymptotic size than invertivorous morphs (humpers). Lean morphs are light in color, whereas deepwater morphs are drab and dark, although the pattern is reversed in dark tannic lakes. Morphs shift from benthic to pelagic feeding at a length of 400–490-mm. Phenotypic differences in locomotion, buoyancy, and lipid metabolism evolved into different mechanisms for buoyancy regulation, with lean morphs relying on hydrodynamic lift and siscowet morphs relying on hydrostatic lift. We suggest that the Salvelinus genus, rather than the species S. alpinus, is a diverse genus that should be the subject of comparative studies of processes causing divergence and adaptation among member species that may lead to a more complete evolutionary conceptual model.
Impaired swimming performance of acid-exposed Arctic charr, Salvelinus alpinus L

Energy Technology Data Exchange (ETDEWEB)

Hunter, L.A. (North/South Consultants Inc., Winnipeg, MB (Canada)); Scherer, E. (Dept. of Fisheries and Oceans, Freshwater Inst. Science Lab., Winnipeg, MB (Canada))

1988-01-01

Effects of increased ambient acidity are of particular interest, as the formation of metabolic and respiratory acids and acceleration of branchial ion loss during vigorous swimming duplicates or compounds effects of exposure to environmental acidity. Three year old Arctic charr (Salvelinus alpinus L.) were exposed to five levels of acidity between pH 6 and pH 3.8. Swimming performance as determined by critical swimming speeds was 67.5 cm {center dot} s{sup -1} or 4.4 body lengths per second for untreated fish (pH 7.8). Performance declined sharply below pH 4.5; at pH 3.8 it was reduced by 35% after 7 days of exposure. Tailbeat frequencies and ventilation rates showed no dose-response effects. This would support the assumption that afferent and efferent neuromuscular functions may have remained unimpaired under increased ambient acidity so that the stimulus of directed water current continued to elicit forced swimming, causing (forcing) the fish to use the entire scope for activity available at the various pH levels. At swimming speeds between 20 and 50 cm {center dot} s{sup -1}, ventilation rates at all levels of acidity were higher than at the control level. Based on this, spontaneous, i.e., non-forced swimming activity may show a lower response threshold. 19 refs., 3 figs., 1 tab.
Adjustments of Protein Metabolism in Fasting Arctic Charr, Salvelinus alpinus.

Directory of Open Access Journals (Sweden)

Alicia A Cassidy

Full Text Available Protein metabolism, including the interrelated processes of synthesis and degradation, mediates the growth of an animal. In ectothermic animals, protein metabolism is responsive to changes in both biotic and abiotic conditions. This study aimed to characterise responses of protein metabolism to food deprivation that occur in the coldwater salmonid, Arctic charr, Salvelinus alpinus. We compared two groups of Arctic charr: one fed continuously and the other deprived of food for 36 days. We measured the fractional rate of protein synthesis (KS in individuals from the fed and fasted groups using a flooding dose technique modified for the use of deuterium-labelled phenylalanine. The enzyme activities of the three major protein degradation pathways (ubiquitin proteasome, lysosomal cathepsins and the calpain systems were measured in the same fish. This study is the first to measure both KS and the enzymatic activity of protein degradation in the same fish, allowing us to examine the apparent contribution of different protein degradation pathways to protein turnover in various tissues (red and white muscle, liver, heart and gills. KS was lower in the white muscle and in liver of the fasted fish compared to the fed fish. There were no observable effects of food deprivation on the protease activities in any of the tissues with the exception of liver, where the ubiquitin proteasome pathway seemed to be activated during fasting conditions. Lysosomal proteolysis appears to be the primary degradation pathway for muscle protein, while the ubiquitin proteasome pathway seems to predominate in the liver. We speculate that Arctic charr regulate protein metabolism during food deprivation to conserve proteins.
Fish and crustaceans in northeast Greenland lakes with special emphasis on interactions between Arctic charr (Salvelinus alpinus), Lepidurus arcticus and benthic chydorids

DEFF Research Database (Denmark)

Jeppesen, E.; Christoffersen, K.; Landkildehus, F.

2001-01-01

We studied the trophic structure in the pelagial and crustacean remains in the surface 1 cm of the sediment of 13 shallow, high arctic lakes in northeast Greenland (74 N). Seven lakes were fishless, while the remaining six hosted a dwarf form of Arctic charr (Salvelinus alpinus). In fishless lakes...... sp. in lakes with Lepidurus, while they were abundant in lakes with fish. The low abundance in fishless lakes could not be explained by damage of crustacean remains caused by Lepidurus feeding in the sediment, because remains of the more soft-shelled, pelagic-living Daphnia were abundant...... in the sediment of these lakes. No significant differences between lakes with and without fish were found in chlorophyll a, total phosphorus, total nitrogen, conductivity or temperature, suggesting that the observed link between Lepidurus arcticus and the benthic crustacean community is causal. Consequently...
Biochemical characterization of the Arctic char (Salvelinus alpinus ovarian progestin membrane receptor

Directory of Open Access Journals (Sweden)

Thomas Peter

2005-11-01

Full Text Available Abstract Membrane progestin receptors are involved in oocyte maturation in teleosts. However, the maturation-inducing steroid (MIS does not appear to be conserved among species and several progestins may fulfill this function. So far, complete biochemical characterization has only been performed on a few species. In the present study we have characterized the membrane progestin receptor in Arctic char (Salvelinus alpinus and show that the 17,20beta-dihydroxy-4-pregnen-3-one (17,20beta-P receptor also binds several xenobiotics, thus rendering oocyte maturation sensitive to environmental pollutants. We identified a single class of high affinity (Kd, 13.8 ± 1.1 nM, low capacity (Bmax, 1.6 ± 0.6 pmol/g ovary binding sites by saturation and Scatchard analyses. Receptor binding displayed rapid association and dissociation kinetics typical of steroid membrane receptors, with t1/2 s of less than 1 minute. The 17,20beta-P binding also displayed tissue specificity with high, saturable, and specific 17,20beta-P binding detected in ovaries, heart and gills while no specific binding was observed in muscle, brain or liver. Changes in 17,20beta-P binding during oocyte maturation were consistent with its identity as the oocyte MIS membrane receptor. Incubation of fully-grown ovarian follicles with gonadotropin induced oocyte maturation, which was accompanied by a five-fold increase in 17,20beta-P receptor binding. In addition, competition studies with a variety of steroids revealed that receptor binding is highly specific for 17,20beta-P, the likely maturation-inducing steroid (MIS in Arctic char. The relative-binding affinities of all the other progestogens and steroids tested were less than 5% of that of 17,20beta-P for the receptor. Several ortho, para derivatives of DDT also showed weak binding affinity for the 17,20beta-P receptor supporting the hypothesis that xenobiotics may bind steroid receptors on the oocyte's surface and might thereby interfere
Radiocaesium turnover in Arctic charr (Salvelinus alpinus) and brown trout (Salmo trutta) in a Norwegian lake

International Nuclear Information System (INIS)

Forseth, T.; Ugedal, O.; Jonsson, B.; Langeland, A.; Njaastad, O.

1991-01-01

The radioactivity of brown trout (Salmo trutta L.) and Arctic charr (Salvelinus alpinus (L.)) was monitored in a Norwegian lake from 1986 to 1989. A distinct difference was observed between brown trout and Arctic charr in the accumulation of radiocaesium ( 134 Cs and 137 Cs) from the Chernobyl fallout, and the study focused on the understanding of this difference. Brown trout had a large food consumption and a corresponding high intake of radiocaesium. Excretion was 20% faster in brown trout than Arctic charr as brown trout lived at high temperatures in epilimnic water. Arctic charr had a lower food consumption (less than one-third of trout) and lived in colder meta-and hypolimnic water. Arctic charr therefore had a lower intake and slower excretion of radiocaesium. Brown trout an Arctic charr had different diets. For brown trout zoobenthos was the dominant food item, whereas Artic charr mainly fed on zooplankton. The radioactivity in the stomach contents of the two species was different in 1986, but similar for the rest of the period. Higher levels of radiocaesium in brown trout than Arctic charr in 1986 were due to a higher food consumption and more radioactive food items in its diet. The parallel development in accumulated radiocaesium through summer 1987 was probably formed by brown trout balancing a higher intake with a faster excretion. The ecological half-lives of radiocaesium in brown trout (357 days) and Arctic charr (550 days) from Lake Hoeysjoeen indicated a slow removal of the isotopes from the food webs. (author)
Distribution of Po-210 and Pb-210 in Arctic Char (Salvelinus alpinus) from an Arctic freshwater lake

Energy Technology Data Exchange (ETDEWEB)

Gwynn, J.P.; Rudolfsen, G. [Norwegian Radiation Protection Authority, The Fram Centre, Tromsoe (Norway)

2014-07-01

There is little information available with regard to the accumulation of Po-210 and Pb-210 by freshwater fish in natural freshwater systems despite the potential for relevant ingestion doses to man. This is maybe of particular pertinence for certain population groups where freshwater fish are an important dietary food item. Equally, it is important to understand the body distributions of these naturally occurring radionuclides to quantify the resulting doses to different tissues and organs of freshwater fish. With regard to the latter, it is important to consider not only the doses arising from bio-accumulated Po-210 and Pb-210 in various body compartments but additionally the internal dose from unabsorbed Po-210 and Pb-210 in the digestive tract. In this study, activity concentrations of Po-210 and Pb-210 were determined in muscle and various internal organs of Arctic Charr (Salvelinus alpinus) sampled from a lake in the Norwegian Arctic (69 deg. 4' N, 19 deg. 20' E). Observed activity concentrations of Po-210 and Pb-210 in different tissues will be discussed in relation to physiological parameters and ambient lake water activity concentrations. Results from this study will be compared to two similar studies conducted in freshwater systems where elevated activity concentrations of these radionuclides have been observed. Ingestion dose rates to man and effective absorbed dose rates to different tissues and organs of Arctic Charr from Po-210 and Pb-210 will be derived and compared to those from observed activity concentrations of the anthropogenic radionuclide Cs-137. (authors)
17beta-estradiol induced vitellogenesis is inhibited by cortisol at the post-transcriptional level in Arctic char (Salvelinus alpinus

Directory of Open Access Journals (Sweden)

Modig Carina

2004-09-01

Full Text Available Abstract This study was performed to investigate stress effects on the synthesis of egg yolk precursor, vitellogenin (Vtg in Arctic char (Salvelinus alpinus. In particular the effect of cortisol (F was determined since this stress hormone has been suggested to interfere with vitellogenesis and is upregulated during sexual maturation in teleosts. Arctic char Vtg was purified and polyclonal antibodies were produced in order to develop tools to study regulation of vitellogenesis. The Vtg antibodies were used to develop an enzyme-linked immunosorbent assay. The corresponding Vtg cDNA was cloned from a hepatic cDNA library in order to obtain DNA probes to measure Vtg mRNA expression. Analysis of plasma from juvenile Arctic char, of both sexes, exposed to different steroids showed that production of Vtg was induced in a dose dependent fashion by 17β-estradiol (E2, estrone and estriol. Apart from estrogens a high dose of F also upregulated Vtg. In addition, F, progesterone (P and tamoxifen were tested to determine these compounds ability to modulate E2 induced Vtg synthesis at both the mRNA and protein level. Tamoxifen was found to inhibit E2 induced Vtg mRNA and protein upregulation. P did not alter the Vtg induction while F reduced the Vtg protein levels without affecting the Vtg mRNA levels. Furthermore the inhibition of Vtg protein was found to be dose dependent. Thus, the inhibitory effect of F on Vtg appears to be mediated at the post-transcriptional level.
The developmental transcriptome of contrasting Arctic charr (Salvelinus alpinus morphs [version 2; referees: 1 approved, 2 approved with reservations

Directory of Open Access Journals (Sweden)

Johannes Gudbrandsson

2016-04-01

Full Text Available Species and populations with parallel evolution of specific traits can help illuminate how predictable adaptations and divergence are at the molecular and developmental level. Following the last glacial period, dwarfism and specialized bottom feeding morphology evolved rapidly in several landlocked Arctic charr Salvelinus alpinus populations in Iceland. To study the genetic divergence between small benthic morphs and limnetic morphs, we conducted RNA-sequencing charr embryos at four stages in early development. We studied two stocks with contrasting morphologies: the small benthic (SB charr from Lake Thingvallavatn and Holar aquaculture (AC charr. The data reveal significant differences in expression of several biological pathways during charr development. There was also an expression difference between SB- and AC-charr in genes involved in energy metabolism and blood coagulation genes. We confirmed differing expression of five genes in whole embryos with qPCR, including lysozyme and natterin-like which was previously identified as a fish-toxin of a lectin family that may be a putative immunopeptide. We also verified differential expression of 7 genes in the developing head that associated consistently with benthic v.s.limnetic morphology (studied in 4 morphs. Comparison of single nucleotide polymorphism (SNP frequencies reveals extensive genetic differentiation between the SB and AC-charr (~1300 with more than 50% frequency difference. Curiously, three derived alleles in the otherwise conserved 12s and 16s mitochondrial ribosomal RNA genes are found in benthic charr. The data implicate multiple genes and molecular pathways in divergence of small benthic charr and/or the response of aquaculture charr to domestication. Functional, genetic and population genetic studies on more freshwater and anadromous populations are needed to confirm the specific loci and mutations relating to specific ecological traits in Arctic charr.
The developmental transcriptome of contrasting Arctic charr (Salvelinus alpinus morphs [version 3; referees: 1 approved, 2 approved with reservations

Directory of Open Access Journals (Sweden)

Johannes Gudbrandsson

2016-12-01

Full Text Available Species and populations with parallel evolution of specific traits can help illuminate how predictable adaptations and divergence are at the molecular and developmental level. Following the last glacial period, dwarfism and specialized bottom feeding morphology evolved rapidly in several landlocked Arctic charr Salvelinus alpinus populations in Iceland. To study the genetic divergence between small benthic morphs and limnetic morphs, we conducted RNA-sequencing charr embryos at four stages in early development. We studied two stocks with contrasting morphologies: the small benthic (SB charr from Lake Thingvallavatn and Holar aquaculture (AC charr. The data reveal significant differences in expression of several biological pathways during charr development. There was also an expression difference between SB- and AC-charr in genes involved in energy metabolism and blood coagulation genes. We confirmed differing expression of five genes in whole embryos with qPCR, including lysozyme and natterin-like which was previously identified as a fish-toxin of a lectin family that may be a putative immunopeptide. We also verified differential expression of 7 genes in the developing head that associated consistently with benthic v.s.limnetic morphology (studied in 4 morphs. Comparison of single nucleotide polymorphism (SNP frequencies reveals extensive genetic differentiation between the SB and AC-charr (~1300 with more than 50% frequency difference. Curiously, three derived alleles in the otherwise conserved 12s and 16s mitochondrial ribosomal RNA genes are found in benthic charr. The data implicate multiple genes and molecular pathways in divergence of small benthic charr and/or the response of aquaculture charr to domestication. Functional, genetic and population genetic studies on more freshwater and anadromous populations are needed to confirm the specific loci and mutations relating to specific ecological traits in Arctic charr.
A SNP Based Linkage Map of the Arctic Charr (Salvelinus alpinus Genome Provides Insights into the Diploidization Process After Whole Genome Duplication

Directory of Open Access Journals (Sweden)

Cameron M. Nugent

2017-02-01

Full Text Available Diploidization, which follows whole genome duplication events, does not occur evenly across the genome. In salmonid fishes, certain pairs of homeologous chromosomes preserve tetraploid loci in higher frequencies toward the telomeres due to residual tetrasomic inheritance. Research suggests this occurs only in homeologous pairs where one chromosome arm has undergone a fusion event. We present a linkage map for Arctic charr (Salvelinus alpinus, a salmonid species with relatively fewer chromosome fusions. Genotype by sequencing identified 19,418 SNPs, and a linkage map consisting of 4508 markers was constructed from a subset of high quality SNPs and microsatellite markers that were used to anchor the new map to previous versions. Both male- and female-specific linkage maps contained the expected number of 39 linkage groups. The chromosome type associated with each linkage group was determined, and 10 stable metacentric chromosomes were identified, along with a chromosome polymorphism involving the sex chromosome AC04. Two instances of a weak form of pseudolinkage were detected in the telomeric regions of homeologous chromosome arms in both female and male linkage maps. Chromosome arm homologies within the Atlantic salmon (Salmo salar and rainbow trout (Oncorhynchus mykiss genomes were determined. Paralogous sequence variants (PSVs were identified, and their comparative BLASTn hit locations showed that duplicate markers exist in higher numbers on seven pairs of homeologous arms, previously identified as preserving tetrasomy in salmonid species. Homeologous arm pairs where neither arm has been part of a fusion event in Arctic charr had fewer PSVs, suggesting faster diploidization rates in these regions.
Metabolic interactions between low doses of benzo[a]pyrene and tributyltin in arctic charr (salvelinus alpinus): a long-term in vivo study

International Nuclear Information System (INIS)

Padros, Jaime; Pelletier, Emilien; Ribeiro, Ciro Oliveira

2003-01-01

We have previously reported that short-term, single exposure to a high dose of tributyltin (TBT), a widely used antifouling biocide, inhibited both the in vivo metabolism and metabolic activation of the carcinogenic polycyclic aromatic hydrocarbon benzo[a]pyrene (BaP) in fish; (BaP), in turn, stimulated TBT metabolism. Here, we provide further mechanistic evidence of mutual metabolic interactions between BaP and TBT in response to long-term, repeated exposures to low doses. Juvenile Arctic charr (Salvelinus alpinus) received 10 separate ip injections (a single injection every 6 days) of BaP (3 mg/kg), TBT (0.3 mg/kg), or both in combination; control fish received corn oil vehicle only. Two days after the 2nd (Day 8), 6 th (Day 32), and 10th dose (Day 56), blood, bile, and liver samples were collected and analyzed for a suite of biomarkers. HPLC/fluorescence analysis indicated that TBT cotreatment inhibited the formation of (+)-anti-BaP diol-epoxide adducts with plasma albumin (53%, Day 8), hepatic DNA (27%, Day 32), or both albumin and globin (50 and 58%, Day 56) compared to BaP alone. This antagonistic interaction was associated with a time-dependent modulation (inhibition at Day 8, enhancement at Day 32) of both cytochrome P450 (P450)1A-mediated ethoxyresorufin O-deethylase (EROD) activity and biliary BaP metabolite formation. TBT cotreatment also inhibited (39%) the BaP-mediated induction of hepatic glutathione S-transferase (GST) activity observed at Day 8. Treatment with TBT alone increased EROD activity (60%) at Day 32, but decreased both GST activity (70 and 37%) and glutathione content (24% and 16%) at Days 32 and 56, respectively. GC/MS analysis revealed that, at Day 56, BaP cotreatment increased (200%) the levels of biliary butyltin compounds, including mono- and dibutyltin metabolites. This potentiative interaction was associated with a protective effect of BaP cotreatment against the TBT-mediated decreases in GST activity and glutathione content. The
Biological flora of the Central Europe: Rumex alpinus

Czech Academy of Sciences Publication Activity Database

Šťastná, P.; Klimeš, Leoš; Klimešová, Jitka

2010-01-01

Roč. 12, č. 1 (2010), s. 67-79 ISSN 1433-8319 R&D Projects: GA ČR GA526/07/0808 Institutional research plan: CEZ:AV0Z60050516 Keywords : biology * Rumex alpinus * distribution area Subject RIV: EF - Botanics Impact factor: 4.488, year: 2010
What are the toxicological effects of mercury in Arctic biota?

DEFF Research Database (Denmark)

Dietz, Rune; Sonne, Christian; Basu, Niladri

2013-01-01

effects. Species whose concentrations exceed threshold values include the polar bears (Ursus maritimus), beluga whale (Delphinapterus leucas), pilot whale (Globicephala melas), hooded seal (Cystophora cristata), a few seabird species, and landlocked Arctic char (Salvelinus alpinus). Toothed whales appear...
Fish hydroacoustic survey standardization: A step forward based on comparisons of methods and systems from vertical surveys of a large deep lake

Czech Academy of Sciences Publication Activity Database

Draštík, Vladislav; Godlewska, M.; Balk, H.; Clabburn, P.; Kubečka, Jan; Morrissey, E.; Hateley, J.; Winfield, I. J.; Mrkvička, Tomáš; Guillard, J.

2017-01-01

Roč. 15, č. 10 (2017), s. 836-846 ISSN 1541-5856 Institutional support: RVO:60077344 Keywords : charr salvelinus-alpinus * great-lakes * biomass * uk * populations Subject RIV: EH - Ecology, Behaviour OBOR OECD: Marine biology, freshwater biology, limnology Impact factor: 1.992, year: 2016
Risk analysis of exotic fish species included in the Dutch Fisheries Act and their hybrids

NARCIS (Netherlands)

Schiphouwer, M.E.; Kessel, van N.; Matthews, J.; Leuven, R.S.E.W.; Koppel, S.; Kranenbarg, J.; Haenen, O.L.M.; Lenders, H.J.R.; Nagelkerke, L.A.J.; Velde, van der G.; Crombaghs, B.; Zollinger, R.

2014-01-01

In dit rapport worden de risico’s geanalyseerd van exotische vissoorten die zijn opgenomen in de Visserijwet en hun hybriden. De volgende soorten en één specifieke hybride zijn in de analyse meegenomen: beekridder (Salvelinus alpinus); roofblei (Leuciscus aspius); karper (Cyprinus carpio);
Size-dependent resource limitation and foraging-predation risk trade-offs: growth and habitat use in young arctic char

NARCIS (Netherlands)

Byström, P.; Andersson, J.; Persson, L.; de Roos, A.M.

2004-01-01

Variation in growth and habitat use is closely connected to individual responses to habitat specific resource levels and predation risk. In three mountain lakes which differed in the density of young-of-the-year (YOY) arctic char (Salvelinus alpinus), we studied the growth, diets and habitat use of
Size-dependent resource limitation and foraging-predation risk trade-offs:growth and habitat use in young artic char

NARCIS (Netherlands)

Bystrom, P.; Persson, L.; de Roos, A.M.; Andersson, J.A.

2004-01-01

Variation in growth and habitat use is closely connected to individual responses to habitat specific resource levels and predation risk. In three mountain lakes which differed in the density of young-of-the-year (YOY) arctic char (Salvelinus alpinus), we studied the growth, diets and habitat use of
Amount and qualities of carotenoids in fillets of fish species fed with ...

African Journals Online (AJOL)

umb

2013-03-20

Mar 20, 2013 ... the Baltic Sea (Czeczuga and Klyszejko, 1996), Black. Sea (Czeczuga, 1973), fishing areas of the ..... (in females and eggs) through the low level of astaxan- thin and thiamine (Fitzsimons et al., .... salar L.) and Arctic charr (Salvelinus alpinus L.) specimens from an ocean ranching farm. Ocean. Hydrobiol.

Temperature, pressure and light data collected by attached Archival Transmitting Tags to Salvelinus malma (Dolly Varden trout) in the Wulik River, Alaska, during 2012-06 to 2013-10 (NODC Accession 0119954)

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains data collected by electronic tags (Pop-up Satellite Archival Transmitting) attached to Salvelinus malma (Dolly Varden trout) in the Wulik...
Fatty acid composition of fish species with different feeding habits from an Arctic Lake.

Science.gov (United States)

Gladyshev, M I; Sushchik, N N; Glushchenko, L A; Zadelenov, V A; Rudchenko, A E; Dgebuadze, Y Y

2017-05-01

We compared the composition and content of fatty acids (FAs) in fish with different feeding habits (sardine (least) cisco Coregonus sardinella, goggle-eyed charr (pucheglazka) form of Salvelinus alpinus complex, humpback whitefish Coregonus pidschian, broad whitefish Coregonus nasus, boganid charr Salvelinus boganidae, and northern pike Esox lucius from an Arctic Lake. Feeding habits of the studied fish (planktivore, benthivore, or piscivore) significantly affected the composition of biomarker fatty acids and the ratio of stable isotopes of carbon and nitrogen in their biomass. The hypothesis on a higher content of eicosapentaenoic and docosahexaenoic acids in the fish of higher trophic level (piscivores) when compared within the same taxonomic group (order Salmoniformes) was confirmed.
Nucleotide variation in the mitochondrial genome provides evidence for dual routes of postglacial recolonization and genetic recombination in the northeastern brook trout (Salvelinus fontinalis).

Science.gov (United States)

Pilgrim, B L; Perry, R C; Barron, J L; Marshall, H D

2012-09-26

Levels and patterns of mitochondrial DNA (mtDNA) variation were examined to investigate the population structure and possible routes of postglacial recolonization of the world's northernmost native populations of brook trout (Salvelinus fontinalis), which are found in Labrador, Canada. We analyzed the sequence diversity of a 1960-bp portion of the mitochondrial genome (NADH dehydrogenase 1 gene and part of cytochrome oxidase 1) of 126 fish from 32 lakes distributed throughout seven regions of northeastern Canada. These populations were found to have low levels of mtDNA diversity, a characteristic trait of populations at northern extremes, with significant structuring at the level of the watershed. Upon comparison of northeastern brook trout sequences to the publicly available brook trout whole mitochondrial genome (GenBank AF154850), we infer that the GenBank sequence is from a fish whose mtDNA has recombined with that of Arctic charr (S. alpinus). The haplotype distribution provides evidence of two different postglacial founding groups contributing to present-day brook trout populations in the northernmost part of their range; the evolution of the majority of the haplotypes coincides with the timing of glacier retreat from Labrador. Our results exemplify the strong influence that historical processes such as glaciations have had on shaping the current genetic structure of northern species such as the brook trout.
Invasion of top and intermediate consumers in a size structured fish community

OpenAIRE

Ask, Per

2010-01-01

In this thesis I have investigated the effects of invading top and intermediate consumers in a size-structured fish community, using a combination of field studies, a lake invasion experiment and smaller scale pond and aquaria experiments. The lake invasion experiment was based on introductions of an intermediate consumer, ninespine stickleback (Pungitius pungitius L.), in to allopatric populations of an omnivorous top predator, Arctic char (Salvelinus alpinus L.). The invasion experiment was...
Lake trout (Salvelinus namaycush) suppression for bull trout (Salvelinus confluentus) recovery in Flathead Lake, Montana, North America

Science.gov (United States)

Hansen, Michael J.; Hansen, Barry S; Beauchamp, David A.

2016-01-01

Non-native lake trout Salvelinus namaycush displaced native bull trout Salvelinus confluentus in Flathead Lake, Montana, USA, after 1984, when Mysis diluviana became abundant following its introduction in upstream lakes in 1968–1976. We developed a simulation model to determine the fishing mortality rate on lake trout that would enable bull trout recovery. Model simulations indicated that suppression of adult lake trout by 75% from current abundance would reduce predation on bull trout by 90%. Current removals of lake trout through incentivized fishing contests has not been sufficient to suppress lake trout abundance estimated by mark-recapture or indexed by stratified-random gill netting. In contrast, size structure, body condition, mortality, and maturity are changing consistent with a density-dependent reduction in lake trout abundance. Population modeling indicated total fishing effort would need to increase 3-fold to reduce adult lake trout population density by 75%. We conclude that increased fishing effort would suppress lake trout population density and predation on juvenile bull trout, and thereby enable higher abundance of adult bull trout in Flathead Lake and its tributaries.
The avoidance strategy of environmental constraints by an aquatic plant Potamogeton alpinus in running waters.

Science.gov (United States)

Robionek, Alicja; Banaś, Krzysztof; Chmara, Rafał; Szmeja, Józef

2015-08-01

Aquatic plants anchored in streams are under pressure from various constraints linked to the water flow and display strategies to prevent their damage or destruction. We assume that the responses of aquatic plants to fast-water flow are a manifestation of a trade-off consisting in either maximizing the resistance to damage (tolerance strategy) in minimizing the hydrodynamic forces (avoidance strategy), or both. Our main hypothesis was that Potamogeton alpinus demonstrate the avoidance strategy. We analyzed architecture traits of the modules of this clonal plant from slow- and fast-flowing streams. In fast-flowing waters, the avoidance strategy of P. alpinus is reflected by the following: (1) the presence of floating leaves that stabilize the vertical position of the stem and protect the inflorescence against immersion; (2) elongation of submerged leaves (weakens the pressure of water); and (3) shoot diameter reduction and increase in shoot density (weakens the pressure of water, increases shoot elasticity), and by contrast in slow-water flow include the following: (4) the absence of floating leaves in high intensity of light (avoiding unnecessary outlays on a redundant organ); (5) the presence of floating leaves in low intensity of light (avoidance of stress caused by an insufficient assimilation area of submerged leaves).
Use of cover habitat by bull trout Salvelinus confluentus and lake trout Salvelinus namaycush in a laboratory environment

Science.gov (United States)

Meeuwig, Michael H.; Guy, Christopher S.; Fredenberg, Wade A.

2011-01-01

Lacustrine-adfluvial bull trout, Salvelinus confluentus, migrate from spawning and rearing streams to lacustrine environments as early as age 0. Within lacustrine environments, cover habitat pro- vides refuge from potential predators and is a resource that is competed for if limiting. Competitive inter- actions between bull trout and other species could result in bull trout being displaced from cover habitat, and bull trout may lack evolutionary adaptations to compete with introduced species, such as lake trout, Salvelinus namaycush. A laboratory experiment was performed to examine habitat use and interactions for cover by juvenile (i.e., habitat, with bull trout using cover and bottom habitats more than lake trout. Habitat selection ratios indicated that bull trout avoided water column habitat in the presence of lake trout and that lake trout avoided bottom habitat. Intraspecific and interspecific agonistic interactions were infrequent, but approximately 10 times greater for intraspecific inter- actions between lake trout. Results from this study provide little evidence that juvenile bull trout and lake trout compete for cover, and that species-specific differences in habitat use and selection likely result in habitat partitioning between these species.
The fate of mercury in Arctic terrestrial and aquatic ecosystems, a review

DEFF Research Database (Denmark)

Douglas, Thomas A.; Loseto, Lisa L.; MacDonald, Robie W.

2012-01-01

the fate of Hg in most ecosystems, and the role of trophic processes in controlling Hg in higher order animals are also included. Case studies on Eastern Beaufort Sea beluga (Delphinapterus leucas) and landlocked Arctic char (Salvelinus alpinus) are presented as examples of the relationship between...... into non-biological archives is also addressed. The review concludes by identifying major knowledge gaps in our understanding, including: (1) the rates of Hg entry into marine and terrestrial ecosystems and the rates of inorganic and MeHg uptake by Arctic microbial and algal communities; (2...
Evidence of sound production by spawning lake trout (Salvelinus namaycush) in lakes Huron and Champlain

Science.gov (United States)

Johnson, Nicholas S.; Higgs, Dennis; Binder, Thomas R.; Marsden, J. Ellen; Buchinger, Tyler John; Brege, Linnea; Bruning, Tyler; Farha, Steve A.; Krueger, Charles C.

2018-01-01

Two sounds associated with spawning lake trout (Salvelinus namaycush) in lakes Huron and Champlain were characterized by comparing sound recordings to behavioral data collected using acoustic telemetry and video. These sounds were named growls and snaps, and were heard on lake trout spawning reefs, but not on a non-spawning reef, and were more common at night than during the day. Growls also occurred more often during the spawning period than the pre-spawning period, while the trend for snaps was reversed. In a laboratory flume, sounds occurred when male lake trout were displaying spawning behaviors; growls when males were quivering and parallel swimming, and snaps when males moved their jaw. Combining our results with the observation of possible sound production by spawning splake (Salvelinus fontinalis × Salvelinus namaycush hybrid), provides rare evidence for spawning-related sound production by a salmonid, or any other fish in the superorder Protacanthopterygii. Further characterization of these sounds could be useful for lake trout assessment, restoration, and control.
Using Linkage Maps as a Tool To Determine Patterns of Chromosome Synteny in the Genus Salvelinus

Directory of Open Access Journals (Sweden)

Matthew C. Hale

2017-11-01

Full Text Available Next generation sequencing techniques have revolutionized the collection of genome and transcriptome data from nonmodel organisms. This manuscript details the application of restriction site-associated DNA sequencing (RADseq to generate a marker-dense genetic map for Brook Trout (Salvelinus fontinalis. The consensus map was constructed from three full-sib families totaling 176 F1 individuals. The map consisted of 42 linkage groups with a total female map size of 2502.5 cM, and a total male map size of 1863.8 cM. Synteny was confirmed with Atlantic Salmon for 38 linkage groups, with Rainbow Trout for 37 linkage groups, Arctic Char for 36 linkage groups, and with a previously published Brook Trout linkage map for 39 linkage groups. Comparative mapping confirmed the presence of 8 metacentric and 34 acrocentric chromosomes in Brook Trout. Six metacentric chromosomes seem to be conserved with Arctic Char suggesting there have been at least two species-specific fusion and fission events within the genus Salvelinus. In addition, the sex marker (sdY; sexually dimorphic on the Y chromosome was mapped to Brook Trout BC35, which is homologous with Atlantic Salmon Ssa09qa, Rainbow Trout Omy25, and Arctic Char AC04q. Ultimately, this linkage map will be a useful resource for studies on the genome organization of Salvelinus, and facilitates comparisons of the Salvelinus genome with Salmo and Oncorhynchus.
Pressure shock triploidization of Salmo trutta f. lacustris and Salvelinus umbla eggs and its impact on fish development.

Science.gov (United States)

Lahnsteiner, Franz; Kletzl, Manfred

2018-07-15

The study tested the efficiency of hydrostatic pressure triploidization methods for Salmo trutta f. lacustris and Salvelinus umbla and investigated the effects on survival rate, skeletal malformation, and on morphometrics and cellular composition of gills, spleen, liver, kidney, intestine, and blood. In Salmo trutta f. lacustris a 100% triploidy rate in combination with high larvae survival rate (80% in comparison to control) was obtained when treating eggs with a pressure of 66 × 10 3  kPa 360 °C temperature minutes (CTM) post fertilization for 5 min, in Salvelinus umbla with a similar pressure after 270 CTM. Juvenile triploid Salmo trutta f. lacustris and Salvelinus umbla (145 days post hatch) had neither an increased rate of mortality, nor an increased rate of malformations. In triploid Salmo trutta f. lacustris and Salvelinus umbla the erythrocyte volume was 50% higher and the erythrocyte concentration in peripheral blood 25-35% lower relative to diploids. In triploids also the erythrocytes surface area: volume ratio was reduced. Gills of triploid Salmo trutta f. lacustris and Salvelinus umbla had increased width of primary lamellae and increased length of secondary lamellae which might compensate for unfavorable erythrocytes surface area: volume ratio. Length of the digestive tract and histology of kidney, liver, spleen, and gills were only investigated in Salmo trutta f. lacustris. In triploids the hematopoietic tissue of the kidney was decreased by 12%, the spleen index by 53%, and the erythroblast concentrations of the spleen by 42% relative to diploids, possibly indicating reduced erythropoiesis. Length of the digestive tract and cellular arrangement of intestine, liver, and gills were not affected. In summary, the used triploidization procedure seems a reliable method not counteracting the principles of animal welfare. Copyright © 2018 Elsevier Inc. All rights reserved.
Development of spinal deformities in Atlantic salmon and Arctic charr fed diets supplemented with oxytetracycline

International Nuclear Information System (INIS)

Toften, H.; Jobling, M.

1996-01-01

Some individuals within populations of Atlantic salmon Salmo salar and Arctic charr Salvelinus alpinus fed diets supplemented with oxytetracycline (OTC) developed spinal deformations. Possible differences in feed intake and growth of spinally deformed fish relative to fish without any deformities were investigated. Amongst Atlantic salmon, 17% of the fish fed OTC-supplemented feed developed spinal fractures, whereas none of the fish receiving the basic feed did so. Despite deformation of the spinal column, the injured fish continued to feed and grow, but at lower rates than unaffected individuals. In contrast to Atlantic salmon, Arctic charr showed no signs of spinal fractures at any time during the 65-day experiment
Development of spinal deformities in Atlantic salmon and Arctic charr fed diets supplemented with oxytetracycline

Energy Technology Data Exchange (ETDEWEB)

Toften, H.; Jobling, M. [Norwegian Institute of Fisheries and Aquaculture, N-9005 Tromsoe (Norway)

1996-07-01

Some individuals within populations of Atlantic salmon Salmo salar and Arctic charr Salvelinus alpinus fed diets supplemented with oxytetracycline (OTC) developed spinal deformations. Possible differences in feed intake and growth of spinally deformed fish relative to fish without any deformities were investigated. Amongst Atlantic salmon, 17% of the fish fed OTC-supplemented feed developed spinal fractures, whereas none of the fish receiving the basic feed did so. Despite deformation of the spinal column, the injured fish continued to feed and grow, but at lower rates than unaffected individuals. In contrast to Atlantic salmon, Arctic charr showed no signs of spinal fractures at any time during the 65-day experiment.
Complete genome sequence of Arthrobacter alpinus ERGS4:06, a yellow pigmented bacterium tolerant to cold and radiations isolated from Sikkim Himalaya.

Science.gov (United States)

Kumar, Rakshak; Singh, Dharam; Swarnkar, Mohit Kumar; Singh, Anil Kumar; Kumar, Sanjay

2016-02-20

Arthrobacter alpinus ERGS4:06, a yellow pigmented bacterium which exhibited tolerance to cold and UV radiations was isolated from the glacial stream of East Rathong glacier in Sikkim Himalaya. Here we report the 4.3Mb complete genome assembly that has provided the basis for potential role of pigments as a survival strategy to combat stressed environment of cold and high UV-radiation and additionally the ability to produce cold active industrial enzymes. Copyright © 2016 Elsevier B.V. All rights reserved.
Bilan des introductions de salmonidés dans les lacs et ruisseaux d'altitude des Hautes-Pyrénées

Directory of Open Access Journals (Sweden)

DELACOSTE M.

1997-01-01

Full Text Available Les introductions de Salmonidés ont été importantes au cours des 60 dernières années dans les lacs et ruisseaux d'altitude des Hautes-Pyrénées. Six espèces de Salmonidés ont été introduites dans des milieux qui, pour la plupart, étaient vierges de populations piscicoles : la truite commune (Salmo trutta L., la truite arc-en-ciel (Oncorhynchus mykiss Walbaum, l'omble de fontaine (Salvelinus fontinalis Mitchill, l'omble chevalier (Salvelinus alpinus L., le cristivomer (Salvelinus namaycush Walbaum et le splake (Salvelinus fontinalis x Salvelinus namaycush. Dans de très nombreux cas, ces introductions ont abouti à des acclimatations. En revanche, les naturalisations sont beaucoup plus rares. Seules les espèces lacustres (cristivomer et omble chevalier se sont naturalisées dans la majorité des lacs où elles ont été introduites. Les conditions de reproduction constituent le facteur clé permettant d'expliquer la naturalisation des espèces. En ruisseau, il faut y ajouter la compétition avec l'espèce indigène (la truite commune, la pression halieutique ainsi que les conditions hivernales très rigoureuses. Les incidences écologiques des introductions sur les populations de truites communes indigènes sont faibles. En revanche, elles ne sont pas négligeables pour les populations de batraciens. Cette politique d'introduction a largement participé au développement de l'halieutisme dans ces milieux d'altitude. En cela, les introductions ont parfaitement répondu aux objectifs halieutiques qu'on leur avait fixés. L'acquisition de connaissances sur l'ensemble de la chaîne pyrénéenne constitue aujourd'hui une étape incontournable pour une politique de gestion globale des introductions.
The effect of UV on photosynthesis and growth in dependence of mineral nutrition (Lactuca sativa L. and Rumex alpinus L.)

International Nuclear Information System (INIS)

Bogenrieder, A.; Doute, Y.

1982-01-01

A clear increase in relative sensitivity of photosynthesis to UV-B occurs with increasing mineral concentration. As expected the chlorophyll content (a + b) per leaf area of control plants initially also increases (as can easily be noticed visualy). It remains, however, the same for the third to the highest concentration. The sensitivity of Rumex alpinus is higher but less influenced by mineral supply. In case of long term experiments, however, the relative differences between control and irradiated plants decreased clearly with increasing mineral supply. In Lactuca sativa, this is the opposite effect to the situation with short term irradiation. (orig./AJ)
Climate change and vulnerability of bull trout (Salvelinus confluentus ) in a fire-prone landscape

Science.gov (United States)

Jeffrey A. Falke; Rebecca L. Flitcroft; Jason B. Dunham; Kristina M. McNyset; Paul F. Hessburg; Gordon H. Reeves; C. Tara Marshall

2015-01-01

Linked atmospheric and wildfire changes will complicate future management of native coldwater fishes in fire-prone landscapes, and new approaches to management that incorporate uncertainty are needed to address this challenge. We used a Bayesian network (BN) approach to evaluate population vulnerability of bull trout (Salvelinus confluentus) in the Wenatchee River...
Development of a multichemical food web model: application to PBDEs in Lake Ellasjoen, Bear Island, Norway.

Science.gov (United States)

Gandhi, Nilima; Bhavsar, Satyendra P; Gewurtz, Sarah B; Diamond, Miriam L; Evenset, Anita; Christensen, Guttorm N; Gregor, Dennis

2006-08-01

A multichemical food web model has been developed to estimate the biomagnification of interconverting chemicals in aquatic food webs. We extended a fugacity-based food web model for single chemicals to account for reversible and irreversible biotransformation among a parent chemical and transformation products, by simultaneously solving mass balance equations of the chemicals using a matrix solution. The model can be applied to any number of chemicals and organisms or taxonomic groups in a food web. The model was illustratively applied to four PBDE congeners, BDE-47, -99, -100, and -153, in the food web of Lake Ellasjøen, Bear Island, Norway. In Ellasjøen arctic char (Salvelinus alpinus), the multichemical model estimated PBDE biotransformation from higher to lower brominated congeners and improved the correspondence between estimated and measured concentrations in comparison to estimates from the single-chemical food web model. The underestimation of BDE-47, even after considering bioformation due to biotransformation of the otherthree congeners, suggests its formation from additional biotransformation pathways not considered in this application. The model estimates approximate values for congener-specific biotransformation half-lives of 5.7,0.8,1.14, and 0.45 years for BDE-47, -99, -100, and -153, respectively, in large arctic char (S. alpinus) of Lake Ellasjøen.
Potential Sources of High Frequency and Biphonic Vocalization in the Dhole (Cuon alpinus.

Directory of Open Access Journals (Sweden)

Roland Frey

Full Text Available Biphonation, i.e. two independent fundamental frequencies in a call spectrum, is a prominent feature of vocal activity in dog-like canids. Dog-like canids can produce a low (f0 and a high (g0 fundamental frequency simultaneously. In contrast, fox-like canids are only capable of producing the low fundamental frequency (f0. Using a comparative anatomical approach for revealing macroscopic structures potentially responsible for canid biphonation, we investigated the vocal anatomy for 4 (1 male, 3 female captive dholes (Cuon alpinus and for 2 (1 male, 1 female wild red fox (Vulpes vulpes. In addition, we analyzed the acoustic structure of vocalizations in the same dholes that served postmortem as specimens for the anatomical investigation. All study dholes produced both high-frequency and biphonic calls. The anatomical reconstructions revealed that the vocal morphologies of the dhole are very similar to those of the red fox. These results suggest that the high-frequency and biphonic calls in dog-like canids can be produced without specific anatomical adaptations of the sound-producing structures. We discuss possible production modes for the high-frequency and biphonic calls involving laryngeal and nasal structures.
Draft Genome Sequence of a Picorna-Like Virus Associated with Gill Tissue in Clinically Normal Brook Trout, Salvelinus fontinalis

OpenAIRE

Iwanowicz, Luke R.; Iwanowicz, Deborah D.; Adams, Cynthia R.; Galbraith, Heather; Aunins, Aaron; Cornman, Robert S.

2017-01-01

ABSTRACT Here, we report a draft genome sequence of a picorna-like virus associated with brook trout, Salvelinus fontinalis, gill tissue. The draft genome comprises 8,681 nucleotides, excluding the poly(A) tract, and contains two open reading frames. It is most similar to picorna-like viruses that infect invertebrates.

Draft Genome Sequence of a Picorna-Like Virus Associated with Gill Tissue in Clinically Normal Brook Trout, Salvelinus fontinalis.

Science.gov (United States)

Iwanowicz, Luke R; Iwanowicz, Deborah D; Adams, Cynthia R; Galbraith, Heather; Aunins, Aaron; Cornman, Robert S

2017-10-12

Here, we report a draft genome sequence of a picorna-like virus associated with brook trout, Salvelinus fontinalis , gill tissue. The draft genome comprises 8,681 nucleotides, excluding the poly(A) tract, and contains two open reading frames. It is most similar to picorna-like viruses that infect invertebrates.
Short-and long term niche segregation and individual specialization of brown trout (Salmo trutta) in species poor Faroese lakes

DEFF Research Database (Denmark)

Brodersen, Jakob; Malmquist, Hilmar J.; Landkildehus, Frank

2012-01-01

fidelity to a niche may be variable both between and within populations. In order to study this complexity, relative simple systems with few species are needed. In this paper, we study how competitor presence affects the resource use of brown trout (Salmo trutta) in 11 species-poor Faroese lakes...... by comparing relative abundance, stable isotope ratios and diet in multiple habitats. In the presence of three-spined sticklebacks (Gasterosteus aculeatus), a higher proportion of the trout population was found in the pelagic habitat, and trout in general relied on a more pelagic diet base as compared to trout...... living in allopatry or in sympatry with Arctic charr (Salvelinus alpinus). Diet analyses revealed, however, that niche-segregation may be more complex than described on a one-dimensional pelagic-littoral axis. Trout from both littoral and offshore benthic habitats had in the presence of sticklebacks...
Landlocked Arctic charr ( Salvelinus alpinus ) population structure and lake morphometry in Greenland - is there a connection?

DEFF Research Database (Denmark)

Riget, F.; Jeppesen, E.; Landkildehus, F.

2000-01-01

correlated with lake volume. Our study indicates that the charr population structure became more complex with increasing lake size. Moreover, the population structure seemed to be influenced by lake-water transparency and the presence or absence of three-spined stickleback (Gasterosteus aculeatus)...
Chromosomal characteristics and distribution of rDNA sequences in the brook trout Salvelinus fontinalis (Mitchill, 1814).

Science.gov (United States)

Śliwińska-Jewsiewicka, A; Kuciński, M; Kirtiklis, L; Dobosz, S; Ocalewicz, K; Jankun, Malgorzata

2015-08-01

Brook trout Salvelinus fontinalis (Mitchill, 1814) chromosomes have been analyzed using conventional and molecular cytogenetic techniques enabling characteristics and chromosomal location of heterochromatin, nucleolus organizer regions (NORs), ribosomal RNA-encoding genes and telomeric DNA sequences. The C-banding and chromosome digestion with the restriction endonucleases demonstrated distribution and heterogeneity of the heterochromatin in the brook trout genome. DNA sequences of the ribosomal RNA genes, namely the nucleolus-forming 28S (major) and non-nucleolus-forming 5S (minor) rDNAs, were physically mapped using fluorescence in situ hybridization (FISH) and primed in situ labelling. The minor rDNA locus was located on the subtelo-acrocentric chromosome pair No. 9, whereas the major rDNA loci were dispersed on 14 chromosome pairs, showing a considerable inter-individual variation in the number and location. The major and minor rDNA loci were located at different chromosomes. Multichromosomal location (3-6 sites) of the NORs was demonstrated by silver nitrate (AgNO3) impregnation. All Ag-positive i.e. active NORs corresponded to the GC-rich blocks of heterochromatin. FISH with telomeric probe showed the presence of the interstitial telomeric site (ITS) adjacent to the NOR/28S rDNA site on the chromosome 11. This ITS was presumably remnant of the chromosome rearrangement(s) leading to the genomic redistribution of the rDNA sequences. Comparative analysis of the cytogenetic data among several related salmonid species confirmed huge variation in the number and the chromosomal location of rRNA gene clusters in the Salvelinus genome.
The influence of enhanced UV-B radiation on Batrachium trichophyllum and Potamogeton alpinus -- aquatic macrophytes with amphibious character.

Science.gov (United States)

Germ, Mateja; Mazej, Zdenka; Gaberscik, Alenka; Häder, Donat P

2002-02-01

The responses of two amphibious species, Batrachium trichophyllum and Potamogeton alpinus to different UV-B environments were studied. Plant material from natural environments, as well as from outdoor treatments was examined. In long-term outdoor experiments plants were grown under three different levels of UV-B radiation: reduced and ambient UV-B levels, and a UV-B level simulating 17% ozone depletion. The following parameters were monitored: contents of total methanol soluble UV-absorbing compounds and chlorophyll a, terminal electron transport system (ETS) activity and optimal and effective quantum yield of photosystem II. No effect of the different UV-B levels on the measured parameters was observed. The amount of UV-B absorbing compounds seems to be saturated, since no differences were observed between treatments and no increase was found in peak season, when natural UV-B levels were the highest. Physiological measurements revealed no harmful effects; neither on potential and actual photochemical efficiency, nor on terminal ETS activity. The contents of UV-B absorbing compounds were examined also in plant material sampled in low and high altitude environments during the growth season. Both species exhibited no seasonal dynamics of production of UV-absorbing compounds. The contents were variable and showed no significant differences between high and low altitude populations.
Genetic Diversity and Hybridisation between Native and Introduced Salmonidae Fishes in a Swedish Alpine Lake.

Directory of Open Access Journals (Sweden)

Leanne Faulks

Full Text Available Understanding the processes underlying diversification can aid in formulating appropriate conservation management plans that help maintain the evolutionary potential of taxa, particularly under human-induced activities and climate change. Here we assessed the microsatellite genetic diversity and structure of three salmonid species, two native (Arctic charr, Salvelinus alpinus and brown trout, Salmo trutta and one introduced (brook charr, Salvelinus fontinalis, from an alpine lake in sub-arctic Sweden, Lake Ånn. The genetic diversity of the three species was similar and sufficiently high from a conservation genetics perspective: corrected total heterozygosity, H'T = 0.54, 0.66, 0.60 and allelic richness, AR = 4.93, 5.53 and 5.26 for Arctic charr, brown trout and brook charr, respectively. There were indications of elevated inbreeding coefficients in brown trout (GIS = 0.144 and brook charr (GIS = 0.129 although sibling relationships were likely a confounding factor, as a high proportion of siblings were observed in all species within and among sampling locations. Overall genetic structure differed between species, Fst = 0.01, 0.02 and 0.04 in Arctic charr, brown trout and brook charr respectively, and there was differentiation at only a few specific locations. There was clear evidence of hybridisation between the native Arctic charr and the introduced brook charr, with 6% of individuals being hybrids, all of which were sampled in tributary streams. The ecological and evolutionary consequences of the observed hybridisation are priorities for further research and the conservation of the evolutionary potential of native salmonid species.
Use of wild trout for PBDE assessment in freshwater environments: Review and summary of critical factors

Directory of Open Access Journals (Sweden)

Juan M. Ríos

2015-11-01

Full Text Available Certain wild animals represent sentinels to address issues related to environmental pollution, since they can provide integrative data on both pollutant exposure and biological effects. Despite their technological benefits, PBDEs are considered a threat to environmental health due to their persistence, toxicity, and capacity to be accumulated. These pollutants have been found geographically widespread in fish, particularly in predator species such as trout. The aim of this work is to critically review the applicability and usefulness of wild trout for assessing PBDEs in freshwater environments. Reviewed reports include data from highly industrialized areas as well as areas from remote regions with relatively low human activity, including European and North American great lakes and freshwater environments in Europe, Greenland, subarctic areas and Patagonia, respectively. A summary of relevant factors were grouped into organism-specific factors (food habits, age, size, lipid content, sex and reproduction, tissue type, mechanism of contaminant uptake and metabolism, and PBDE levels in the surrounding environment (sediment. Five wild trout species [rainbow trout (Oncorhynchus mykiss, brown trout (Salmo trutta, lake trout (Salvelinus namaycush, arctic char (Salvelinus alpinus, and brook trout (Salvelinus fontinalis], collected worldwide within the 1994 to present time frame, were considered. Multivariate techniques (principal component analysis-PCA and mapping approach, showed clear differences in geographic distribution patterns of PBDE levels in trout depending on the region studied: wild trout from European and North American great lakes have the highest PBDE loads. This pattern could be due to high industrial activity at these locations. A correlational approach used to explore intraspecific relationships between PBDE levels and morphometry, showed positive relationships only for brown trout. Further, brown trout showed the highest trout
Dwarf char, a new form of chars (the genus Salvelinus) in Lake Kronotskoe

Science.gov (United States)

Pavlov, S.D.; Pivovarov, E.A.; Ostberg, C.O.

2012-01-01

Lake Kronotskoe is situated in the Kronotskii State Nature Reserve and is a unique natural heritage of Kamchatka. The lake–river system of the reserve includes numerous springs and small streams and three large inflowing rivers, Listvennichnaya, Unana, and Uzon, which form the main bays of Lake Kronotskoe; one river (Kronotskaya) flows from the lake. This river is characterized by several rapids, which are assumed to be unsurmountable barriers for fish migration. The ichthyofauna of the lake has been isolated for a long time, and some endemic fishes appeared, including char of the genus Salvelinus and the residential form of red salmon Oncorhynchus nerka (the local name is kokanee). These species are perfect model objects to study microevolution processes. Char of Lake Kronotskoe are characterized by significant polymorphism and plasticity [1–3]; therefore, they are extremely valuable for studying the processes of speciation and form development. That is why the populations of char in Lake Kronotskoe are unique and attract special attention of researchers.
Allantoinase in lake trout (Salvelinus namaycush): In vitro effects of PCBs, DDT and metals

Science.gov (United States)

Passino, Dora R. May; Cotant, Carol A.

1979-01-01

1. Allantoinase, an enzyme in the purine-urea cycle, was found in livers of Salvelinus namaycush (Osteichthyes: Salmoniformes).2. The enzyme was active from pH 6.6 to 8.2 at 37°C and from pH 7.4 to 9.0 at 10°C and had an Arrhenius energy was activation of 11.0 kcal/mol and a temperature quotient of 2.0. The Km of the enzyme homogenate was 8.4 mM allantoin.3. The concentrations of inorganic metals at which 50% inhibition occurred during in vitro exposure were 6.0 mg/l Cu2+, 6.7 mg/l Cd2+, 34 mg/l Hg2+ and 52 mg/l Pb2+. The in vitro sensitivity to PCBs, DDT and DDE and kinetics in the presence of metals were determined.4. Allantoinase activity was negatively correlated with body length for fish from Lake Michigan but not from Lake Superior or the laboratory.
Comparative analysis of total mercury concentrations in anadromous and non-anadromous Arctic charr (Salvelinus alpinus) from eastern Canada

International Nuclear Information System (INIS)

Velden, S. van der; Evans, M.S.; Dempson, J.B.; Muir, D.C.G.; Power, M.

2013-01-01

Previous research has documented that total mercury concentrations ([THg]) are lower in anadromous Arctic charr than in non-anadromous conspecifics, but the two life-history forms have rarely been studied together. Here, data from nine pairs of closely-located anadromous and non-anadromous Arctic charr populations were used to explore the impact of biological and life-history factors on individual [THg] across a range of latitudes (49–81° N) in eastern Canada. Unadjusted mean [THg] ranged from 20 to 114 ng/g wet weight (ww) in anadromous populations, and was significantly higher in non-anadromous populations, ranging from 111 to 227 ng/g ww. Within-population variations in [THg] were best explained by fish age, and were often positively related to fork-length and δ 15 N-inferred trophic level. Differences in [THg] were not related to differences in length-at-age (i.e., average somatic growth rate) among populations of either life-history type. Mercury concentrations were not related to site latitude in either the anadromous or non-anadromous fish. We conclude that the difference in Arctic charr [THg] with life-history type could not be explained by differences in fish age, fork-length, trophic position, or length-at-age, and discuss possible factors contributing to low mercury concentrations in anadromous, relative to freshwater, fish. - Highlights: ► Total mercury concentrations ([THg]) were measured in 9 co-located anadromous and non-anadromous Arctic charr populations. ► Mean [THg] in non-anadromous populations exceeded mean [THg] in spatially paired anadromous populations. ► Among-individual variation in [THg] was best explained by fish age. ► The lower [THg] in anadromous fish could not be explained by differences in age, fork-length, trophic level, or growth rate. ► Variations in Arctic charr [THg] were independent of latitude (49–81° N) in eastern Canada
The effects of inbreeding on sperm quality traits in captive‐bred lake trout, Salvelinus namaycush (Walbaum, 1972)

DEFF Research Database (Denmark)

Johnson, K.; Butts, I. A. E.; Smith, J. L.

2015-01-01

The effects of inbreeding in both captive and wild‐caught species and populations have been reported to affect a wide variety of life history traits. Recently, the effects of inbreeding on reproductive traits such as sperm quality have become a subject of particular interest for conservation...... biology, evolutionary ecology, and management of captive populations. This study investigated the effects of inbreeding on sperm quality in a captive population of experimentally inbred and outbred lake trout, Salvelinus namaycush. It was found for moderately to highly inbred males (males with half......‐sib and full‐sib parents, respectively), that sperm quality traits (velocity, motility, linearity, longevity, spermatocrit and morphology) showed no apparent inbreeding depression. The apparent lack of inbreeding effects on sperm quality traits may be due to several factors including (i) no inbreeding...
Preference and avoidance pH of brook trout Salvelinus fontinalis and brown trout Salmo trutta exposed to different holding pH.

Science.gov (United States)

Fost, B A; Ferreri, C P

2015-08-01

The goal of this study was to determine if short-term exposure of brook trout Salvelinus fontinalis and brown trout Salmo trutta to a lower pH than found in their source stream results in a shift in preference or avoidance pH. The lack of a shift in preference or avoidance pH of adult S. fontinalis and S. trutta suggests that these species can be held at a pH different from the source waterbody for a short period of time without altering preference or avoidance pH behaviour. © 2015 The Fisheries Society of the British Isles.
Climate change and vulnerability of bull trout (Salvelinus confluentus) in a fire-prone landscape.

Science.gov (United States)

Falke, Jeffrey A.; Flitcroft, Rebecca L; Dunham, Jason B.; McNyset, Kristina M.; Hessburg, Paul F.; Reeves, Gordon H.

2015-01-01

Linked atmospheric and wildfire changes will complicate future management of native coldwater fishes in fire-prone landscapes, and new approaches to management that incorporate uncertainty are needed to address this challenge. We used a Bayesian network (BN) approach to evaluate population vulnerability of bull trout (Salvelinus confluentus) in the Wenatchee River basin, Washington, USA, under current and future climate and fire scenarios. The BN was based on modeled estimates of wildfire, water temperature, and physical habitat prior to, and following, simulated fires throughout the basin. We found that bull trout population vulnerability depended on the extent to which climate effects can be at least partially offset by managing factors such as habitat connectivity and fire size. Moreover, our analysis showed that local management can significantly reduce the vulnerability of bull trout to climate change given appropriate management actions. Tools such as our BN that explicitly integrate the linked nature of climate and wildfire, and incorporate uncertainty in both input data and vulnerability estimates, will be vital in effective future management to conserve native coldwater fishes.
Summer temperature metrics for predicting brook trout (Salvelinus fontinalis) distribution in streams

Science.gov (United States)

Parrish, Donna; Butryn, Ryan S.; Rizzo, Donna M.

2012-01-01

We developed a methodology to predict brook trout (Salvelinus fontinalis) distribution using summer temperature metrics as predictor variables. Our analysis used long-term fish and hourly water temperature data from the Dog River, Vermont (USA). Commonly used metrics (e.g., mean, maximum, maximum 7-day maximum) tend to smooth the data so information on temperature variation is lost. Therefore, we developed a new set of metrics (called event metrics) to capture temperature variation by describing the frequency, area, duration, and magnitude of events that exceeded a user-defined temperature threshold. We used 16, 18, 20, and 22°C. We built linear discriminant models and tested and compared the event metrics against the commonly used metrics. Correct classification of the observations was 66% with event metrics and 87% with commonly used metrics. However, combined event and commonly used metrics correctly classified 92%. Of the four individual temperature thresholds, it was difficult to assess which threshold had the “best” accuracy. The 16°C threshold had slightly fewer misclassifications; however, the 20°C threshold had the fewest extreme misclassifications. Our method leveraged the volumes of existing long-term data and provided a simple, systematic, and adaptable framework for monitoring changes in fish distribution, specifically in the case of irregular, extreme temperature events.
Sexual difference in PCB concentrations of lake trout (Salvelinus namaycush) from Lake Ontario

Science.gov (United States)

Madenjian, Charles P.; Keir, Michael J.; Whittle, D. Michael; Noguchi, George E.

2010-01-01

We determined polychlorinated biphenyl (PCB) concentrations in 61 female lake trout (Salvelinus namaycush) and 71 male lake trout from Lake Ontario (Ontario, Canada and New York, United States). To estimate the expected change in PCB concentration due to spawning, PCB concentrations in gonads and in somatic tissue of lake trout were also determined. In addition, bioenergetics modeling was applied to investigate whether gross growth efficiency (GGE) differed between the sexes. Results showed that, on average, males were 22% higher in PCB concentration than females in Lake Ontario. Results from the PCB determinations of the gonads and somatic tissues revealed that shedding of the gametes led to 3% and 14% increases in PCB concentration for males and females, respectively. Therefore, shedding of the gametes could not explain the higher PCB concentration in male lake trout. According to the bioenergetics modeling results, GGE of males was about 2% higher than adult female GGE, on average. Thus, bioenergetics modeling could not explain the higher PCB concentrations exhibited by the males. Nevertheless, a sexual difference in GGE remained a plausible explanation for the sexual difference in PCB concentrations of the lake trout.
EPA Nanorelease Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....
Comparison of organotin accumulation on the white-spotted charr Salvelinus leucomaenis between sea-run and freshwater-resident types

Science.gov (United States)

Ohji, Madoka; Harino, Hiroya; Arai, Takaomi

2011-01-01

To examine the accumulation pattern of organotin compounds (OTs) in relation to the migration of diadromous fish, tributyltin (TBT) and triphenyltin (TPT) compounds and their derivatives were determined in the muscle tissue of both sea-run (anadromous) and freshwater-resident (nonanadromous) types of the white-spotted charr Salvelinus leucomaenis. There were generally no significant correlations between the TBT and TPT accumulation and various biological characteristics such as the total length (TL), body weight (BW), age and sex in S. leucomaenis. It is noteworthy that the TBT and TPT concentrations in sea-run white-spotted charr were significantly higher than in freshwater-resident individuals, although they are intraspecies. These results suggest that the sea-run S. leucomaenis has a higher ecological risk of TBT and TPT exposure than the freshwater-residents during their life history.
Land-locked Arctic charr (Salvelinus alpinus) population structure and lake morphometry in Greenland - is there a connection?

DEFF Research Database (Denmark)

Riget, F.; Jeppesen, E.; Landkildehus, F.

2000-01-01

with lake volume. Our study indicates that the charr population structure became more complex with increasing lake size. More- over, the population structure seemed to be in¯uenced by lake-water transparency and the presence or absence of three-spined stickleback (Gasterosteus aculeatus)....
Optimum temperature of a northern population of Arctic charr (Salvelinus alpinus) using heart rate Arrhenius breakpoint analysis

DEFF Research Database (Denmark)

Hansen, Aslak Kappel; Byriel, David Bille; R. Jensen, Mads

2017-01-01

± 0.4). The Q10 breakpoint occurred at an average of 7.1 °C ± 0.3. There was no significant difference between the breakpoint temperature found using Q10 and Arrhenius [two-sample t test, df = 16; p > 0.1]. The highest fHmax was found at 12.8 °C ± 1.0 reaching an average of 61.8 BPM ± 3.1. Arrhythmia...
At the forefront: evidence of the applicability of using environmental DNA to quantify the abundance of fish populations in natural lentic waters with additional sampling considerations

Science.gov (United States)

Klobucar, Stephen L.; Rodgers, Torrey W.; Budy, Phaedra

2017-01-01

Environmental DNA (eDNA) sampling has proven to be a valuable tool for detecting species in aquatic ecosystems. Within this rapidly evolving field, a promising application is the ability to obtain quantitative estimates of relative species abundance based on eDNA concentration rather than traditionally labor-intensive methods. We investigated the relationship between eDNA concentration and Arctic char (Salvelinus alpinus) abundance in five well-studied natural lakes; additionally, we examined the effects of different temporal (e.g., season) and spatial (e.g., depth) scales on eDNA concentration. Concentrations of eDNA were linearly correlated with char population estimates ( = 0.78) and exponentially correlated with char densities ( = 0.96 by area; 0.82 by volume). Across lakes, eDNA concentrations were greater and more homogeneous in the water column during mixis; however, when stratified, eDNA concentrations were greater in the hypolimnion. Overall, our findings demonstrate that eDNA techniques can produce effective estimates of relative fish abundance in natural lakes. These findings can guide future studies to improve and expand eDNA methods while informing research and management using rapid and minimally invasive sampling.

Use of the Rigor Mortis Process as a Tool for Better Understanding of Skeletal Muscle Physiology: Effect of the Ante-Mortem Stress on the Progression of Rigor Mortis in Brook Charr (Salvelinus fontinalis).

Science.gov (United States)

Diouf, Boucar; Rioux, Pierre

1999-01-01

Presents the rigor mortis process in brook charr (Salvelinus fontinalis) as a tool for better understanding skeletal muscle metabolism. Describes an activity that demonstrates how rigor mortis is related to the post-mortem decrease of muscular glycogen and ATP, how glycogen degradation produces lactic acid that lowers muscle pH, and how…
Proteomics dataset

DEFF Research Database (Denmark)

Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

2017-01-01

The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....
Multiscale hydrogeomorphic influences on bull trout (Salvelinus confluentus) spawning habitat

Science.gov (United States)

Bean, Jared R; Wilcox, Andrew C.; Woessner, William W.; Muhlfeld, Clint C.

2015-01-01

We investigated multiscale hydrogeomorphic influences on the distribution and abundance of bull trout (Salvelinus confluentus) spawning in snowmelt-dominated streams of the upper Flathead River basin, northwestern Montana. Within our study reaches, bull trout tended to spawn in the finest available gravel substrates. Analysis of the mobility of these substrates, based on one-dimensional hydraulic modeling and calculation of dimensionless shear stresses, indicated that bed materials in spawning reaches would be mobilized at moderate (i.e., 2-year recurrence interval) high-flow conditions, although the asynchronous timing of the fall–winter egg incubation period and typical late spring – early summer snowmelt high flows in our study area may limit susceptibility to redd scour under current hydrologic regimes. Redd occurrence also tended to be associated with concave-up bedforms (pool tailouts) with downwelling intragravel flows. Streambed temperatures tracked stream water diurnal temperature cycles to a depth of at least 25 cm, averaging 6.1–8.1 °C in different study reaches during the spawning period. Ground water provided thermal moderation of stream water for several high-density spawning reaches. Bull trout redds were more frequent in unconfined alluvial valley reaches (8.5 versus 5.0 redds·km−1 in confined valley reaches), which were strongly influenced by hyporheic and groundwater – stream water exchange. A considerable proportion of redds were patchily distributed in confined valley reaches, however, emphasizing the influence of local physical conditions in supporting bull trout spawning habitat. Moreover, narrowing or “bounding” of these alluvial valley segments did not appear to be important. Our results suggest that geomorphic, thermal, and hydrological factors influence bull trout spawning occurrence at multiple spatial scales.
Dynamics of individual growth in a recovering population of lake trout (Salvelinus namaycush)

Science.gov (United States)

Fabrizio, Mary C.; Dorazio, Robert M.; Schram, Stephen T.

2001-01-01

In 1976, the Wisconsin Department of Natural Resources established a refuge for a nearly depleted population of lake trout (Salvelinus namaycush) at Gull Island Shoal, Lake Superior. The refuge was intended to reduce fishing mortality by protecting adult lake trout. We examined the growth dynamics of these lake trout during the period of recovery by comparing estimates of ndividual growth before and after the refuge was established. Our estimates are based on an annual mark-recapture survey conducted at the spawning area since 1969. We developed a model that allowed mean growth rates to differ among individuals of different sizes and that accommodated variation in growth rates of individuals of the same size. Likelihood ratio tests were used to determine if the mean growth increments of lake trout changed ater the refuge was established. Our results suggest that growth of mature lake trout (particularly wild fish) decreased significantly in the postrefuge period. This decreased growth may have been associated with a reduction in food availability. We also observed reductions in growth as wild fish grew older and larger, which suggests that the growth of these fish may be adequately approximated by a von Bertalanffy growth model if it becomes possible to obtain accurate ages.
Seasonal Variations in Relative Weight of Lake Trout (Salvelinus namaycush), Kokanee Salmon (Oncorhynchus nerka), Rainbow Trout (Onocorhynchus mykiss), and Brown Trout (Salmo trutta) in Blue Mesa Reservoir, Colorado

OpenAIRE

Midas, Madeline; Williams, Asia; Cooper, Cindy; Courtney, Michael

2013-01-01

Blue Mesa Reservoir is the largest body of water in Colorado and is located on the western slope of the Rocky Mountains at an elevation of 7520 feet. Blue Mesa Reservoir contains recreationally important populations of lake trout (Salvelinus namaycush), kokanee salmon (Oncorhynchus nerka), rainbow trout (Onocorhynchus mykiss), and brown trout (Salmo trutta). A management challenge in recent years has been the overpopulation of lake trout, which has led to a steep decline in abundance of kokan...
RARD: The Related-Article Recommendation Dataset

OpenAIRE

Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

2017-01-01

Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...
Isfahan MISP Dataset.

Science.gov (United States)

Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

2017-01-01

An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).
Hypervariability of ribosomal DNA at multiple chromosomal sites in lake trout (Salvelinus namaycush).

Science.gov (United States)

Zhuo, L; Reed, K M; Phillips, R B

1995-06-01

Variation in the intergenic spacer (IGS) of the ribosomal DNA (rDNA) of lake trout (Salvelinus namaycush) was examined. Digestion of genomic DNA with restriction enzymes showed that almost every individual had a unique combination of length variants with most of this variation occurring within rather than between populations. Sequence analysis of a 2.3 kilobase (kb) EcoRI-DraI fragment spanning the 3' end of the 28S coding region and approximately 1.8 kb of the IGS revealed two blocks of repetitive DNA. Putative transcriptional termination sites were found approximately 220 bases (b) downstream from the end of the 28S coding region. Comparison of the 2.3-kb fragments with two longer (3.1 kb) fragments showed that the major difference in length resulted from variation in the number of short (89 b) repeats located 3' to the putative terminator. Repeat units within a single nucleolus organizer region (NOR) appeared relatively homogeneous and genetic analysis found variants to be stably inherited. A comparison of the number of spacer-length variants with the number of NORs found that the number of length variants per individual was always less than the number of NORs. Examination of spacer variants in five populations showed that populations with more NORs had more spacer variants, indicating that variants are present at different rDNA sites on nonhomologous chromosomes.
Hiding in Plain Sight: A Case for Cryptic Metapopulations in Brook Trout (Salvelinus fontinalis.

Directory of Open Access Journals (Sweden)

David C Kazyak

Full Text Available A fundamental issue in the management and conservation of biodiversity is how to define a population. Spatially contiguous fish occupying a stream network have often been considered to represent a single, homogenous population. However, they may also represent multiple discrete populations, a single population with genetic isolation-by-distance, or a metapopulation. We used microsatellite DNA and a large-scale mark-recapture study to assess population structure in a spatially contiguous sample of Brook Trout (Salvelinus fontinalis, a species of conservation concern. We found evidence for limited genetic exchange across small spatial scales and in the absence of barriers to physical movement. Mark-recapture and stationary passive integrated transponder antenna records demonstrated that fish from two tributaries very seldom moved into the opposite tributary, but movements between the tributaries and mainstem were more common. Using Bayesian genetic clustering, we identified two genetic groups that exhibited significantly different growth rates over three years of study, yet survival rates were very similar. Our study highlights the importance of considering the possibility of multiple genetically distinct populations occurring within spatially contiguous habitats, and suggests the existence of a cryptic metapopulation: a spatially continuous distribution of organisms exhibiting metapopulation-like behaviors.
Tests of size and growth effects on Arctic charr (Salvelinus alpinus) otolith δ18 O and δ13 C values.

Science.gov (United States)

Burbank, J; Kelly, B; Nilsson, J; Power, M

2018-06-06

Otolith δ 18 O and δ 13 C values have been used extensively to reconstruct thermal and diet histories. Researchers have suggested that individual growth rate and size may have an effect on otolith isotope ratios and subsequently confound otolith based thermal and diet reconstructions. As few explicit tests of the effect of fish in freshwater environments exist, here we determine experimentally the potential for related growth rate and size effects on otolith δ 18 O and δ 13 C values. Fifty Arctic charr were raised in identical conditions for two years after which their otoliths were removed and analyzed for their δ 18 O and δ 13 C values. The potential effects of final length and the Thermal Growth Coefficient (TGC) on otolith isotope ratios were tested using correlation and regression analysis to determine if significant effects were present and to quantify effects when present. The analyses indicated that TGC and size had significant and similar positive non-linear relationships with δ 13 C values and explained 35% and 42% of the variability, respectively. Conversely, both TGC and size were found to have no significant correlation with otolith δ 18 O values. There was no significant correlation between δ 18 O and δ 13 C values. The investigation indicated the presence of linked growth rate and size effects on otolith δ 13 C values, the nature of which requires further study. Otolith δ 18 O values were unaffected by individual growth rate and size, confirming the applicability of applying these values to thermal reconstructions of fish habitat. This article is protected by copyright. All rights reserved.
Open University Learning Analytics dataset.

Science.gov (United States)

Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

2017-11-28

Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.
Relationship of otolith strontium-to-calcium ratios and salinity: Experimental validation for juvenile salmonids

Science.gov (United States)

Zimmerman, C.E.

2005-01-01

Analysis of otolith strontium (Sr) or strontium-to-calcium (Sr:Ca) ratios provides a powerful tool to reconstruct the chronology of migration among salinity environments for diadromous salmonids. Although use of this method has been validated by examination of known individuals and translocation experiments, it has never been validated under controlled experimental conditions. In this study, incorporation of otolith Sr was tested across a range of salinities and resulting levels of ambient Sr and Ca concentrations in juvenile chinook salmon (Oncorhynchus tshawytscha), coho salmon (Oncorhynchus kisutch), sockeye salmon (Oncorhynchus nerka), rainbow trout (Oncorhynchus rnykiss), and Arctic char (Salvelinus alpinus). Experimental water was mixed, using stream water and seawater as end members, to create experimental salinities of 0.1, 6.3, 12.7, 18.6, 25.5, and 33.0 psu. Otolith Sr and Sr:Ca ratios were significantly related to salinity for all species (r2 range: 0.80-0.91) but provide only enough predictive resolution to discriminate among fresh water, brackish water, and saltwater residency. These results validate the use of otolith Sr:Ca ratios to broadly discriminate salinity histories encountered by salmonids but highlight the need for further research concerning the influence of osmoregulation and physiological changes associated with smoking on otolith microchemistry.
Net trophic transfer efficiencies of polychlorinated biphenyl congeners to lake trout (Salvelinus namaycush) from its prey

Science.gov (United States)

Madenjian, Charles P.; David, Solomon R.; Rediske, Richard R.; O’Keefe, James P.

2012-01-01

Lake trout (Salvelinus namaycush) were fed bloater (Coregonus hoyi) in eight laboratory tanks over a 135-d experiment. At the start of the experiment, four to nine fish in each tank were sacrificed, and the concentrations of 75 polychlorinated biphenyl (PCB) congeners within these fish were determined. Polychlorinated biphenyl congener concentrations were also determined in the 10 lake trout remaining in each of the eight tanks at the end of the experiment as well as in the bloater fed to the lake trout. Each lake trout was weighed at the start and the end of the experiment, and the amount of food eaten by the lake trout was recorded. Using these measurements, net trophic transfer efficiency (γ) from the bloater to the lake trout in each of the eight tanks was calculated for each of the 75 congeners. Results showed that γ did not vary significantly with the degree of chlorination of the PCB congeners, and γ averaged 0.66 across all congeners. However,γ did show a slight, but significant, decrease as logKOW increased from 6.0 to 8.2. Activity level of the lake trout did not have a significant effect on γ.
Bioenergetic evaluation of diel vertical migration by bull trout (Salvelinus confluentus) in a thermally stratified reservoir

Science.gov (United States)

Eckmann, Madeleine; Dunham, Jason B.; Connor, Edward J.; Welch, Carmen A.

2018-01-01

Many species living in deeper lentic ecosystems exhibit daily movements that cycle through the water column, generally referred to as diel vertical migration (DVM). In this study, we applied bioenergetics modelling to evaluate growth as a hypothesis to explain DVM by bull trout (Salvelinus confluentus) in a thermally stratified reservoir (Ross Lake, WA, USA) during the peak of thermal stratification in July and August. Bioenergetics model parameters were derived from observed vertical distributions of temperature, prey and bull trout. Field sampling confirmed that bull trout prey almost exclusively on recently introduced redside shiner (Richardsonius balteatus). Model predictions revealed that deeper (>25 m) DVMs commonly exhibited by bull trout during peak thermal stratification cannot be explained by maximising growth. Survival, another common explanation for DVM, may have influenced bull trout depth use, but observations suggest there may be additional drivers of DVM. We propose these deeper summertime excursions may be partly explained by an alternative hypothesis: the importance of colder water for gametogenesis. In Ross Lake, reliance of bull trout on warm water prey (redside shiner) for consumption and growth poses a potential trade-off with the need for colder water for gametogenesis.
Mridangam stroke dataset

OpenAIRE

CompMusic

2014-01-01

The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan using SM-58 microphones and an H4n ZOOM recorder. The audio was sampled at 44.1 kHz and stored as 16 bit wav files. The dataset can be used for training models for each Mridangam stroke. /n/nA detailed description of the Mridangam and its strokes can be found in the paper below. A part of the dataset was used in the following paper. /nAkshay Anantapadman...
Understanding environmental DNA detection probabilities: A case study using a stream-dwelling char Salvelinus fontinalis

Science.gov (United States)

Wilcox, Taylor M; Mckelvey, Kevin S.; Young, Michael K.; Sepulveda, Adam; Shepard, Bradley B.; Jane, Stephen F; Whiteley, Andrew R.; Lowe, Winsor H.; Schwartz, Michael K.

2016-01-01

Environmental DNA sampling (eDNA) has emerged as a powerful tool for detecting aquatic animals. Previous research suggests that eDNA methods are substantially more sensitive than traditional sampling. However, the factors influencing eDNA detection and the resulting sampling costs are still not well understood. Here we use multiple experiments to derive independent estimates of eDNA production rates and downstream persistence from brook trout (Salvelinus fontinalis) in streams. We use these estimates to parameterize models comparing the false negative detection rates of eDNA sampling and traditional backpack electrofishing. We find that using the protocols in this study eDNA had reasonable detection probabilities at extremely low animal densities (e.g., probability of detection 0.18 at densities of one fish per stream kilometer) and very high detection probabilities at population-level densities (e.g., probability of detection > 0.99 at densities of ≥ 3 fish per 100 m). This is substantially more sensitive than traditional electrofishing for determining the presence of brook trout and may translate into important cost savings when animals are rare. Our findings are consistent with a growing body of literature showing that eDNA sampling is a powerful tool for the detection of aquatic species, particularly those that are rare and difficult to sample using traditional methods.
2008 TIGER/Line Nationwide Dataset

Data.gov (United States)

California Natural Resource Agency — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...
Design of an audio advertisement dataset

Science.gov (United States)

Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

2015-12-01

Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.
Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

Science.gov (United States)

Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

2015-01-01

The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.
The GTZAN dataset

DEFF Research Database (Denmark)

Sturm, Bob L.

2013-01-01

The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN...

Basal mercury concentrations and biomagnification rates in freshwater and marine food webs: Effects on Arctic charr (Salvelinus alpinus) from eastern Canada

International Nuclear Information System (INIS)

Velden, S. van der; Dempson, J.B.; Evans, M.S.; Muir, D.C.G.; Power, M.

2013-01-01

Patterns of total Hg (THg) and methyl Hg (MeHg) biomagnification were investigated in six pairs of co-located lacustrine and marine food webs supporting a common predator, Arctic charr. Mercury biomagnification rates (the slope of log Hg concentration versus δ 15 N-inferred trophic level) did not differ significantly between the two feeding habitats for either THg or MeHg, but THg and MeHg concentrations at the base of the food web were higher in the lacustrine environment than in the marine environment. The proportion of THg as MeHg was related to trophic level, and the relationship was statistically similar in the lacustrine and marine habitats. The biomagnification rate of MeHg exceeded that of THg in both habitats. We conclude that the known difference in Hg concentration between anadromous and non-anadromous Arctic charr is driven by differential Hg concentrations at the base of the lacustrine and marine foodwebs, and not by differential biomagnification rates. - Highlights: ► Concentrations of total mercury ([THg]) and methylmercury ([MeHg]) were measured in 6 paired lacustrine and marine food webs. ► Biomagnification rates (slopes of [THg] or [MeHg] versus δ 15 N-inferred trophic level) were similar in the two habitat types. ► Mercury concentrations at the base of the food web were higher in lacustrine than in marine food webs. ► The percentage of methylated mercury increased with trophic level similarly in the two habitat types. ► The biomagnification rate of MeHg exceeded that of THg in both habitats
Ontogenetic dynamics of infection with Diphyllobothrium spp. cestodes in sympatric Arctic charr Salvelinus alpinus (L.) and brown trout Salmo trutta L.

Science.gov (United States)

Henrickson, Eirik H.; Knudsen, Rune; Kristoffersen, Roar; Kuris, Armand M.; Lafferty, Kevin D.; Siwertsson, Anna; Amundsen, Per-Arne

2016-01-01

The trophic niches of Arctic charr and brown trout differ when the species occur in sympatry. Their trophically transmitted parasites are expected to reflect these differences. Here, we investigate how the infections of Diphyllobothrium dendriticum and D. ditremum differ between charr and trout. These tapeworms use copepods as their first intermediate hosts and fish can become infected as second intermediate hosts by consuming either infected copepods or infected fish. We examined 767 charr and 368 trout for Diphyllobothrium plerocercoids in a subarctic lake. The prevalence of D. ditremum was higher in charr (61.5%) than in trout, (39.5%), but the prevalence of D. dendriticum was higher in trout (31.2%) than in charr (19.3%). Diphyllobothrium spp. intensities were elevated in trout compared to charr, particularly for D. dendriticum. Large fish with massive parasite burdens were responsible for the high Diphyllobothrium spp. loads in trout. We hypothesize that fish prey may be the most important source for the Diphyllobothrium spp. infections in trout, whereas charr predominantly acquire Diphyllobothrium spp. by feeding on copepods. Our findings support previous suggestions that the ability to establish in a second piscine host is greater for D. dendriticum than for D. ditremum.
Breaking the speed limit--comparative sprinting performance of brook trout (Salvelinus fontinalis) and brown trout (Salmo trutta)

Science.gov (United States)

Castro-Santos, Theodore; Sanz-Ronda, Francisco Javier; Ruiz-Legazpi, Jorge

2013-01-01

Sprinting behavior of free-ranging fish has long been thought to exceed that of captive fish. Here we present data from wild-caught brook trout (Salvelinus fontinalis) and brown trout (Salmo trutta), volitionally entering and sprinting against high-velocity flows in an open-channel flume. Performance of the two species was nearly identical, with the species attaining absolute speeds > 25 body lengths·s−1. These speeds far exceed previously published observations for any salmonid species and contribute to the mounting evidence that commonly accepted estimates of swimming performance are low. Brook trout demonstrated two distinct modes in the relationship between swim speed and fatigue time, similar to the shift from prolonged to sprint mode described by other authors, but in this case occurring at speeds > 19 body lengths·s−1. This is the first demonstration of multiple modes of sprint swimming at such high swim speeds. Neither species optimized for distance maximization, however, indicating that physiological limits alone are poor predictors of swimming performance. By combining distributions of volitional swim speeds with endurance, we were able to account for >80% of the variation in distance traversed by both species.
Bull trout (Salvelinus confluentus) telemetry and associated habitat data collected in a geodatabase from the upper Boise River, southwestern Idaho

Science.gov (United States)

MacCoy, Dorene E.; Shephard, Zachary M.; Benjamin, Joseph R.; Vidergar, Dmitri T.; Prisciandaro, Anthony F.

2017-03-23

Bull trout (Salvelinus confluentus), listed as threatened under the Endangered Species Act, are among the more thermally sensitive of coldwater species in North America. The Boise River upstream of Arrowrock Dam in southwestern Idaho (including Arrowrock Reservoir) provides habitat for one of the southernmost populations of bull trout. The presence of the species in Arrowrock Reservoir poses implications for dam and reservoir operations. From 2011 to 2014, the Bureau of Reclamation and the U.S. Geological Survey collected fish telemetry data to improve understanding of bull trout distribution and movement in Arrowrock Reservoir and in the upper Boise River tributaries. The U.S. Geological Survey compiled the telemetry (fish location) data, along with reservoir elevation, river discharge, precipitation, and water-quality data in a geodatabase. The geodatabase includes metadata compliant with Federal Geographic Data Committee content standards. The Bureau of Reclamation plans to incorporate the data in a decision‑support tool for reservoir management.
Editorial: Datasets for Learning Analytics

NARCIS (Netherlands)

Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

2018-01-01

The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of
Differences in organotin accumulation in relation to life history in the white-spotted charr Salvelinus leucomaenis

International Nuclear Information System (INIS)

Ohji, Madoka; Harino, Hiroya; Arai, Takaomi

2011-01-01

Research highlights: → Otolith Sr:Ca ratios in sea-run type were higher than those in freshwater-residents. → TBT and TPT concentrations in sea-run type were higher than those in freshwater-residents. → Sea-run type have higher risk of TBT and TPT than freshwater-residents in white-spotted charr. - Abstract: To examine the accumulation pattern of organotins (OTs) in relation to the migration of diadromous fish, tributyltin (TBT) and triphenyltin (TPT) and their derivatives were determined in the muscle tissue of both sea-run (anadromous) and freshwater-resident (nonanadromous) types of the white-spotted charr Salvelinus leucomaenis. Ontogenic changes in otolith strontium (Sr) and calcium (Ca) concentrations were examined along life history transect to discriminate migration type. Mean Sr:Ca ratio from the core to the edge of the otolith in sea-run individuals was significantly higher than those in freshwater-resident one. There were no significant correlations in S. leucomaenis between OT accumulation and various biological characteristics. It is noteworthy that TBT and TPT concentrations in sea-run type were significantly higher than those in freshwater-resident individuals, although they are both of the same species. These results suggest that sea-run S. leucomaenis have a higher ecological risk of OT exposure than freshwater-residents during their life histories.
The Geometry of Finite Equilibrium Datasets

DEFF Research Database (Denmark)

Balasko, Yves; Tvede, Mich

We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....
Foraging mechanisms of siscowet lake trout (Salvelinus namaycush siscowet) on pelagic prey

Science.gov (United States)

Keyler, Trevor D.; Hrabik, Thomas R.; Austin, C. Lee; Gorman, Owen T.; Mensinger, Allen F.

2015-01-01

The reaction distance, angle of attack, and foraging success were determined for siscowet lake trout (Salvelinus namaycush siscowet) during laboratory trials under lighting conditions that approximated downwelling spectral irradiance and intensity (9.00 × 108–1.06 × 1014 photons m− 2 s− 1) at daytime depths. Siscowet reaction distance in response to golden shiners (Notemigonus crysoleucas) was directly correlated with increasing light intensity until saturation at 1.86 × 1011 photons m− 2 s− 1, above which reaction distance was constant within the range of tested light intensities. At the lowest tested light intensity, sensory detection was sufficient to locate prey at 25 ± 2 cm, while increasing light intensities increased reaction distance up to 59 ± 2 cm at 1.06 × 1014 photons m− 2 s− 1. Larger prey elicited higher reaction distances than smaller prey at all light intensities while moving prey elicited higher reaction distances than stationary prey at the higher light intensities (6.00 × 109 to 1.06 × 1014 photons m− 2 s− 1). The capture and consumption of prey similarly increased with increasing light intensity while time to capture decreased with increasing light intensity. The majority of orientations toward prey occurred within 120° of the longitudinal axis of the siscowet's eyes, although reaction distances among 30° increments along the entire axis were not significantly different. The developed predictive model will help determine reaction distances for siscowet in various photic environments and will help identify the mechanisms and behavior that allow for low light intensity foraging within freshwater systems.
An Annotated Dataset of 14 Meat Images

DEFF Research Database (Denmark)

Stegmann, Mikkel Bille

2002-01-01

This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....
Extensive feeding on sockeye salmon Oncorhynchus nerka smolts by bull trout Salvelinus confluentus during initial outmigration into a small, unregulated and inland British Columbia river

Science.gov (United States)

Furey, Nathan B.; Hinch, Scott G.; Lotto, A.G.; Beauchamp, David A.

2015-01-01

Stomach contents were collected and analysed from 22 bull trout Salvelinus confluentus at the edge of the Chilko Lake and Chilko River in British Columbia, Canada, during spring outmigration of sockeye salmon Oncorhynchus nerka smolts. Twenty of the 22 (>90%) stomachs contained prey items, virtually all identifiable prey items were outmigrant O. nerka smolts and stomach contents represented a large portion (0·0–12·6%) of estimated S. confluentus mass. The results demonstrate nearly exclusive and intense feeding by S. confluentus on outmigrant smolts, and support recent telemetry observations of high disappearance rates of O. nerka smolts leaving large natural lake systems prior to entering high-order unregulated river systems.
Comparison of recent SnIa datasets

International Nuclear Information System (INIS)

Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S.

2009-01-01

We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w 0 +w 1 (1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w 0 ,w 1 ) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample
SIMADL: Simulated Activities of Daily Living Dataset

Directory of Open Access Journals (Sweden)

Talal Alshammari

2018-04-01

Full Text Available With the realisation of the Internet of Things (IoT paradigm, the analysis of the Activities of Daily Living (ADLs, in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator, which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset.
The NOAA Dataset Identifier Project

Science.gov (United States)

de la Beaujardiere, J.; Mccullough, H.; Casey, K. S.

2013-12-01

The US National Oceanic and Atmospheric Administration (NOAA) initiated a project in 2013 to assign persistent identifiers to datasets archived at NOAA and to create informational landing pages about those datasets. The goals of this project are to enable the citation of datasets used in products and results in order to help provide credit to data producers, to support traceability and reproducibility, and to enable tracking of data usage and impact. A secondary goal is to encourage the submission of datasets for long-term preservation, because only archived datasets will be eligible for a NOAA-issued identifier. A team was formed with representatives from the National Geophysical, Oceanographic, and Climatic Data Centers (NGDC, NODC, NCDC) to resolve questions including which identifier scheme to use (answer: Digital Object Identifier - DOI), whether or not to embed semantics in identifiers (no), the level of granularity at which to assign identifiers (as coarsely as reasonable), how to handle ongoing time-series data (do not break into chunks), creation mechanism for the landing page (stylesheet from formal metadata record preferred), and others. Decisions made and implementation experience gained will inform the writing of a Data Citation Procedural Directive to be issued by the Environmental Data Management Committee in 2014. Several identifiers have been issued as of July 2013, with more on the way. NOAA is now reporting the number as a metric to federal Open Government initiatives. This paper will provide further details and status of the project.
Control Measure Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...
The Kinetics Human Action Video Dataset

OpenAIRE

Kay, Will; Carreira, Joao; Simonyan, Karen; Zhang, Brian; Hillier, Chloe; Vijayanarasimhan, Sudheendra; Viola, Fabio; Green, Tim; Back, Trevor; Natsev, Paul; Suleyman, Mustafa; Zisserman, Andrew

2017-01-01

We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some ...
Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

Science.gov (United States)

Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

2017-04-01

CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the
New parasites and predators follow the introduction of two fish species to a subarctic lake: implications for food-web structure and functioning

Science.gov (United States)

Amundsen, Per-Arne; Lafferty, Kevin D.; Knudsen, Rune; Primicerio, Raul; Kristoffersen, Roar; Klemetsen, Anders; Kuris, Armand M.

2012-01-01

Introduced species can alter the topology of food webs. For instance, an introduction can aid the arrival of free-living consumers using the new species as a resource, while new parasites may also arrive with the introduced species. Food-web responses to species additions can thus be far more complex than anticipated. In a subarctic pelagic food web with free-living and parasitic species, two fish species (arctic charr Salvelinus alpinus and three-spined stickleback Gasterosteus aculeatus) have known histories as deliberate introductions. The effects of these introductions on the food web were explored by comparing the current pelagic web with a heuristic reconstruction of the pre-introduction web. Extinctions caused by these introductions could not be evaluated by this approach. The introduced fish species have become important hubs in the trophic network, interacting with numerous parasites, predators and prey. In particular, five parasite species and four predatory bird species depend on the two introduced species as obligate trophic resources in the pelagic web and could therefore not have been present in the pre-introduction network. The presence of the two introduced fish species and the arrival of their associated parasites and predators increased biodiversity, mean trophic level, linkage density, and nestedness; altering both the network structure and functioning of the pelagic web. Parasites, in particular trophically transmitted species, had a prominent role in the network alterations that followed the introductions.
Seasonal Variations in the Use of Profundal Habitat among Freshwater Fishes in Lake Norsjø, Southern Norway, and Subsequent Effects on Fish Mercury Concentrations

Directory of Open Access Journals (Sweden)

Tom Robin Olk

2016-11-01

Full Text Available This study is based on monthly sampling of fish from grates mounted at an industrial water intake, located at a depth of 50 m in Lake Norsjø (Southern Norway during the year 2014, to investigate seasonal variations in the use of the profundal habitat and subsequent variations in total Hg-concentrations in profundal fish. Data on various fish present in a cold and dark hypolimnion of a large, deep, dimictic lake within the upper temperate zone of the Northern Hemisphere are rare. While predominant species such as A. charr (Salvelinus alpinus and E. smelt (Osmerus eperlanus were continuously present in this habitat, whitefish (Coregonus lavaretus occupied this habitat primarily during wintertime, while other common species like brown trout (Salmo trutta, perch (Perca fluviatilis and northern pike (Esox lucius were almost absent. Besides stomach analyses (diet and biometry, stable isotope analyses (δ15N and δ13C and total mercury (Tot-Hg analyses were carried out on the caught fish. The δ13C signature and stomach analyses revealed a combined profundal-pelagic diet for all three species, A. charr with the most profundal-based diet. Length was the strongest predictor for Hg in whitefish and A. charr, while age was the strongest explanatory variable for Hg in E. smelt. A. charr was the only species exhibiting seasonal variation in Hg, highest during winter and spring.
Fish community status in Norwegian lakes in relation to acidification: a comparison between interviews and actual catches by test fishing

Energy Technology Data Exchange (ETDEWEB)

Hesthagen, T.; Berger, H.M.; Larsen, B.M. (Norwegian Inst. for Nature Research, Trondheim (Norway)); Rosseland, B.O. (Norwegian Inst. for Water Research, Oslo (Norway))

1993-01-01

Inquiries are used to obtain information on fish community status in terms of unchanged, reduced and lost communities, to assess the effects of acidification in lakes. The aim of this paper was to investigate the validity of this method by comparing fish status with actual catches on standard gill net series (CPUE). Data from 230 test fishing incidents comprising 357 stocks of 7 different fish species are presented. We found significant differences in CPUE between perceived fish status categories for brown trout (Salmo trutta), arctic char (Salvelinus alpinus) and perch (Perca fluviatilis), for which sufficient data were available. A discriminant analysis revealed that for stocks reported as unchanged and lost, the predicted membership ranged between 60.0-72.1 % respectively. However, a dominant fraction (50.0-66.7 %) of stocks reported as reduced were assigned as lost. Stocks which have been declining for less than 10 years, had a significantly higher CPUE than stocks which have declined for a longer period of time. Another apparent change in population characteristics was an acidification induced increase in mean weight for fish affected stocks. It is suggested that interviews tend to overestimate the current fish status. This is discussed in relation to a time lag between the damage and the time when it became apparent to fishermen, and rapid decline in population numbers. 26 refs, 2 figs, 4 tabs
The uranium in Kvanefjeld

International Nuclear Information System (INIS)

Nielsen, J.S.

1983-08-01

The report is a final thesis at the study of biology at the University of Copenhagen. It examines on a theoretical basis a number of possible environmental effects from a uranium mining and milling project under consideration at the Kvanefjeld site near Narssaq in South Greenland. An introductory description and discussion of the advantages and limitations of ecological baseline studies and dose committment assessments as a tool for planning and decision making is given. The leaching and atmospheric dispersion of particles, heavy metals, radionuclides and other elements from future waste rock and ore piles as well as from mill tailings at the Kvanefjeld site are analysed and discussed. Also, the mobility, transport and accumulation of potentially toxic elements in local terrestrial and aquatic ecosystems and food chains are examined. The resulting human burden are discussed with special attention given to the impact on the local population from eating lamb and seafood. A special quantitative analysis of the dispersion and biological uptake of fluoride, which is found in high concentrations in the ore, is given, focusing on the possible human intake of fluoride-polluted arctic char (Salvelinus alpinus) caught in Narrssaq River. The report at the end gives consideration to the long term problems of controlling mill tailings, discussing among other things the long term human exposure to radon and thoron daughters. (author)

Fluxnet Synthesis Dataset Collaboration Infrastructure

Energy Technology Data Exchange (ETDEWEB)

Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

2008-02-06

The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.
Simulation of Smart Home Activity Datasets

Directory of Open Access Journals (Sweden)

Jonathan Synnott

2015-06-01

Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.
Simulation of Smart Home Activity Datasets.

Science.gov (United States)

Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

2015-06-16

A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.
Solar Integration National Dataset Toolkit | Grid Modernization | NREL

Science.gov (United States)

Solar Integration National Dataset Toolkit Solar Integration National Dataset Toolkit NREL is working on a Solar Integration National Dataset (SIND) Toolkit to enable researchers to perform U.S . regional solar generation integration studies. It will provide modeled, coherent subhourly solar power data
PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

Directory of Open Access Journals (Sweden)

E. Hietanen

2016-06-01

Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.
Genetically based population divergence in overwintering energy mobilization in brook charr (Salvelinus fontinalis).

Science.gov (United States)

Crespel, Amélie; Bernatchez, Louis; Garant, Dany; Audet, Céline

2013-03-01

Investigating the nature of physiological traits potentially related to fitness is important towards a better understanding of how species and/or populations may respond to selective pressures imposed by contrasting environments. In northern species in particular, the ability to mobilize energy reserves to compensate for the low external energy intake during winter is crucial. However, the phenotypic and genetic bases of energy reserve accumulation and mobilization have rarely been investigated, especially pertaining to variation in strategy adopted by different populations. In the present study, we documented variation in several energy reserve variables and estimated their quantitative genetic basis to test the null hypothesis of no difference in variation at those traits among three strains of brook charr (Salvelinus fontinalis) and their reciprocal hybrids. Our results indicate that the strategy of winter energy preparation and mobilization was specific to each strain, whereby (1) domestic fish accumulated a higher amount of energy reserves before winter and kept accumulating liver glycogen during winter despite lower feeding; (2) Laval fish used liver glycogen and lipids during winter and experienced a significant decrease in condition factor; (3) Rupert fish had relatively little energy reserves accumulated at the end of fall and preferentially mobilized visceral fat during winter. Significant heritability for traits related to the accumulation and use of energy reserves was found in the domestic and Laval but not in the Rupert strain. Genetic and phenotypic correlations also varied among strains, which suggested population-specific genetic architecture underlying the expression of these traits. Hybrids showed limited evidence of non-additive effects. Overall, this study provides the first evidence of a genetically based-and likely adaptive-population-specific strategy for energy mobilization related to overwinter survival.
Wind Integration National Dataset Toolkit | Grid Modernization | NREL

Science.gov (United States)

Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies. WIND
A New Outlier Detection Method for Multidimensional Datasets

KAUST Repository

Abdel Messih, Mario A.

2012-07-01

This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.
Spatial and temporal movement dynamics of brook Salvelinus fontinalis and brown trout Salmo trutta

Science.gov (United States)

Davis, L.A.; Wagner, Tyler; Barton, Meredith L.

2015-01-01

Native eastern brook trout Salvelinus fontinalis and naturalized brown trout Salmo trutta occur sympatrically in many streams across the brook trout’s native range in the eastern United States. Understanding within- among-species variability in movement, including correlates of movement, has implications for management and conservation. We radio tracked 55 brook trout and 45 brown trout in five streams in a north-central Pennsylvania, USA watershed to quantify the movement of brook trout and brown trout during the fall and early winter to (1) evaluate the late-summer, early winter movement patterns of brook trout and brown trout, (2) determine correlates of movement and if movement patterns varied between brook trout and brown trout, and (3) evaluate genetic diversity of brook trout within and among study streams, and relate findings to telemetry-based observations of movement. Average total movement was greater for brown trout (mean ± SD = 2,924 ± 4,187 m) than for brook trout (mean ± SD = 1,769 ± 2,194 m). Although there was a large amount of among-fish variability in the movement of both species, the majority of movement coincided with the onset of the spawning season, and a threshold effect was detected between stream flow and movement: where movement increased abruptly for both species during positive flow events. Microsatellite analysis of brook trout revealed consistent findings to those found using radio-tracking, indicating a moderate to high degree of gene flow among brook trout populations. Seasonal movement patterns and the potential for relatively large movements of brook and brown trout highlight the importance of considering stream connectivity when restoring and protecting fish populations and their habitats.
NP-PAH Interaction Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...
A dataset on tail risk of commodities markets.

Science.gov (United States)

Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

2017-12-01

This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.
Proteomics dataset

DEFF Research Database (Denmark)

Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

2017-01-01

patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8–10. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....
Comparison of Shallow Survey 2012 Multibeam Datasets

Science.gov (United States)

Ramirez, T. M.

2012-12-01

The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and
National Hydrography Dataset (NHD)

Data.gov (United States)

Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...
The Harvard organic photovoltaic dataset.

Science.gov (United States)

Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

2016-09-27

The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.
Tables and figure datasets

Data.gov (United States)

U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....
PHYSICS PERFORMANCE AND DATASET (PPD)

CERN Multimedia

L. Silvestris

2013-01-01

The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...
Integrated Surface Dataset (Global)

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...
Aaron Journal article datasets

Data.gov (United States)

U.S. Environmental Protection Agency — All figures used in the journal article are in netCDF format. This dataset is associated with the following publication: Sims, A., K. Alapaty , and S. Raman....
Market Squid Ecology Dataset

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

ATLAS File and Dataset Metadata Collection and Use

CERN Document Server

Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

2012-01-01

The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...
Norwegian Hydrological Reference Dataset for Climate Change Studies

Energy Technology Data Exchange (ETDEWEB)

Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

2012-07-01

Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)
The Harvard organic photovoltaic dataset

Science.gov (United States)

Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

2016-01-01

The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312
Sex Chromosome Evolution, Heterochiasmy, and Physiological QTL in the Salmonid Brook Charr Salvelinus fontinalis

Directory of Open Access Journals (Sweden)

Ben J.G. Sutherland

2017-08-01

Full Text Available Whole-genome duplication (WGD can have large impacts on genome evolution, and much remains unknown about these impacts. This includes the mechanisms of coping with a duplicated sex determination system and whether this has an impact on increasing the diversity of sex determination mechanisms. Other impacts include sexual conflict, where alleles having different optimums in each sex can result in sequestration of genes into nonrecombining sex chromosomes. Sex chromosome development itself may involve sex-specific recombination rate (i.e., heterochiasmy, which is also poorly understood. The family Salmonidae is a model system for these phenomena, having undergone autotetraploidization and subsequent rediploidization in most of the genome at the base of the lineage. The salmonid master sex determining gene is known, and many species have nonhomologous sex chromosomes, putatively due to transposition of this gene. In this study, we identify the sex chromosome of Brook Charr Salvelinus fontinalis and compare sex chromosome identities across the lineage (eight species and four genera. Although nonhomology is frequent, homologous sex chromosomes and other consistencies are present in distantly related species, indicating probable convergence on specific sex and neo-sex chromosomes. We also characterize strong heterochiasmy with 2.7-fold more crossovers in maternal than paternal haplotypes with paternal crossovers biased to chromosome ends. When considering only rediploidized chromosomes, the overall heterochiasmy trend remains, although with only 1.9-fold more recombination in the female than the male. Y chromosome crossovers are restricted to a single end of the chromosome, and this chromosome contains a large interspecific inversion, although its status between males and females remains unknown. Finally, we identify quantitative trait loci (QTL for 21 unique growth, reproductive, and stress-related phenotypes to improve knowledge of the genetic
Synthetic and Empirical Capsicum Annuum Image Dataset

NARCIS (Netherlands)

Barth, R.

2016-01-01

This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included. The aim of the datasets are to
EEG datasets for motor imagery brain-computer interface.

Science.gov (United States)

Cho, Hohyun; Ahn, Minkyu; Ahn, Sangtae; Kwon, Moonyoung; Jun, Sung Chan

2017-07-01

Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states. © The Authors 2017. Published by Oxford University Press.
A high-resolution European dataset for hydrologic modeling

Science.gov (United States)

Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

2013-04-01

There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as
ASSISTments Dataset from Multiple Randomized Controlled Experiments

Science.gov (United States)

Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

2016-01-01

In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…
Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

International Nuclear Information System (INIS)

Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

2014-01-01

This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed. (paper)
Estimating parameters for probabilistic linkage of privacy-preserved datasets.

Science.gov (United States)

Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

2017-07-10

Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher
Viking Seismometer PDS Archive Dataset

Science.gov (United States)

Lorenz, R. D.

2016-12-01

The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.
Genetic and phenotypic variation along an ecological gradient in lake trout Salvelinus namaycush

Science.gov (United States)

Baillie, Shauna M.; Muir, Andrew M.; Hansen, Michael J.; Krueger, Charles C.; Bentzen, Paul

2016-01-01

BackgroundAdaptive radiation involving a colonizing phenotype that rapidly evolves into at least one other ecological variant, or ecotype, has been observed in a variety of freshwater fishes in post-glacial environments. However, few studies consider how phenotypic traits vary with regard to neutral genetic partitioning along ecological gradients. Here, we present the first detailed investigation of lake trout Salvelinus namaycushthat considers variation as a cline rather than discriminatory among ecotypes. Genetic and phenotypic traits organized along common ecological gradients of water depth and geographic distance provide important insights into diversification processes in a lake with high levels of human disturbance from over-fishing.ResultsFour putative lake trout ecotypes could not be distinguished using population genetic methods, despite morphological differences. Neutral genetic partitioning in lake trout was stronger along a gradient of water depth, than by locality or ecotype. Contemporary genetic migration patterns were consistent with isolation-by-depth. Historical gene flow patterns indicated colonization from shallow to deep water. Comparison of phenotypic (Pst) and neutral genetic variation (Fst) revealed that morphological traits related to swimming performance (e.g., buoyancy, pelvic fin length) departed more strongly from neutral expectations along a depth gradient than craniofacial feeding traits. Elevated phenotypic variance with increasing water depth in pelvic fin length indicated possible ongoing character release and diversification. Finally, differences in early growth rate and asymptotic fish length across depth strata may be associated with limiting factors attributable to cold deep-water environments.ConclusionWe provide evidence of reductions in gene flow and divergent natural selection associated with water depth in Lake Superior. Such information is relevant for documenting intraspecific biodiversity in the largest freshwater lake
Homogenised Australian climate datasets used for climate change monitoring

International Nuclear Information System (INIS)

Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

2007-01-01

Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal
Introduction of a simple-model-based land surface dataset for Europe

Science.gov (United States)

Orth, Rene; Seneviratne, Sonia I.

2015-04-01

Land surface hydrology can play a crucial role during extreme events such as droughts, floods and even heat waves. We introduce in this study a new hydrological dataset for Europe that consists of soil moisture, runoff and evapotranspiration (ET). It is derived with a simple water balance model (SWBM) forced with precipitation, temperature and net radiation. The SWBM dataset extends over the period 1984-2013 with a daily time step and 0.5° × 0.5° resolution. We employ a novel calibration approach, in which we consider 300 random parameter sets chosen from an observation-based range. Using several independent validation datasets representing soil moisture (or terrestrial water content), ET and streamflow, we identify the best performing parameter set and hence the new dataset. To illustrate its usefulness, the SWBM dataset is compared against several state-of-the-art datasets (ERA-Interim/Land, MERRA-Land, GLDAS-2-Noah, simulations of the Community Land Model Version 4), using all validation datasets as reference. For soil moisture dynamics it outperforms the benchmarks. Therefore the SWBM soil moisture dataset constitutes a reasonable alternative to sparse measurements, little validated model results, or proxy data such as precipitation indices. Also in terms of runoff the SWBM dataset performs well, whereas the evaluation of the SWBM ET dataset is overall satisfactory, but the dynamics are less well captured for this variable. This highlights the limitations of the dataset, as it is based on a simple model that uses uniform parameter values. Hence some processes impacting ET dynamics may not be captured, and quality issues may occur in regions with complex terrain. Even though the SWBM is well calibrated, it cannot replace more sophisticated models; but as their calibration is a complex task the present dataset may serve as a benchmark in future. In addition we investigate the sources of skill of the SWBM dataset and find that the parameter set has a similar
Data Mining for Imbalanced Datasets: An Overview

Science.gov (United States)

Chawla, Nitesh V.

A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.
Consistency in trophic magnification factors of cyclic methyl siloxanes in pelagic freshwater food webs leading to brown trout.

Science.gov (United States)

Borgå, Katrine; Fjeld, Eirik; Kierkegaard, Amelie; McLachlan, Michael S

2013-12-17

Cyclic volatile methyl siloxanes (cVMS) concentrations were analyzed in the pelagic food web of two Norwegian lakes (Mjøsa, Randsfjorden), and in brown trout (Salmo trutta) and Arctic char (Salvelinus alpinus) collected in a reference lake (Femunden), in 2012. Lakes receiving discharge from wastewater treatment plants (Mjøsa and Randsfjorden) had cVMS concentrations in trout that were up to 2 orders of magnitude higher than those in Femunden, where most samples were close to the limit of quantification (LOQ). Food web biomagnification of cVMS in Mjøsa and Randsfjorden was quantified by estimation of trophic magnification factors (TMFs). TMF for legacy persistent organic pollutants (POPs) were analyzed for comparison. Both decamethylcyclopentasiloxane (D5) and dodecamethylcyclohexasiloxane (D6) biomagnified with TMFs of 2.9 (2.1-4.0) and 2.3 (1.8-3.0), respectively. Octamethylcyclotetrasiloxane (D4) was below the LOQ in the majority of samples and had substantially lower biomagnification than for D5 and D6. The cVMS TMFs did not differ between the lakes, whereas the legacy POP TMFs were higher in Mjøsa than inRandsfjorden. Whitefish had lower cVMS bioaccumulation compared to legacy POPs, and affected the TMF significance for cVMS, but not for POPs. TMFs of D5 and legacy contaminants in Lake Mjøsa were consistent with those previously measured in Mjøsa.
Toxaphene in the aquatic environment of Greenland

International Nuclear Information System (INIS)

Vorkamp, Katrin; Rigét, Frank F.; Dietz, Rune

2015-01-01

The octa- and nonachlorinated bornanes (toxaphene) CHBs 26, 40, 41, 44, 50 and 62 were analysed in Arctic char (Salvelinus alpinus), shorthorn sculpin (Myoxocephalus scorpius), ringed seal (Pusa hispida) and black guillemot eggs (Cepphus grylle) from Greenland. Despite their high trophic level, ringed seals had the lowest concentrations of these species, with a Σ 6 Toxaphene median concentration of 13–20 ng/g lipid weight (lw), suggesting metabolisation. The congener composition also suggests transformation of nona- to octachlorinated congeners. Black guillemot eggs had the highest concentrations (Σ 6 Toxaphene median concentration of 971 ng/g lw). Although concentrations were higher in East than in West Greenland differences were smaller than for other persistent organic pollutants. In a circumpolar context, toxaphene had the highest concentrations in the Canadian Arctic. Time trend analyses showed significant decreases for black guillemot eggs and juvenile ringed seals, with annual rates of −5 to −7% for Σ 6 Toxaphene. The decreases were generally steepest for CHBs 40, 41 and 44. - Highlights: • Toxaphene was detected in freshwater and marine species of Greenland. • Relatively low concentrations in ringed seal suggest metabolisation. • The concentrations in Greenland appear lower than those in the Canadian Arctic. • Significant decreases were found in juvenile ringed seals and black guillemot eggs. - The banned insecticide toxaphene is widely present in the aquatic environment of Greenland, but concentrations are decreasing
Double-digest RAD sequencing using Ion Proton semiconductor platform (ddRADseq-ion) with nonmodel organisms.

Science.gov (United States)

Recknagel, Hans; Jacobs, Arne; Herzyk, Pawel; Elmer, Kathryn R

2015-11-01

Research in evolutionary biology involving nonmodel organisms is rapidly shifting from using traditional molecular markers such as mtDNA and microsatellites to higher throughput SNP genotyping methodologies to address questions in population genetics, phylogenetics and genetic mapping. Restriction site associated DNA sequencing (RAD sequencing or RADseq) has become an established method for SNP genotyping on Illumina sequencing platforms. Here, we developed a protocol and adapters for double-digest RAD sequencing for Ion Torrent (Life Technologies; Ion Proton, Ion PGM) semiconductor sequencing. We sequenced thirteen genomic libraries of three different nonmodel vertebrate species on Ion Proton with PI chips: Arctic charr Salvelinus alpinus, European whitefish Coregonus lavaretus and common lizard Zootoca vivipara. This resulted in ~962 million single-end reads overall and a mean of ~74 million reads per library. We filtered the genomic data using Stacks, a bioinformatic tool to process RAD sequencing data. On average, we obtained ~11,000 polymorphic loci per library of 6-30 individuals. We validate our new method by technical and biological replication, by reconstructing phylogenetic relationships, and using a hybrid genetic cross to track genomic variants. Finally, we discuss the differences between using the different sequencing platforms in the context of RAD sequencing, assessing possible advantages and disadvantages. We show that our protocol can be used for Ion semiconductor sequencing platforms for the rapid and cost-effective generation of variable and reproducible genetic markers. © 2015 John Wiley & Sons Ltd.
A hybrid organic-inorganic perovskite dataset

Science.gov (United States)

Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

2017-05-01

Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.
Genomics dataset of unidentified disclosed isolates

Directory of Open Access Journals (Sweden)

Bhagwan N. Rekadwad

2016-09-01

Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. Keywords: BioLABs, Blunt ends, Genomics, NEB cutter, Restriction digestion, Short DNA sequences, Sticky ends

IPCC Socio-Economic Baseline Dataset

Data.gov (United States)

National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...
The LANDFIRE Refresh strategy: updating the national dataset

Science.gov (United States)

Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

2013-01-01

The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.
Omicseq: a web-based search engine for exploring omics datasets

Science.gov (United States)

Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

2017-01-01

Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462
Nanoparticle-organic pollutant interaction dataset

Data.gov (United States)

U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...
Framework for Interactive Parallel Dataset Analysis on the Grid

Energy Technology Data Exchange (ETDEWEB)

Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

2007-01-10

We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.
Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

Science.gov (United States)

Maskey, M.; Ramachandran, R.; Miller, J.

2017-12-01

Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.
An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

Directory of Open Access Journals (Sweden)

Kang Zhang

2014-01-01

Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.
Behavioural and physiological responses of brook trout Salvelinus fontinalis to midwinter flow reduction in a small ice-free mountain stream.

Science.gov (United States)

Krimmer, A N; Paul, A J; Hontela, A; Rasmussen, J B

2011-09-01

This study presents an experimental analysis of the effects of midwinter flow reduction (50-75%, reduction in discharge in 4 h daily pulses) on the physical habitat and on behaviour and physiology of overwintering brook trout Salvelinus fontinalis in a small mountain stream. Flow reduction did not result in significant lowering of temperature or formation of surface or subsurface ice. The main findings were (1) daily movement by S. fontinalis increased (c. 2·5-fold) during flow reduction, but was limited to small-scale relocations (reduced during flow reduction. (3) Although both experimental and reference fish did lose mass and condition during the experiment, no effects of flow reduction on stress indicators (blood cortisol or glucose) or bioenergetics (total body fat, water content or mass loss) were detected, probably because access to the preferred type of cover remained available. Like other salmonids, S. fontinalis moves little and seeks physical cover during winter. Unlike many of the more studied salmonids, however, this species overwinters successfully in small groundwater-rich streams that often remain ice-free, and this study identifies undercut banks as the critical winter habitat rather than substratum cover. © 2011 The Authors. Journal of Fish Biology © 2011 The Fisheries Society of the British Isles.
Feeding habits of the alien brook trout Salvelinus fontinalis and the native brown trout Salmo trutta in Czech mountain streams

Directory of Open Access Journals (Sweden)

Horká Petra

2017-01-01

Full Text Available Quantifying patterns of prey resource use is fundamental to identify mechanisms enabling the coexistence of related fish species. Trophic interactions between the native brown trout, Salmo trutta, and the introduced brook trout, Salvelinus fontinalis, were studied monthly from May to October in three mountain streams in Central Europe (Czech Republic. To evaluate whether the feeding habits differ between separated and coexisting populations of these species, one locality where both species coexist, and two allopatric populations of either species were studied. Across the study period, the mean stomach fullness of fish varied, being highest in spring and declining through autumn. The diet overlap (Schoener's overlap index between the species increased through the studied season (from 54.5% in July to 81.5% in October. In allopatry, both species had nearly the same feeding habits. However, in sympatry, brook trout consumed higher proportion of terrestrial invertebrates, while brown trout showed no changes either in the proportions of aquatic and terrestrial prey utilized or in the selectivity for prey categories in comparison to allopatric conditions. The dietary shift observed for brook trout, but not for brown trout, suggests that brown trout is a stronger competitor in the studied sympatric locality, leading the brook trout to change its feeding habits to reduce interspecific competition.
Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

Science.gov (United States)

Lary, D. J.

2013-12-01

A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.
Chemical product and function dataset

Data.gov (United States)

U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....
General Purpose Multimedia Dataset - GarageBand 2008

DEFF Research Database (Denmark)

Meng, Anders

This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....
Omicseq: a web-based search engine for exploring omics datasets.

Science.gov (United States)

Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

2017-07-03

The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Quantifying uncertainty in observational rainfall datasets

Science.gov (United States)

Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

2015-04-01

The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded
Turkey Run Landfill Emissions Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....
Topic modeling for cluster analysis of large biological and medical datasets.

Science.gov (United States)

Zhao, Weizhong; Zou, Wen; Chen, James J

2014-01-01

The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting
An Analysis of the GTZAN Music Genre Dataset

DEFF Research Database (Denmark)

Sturm, Bob L.

2012-01-01

Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...
Dataset definition for CMS operations and physics analyses

Science.gov (United States)

Franzoni, Giovanni; Compact Muon Solenoid Collaboration

2016-04-01

Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.
Dataset definition for CMS operations and physics analyses

CERN Document Server

AUTHOR|(CDS)2051291

2016-01-01

Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concept of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the first run, and we discuss the plans for the second LHC run.
Dataset of NRDA emission data

Data.gov (United States)

U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session.

Science.gov (United States)

Kohli, Marc D; Summers, Ronald M; Geis, J Raymond

2017-08-01

At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.
Discovery and Reuse of Open Datasets: An Exploratory Study

Directory of Open Access Journals (Sweden)

Sara

2016-07-01

Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.
Visualization of conserved structures by fusing highly variable datasets.

Science.gov (United States)

Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

2002-01-01

Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual
Fine-scale acoustic telemetry reveals unexpected lake trout, Salvelinus namaycush, spawning habitats in northern Lake Huron, North America

Science.gov (United States)

Binder, Thomas; Farha, Steve A.; Thompson, Henry T.; Holbrook, Christopher; Bergstedt, Roger A.; Riley, Stephen; Bronte, Charles R.; He, Ji; Krueger, Charles C.

2018-01-01

Previous studies of lake trout, Salvelinus namaycush, spawning habitat in the Laurentian Great Lakes have used time- and labour-intensive survey methods and have focused on areas with historic observations of spawning aggregations and on habitats prejudged by researchers to be suitable for spawning. As an alternative, we used fine-scale acoustic telemetry to locate, describe and compare lake trout spawning habitats. Adult lake trout were implanted with acoustic transmitters and tracked during five consecutive spawning seasons in a 19–27 km2 region of the Drummond Island Refuge, Lake Huron, using the VEMCO Positioning System. Acoustic telemetry revealed discrete areas of aggregation on at least five reefs in the study area, subsequently confirmed by divers to contain deposited eggs. Notably, several identified spawning sites would likely not have been discovered using traditional methods because either they were too small and obscure to stand out on a bathymetric map or because they did not conform to the conceptual model of spawning habitat held by many biologists. Our most unique observation was egg deposition in gravel and rubble substrates located at the base of and beneath overhanging edges of large boulders. Spawning sites typically comprised <10% of the reef area and were used consistently over the 5-year study. Evaluation of habitat selection from the perspective of fish behaviour through use of acoustic transmitters offers potential to expand current conceptual models of critical spawning habitat.
An Annotated Dataset of 14 Cardiac MR Images

DEFF Research Database (Denmark)

Stegmann, Mikkel Bille

2002-01-01

This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....
Dataset - Adviesregel PPL 2010

NARCIS (Netherlands)

Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

2011-01-01

This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an
Tension in the recent Type Ia supernovae datasets

International Nuclear Information System (INIS)

Wei, Hao

2010-01-01

In the present work, we investigate the tension in the recent Type Ia supernovae (SNIa) datasets Constitution and Union. We show that they are in tension not only with the observations of the cosmic microwave background (CMB) anisotropy and the baryon acoustic oscillations (BAO), but also with other SNIa datasets such as Davis and SNLS. Then, we find the main sources responsible for the tension. Further, we make this more robust by employing the method of random truncation. Based on the results of this work, we suggest two truncated versions of the Union and Constitution datasets, namely the UnionT and ConstitutionT SNIa samples, whose behaviors are more regular.
Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

Science.gov (United States)

Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

2016-11-01

This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.
Technical note: An inorganic water chemistry dataset (1972–2011 ...

African Journals Online (AJOL)

A national dataset of inorganic chemical data of surface waters (rivers, lakes, and dams) in South Africa is presented and made freely available. The dataset comprises more than 500 000 complete water analyses from 1972 up to 2011, collected from more than 2 000 sample monitoring stations in South Africa. The dataset ...
Wind and wave dataset for Matara, Sri Lanka

Science.gov (United States)

Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

2018-01-01

We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447).
Wind and wave dataset for Matara, Sri Lanka

Directory of Open Access Journals (Sweden)

Y. Luo

2018-01-01

Full Text Available We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1 is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017 is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447.
Perturbation in protein expression of the sterile salmonid hybrids between female brook trout Salvelinus fontinalis and male masu salmon Oncorhynchus masou during early spermatogenesis.

Science.gov (United States)

Zheng, Liang; Senda, Yoshie; Abe, Syuiti

2013-05-01

Most males and females of intergeneric hybrid (BM) between female brook trout (Bt) Salvelinus fontinalis and male masu salmon (Ms) Oncorhynchus masou had undeveloped gonads, with abnormal germ cell development shown by histological examination. To understand the cause of this hybrid sterility, expression profiles of testicular proteins in the BM and parental species were examined with 2-DE coupled with MALDI-TOF/TOF MS. Compared with the parental species, more than 60% of differentially expressed protein spots were down-regulated in BM. A total of 16 up-regulated and 48 down-regulated proteins were identified in BM. Up-regulated were transferrin and other somatic cell-predominant proteins, whereas down-regulated were some germ cell-specific proteins such as DEAD box RNA helicase Vasa. Other pronouncedly down-regulated proteins included tubulins and heat shock proteins that are supposed to have roles in spermatogenesis. The present findings suggest direct association of the observed perturbation in protein expression with the failure of spermatogenesis and the sterility in the examined salmonid hybrids. Copyright © 2013 Elsevier B.V. All rights reserved.
Heuristics for Relevancy Ranking of Earth Dataset Search Results

Science.gov (United States)

Lynnes, Christopher; Quinn, Patrick; Norton, James

2016-01-01

As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.
QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity

Directory of Open Access Journals (Sweden)

Davy Guan

2018-04-01

Full Text Available Five datasets were constructed from ligand and bioassay result data from the literature. These datasets include bioassay results from the Ames mutagenicity assay, Greenscreen GADD-45a-GFP assay, Syrian Hamster Embryo (SHE assay, and 2 year rat carcinogenicity assay results. These datasets provide information about chemical mutagenicity, genotoxicity and carcinogenicity.
The Dataset of Countries at Risk of Electoral Violence

OpenAIRE

Birch, Sarah; Muchlinski, David

2017-01-01

Electoral violence is increasingly affecting elections around the world, yet researchers have been limited by a paucity of granular data on this phenomenon. This paper introduces and describes a new dataset of electoral violence – the Dataset of Countries at Risk of Electoral Violence (CREV) – that provides measures of 10 different types of electoral violence across 642 elections held around the globe between 1995 and 2013. The paper provides a detailed account of how and why the dataset was ...
Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

Science.gov (United States)

Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

2010-06-30

QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but
Towards interoperable and reproducible QSAR analyses: Exchange of datasets

Directory of Open Access Journals (Sweden)

Spjuth Ola

2010-06-01

Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join
VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

Science.gov (United States)

Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.
Toward computational cumulative biology by combining models of biological datasets.

Science.gov (United States)

Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

2014-01-01

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.
3DSEM: A 3D microscopy dataset

Directory of Open Access Journals (Sweden)

Ahmad P. Tafti

2016-03-01

Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM

Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

Directory of Open Access Journals (Sweden)

Mingwei Leng

2013-01-01

Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.
A reanalysis dataset of the South China Sea

Science.gov (United States)

Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

2014-01-01

Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803
A dataset of forest biomass structure for Eurasia.

Science.gov (United States)

Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

2017-05-16

The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.
A Dataset for Visual Navigation with Neuromorphic Methods

Directory of Open Access Journals (Sweden)

Francisco eBarranco

2016-02-01

Full Text Available Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.
Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

Science.gov (United States)

Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

2014-01-01

SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111
Drei für Deutschland neue Zwergspinnen aus dem bayerischen Alpenraum (Araneae: Linyphiidae, Erigoninae

Directory of Open Access Journals (Sweden)

Muster, Christoph

2001-10-01

Full Text Available During the survey of epigeous spider communities in the Bavarian Alps (Germany, Upper Bavaria, three species of Erigoninae were recorded from Germany for the first time. Micrargus alpinus and Silometopus rosemariae are endemic species of the Alps, Panamomops palmgreni is endemic to the Alpine mountain system. For each species present knowledge on distribution, habitat and phenology is summarized. As Micrargus alpinus was described in 1997, faunistic and ecological data are still very poor. Niche differentiation between the closely related species of the Micrargus herbigradus-group is discussed.
New insight into the spawning behavior of lake trout, Salvelinus namaycush, from a recovering population in the Laurentian Great Lakes

Science.gov (United States)

Binder, Thomas R.; Thompson, Henry T.; Muir, Andrew M.; Riley, Stephen C.; Marsden, J. Ellen; Bronte, Charles R.; Krueger, Charles C.

2015-01-01

Spawning behavior of lake trout, Salvelinus namaycush, is poorly understood, relative to stream-dwelling salmonines. Underwater video records of spawning in a recovering population from the Drummond Island Refuge (Lake Huron) represent the first reported direct observations of lake trout spawning in the Laurentian Great Lakes. These observations provide new insight into lake trout spawning behavior and expand the current conceptual model. Lake trout spawning consisted of at least four distinct behaviors: hovering, traveling, sinking, and gamete release. Hovering is a new courtship behavior that has not been previously described. The apparent concentration of hovering near the margin of the spawning grounds suggests that courtship and mate selection might be isolated from the spawning act (i.e., traveling, sinking, and gamete release). Moreover, we interpret jockeying for position displayed by males during traveling as a unique form of male-male competition that likely evolved in concert with the switch from redd-building to itinerant spawning in lake trout. Unlike previous models, which suggested that intra-sexual competition and mate selection do not occur in lake trout, our model includes both and is therefore consistent with evolutionary theory, given that the sex ratio on spawning grounds is skewed heavily towards males. The model presented in this paper is intended as a working hypothesis, and further revision may become necessary as we gain a more complete understanding of lake trout spawning behavior.
An Analysis on Better Testing than Training Performances on the Iris Dataset

NARCIS (Netherlands)

Schutten, Marten; Wiering, Marco

2016-01-01

The Iris dataset is a well known dataset containing information on three different types of Iris flowers. A typical and popular method for solving classification problems on datasets such as the Iris set is the support vector machine (SVM). In order to do so the dataset is separated in a set used
Exposure-related effects of Zequanox on juvenile lake sturgeon (Acipenser fulvescens) and lake trout (Salvelinus namaycush)

Science.gov (United States)

Luoma, James A.; Severson, Todd J.; Wise, Jeremy K.; Barbour, Matthew

2018-01-01

The environmental fate, persistence, and non-target animal impacts of traditional molluscicides for zebra, Dreissena polymorpha, and quagga, D. bugensis, mussel control led to the development of the biomolluscicide Zequanox. Although previous research has demonstrated the specificity of Zequanox, one study indicated sensitivity of salmonids and lake sturgeon, Acipenser fulvescens, following non-label compliant exposures to Zequanox. This study was conducted to evaluate sublethal and lethal impacts of Zequanox exposure on juvenile lake sturgeon and lake trout, Salvelinus namaycush, following applications that were conducted in a manner consistent with the Zequanox product label. Fish were exposed to 50 or 100 mg/L of Zequanox as active ingredient for 8 h and then held for 33 d to evaluate latent impacts. No acute mortality was observed in either species; however, significant latent mortality (P < 0.01, df = 9; 46.2%) was observed in lake trout that were exposed to the highest dose of Zequanox. Statistically significant (P < 0.03, df = 9), but biologically minimal differences were observed in the weight (range 20.17 to 21.49 g) of surviving lake sturgeon at the termination of the 33 d post-exposure observation period. Statistically significant (P < 0.05, df = 9) and biologically considerable differences were observed in the weight (range 6.19 to 9.55 g) of surviving lake trout at the termination of the 33 d post-exposure observation period. Histologic evaluation of lake trout gastrointestinal tracts suggests that the mode of action in lake trout is different from the mode of action that induces zebra and quagga mussel mortality. Further research could determine the sensitivity of other salmonid species to Zequanox and determine if native fish will avoid Zequanox treated water.
Interactive visualization and analysis of multimodal datasets for surgical applications.

Science.gov (United States)

Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

2012-12-01

Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.
Something From Nothing (There): Collecting Global IPv6 Datasets from DNS

NARCIS (Netherlands)

Fiebig, T.; Borgolte, Kevin; Hao, Shuang; Kruegel, Christopher; Vigna, Giovanny; Spring, Neil; Riley, George F.

2017-01-01

Current large-scale IPv6 studies mostly rely on non-public datasets, asmost public datasets are domain specific. For instance, traceroute-based datasetsare biased toward network equipment. In this paper, we present a new methodologyto collect IPv6 address datasets that does not require access to
Automatic processing of multimodal tomography datasets.

Science.gov (United States)

Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

2017-01-01

With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.
GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare.

Science.gov (United States)

Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung

2015-07-02

A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a "data modeler" tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.
Role of cortisol in stocking density-induced changes in growth and metabolism of brook charr (Salvelinus fontinalis)

Energy Technology Data Exchange (ETDEWEB)

Vijayan, M.M.

1990-01-01

Brook charr (Salvelinus fontinalis) held at high stocking density (SD) (120 kg/m{sup 3}) had a lower growth rate, food consumption and food conversion efficiency, and plasma thyroxine (T4) than those held at low SD (30 kg/m{sup 3}). SD had no effect on plasma triiodothyronine (T3) levels. Plasma cortisol levels in fish maintained at high SD were variable, being either lower with increased SD or not different from the low SD group. Head kidney tissue (containing the interrenal cells) preparations of brook charr held at high SD showed a higher spontaneous cortisol secretion rate. There was no difference in the clearance rate of ({sup 3}H)-cortisol from plasma, but liver from fish held at high SD showed higher cortisol uptake and catabolism, indicative of altered hepatic metabolic activity. High SD appears to alter the energy metabolism of brook charr. This was evident from significant changes between densities in levels of metabolites (plasma glucose and liver glycogen) and activities of key hepatic enzymes (PFK, HK, FBPase, G6PDH, HOAD, GK and G3PDH). These results suggests that high SD has the effect of mobilizing triglycerides, and promoting gluconeogenesis from glycerol, but has little effect on protein metabolism. When cortisol was administered to brook charr in the form of slow release intraperitoneal implants, the metabolic changes evident were similar to those observed in fish held at high SD. There was no consistent increase in plasma cortisol levels of cortisol implanted fish over a 90 day period. Nevertheless, there were significant effects, apparently cortisol-related, on certain metabolite levels (plasma glycerol, plasma glucose, hepatic glycogen), and activities of key hepatic enzymes.
A Research Graph dataset for connecting research data repositories using RD-Switchboard.

Science.gov (United States)

Aryani, Amir; Poblet, Marta; Unsworth, Kathryn; Wang, Jingbo; Evans, Ben; Devaraju, Anusuriya; Hausstein, Brigitte; Klas, Claus-Peter; Zapilko, Benjamin; Kaplun, Samuele

2018-05-29

This paper describes the open access graph dataset that shows the connections between Dryad, CERN, ANDS and other international data repositories to publications and grants across multiple research data infrastructures. The graph dataset was created using the Research Graph data model and the Research Data Switchboard (RD-Switchboard), a collaborative project by the Research Data Alliance DDRI Working Group (DDRI WG) with the aim to discover and connect the related research datasets based on publication co-authorship or jointly funded grants. The graph dataset allows researchers to trace and follow the paths to understanding a body of work. By mapping the links between research datasets and related resources, the graph dataset improves both their discovery and visibility, while avoiding duplicate efforts in data creation. Ultimately, the linked datasets may spur novel ideas, facilitate reproducibility and re-use in new applications, stimulate combinatorial creativity, and foster collaborations across institutions.
Evolution and origin of sympatric shallow-water morphotypes of Lake Trout, Salvelinus namaycush, in Canada's Great Bear Lake.

Science.gov (United States)

Harris, L N; Chavarie, L; Bajno, R; Howland, K L; Wiley, S H; Tonn, W M; Taylor, E B

2015-01-01

Range expansion in north-temperate fishes subsequent to the retreat of the Wisconsinan glaciers has resulted in the rapid colonization of previously unexploited, heterogeneous habitats and, in many situations, secondary contact among conspecific lineages that were once previously isolated. Such ecological opportunity coupled with reduced competition likely promoted morphological and genetic differentiation within and among post-glacial fish populations. Discrete morphological forms existing in sympatry, for example, have now been described in many species, yet few studies have directly assessed the association between morphological and genetic variation. Morphotypes of Lake Trout, Salvelinus namaycush, are found in several large-lake systems including Great Bear Lake (GBL), Northwest Territories, Canada, where several shallow-water forms are known. Here, we assess microsatellite and mitochondrial DNA variation among four morphotypes of Lake Trout from the five distinct arms of GBL, and also from locations outside of this system to evaluate several hypotheses concerning the evolution of morphological variation in this species. Our data indicate that morphotypes of Lake Trout from GBL are genetically differentiated from one another, yet the morphotypes are still genetically more similar to one another compared with populations from outside of this system. Furthermore, our data suggest that Lake Trout colonized GBL following dispersal from a single glacial refugium (the Mississippian) and support an intra-lake model of divergence. Overall, our study provides insights into the origins of morphological and genetic variation in post-glacial populations of fishes and provides benchmarks important for monitoring Lake Trout biodiversity in a region thought to be disproportionately susceptible to impacts from climate change.
Process mining in oncology using the MIMIC-III dataset

Science.gov (United States)

Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

2018-03-01

Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.
Veterans Affairs Suicide Prevention Synthetic Dataset

Data.gov (United States)

Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...
Identification of Nematodirus species (Nematoda: Molineidae) from wild ruminants in Italy using ribosomal DNA markers.

Science.gov (United States)

Gasser, R B; Rossi, L; Zhu, X

1999-11-01

The sequence of the second internal transcribed spacer of ribosomal DNA was determined for four species of Nematodirus (Nematodirus rupicaprae, Nematodirus oiratianus, Nematodirus davtiani alpinus and Nematodirus europaeus) from roe deer or alpine chamois. The second internal transcribed spacer of the four species varied in length from 228 to 236 bp, and the G + C contents ranged from 41 to 44%. While no intraspecific sequence variation was detected among multiple samples representing three of the taxa, sequence differences of 5.9-9.7% were detected among the four species, Nematodirus davtiani alpinus and N. rupicaprae were genetically most similar (94.1%), followed by N. oiratianus, N. europaeus and N. rupicaprae (91.1-91.5%), whereas N. oiratianus was genetically most different from N. davtiani alpinus. The interspecific sequence differences were exploited for the delineation of the four species by PCR-based restriction fragment length polymorphism (using two enzymes) and single-strand conformation polymorphism. The results have implications for diagnosis, epidemiology and for studying the systematics of the Nematodirinae.
SAR image classification based on CNN in real and simulation datasets

Science.gov (United States)

Peng, Lijiang; Liu, Ming; Liu, Xiaohua; Dong, Liquan; Hui, Mei; Zhao, Yuejin

2018-04-01

Convolution neural network (CNN) has made great success in image classification tasks. Even in the field of synthetic aperture radar automatic target recognition (SAR-ATR), state-of-art results has been obtained by learning deep representation of features on the MSTAR benchmark. However, the raw data of MSTAR have shortcomings in training a SAR-ATR model because of high similarity in background among the SAR images of each kind. This indicates that the CNN would learn the hierarchies of features of backgrounds as well as the targets. To validate the influence of the background, some other SAR images datasets have been made which contains the simulation SAR images of 10 manufactured targets such as tank and fighter aircraft, and the backgrounds of simulation SAR images are sampled from the whole original MSTAR data. The simulation datasets contain the dataset that the backgrounds of each kind images correspond to the one kind of backgrounds of MSTAR targets or clutters and the dataset that each image shares the random background of whole MSTAR targets or clutters. In addition, mixed datasets of MSTAR and simulation datasets had been made to use in the experiments. The CNN architecture proposed in this paper are trained on all datasets mentioned above. The experimental results shows that the architecture can get high performances on all datasets even the backgrounds of the images are miscellaneous, which indicates the architecture can learn a good representation of the targets even though the drastic changes on background.

On sample size and different interpretations of snow stability datasets

Science.gov (United States)

Schirmer, M.; Mitterer, C.; Schweizer, J.

2009-04-01

Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar
Really big data: Processing and analysis of large datasets

Science.gov (United States)

Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...
A robust dataset-agnostic heart disease classifier from Phonocardiogram.

Science.gov (United States)

Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

2017-07-01

Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.
A Comparative Analysis of Classification Algorithms on Diverse Datasets

Directory of Open Access Journals (Sweden)

M. Alghobiri

2018-04-01

Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.
An assessment of differences in gridded precipitation datasets in complex terrain

Science.gov (United States)

Henn, Brian; Newman, Andrew J.; Livneh, Ben; Daly, Christopher; Lundquist, Jessica D.

2018-01-01

Hydrologic modeling and other geophysical applications are sensitive to precipitation forcing data quality, and there are known challenges in spatially distributing gauge-based precipitation over complex terrain. We conduct a comparison of six high-resolution, daily and monthly gridded precipitation datasets over the Western United States. We compare the long-term average spatial patterns, and interannual variability of water-year total precipitation, as well as multi-year trends in precipitation across the datasets. We find that the greatest absolute differences among datasets occur in high-elevation areas and in the maritime mountain ranges of the Western United States, while the greatest percent differences among datasets relative to annual total precipitation occur in arid and rain-shadowed areas. Differences between datasets in some high-elevation areas exceed 200 mm yr-1 on average, and relative differences range from 5 to 60% across the Western United States. In areas of high topographic relief, true uncertainties and biases are likely higher than the differences among the datasets; we present evidence of this based on streamflow observations. Precipitation trends in the datasets differ in magnitude and sign at smaller scales, and are sensitive to how temporal inhomogeneities in the underlying precipitation gauge data are handled.
Strontium removal jar test dataset for all figures and tables.

Data.gov (United States)

U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...
Development of a SPARK Training Dataset

Energy Technology Data Exchange (ETDEWEB)

Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

2015-03-01

In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the
Benchmarking of Typical Meteorological Year datasets dedicated to Concentrated-PV systems

Science.gov (United States)

Realpe, Ana Maria; Vernay, Christophe; Pitaval, Sébastien; Blanc, Philippe; Wald, Lucien; Lenoir, Camille

2016-04-01

Accurate analysis of meteorological and pyranometric data for long-term analysis is the basis of decision-making for banks and investors, regarding solar energy conversion systems. This has led to the development of methodologies for the generation of Typical Meteorological Years (TMY) datasets. The most used method for solar energy conversion systems was proposed in 1978 by the Sandia Laboratory (Hall et al., 1978) considering a specific weighted combination of different meteorological variables with notably global, diffuse horizontal and direct normal irradiances, air temperature, wind speed, relative humidity. In 2012, a new approach was proposed in the framework of the European project FP7 ENDORSE. It introduced the concept of "driver" that is defined by the user as an explicit function of the pyranometric and meteorological relevant variables to improve the representativeness of the TMY datasets with respect the specific solar energy conversion system of interest. The present study aims at comparing and benchmarking different TMY datasets considering a specific Concentrated-PV (CPV) system as the solar energy conversion system of interest. Using long-term (15+ years) time-series of high quality meteorological and pyranometric ground measurements, three types of TMY datasets generated by the following methods: the Sandia method, a simplified driver with DNI as the only representative variable and a more sophisticated driver. The latter takes into account the sensitivities of the CPV system with respect to the spectral distribution of the solar irradiance and wind speed. Different TMY datasets from the three methods have been generated considering different numbers of years in the historical dataset, ranging from 5 to 15 years. The comparisons and benchmarking of these TMY datasets are conducted considering the long-term time series of simulated CPV electric production as a reference. The results of this benchmarking clearly show that the Sandia method is not
The diet of introduced brook trout (Salvelinus fontinalis; Mitchill, 1814 in an alpine area and a literature review on its feeding ecology

Directory of Open Access Journals (Sweden)

Rocco Tiberti

2016-05-01

Full Text Available Introduced fish are a major threat for high altitude aquatic habitats and Salvelinus fontinalis have been widely used throughout the Alps for stocking lakes and rivers. Understanding its feeding ecology is a basic, but essential tool for interpreting its impact. To assess which factors determine the diet of S. fontinalis we analyzed more than 500 stomachs from several introduced populations from the Gran Paradiso National Park (GPNP, Western Italian Alps and we measured the availability of several prey groups (zooplankton, aquatic invertebrates, terrestrial invertebrates. We complemented the study with a short, but exhaustive literature review on the S. fontinalis feeding ecology. In general the food composition reflected the availability of prey -confirming that S. fontinalis is an opportunistic predator- and was influenced by habitat type (stream vs lake, fish size, and seasonality. The obtained results were discussed in the light of the existing literature on the feeding ecology and ecological impact of S. fontinalis. Large benthonic insects account for a substantial part of the diet of stream dwelling brook trout, while they are almost absent both in the diet and in the prey species pool of lake-dwelling brook trout, probably reflecting a stronger ecological impact in the lakes.
SIAM 2007 Text Mining Competition dataset

Data.gov (United States)

National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...
Environmental Dataset Gateway (EDG) REST Interface

Data.gov (United States)

U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...
Geoseq: a tool for dissecting deep-sequencing datasets

Directory of Open Access Journals (Sweden)

Homann Robert

2010-10-01

Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
Understanding how lake populations of arctic char are structured and function with special consideration of the potential effects of climate change: a multi-faceted approach.

Science.gov (United States)

Budy, Phaedra; Luecke, Chris

2014-09-01

Size dimorphism in fish populations, both its causes and consequences, has been an area of considerable focus; however, uncertainty remains whether size dimorphism is dynamic or stabilizing and about the role of exogenous factors. Here, we explored patterns among empirical vital rates, population structure, abundance and trend, and predicted the effects of climate change on populations of arctic char (Salvelinus alpinus) in two lakes. Both populations cycle dramatically between dominance by small (≤300 mm) and large (>300 mm) char. Apparent survival (Φ) and specific growth rates (SGR) were relatively high (40-96%; SGR range 0.03-1.5%) and comparable to those of conspecifics at lower latitudes. Climate change scenarios mimicked observed patterns of warming and resulted in temperatures closer to optimal for char growth (15.15 °C) and a longer growing season. An increase in consumption rates (28-34%) under climate change scenarios led to much greater growth rates (23-34%). Higher growth rates predicted under climate change resulted in an even greater predicted amplitude of cycles in population structure as well as an increase in reproductive output (Ro) and decrease in generation time (Go). Collectively, these results indicate arctic char populations (not just individuals) are extremely sensitive to small changes in the number of ice-free days. We hypothesize years with a longer growing season, predicted to occur more often under climate change, produce elevated growth rates of small char and act in a manner similar to a "resource pulse," allowing a sub-set of small char to "break through," thus setting the cycle in population structure.
On the relative effect of spawning asynchrony, sperm quantity and sperm quality on paternity under sperm competition in an external fertilizer

Directory of Open Access Journals (Sweden)

Torvald Blikra Egeland

2015-07-01

Full Text Available How much of a fitness benefit is obtained by dominant males of external fertilizers from releasing ejaculates in synchrony with female egg-release when engaging in sperm competition, and what is the most important sperm trait for paternity in these situations? The Arctic charr (Salvelinus alpinus is an external fertilizer experiencing intense male-male competition over reproductive opportunities including sperm competition. To compensate for their disadvantage the sneaker males, which often spawn out of synchrony with the female, produce more and faster sperm than the guarding males. We used controlled in vitro fertilization trials with experimentally produced dominant and subordinate, sneaker males to test what effect relative synchrony in gamete release, sperm quality (i.e., motility and velocity and sperm quantity have on a male’s fertilization success in pair-wise sperm competitions. When the sneaker males released ejaculates after the guarding male there was no overall difference in fertilization success. The quality (i.e., motility and velocity of a male’s sperm relative to that of the competing male was the best predictor of male fertilization success regardless of their mating tactic and spawning synchrony. The relative number of sperm cells also had an effect on fertilization success, but mainly when the dominant and sneaker male ejaculated synchronously. Our close imitation of natural sperm competition in charr shows that the sneaker males of external fertilizing species may fully compensate for their disadvantaged mating role by producing ejaculates of higher quality - an adjustment strangely not met by dominants.
Studies on the levels of Cs-137 originating from the Chernobyl accident in salmonid fish, its prey organisms and environment, in some alpine lakes of northern Sweden

International Nuclear Information System (INIS)

Hammar, J.; Neumann, G.; Notter, M.

1988-01-01

Fallout from the Chernobyl accident resulted in heavy surface contamination in areas of middle and northern Sweden. One of the most heavily affected regions was an approx. 5000 km 2 alpine district at the border to Norway. Shortly after the accident, the levels of Cs-137 and other radionuclides began to rise in two dominating fish species of the region, the brown trout (Salmo trutta L.) and the Arctic char (Salvelinus alpinus L. In July 1986 a research program started with the ultimate goal of understanding the mechanisms of transport for radionuclides through the food chain to fish, and, if possible, to give a forecast of the future development of the Cs-137 contamination. It comprises measurements of mainly Cs-137, Cs-134 and K-40 in 8 lakes within the area. The measurements have been performed at five occasions during the period July 1986 until October 1987. The preliminary results concerning Cs-137 show: - A large increase in levels of bottom sediments. A decrease in levels of water. A strong decrease in levels of phytoplankton, zooplankton, Gammarus and Mysis during autumn 1986, and a stabilization during 1987. In same lakes, however, another increase was recorded during 1987. That Mysis originally has the highest levels of all food items. That the levels of brown trout and Arctic char increased steadily during 1986 and showed a tendency of stabilization or slight decrease during 1987. That brown trout generally show higher levels than Arctic char and reaches its highest levels in lakes with introduced Mysis
Leptin and ghrelin in anadromous Arctic charr: cloning and change in expressions during a seasonal feeding cycle.

Science.gov (United States)

Frøiland, Eirik; Murashita, Koji; Jørgensen, Even Hjalmar; Kurokawa, Tadahide

2010-01-01

Anadromous (sea-migrating) Arctic charr (Salvelinus alpinus) display pronounced seasonal variations in food intake and growth and is an interesting model for studying mechanisms of appetite regulation. In this study cDNAs encoding for ghrelin (GHRL) and leptin (LEP) in Arctic charr were cloned, after which stomach GHRL and liver LEP mRNA expressions were examined by qPCR during a seasonal feeding cycle of semi-wild anadromous Arctic charr. The fish were captured as they returned from summer feeding in seawater and transferred to an indoor tank where they were fed in excess until October the year after. Growth rate was low in late winter, increased in late spring and reached a peak during summer, and then declined during autumn, when the fish became sexually mature. The changes in growth rate were associated with corresponding changes in the proportion of fish that had been eating at each sampling date, and whole body lipid status. Stomach GHRL mRNA expression was high in late winter, decreased to a nadir in mid-summer and increased again to a high level in early autumn. Liver LEP mRNA remained low during winter, spring and early summer, after which there was a gradual, 7-fold increase until October. The seasonal changes in ghrelin and leptin support a role of these hormones in the long-term regulation of energy homeostasis in the anadromous Arctic charr. It cannot be excluded, however, that the increase in liver leptin expression during autumn is related to sexual maturation.
Harvard Aging Brain Study: Dataset and accessibility.

Science.gov (United States)

Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

2017-01-01

The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year. Copyright © 2015 Elsevier Inc. All rights reserved.
Sensitivity of a numerical wave model on wind re-analysis datasets

Science.gov (United States)

Lavidas, George; Venugopal, Vengatesan; Friedrich, Daniel

2017-03-01

Wind is the dominant process for wave generation. Detailed evaluation of metocean conditions strengthens our understanding of issues concerning potential offshore applications. However, the scarcity of buoys and high cost of monitoring systems pose a barrier to properly defining offshore conditions. Through use of numerical wave models, metocean conditions can be hindcasted and forecasted providing reliable characterisations. This study reports the sensitivity of wind inputs on a numerical wave model for the Scottish region. Two re-analysis wind datasets with different spatio-temporal characteristics are used, the ERA-Interim Re-Analysis and the CFSR-NCEP Re-Analysis dataset. Different wind products alter results, affecting the accuracy obtained. The scope of this study is to assess different available wind databases and provide information concerning the most appropriate wind dataset for the specific region, based on temporal, spatial and geographic terms for wave modelling and offshore applications. Both wind input datasets delivered results from the numerical wave model with good correlation. Wave results by the 1-h dataset have higher peaks and lower biases, in expense of a high scatter index. On the other hand, the 6-h dataset has lower scatter but higher biases. The study shows how wind dataset affects the numerical wave modelling performance, and that depending on location and study needs, different wind inputs should be considered.
Querying Large Biological Network Datasets

Science.gov (United States)

Gulsoy, Gunhan

2013-01-01

New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…
BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters

Directory of Open Access Journals (Sweden)

Mithun Biswas

2017-06-01

Full Text Available BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.

A dataset of human decision-making in teamwork management

Science.gov (United States)

Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

2017-01-01

Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.
EVALUATION OF LAND USE/LAND COVER DATASETS FOR URBAN WATERSHED MODELING

International Nuclear Information System (INIS)

S.J. BURIAN; M.J. BROWN; T.N. MCPHERSON

2001-01-01

Land use/land cover (LULC) data are a vital component for nonpoint source pollution modeling. Most watershed hydrology and pollutant loading models use, in some capacity, LULC information to generate runoff and pollutant loading estimates. Simple equation methods predict runoff and pollutant loads using runoff coefficients or pollutant export coefficients that are often correlated to LULC type. Complex models use input variables and parameters to represent watershed characteristics and pollutant buildup and washoff rates as a function of LULC type. Whether using simple or complex models an accurate LULC dataset with an appropriate spatial resolution and level of detail is paramount for reliable predictions. The study presented in this paper compared and evaluated several LULC dataset sources for application in urban environmental modeling. The commonly used USGS LULC datasets have coarser spatial resolution and lower levels of classification than other LULC datasets. In addition, the USGS datasets do not accurately represent the land use in areas that have undergone significant land use change during the past two decades. We performed a watershed modeling analysis of three urban catchments in Los Angeles, California, USA to investigate the relative difference in average annual runoff volumes and total suspended solids (TSS) loads when using the USGS LULC dataset versus using a more detailed and current LULC dataset. When the two LULC datasets were aggregated to the same land use categories, the relative differences in predicted average annual runoff volumes and TSS loads from the three catchments were 8 to 14% and 13 to 40%, respectively. The relative differences did not have a predictable relationship with catchment size
Sharing Video Datasets in Design Research

DEFF Research Database (Denmark)

Christensen, Bo; Abildgaard, Sille Julie Jøhnk

2017-01-01

This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...
Interpolation of diffusion weighted imaging datasets

DEFF Research Database (Denmark)

Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

2014-01-01

anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal......Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical...
Development of a SPARK Training Dataset

International Nuclear Information System (INIS)

Sayre, Amanda M.; Olson, Jarrod R.

2015-01-01

In its first five years, the National Nuclear Security Administration's (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK's intended analysis capability. The analysis demonstration sought to answer
ClimateNet: A Machine Learning dataset for Climate Science Research

Science.gov (United States)

Prabhat, M.; Biard, J.; Ganguly, S.; Ames, S.; Kashinath, K.; Kim, S. K.; Kahou, S.; Maharaj, T.; Beckham, C.; O'Brien, T. A.; Wehner, M. F.; Williams, D. N.; Kunkel, K.; Collins, W. D.

2017-12-01

Deep Learning techniques have revolutionized commercial applications in Computer vision, speech recognition and control systems. The key for all of these developments was the creation of a curated, labeled dataset ImageNet, for enabling multiple research groups around the world to develop methods, benchmark performance and compete with each other. The success of Deep Learning can be largely attributed to the broad availability of this dataset. Our empirical investigations have revealed that Deep Learning is similarly poised to benefit the task of pattern detection in climate science. Unfortunately, labeled datasets, a key pre-requisite for training, are hard to find. Individual research groups are typically interested in specialized weather patterns, making it hard to unify, and share datasets across groups and institutions. In this work, we are proposing ClimateNet: a labeled dataset that provides labeled instances of extreme weather patterns, as well as associated raw fields in model and observational output. We develop a schema in NetCDF to enumerate weather pattern classes/types, store bounding boxes, and pixel-masks. We are also working on a TensorFlow implementation to natively import such NetCDF datasets, and are providing a reference convolutional architecture for binary classification tasks. Our hope is that researchers in Climate Science, as well as ML/DL, will be able to use (and extend) ClimateNet to make rapid progress in the application of Deep Learning for Climate Science research.
Resampling Methods Improve the Predictive Power of Modeling in Class-Imbalanced Datasets

Directory of Open Access Journals (Sweden)

Paul H. Lee

2014-09-01

Full Text Available In the medical field, many outcome variables are dichotomized, and the two possible values of a dichotomized variable are referred to as classes. A dichotomized dataset is class-imbalanced if it consists mostly of one class, and performance of common classification models on this type of dataset tends to be suboptimal. To tackle such a problem, resampling methods, including oversampling and undersampling can be used. This paper aims at illustrating the effect of resampling methods using the National Health and Nutrition Examination Survey (NHANES wave 2009–2010 dataset. A total of 4677 participants aged ≥20 without self-reported diabetes and with valid blood test results were analyzed. The Classification and Regression Tree (CART procedure was used to build a classification model on undiagnosed diabetes. A participant demonstrated evidence of diabetes according to WHO diabetes criteria. Exposure variables included demographics and socio-economic status. CART models were fitted using a randomly selected 70% of the data (training dataset, and area under the receiver operating characteristic curve (AUC was computed using the remaining 30% of the sample for evaluation (testing dataset. CART models were fitted using the training dataset, the oversampled training dataset, the weighted training dataset, and the undersampled training dataset. In addition, resampling case-to-control ratio of 1:1, 1:2, and 1:4 were examined. Resampling methods on the performance of other extensions of CART (random forests and generalized boosted trees were also examined. CARTs fitted on the oversampled (AUC = 0.70 and undersampled training data (AUC = 0.74 yielded a better classification power than that on the training data (AUC = 0.65. Resampling could also improve the classification power of random forests and generalized boosted trees. To conclude, applying resampling methods in a class-imbalanced dataset improved the classification power of CART, random forests
BASE MAP DATASET, INYO COUNTY, OKLAHOMA

Data.gov (United States)

Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...
BASE MAP DATASET, JACKSON COUNTY, OKLAHOMA

Data.gov (United States)

Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...
BASE MAP DATASET, KINGFISHER COUNTY, OKLAHOMA

Data.gov (United States)

Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...
Current status, between-year comparisons and maternal transfer of organohalogenated compounds (OHCs) in Arctic char (Salvelinus alpinus) from Bjørnøya, Svalbard (Norway)

International Nuclear Information System (INIS)

Bytingsvik, J.; Frantzen, M.; Götsch, A.; Heimstad, E.S.; Christensen, G.; Evenset, A.

2015-01-01

High levels of organohalogenated compounds (OHCs) have been found in Arctic char from Lake Ellasjøen at Bjørnøya (Svalbard, Norway) compared to char from other arctic lakes. The first aim of the study was to investigate the OHC status, contaminant profile, and partitioning of OHCs between muscle and ovary tissue in spawning female char from the high-polluted Lake Ellasjøen and the low-polluted Lake Laksvatn. The second aim was to investigate if OHC levels in muscle tissue have changed over time. Between-lake comparisons show that the muscle levels (lipid weight) of hexachlorobenzene (HCB), chlordanes (∑ CHLs), mirex, dichlorodiphenyltrichloroethanes (∑ DDTs) and polychlorinated biphenyls (∑ PCBs) were up to 36 times higher in char from Ellasjøen than in Laksvatn, and confirm that the char from Ellasjøen are still heavily exposed compared to char from neighboring lake. A higher proportion of persistent OHCs were found in Ellasjøen compared to Laksvatn, while the proportion of the less persistent OHCs was highest in Laksvatn. A between-year comparison of OHC levels (i.e., HCB, DDTs, PCBs) in female and male char shows higher levels of HCB in female char from Ellasjøen in 2009/2012 compared to in 1999/2001. No other between-year differences in OHC levels were found. Due to small study groups, findings associated with between-year differences in OHC levels should be interpreted with caution. OHCs accumulate in the lipid rich ovaries of spawning females, resulting in up to six times higher levels of OHCs in ovaries compared to in muscle (wet weight). The toxic equivalent (TEQ)-value for the dioxin-like PCBs (PCB-105 and -118) in ovaries of the Ellasjøen char exceeded levels associated with increased egg mortality in rainbow trout (Oncorhynchus mykiss). Hence, we suggest that future studies should focus on the reproductive health and performance abilities of the high-exposed population of char inhabiting Lake Ellasjøen. - Highlights: • Examine levels, profile, time-trends and maternal transfer of OHCs in Arctic char • Char from Lake Ellasjøen (Bjørnøya, Norway) are known to be highly contaminated. • PCB-levels (2012): 36 times higher in char from Ellasjøen than in Laksvatn (ref.) • Higher HCB levels in female char from Ellasjøen in 2009/2012 than in 1999/2001 • OHC-levels were up to six times higher in ovaries than in muscle tissue
Current status, between-year comparisons and maternal transfer of organohalogenated compounds (OHCs) in Arctic char (Salvelinus alpinus) from Bjørnøya, Svalbard (Norway)

Energy Technology Data Exchange (ETDEWEB)

Bytingsvik, J., E-mail: jenny.bytingsvik@akvaplan.niva.no [Akvaplan-niva AS, The Fram Centre, N-9296 Tromsø Norway (Norway); Frantzen, M. [Akvaplan-niva AS, The Fram Centre, N-9296 Tromsø Norway (Norway); Götsch, A.; Heimstad, E.S. [NILU (Norwegian Institute for Air Research), The Fram Centre, N-9296 Tromsø Norway (Norway); Christensen, G. [Akvaplan-niva AS, The Fram Centre, N-9296 Tromsø Norway (Norway); Evenset, A. [Akvaplan-niva AS, The Fram Centre, N-9296 Tromsø Norway (Norway); University of Tromsø, The Arctic University of Norway, Pb 6050 Langnes, N-9037 Tromsø (Norway)

2015-07-15

High levels of organohalogenated compounds (OHCs) have been found in Arctic char from Lake Ellasjøen at Bjørnøya (Svalbard, Norway) compared to char from other arctic lakes. The first aim of the study was to investigate the OHC status, contaminant profile, and partitioning of OHCs between muscle and ovary tissue in spawning female char from the high-polluted Lake Ellasjøen and the low-polluted Lake Laksvatn. The second aim was to investigate if OHC levels in muscle tissue have changed over time. Between-lake comparisons show that the muscle levels (lipid weight) of hexachlorobenzene (HCB), chlordanes (∑ CHLs), mirex, dichlorodiphenyltrichloroethanes (∑ DDTs) and polychlorinated biphenyls (∑ PCBs) were up to 36 times higher in char from Ellasjøen than in Laksvatn, and confirm that the char from Ellasjøen are still heavily exposed compared to char from neighboring lake. A higher proportion of persistent OHCs were found in Ellasjøen compared to Laksvatn, while the proportion of the less persistent OHCs was highest in Laksvatn. A between-year comparison of OHC levels (i.e., HCB, DDTs, PCBs) in female and male char shows higher levels of HCB in female char from Ellasjøen in 2009/2012 compared to in 1999/2001. No other between-year differences in OHC levels were found. Due to small study groups, findings associated with between-year differences in OHC levels should be interpreted with caution. OHCs accumulate in the lipid rich ovaries of spawning females, resulting in up to six times higher levels of OHCs in ovaries compared to in muscle (wet weight). The toxic equivalent (TEQ)-value for the dioxin-like PCBs (PCB-105 and -118) in ovaries of the Ellasjøen char exceeded levels associated with increased egg mortality in rainbow trout (Oncorhynchus mykiss). Hence, we suggest that future studies should focus on the reproductive health and performance abilities of the high-exposed population of char inhabiting Lake Ellasjøen. - Highlights: • Examine levels, profile, time-trends and maternal transfer of OHCs in Arctic char • Char from Lake Ellasjøen (Bjørnøya, Norway) are known to be highly contaminated. • PCB-levels (2012): 36 times higher in char from Ellasjøen than in Laksvatn (ref.) • Higher HCB levels in female char from Ellasjøen in 2009/2012 than in 1999/2001 • OHC-levels were up to six times higher in ovaries than in muscle tissue.
pH preference and avoidance responses of adult brook trout Salvelinus fontinalis and brown trout Salmo trutta.

Science.gov (United States)

Fost, B A; Ferreri, C P

2015-03-01

The pH preferred and avoided by wild, adult brook trout Salvelinus fontinalis and brown trout Salmo trutta was examined in a series a laboratory tests using gradual and steep-gradient flow-through aquaria. The results were compared with those published for the observed segregation patterns of juvenile S. fontinalis and S. trutta in Pennsylvania streams. The adult S. trutta tested showed a preference for pH 4·0 while adult S. fontinalis did not prefer any pH within the range tested. Salmo trutta are not found in Pennsylvania streams with a base-flow pH < 5·8 which suggests that S. trutta prefer pH well above 4·0. Adult S. trutta displayed a lack of avoidance at pH below 5·0, as also reported earlier for juveniles. The avoidance pH of wild, adult S. fontinalis (between pH 5·5 and 6·0) and S. trutta (between pH 6·5 and 7·0) did not differ appreciably from earlier study results for the avoidance pH of juvenile S. fontinalis and S. trutta. A comparison of c.i. around these avoidance estimates indicates that avoidance pH is similar among adult S. fontinalis and S. trutta in this study. The limited overlap of c.i. for avoidance pH values for the two species, however, suggests that some S. trutta will display avoidance at a higher pH when S. fontinalis will not. The results of this study indicate that segregation patterns of adult S. fontinalis and S. trutta in Pennsylvania streams could be related to pH and that competition with S. trutta could be mediating the occurrence of S. fontinalis at some pH levels. © 2015 The Fisheries Society of the British Isles.
Image segmentation evaluation for very-large datasets

Science.gov (United States)

Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

2016-03-01

With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.
A New Dataset Size Reduction Approach for PCA-Based Classification in OCR Application

Directory of Open Access Journals (Sweden)

Mohammad Amin Shayegan

2014-01-01

Full Text Available A major problem of pattern recognition systems is due to the large volume of training datasets including duplicate and similar training samples. In order to overcome this problem, some dataset size reduction and also dimensionality reduction techniques have been introduced. The algorithms presently used for dataset size reduction usually remove samples near to the centers of classes or support vector samples between different classes. However, the samples near to a class center include valuable information about the class characteristics and the support vector is important for evaluating system efficiency. This paper reports on the use of Modified Frequency Diagram technique for dataset size reduction. In this new proposed technique, a training dataset is rearranged and then sieved. The sieved training dataset along with automatic feature extraction/selection operation using Principal Component Analysis is used in an OCR application. The experimental results obtained when using the proposed system on one of the biggest handwritten Farsi/Arabic numeral standard OCR datasets, Hoda, show about 97% accuracy in the recognition rate. The recognition speed increased by 2.28 times, while the accuracy decreased only by 0.7%, when a sieved version of the dataset, which is only as half as the size of the initial training dataset, was used.
Fine-scale population structure and riverscape genetics of brook trout (Salvelinus fontinalis) distributed continuously along headwater channel networks

Science.gov (United States)

Kanno, Yoichiro; Vokoun, Jason C.; Letcher, Benjamin H.

2011-01-01

Linear and heterogeneous habitat makes headwater stream networks an ideal ecosystem in which to test the influence of environmental factors on spatial genetic patterns of obligatory aquatic species. We investigated fine-scale population structure and influence of stream habitat on individual-level genetic differentiation in brook trout (Salvelinus fontinalis) by genotyping eight microsatellite loci in 740 individuals in two headwater channel networks (7.7 and 4.4 km) in Connecticut, USA. A weak but statistically significant isolation-by-distance pattern was common in both sites. In the field, many tagged individuals were recaptured in the same 50-m reaches within a single field season (summer to fall). One study site was characterized with a hierarchical population structure, where seasonal barriers (natural falls of 1.5–2.5 m in height during summer base-flow condition) greatly reduced gene flow and perceptible spatial patterns emerged because of the presence of tributaries, each with a group of genetically distinguishable individuals. Genetic differentiation increased when pairs of individuals were separated by high stream gradient (steep channel slope) or warm stream temperature in this site, although the evidence of their influence was equivocal. In a second site, evidence for genetic clusters was weak at best, but genetic differentiation between individuals was positively correlated with number of tributary confluences. We concluded that the population-level movement of brook trout was limited in the study headwater stream networks, resulting in the fine-scale population structure (genetic clusters and clines) even at distances of a few kilometres, and gene flow was mitigated by ‘riverscape’ variables, particularly by physical barriers, waterway distance (i.e. isolation-by-distance) and the presence of tributaries.
The CMS dataset bookkeeping service

Science.gov (United States)

Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

2008-07-01

The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.
The CMS dataset bookkeeping service

Energy Technology Data Exchange (ETDEWEB)

Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V [Fermilab, Batavia, Illinois 60510 (United States); Dolgert, A; Jones, C; Kuznetsov, V; Riley, D [Cornell University, Ithaca, New York 14850 (United States)

2008-07-15

The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.
The CMS dataset bookkeeping service

International Nuclear Information System (INIS)

Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

2008-01-01

The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems
The CMS dataset bookkeeping service

International Nuclear Information System (INIS)

Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

2007-01-01

The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

A cross-country Exchange Market Pressure (EMP dataset

Directory of Open Access Journals (Sweden)

Mohit Desai

2017-06-01

Full Text Available The data presented in this article are related to the research article titled - “An exchange market pressure measure for cross country analysis” (Patnaik et al. [1]. In this article, we present the dataset for Exchange Market Pressure values (EMP for 139 countries along with their conversion factors, ρ (rho. Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values for the point estimates of ρ’s. Using the standard errors of estimates of ρ’s, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.
A cross-country Exchange Market Pressure (EMP) dataset.

Science.gov (United States)

Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

2017-06-01

The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.
The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

Science.gov (United States)

Bridges, James; Wernet, Mark P.

2011-01-01

Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.
Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network

Directory of Open Access Journals (Sweden)

Kindie Biredagn Nahato

2015-01-01

Full Text Available The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.
Analysis of trade-offs between threats of invasion by nonnative brook trout (Salvelinus fontinalis) and intentional isolation for native westslope cutthroat trout (Oncorhynchus clarkii lewisi)

Science.gov (United States)

Peterson, D.P.; Rieman, B.E.; Dunham, J.B.; Fausch, K.D.; Young, M.K.

2008-01-01

Native salmonid fishes often face simultaneous threats from habitat fragmentation and invasion by nonnative trout species. Unfortunately, management actions to address one may create or exacerbate the other. A consistent decision process would include a systematic analysis of when and where intentional use or removal of barriers is the most appropriate action. We developed a Bayesian belief network as a tool for such analyses. We focused on native westslope cutthroat trout (Oncorhynchus clarkii lewisi) and nonnative brook trout (Salvelinus fontinalis) and considered the environmental factors influencing both species, their potential interactions, and the effects of isolation on the persistence of local cutthroat trout populations. The trade-offs between isolation and invasion were strongly influenced by size and habitat quality of the stream network to be isolated and existing demographic linkages within and among populations. An application of the model in several sites in western Montana (USA) showed the process could help clarify management objectives and options and prioritize conservation actions among streams. The approach can also facilitate communication among parties concerned with native salmonids, nonnative fish invasions, barriers and intentional isolation, and management of the associated habitats and populations. ?? 2008 NRC.
Spatially-explicit estimation of geographical representation in large-scale species distribution datasets.

Science.gov (United States)

Kalwij, Jesse M; Robertson, Mark P; Ronk, Argo; Zobel, Martin; Pärtel, Meelis

2014-01-01

Much ecological research relies on existing multispecies distribution datasets. Such datasets, however, can vary considerably in quality, extent, resolution or taxonomic coverage. We provide a framework for a spatially-explicit evaluation of geographical representation within large-scale species distribution datasets, using the comparison of an occurrence atlas with a range atlas dataset as a working example. Specifically, we compared occurrence maps for 3773 taxa from the widely-used Atlas Florae Europaeae (AFE) with digitised range maps for 2049 taxa of the lesser-known Atlas of North European Vascular Plants. We calculated the level of agreement at a 50-km spatial resolution using average latitudinal and longitudinal species range, and area of occupancy. Agreement in species distribution was calculated and mapped using Jaccard similarity index and a reduced major axis (RMA) regression analysis of species richness between the entire atlases (5221 taxa in total) and between co-occurring species (601 taxa). We found no difference in distribution ranges or in the area of occupancy frequency distribution, indicating that atlases were sufficiently overlapping for a valid comparison. The similarity index map showed high levels of agreement for central, western, and northern Europe. The RMA regression confirmed that geographical representation of AFE was low in areas with a sparse data recording history (e.g., Russia, Belarus and the Ukraine). For co-occurring species in south-eastern Europe, however, the Atlas of North European Vascular Plants showed remarkably higher richness estimations. Geographical representation of atlas data can be much more heterogeneous than often assumed. Level of agreement between datasets can be used to evaluate geographical representation within datasets. Merging atlases into a single dataset is worthwhile in spite of methodological differences, and helps to fill gaps in our knowledge of species distribution ranges. Species distribution
The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

Science.gov (United States)

Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

1997-01-01

The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.
A new dataset and algorithm evaluation for mood estimation in music

OpenAIRE

Godec, Primož

2014-01-01

This thesis presents a new dataset of perceived and induced emotions for 200 audio clips. The gathered dataset provides users' perceived and induced emotions for each clip, the association of color, along with demographic and personal data, such as user's emotion state and emotion ratings, genre preference, music experience, among others. With an online survey we collected more than 7000 responses for a dataset of 200 audio excerpts, thus providing about 37 user responses per clip. The foc...
A Large-Scale 3D Object Recognition dataset

DEFF Research Database (Denmark)

Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

2016-01-01

geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...
The Wind Integration National Dataset (WIND) toolkit (Presentation)

Energy Technology Data Exchange (ETDEWEB)

Caroline Draxl: NREL

2014-01-01

Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.
An integrated pan-tropical biomass map using multiple reference datasets

NARCIS (Netherlands)

Avitabile, V.; Herold, M.; Heuvelink, G.B.M.; Lewis, S.L.; Phillips, O.L.; Asner, G.P.; Armston, J.; Asthon, P.; Banin, L.F.; Bayol, N.; Berry, N.; Boeckx, P.; Jong, De B.; Devries, B.; Girardin, C.; Kearsley, E.; Lindsell, J.A.; Lopez-gonzalez, G.; Lucas, R.; Malhi, Y.; Morel, A.; Mitchard, E.; Nagy, L.; Qie, L.; Quinones, M.; Ryan, C.M.; Slik, F.; Sunderland, T.; Vaglio Laurin, G.; Valentini, R.; Verbeeck, H.; Wijaya, A.; Willcock, S.

2016-01-01

We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of
Comparison of global 3-D aviation emissions datasets

Directory of Open Access Journals (Sweden)

S. C. Olsen

2013-01-01

Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NO_x, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NO_x emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NO_x emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NO_x with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution
Global Human Built-up And Settlement Extent (HBASE) Dataset From Landsat

Data.gov (United States)

National Aeronautics and Space Administration — The Global Human Built-up And Settlement Extent (HBASE) Dataset from Landsat is a global map of HBASE derived from the Global Land Survey (GLS) Landsat dataset for...
Passive Containment DataSet

Science.gov (United States)

This data is for Figures 6 and 7 in the journal article. The data also includes the two EPANET input files used for the analysis described in the paper, one for the looped system and one for the block system.This dataset is associated with the following publication:Grayman, W., R. Murray , and D. Savic. Redesign of Water Distribution Systems for Passive Containment of Contamination. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 108(7): 381-391, (2016).
The Lunar Source Disk: Old Lunar Datasets on a New CD-ROM

Science.gov (United States)

Hiesinger, H.

1998-01-01

A compilation of previously published datasets on CD-ROM is presented. This Lunar Source Disk is intended to be a first step in the improvement/expansion of the Lunar Consortium Disk, in order to create an "image-cube"-like data pool that can be easily accessed and might be useful for a variety of future lunar investigations. All datasets were transformed to a standard map projection that allows direct comparison of different types of information on a pixel-by pixel basis. Lunar observations have a long history and have been important to mankind for centuries, notably since the work of Plutarch and Galileo. As a consequence of centuries of lunar investigations, knowledge of the characteristics and properties of the Moon has accumulated over time. However, a side effect of this accumulation is that it has become more and more complicated for scientists to review all the datasets obtained through different techniques, to interpret them properly, to recognize their weaknesses and strengths in detail, and to combine them synoptically in geologic interpretations. Such synoptic geologic interpretations are crucial for the study of planetary bodies through remote-sensing data in order to avoid misinterpretation. In addition, many of the modem datasets, derived from Earth-based telescopes as well as from spacecraft missions, are acquired at different geometric and radiometric conditions. These differences make it challenging to compare or combine datasets directly or to extract information from different datasets on a pixel-by-pixel basis. Also, as there is no convention for the presentation of lunar datasets, different authors choose different map projections, depending on the location of the investigated areas and their personal interests. Insufficient or incomplete information on the map parameters used by different authors further complicates the reprojection of these datasets to a standard geometry. The goal of our efforts was to transfer previously published lunar
Gridded 5km GHCN-Daily Temperature and Precipitation Dataset, Version 1

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — The Gridded 5km GHCN-Daily Temperature and Precipitation Dataset (nClimGrid) consists of four climate variables derived from the GHCN-D dataset: maximum temperature,...
ENHANCED DATA DISCOVERABILITY FOR IN SITU HYPERSPECTRAL DATASETS

Directory of Open Access Journals (Sweden)

B. Rasaiah

2016-06-01

Full Text Available Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015 with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.
Environmental Dataset Gateway (EDG) CS-W Interface

Data.gov (United States)

U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...
Annotating spatio-temporal datasets for meaningful analysis in the Web

Science.gov (United States)

Stasch, Christoph; Pebesma, Edzer; Scheider, Simon

2014-05-01

More and more environmental datasets that vary in space and time are available in the Web. This comes along with an advantage of using the data for other purposes than originally foreseen, but also with the danger that users may apply inappropriate analysis procedures due to lack of important assumptions made during the data collection process. In order to guide towards a meaningful (statistical) analysis of spatio-temporal datasets available in the Web, we have developed a Higher-Order-Logic formalism that captures some relevant assumptions in our previous work [1]. It allows to proof on meaningful spatial prediction and aggregation in a semi-automated fashion. In this poster presentation, we will present a concept for annotating spatio-temporal datasets available in the Web with concepts defined in our formalism. Therefore, we have defined a subset of the formalism as a Web Ontology Language (OWL) pattern. It allows capturing the distinction between the different spatio-temporal variable types, i.e. point patterns, fields, lattices and trajectories, that in turn determine whether a particular dataset can be interpolated or aggregated in a meaningful way using a certain procedure. The actual annotations that link spatio-temporal datasets with the concepts in the ontology pattern are provided as Linked Data. In order to allow data producers to add the annotations to their datasets, we have implemented a Web portal that uses a triple store at the backend to store the annotations and to make them available in the Linked Data cloud. Furthermore, we have implemented functions in the statistical environment R to retrieve the RDF annotations and, based on these annotations, to support a stronger typing of spatio-temporal datatypes guiding towards a meaningful analysis in R. [1] Stasch, C., Scheider, S., Pebesma, E., Kuhn, W. (2014): "Meaningful spatial prediction and aggregation", Environmental Modelling & Software, 51, 149-165.
Evolving hard problems: Generating human genetics datasets with a complex etiology

Directory of Open Access Journals (Sweden)

Himmelstein Daniel S

2011-07-01

Full Text Available Abstract Background A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Results Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. Conclusions This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.

A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

Science.gov (United States)

Kadijevich, Djordje M.

2015-01-01

Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…
Sibship reconstruction for inferring mating systems, dispersal and effective population size in headwater brook trout (Salvelinus fontinalis) populations

Science.gov (United States)

Kanno, Yoichiro; Vokoun, Jason C.; Letcher, Benjamin H.

2011-01-01

Brook trout Salvelinus fontinalis populations have declined in much of the native range in eastern North America and populations are typically relegated to small headwater streams in Connecticut, USA. We used sibship reconstruction to infer mating systems, dispersal and effective population size of resident (non-anadromous) brook trout in two headwater stream channel networks in Connecticut. Brook trout were captured via backpack electrofishing using spatially continuous sampling in the two headwaters (channel network lengths of 4.4 and 7.7 km). Eight microsatellite loci were genotyped in a total of 740 individuals (80–140 mm) subsampled in a stratified random design from all 50 m-reaches in which trout were captured. Sibship reconstruction indicated that males and females were both mostly polygamous although single pair matings were also inferred. Breeder sex ratio was inferred to be nearly 1:1. Few large-sized fullsib families (>3 individuals) were inferred and the majority of individuals were inferred to have no fullsibs among those fish genotyped (family size = 1). The median stream channel distance between pairs of individuals belonging to the same large-sized fullsib families (>3 individuals) was 100 m (range: 0–1,850 m) and 250 m (range: 0–2,350 m) in the two study sites, indicating limited dispersal at least for the size class of individuals analyzed. Using a sibship assignment method, the effective population size for the two streams was estimated at 91 (95%CI: 67–123) and 210 (95%CI: 172–259), corresponding to the ratio of effective-to-census population size of 0.06 and 0.12, respectively. Both-sex polygamy, low variation in reproductive success, and a balanced sex ratio may help maintain genetic diversity of brook trout populations with small breeder sizes persisting in headwater channel networks.
Laboratory estimation of net trophic transfer efficiencies of PCB congeners to lake trout (Salvelinus namaycush) from its prey

Science.gov (United States)

Madenjian, Charles P.; Rediske, Richard R.; O'Keefe, James P.; David, Solomon R.

2014-01-01

A technique for laboratory estimation of net trophic transfer efficiency (γ) of polychlorinated biphenyl (PCB) congeners to piscivorous fish from their prey is described herein. During a 135-day laboratory experiment, we fed bloater (Coregonus hoyi) that had been caught in Lake Michigan to lake trout (Salvelinus namaycush) kept in eight laboratory tanks. Bloater is a natural prey for lake trout. In four of the tanks, a relatively high flow rate was used to ensure relatively high activity by the lake trout, whereas a low flow rate was used in the other four tanks, allowing for low lake trout activity. On a tank-by-tank basis, the amount of food eaten by the lake trout on each day of the experiment was recorded. Each lake trout was weighed at the start and end of the experiment. Four to nine lake trout from each of the eight tanks were sacrificed at the start of the experiment, and all 10 lake trout remaining in each of the tanks were euthanized at the end of the experiment. We determined concentrations of 75 PCB congeners in the lake trout at the start of the experiment, in the lake trout at the end of the experiment, and in bloaters fed to the lake trout during the experiment. Based on these measurements, γ was calculated for each of 75 PCB congeners in each of the eight tanks. Mean γ was calculated for each of the 75 PCB congeners for both active and inactive lake trout. Because the experiment was replicated in eight tanks, the standard error about mean γ could be estimated. Results from this type of experiment are useful in risk assessment models to predict future risk to humans and wildlife eating contaminated fish under various scenarios of environmental contamination.
A new dataset validation system for the Planetary Science Archive

Science.gov (United States)

Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

2007-08-01

The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that
Data Recommender: An Alternative Way to Discover Open Scientific Datasets

Science.gov (United States)

Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.

2017-12-01

Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce
Data assimilation and model evaluation experiment datasets

Science.gov (United States)

Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

1994-01-01

The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.
Artificial intelligence (AI) systems for interpreting complex medical datasets.

Science.gov (United States)

Altman, R B

2017-05-01

Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.
Full-Scale Approximations of Spatio-Temporal Covariance Models for Large Datasets

KAUST Repository

Zhang, Bohai; Sang, Huiyan; Huang, Jianhua Z.

2014-01-01

of dataset and application of such models is not feasible for large datasets. This article extends the full-scale approximation (FSA) approach by Sang and Huang (2012) to the spatio-temporal context to reduce computational complexity. A reversible jump Markov
PERFORMANCE COMPARISON FOR INTRUSION DETECTION SYSTEM USING NEURAL NETWORK WITH KDD DATASET

Directory of Open Access Journals (Sweden)

S. Devaraju

2014-04-01

Full Text Available Intrusion Detection Systems are challenging task for finding the user as normal user or attack user in any organizational information systems or IT Industry. The Intrusion Detection System is an effective method to deal with the kinds of problem in networks. Different classifiers are used to detect the different kinds of attacks in networks. In this paper, the performance of intrusion detection is compared with various neural network classifiers. In the proposed research the four types of classifiers used are Feed Forward Neural Network (FFNN, Generalized Regression Neural Network (GRNN, Probabilistic Neural Network (PNN and Radial Basis Neural Network (RBNN. The performance of the full featured KDD Cup 1999 dataset is compared with that of the reduced featured KDD Cup 1999 dataset. The MATLAB software is used to train and test the dataset and the efficiency and False Alarm Rate is measured. It is proved that the reduced dataset is performing better than the full featured dataset.
Review of ATLAS Open Data 8 TeV datasets, tools and activities

CERN Document Server

The ATLAS collaboration

2018-01-01

The ATLAS Collaboration has released two 8 TeV datasets and relevant simulated samples to the public for educational use. A number of groups within ATLAS have used these ATLAS Open Data 8 TeV datasets, developing tools and educational material to promote particle physics. The general aim of these activities is to provide simple and user-friendly interactive interfaces to simulate the procedures used by high-energy physics researchers. International Masterclasses introduce particle physics to high school students and have been studying 8 TeV ATLAS Open Data since 2015. Inspired by this success, a new ATLAS Open Data initiative was launched in 2016 for university students. A comprehensive educational platform was thus developed featuring a second 8 TeV dataset and a new set of educational tools. The 8 TeV datasets and associated tools are presented and discussed here, as well as a selection of activities studying the ATLAS Open Data 8 TeV datasets.
Recent Development on the NOAA's Global Surface Temperature Dataset

Science.gov (United States)

Zhang, H. M.; Huang, B.; Boyer, T.; Lawrimore, J. H.; Menne, M. J.; Rennie, J.

2016-12-01

Global Surface Temperature (GST) is one of the most widely used indicators for climate trend and extreme analyses. A widely used GST dataset is the NOAA merged land-ocean surface temperature dataset known as NOAAGlobalTemp (formerly MLOST). The NOAAGlobalTemp had recently been updated from version 3.5.4 to version 4. The update includes a significant improvement in the ocean surface component (Extended Reconstructed Sea Surface Temperature or ERSST, from version 3b to version 4) which resulted in an increased temperature trends in recent decades. Since then, advancements in both the ocean component (ERSST) and land component (GHCN-Monthly) have been made, including the inclusion of Argo float SSTs and expanded EOT modes in ERSST, and the use of ISTI databank in GHCN-Monthly. In this presentation, we describe the impact of those improvements on the merged global temperature dataset, in terms of global trends and other aspects.
The OXL format for the exchange of integrated datasets

Directory of Open Access Journals (Sweden)

Taubert Jan

2007-12-01

Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.
Developing a Data-Set for Stereopsis

Directory of Open Access Journals (Sweden)

D.W Hunter

2014-08-01

Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.
BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

Data.gov (United States)

Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...
PENERAPAN TEKNIK BAGGING PADA ALGORITMA KLASIFIKASI UNTUK MENGATASI KETIDAKSEIMBANGAN KELAS DATASET MEDIS

Directory of Open Access Journals (Sweden)

Rizki Tri Prasetio

2016-03-01

Full Text Available ABSTRACT – The class imbalance problems have been reported to severely hinder classiﬁcation performance of many standard learning algorithms, and have attracted a great deal of attention from researchers of different ﬁelds. Therefore, a number of methods, such as sampling methods, cost-sensitive learning methods, and bagging and boosting based ensemble methods, have been proposed to solve these problems. Some medical dataset has two classes has two classes or binominal experiencing an imbalance that causes lack of accuracy in classification. This research proposed a combination technique of bagging and algorithms of classification to improve the accuracy of medical datasets. Bagging technique used to solve the problem of imbalanced class. The proposed method is applied on three classifier algorithm i.e., naïve bayes, decision tree and k-nearest neighbor. This research uses five medical datasets obtained from UCI Machine Learning i.e.., breast-cancer, liver-disorder, heart-disease, pima-diabetes and vertebral column. Results of this research indicate that the proposed method makes a significant improvement on two algorithms of classification i.e. decision tree with p value of t-Test 0.0184 and k-nearest neighbor with p value of t-Test 0.0292, but not significant in naïve bayes with p value of t-Test 0.9236. After bagging technique applied at five medical datasets, naïve bayes has the highest accuracy for breast-cancer dataset of 96.14% with AUC of 0.984, heart-disease of 84.44% with AUC of 0.911 and pima-diabetes of 74.73% with AUC of 0.806. While the k-nearest neighbor has the best accuracy for dataset liver-disorder of 62.03% with AUC of 0.632 and vertebral-column of 82.26% with the AUC of 0.867. Keywords: ensemble technique, bagging, imbalanced class, medical dataset. ABSTRAKSI – Masalah ketidakseimbangan kelas telah dilaporkan sangat menghambat kinerja klasifikasi banyak algoritma klasifikasi dan telah menarik banyak perhatian dari
CERC Dataset (Full Hadza Data)

DEFF Research Database (Denmark)

2016-01-01

The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7......) Kyzyl, Tyva Republic; and (8) Yasawa, Fiji. Related publication: Purzycki, et al. (2016). Moralistic Gods, Supernatural Punishment and the Expansion of Human Sociality. Nature, 530(7590): 327-330....
Error characterisation of global active and passive microwave soil moisture datasets

Directory of Open Access Journals (Sweden)

W. A. Dorigo

2010-12-01

Full Text Available Understanding the error structures of remotely sensed soil moisture observations is essential for correctly interpreting observed variations and trends in the data or assimilating them in hydrological or numerical weather prediction models. Nevertheless, a spatially coherent assessment of the quality of the various globally available datasets is often hampered by the limited availability over space and time of reliable in-situ measurements. As an alternative, this study explores the triple collocation error estimation technique for assessing the relative quality of several globally available soil moisture products from active (ASCAT and passive (AMSR-E and SSM/I microwave sensors. The triple collocation is a powerful statistical tool to estimate the root mean square error while simultaneously solving for systematic differences in the climatologies of a set of three linearly related data sources with independent error structures. Prerequisite for this technique is the availability of a sufficiently large number of timely corresponding observations. In addition to the active and passive satellite-based datasets, we used the ERA-Interim and GLDAS-NOAH reanalysis soil moisture datasets as a third, independent reference. The prime objective is to reveal trends in uncertainty related to different observation principles (passive versus active, the use of different frequencies (C-, X-, and Ku-band for passive microwave observations, and the choice of the independent reference dataset (ERA-Interim versus GLDAS-NOAH. The results suggest that the triple collocation method provides realistic error estimates. Observed spatial trends agree well with the existing theory and studies on the performance of different observation principles and frequencies with respect to land cover and vegetation density. In addition, if all theoretical prerequisites are fulfilled (e.g. a sufficiently large number of common observations is available and errors of the different
Synthetic ALSPAC longitudinal datasets for the Big Data VR project.

Science.gov (United States)

Avraam, Demetris; Wilson, Rebecca C; Burton, Paul

2017-01-01

Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information. In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.
Correction of elevation offsets in multiple co-located lidar datasets

Science.gov (United States)

Thompson, David M.; Dalyander, P. Soupy; Long, Joseph W.; Plant, Nathaniel G.

2017-04-07

IntroductionTopographic elevation data collected with airborne light detection and ranging (lidar) can be used to analyze short- and long-term changes to beach and dune systems. Analysis of multiple lidar datasets at Dauphin Island, Alabama, revealed systematic, island-wide elevation differences on the order of 10s of centimeters (cm) that were not attributable to real-world change and, therefore, were likely to represent systematic sampling offsets. These offsets vary between the datasets, but appear spatially consistent within a given survey. This report describes a method that was developed to identify and correct offsets between lidar datasets collected over the same site at different times so that true elevation changes over time, associated with sediment accumulation or erosion, can be analyzed.
BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

Data.gov (United States)

Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

Data.gov (United States)

Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...
BASE MAP DATASET, CHEROKEE COUNTY, SOUTH CAROLINA

Data.gov (United States)

Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...
BASE MAP DATASET, EDGEFIELD COUNTY, SOUTH CAROLINA

Data.gov (United States)

Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...
BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

Data.gov (United States)

Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...
Genomic arrangement of salinity tolerance QTLs in salmonids: A comparative analysis of Atlantic salmon (Salmo salar with Arctic charr (Salvelinus alpinus and rainbow trout (Oncorhynchus mykiss

Directory of Open Access Journals (Sweden)

Norman Joseph D

2012-08-01

Full Text Available Abstract Background Quantitative trait locus (QTL studies show that variation in salinity tolerance in Arctic charr and rainbow trout has a genetic basis, even though both these species have low to moderate salinity tolerance capacities. QTL were observed to localize to homologous linkage group segments within putative chromosomal regions possessing multiple candidate genes. We compared salinity tolerance QTL in rainbow trout and Arctic charr to those detected in a higher salinity tolerant species, Atlantic salmon. The highly derived karyotype of Atlantic salmon allows for the assessment of whether disparity in salinity tolerance in salmonids is associated with differences in genetic architecture. To facilitate these comparisons, we examined the genomic synteny patterns of key candidate genes in the other model teleost fishes that have experienced three whole-genome duplication (3R events which preceded a fourth (4R whole genome duplication event common to all salmonid species. Results Nine linkage groups contained chromosome-wide significant QTL (AS-2, -4p, -4q, -5, -9, -12p, -12q, -14q -17q, -22, and −23, while a single genome-wide significant QTL was located on AS-4q. Salmonid genomes shared the greatest marker homology with the genome of three-spined stickleback. All linkage group arms in Atlantic salmon were syntenic with at least one stickleback chromosome, while 18 arms had multiple affinities. Arm fusions in Atlantic salmon were often between multiple regions bearing salinity tolerance QTL. Nine linkage groups in Arctic charr and six linkage group arms in rainbow trout currently have no synteny alignments with stickleback chromosomes, while eight rainbow trout linkage group arms were syntenic with multiple stickleback chromosomes. Rearrangements in the stickleback lineage involving fusions of ancestral arm segments could account for the 21 chromosome pairs observed in the stickleback karyotype. Conclusions Salinity tolerance in salmonids from three genera is to some extent controlled by the same loci. Synteny between QTL in salmonids and candidate genes in stickleback suggests genetic variation at candidate gene loci could affect salinity tolerance in all three salmonids investigated. Candidate genes often occur in pairs on chromosomes, and synteny patterns indicate these pairs are generally conserved in 2R, 3R, and 4R genomes. Synteny maps also suggest that the Atlantic salmon genome contains three larger syntenic combinations of candidate genes that are not evident in any of the other 2R, 3R, or 4R genomes examined. These larger synteny tracts appear to have resulted from ancestral arm fusions that occurred in the Atlantic salmon ancestor. We hypothesize that the superior hypo-osmoregulatory efficiency that is characteristic of Atlantic salmon may be related to these clusters.
Satellite-Based Precipitation Datasets

Science.gov (United States)

Munchak, S. J.; Huffman, G. J.

2017-12-01

Of the possible sources of precipitation data, those based on satellites provide the greatest spatial coverage. There is a wide selection of datasets, algorithms, and versions from which to choose, which can be confusing to non-specialists wishing to use the data. The International Precipitation Working Group (IPWG) maintains tables of the major publicly available, long-term, quasi-global precipitation data sets (http://www.isac.cnr.it/ ipwg/data/datasets.html), and this talk briefly reviews the various categories. As examples, NASA provides two sets of quasi-global precipitation data sets: the older Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) and current Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) mission (IMERG). Both provide near-real-time and post-real-time products that are uniformly gridded in space and time. The TMPA products are 3-hourly 0.25°x0.25° on the latitude band 50°N-S for about 16 years, while the IMERG products are half-hourly 0.1°x0.1° on 60°N-S for over 3 years (with plans to go to 16+ years in Spring 2018). In addition to the precipitation estimates, each data set provides fields of other variables, such as the satellite sensor providing estimates and estimated random error. The discussion concludes with advice about determining suitability for use, the necessity of being clear about product names and versions, and the need for continued support for satellite- and surface-based observation.
FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

Science.gov (United States)

Shcherbina, Anna

2014-08-15

High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms. Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible. FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step. FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge
Se-SAD serial femtosecond crystallography datasets from selenobiotinyl-streptavidin

Science.gov (United States)

Yoon, Chun Hong; Demirci, Hasan; Sierra, Raymond G.; Dao, E. Han; Ahmadi, Radman; Aksit, Fulya; Aquila, Andrew L.; Batyuk, Alexander; Ciftci, Halilibrahim; Guillet, Serge; Hayes, Matt J.; Hayes, Brandon; Lane, Thomas J.; Liang, Meng; Lundström, Ulf; Koglin, Jason E.; Mgbam, Paul; Rao, Yashas; Rendahl, Theodore; Rodriguez, Evan; Zhang, Lindsey; Wakatsuki, Soichi; Boutet, Sébastien; Holton, James M.; Hunter, Mark S.

2017-04-01

We provide a detailed description of selenobiotinyl-streptavidin (Se-B SA) co-crystal datasets recorded using the Coherent X-ray Imaging (CXI) instrument at the Linac Coherent Light Source (LCLS) for selenium single-wavelength anomalous diffraction (Se-SAD) structure determination. Se-B SA was chosen as the model system for its high affinity between biotin and streptavidin where the sulfur atom in the biotin molecule (C10H16N2O3S) is substituted with selenium. The dataset was collected at three different transmissions (100, 50, and 10%) using a serial sample chamber setup which allows for two sample chambers, a front chamber and a back chamber, to operate simultaneously. Diffraction patterns from Se-B SA were recorded to a resolution of 1.9 Å. The dataset is publicly available through the Coherent X-ray Imaging Data Bank (CXIDB) and also on LCLS compute nodes as a resource for research and algorithm development.
Dataset of transcriptional landscape of B cell early activation

Directory of Open Access Journals (Sweden)

Alexander S. Garruss

2015-09-01

Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.
U.S. Climate Divisional Dataset (Version Superseded)

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...
UK surveillance: provision of quality assured information from combined datasets.

Science.gov (United States)

Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R

2007-09-14

Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.
Climate Prediction Center IR 4km Dataset

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...
Multivariate Analysis of Multiple Datasets: a Practical Guide for Chemical Ecology.

Science.gov (United States)

Hervé, Maxime R; Nicolè, Florence; Lê Cao, Kim-Anh

2018-03-01

Chemical ecology has strong links with metabolomics, the large-scale study of all metabolites detectable in a biological sample. Consequently, chemical ecologists are often challenged by the statistical analyses of such large datasets. This holds especially true when the purpose is to integrate multiple datasets to obtain a holistic view and a better understanding of a biological system under study. The present article provides a comprehensive resource to analyze such complex datasets using multivariate methods. It starts from the necessary pre-treatment of data including data transformations and distance calculations, to the application of both gold standard and novel multivariate methods for the integration of different omics data. We illustrate the process of analysis along with detailed results interpretations for six issues representative of the different types of biological questions encountered by chemical ecologists. We provide the necessary knowledge and tools with reproducible R codes and chemical-ecological datasets to practice and teach multivariate methods.
Harvard Aging Brain Study : Dataset and accessibility

NARCIS (Netherlands)

Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G.; Chatwal, Jasmeer P.; Papp, Kathryn V.; Amariglio, Rebecca E.; Blacker, Deborah; Rentz, Dorene M.; Johnson, Keith A.; Sperling, Reisa A.; Schultz, Aaron P.

2017-01-01

The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging.
Large Scale Flood Risk Analysis using a New Hyper-resolution Population Dataset

Science.gov (United States)

Smith, A.; Neal, J. C.; Bates, P. D.; Quinn, N.; Wing, O.

2017-12-01

Here we present the first national scale flood risk analyses, using high resolution Facebook Connectivity Lab population data and data from a hyper resolution flood hazard model. In recent years the field of large scale hydraulic modelling has been transformed by new remotely sensed datasets, improved process representation, highly efficient flow algorithms and increases in computational power. These developments have allowed flood risk analysis to be undertaken in previously unmodeled territories and from continental to global scales. Flood risk analyses are typically conducted via the integration of modelled water depths with an exposure dataset. Over large scales and in data poor areas, these exposure data typically take the form of a gridded population dataset, estimating population density using remotely sensed data and/or locally available census data. The local nature of flooding dictates that for robust flood risk analysis to be undertaken both hazard and exposure data should sufficiently resolve local scale features. Global flood frameworks are enabling flood hazard data to produced at 90m resolution, resulting in a mis-match with available population datasets which are typically more coarsely resolved. Moreover, these exposure data are typically focused on urban areas and struggle to represent rural populations. In this study we integrate a new population dataset with a global flood hazard model. The population dataset was produced by the Connectivity Lab at Facebook, providing gridded population data at 5m resolution, representing a resolution increase over previous countrywide data sets of multiple orders of magnitude. Flood risk analysis undertaken over a number of developing countries are presented, along with a comparison of flood risk analyses undertaken using pre-existing population datasets.
Fasting augments PCB impact on liver metabolism in anadromous Arctic Char

Science.gov (United States)

Vijayan, M.M.; Aluru, N.; Maule, A.G.; Jorgensen, E.H.

2006-01-01

Anadromous arctic char (Salvelinus alpinus) undertake short feeding migrations to seawater every summer and accumulate lipids, while the rest of the year is spent in fresh water where the accumulated lipid reserves are mobilized. We tested the hypothesis that winter fasting and the associated polychlorinated biphenyls' (PCBs) redistribution from lipid depots to critical tissues impair the liver metabolic capacity in these animals. Char were administered Aroclor 1254 (0, 1, 10, and 100 mg/ kg body mass) orally and maintained for 4 months without feeding to mimic seasonal winter fasting, while fed groups (0 and 100 mg Aroclor 1254/kg) were maintained for comparison. A clear dose-related increase in PCB accumulation and cytochrome P4501A (CYP1A) protein content was observed in the livers of fasted fish. This PCB concentration and CYP1A response with the high dose of Aroclor were 1.5-fold and 3-fold greater in the fasted than in the fed fish, respectively. In fed fish, PCB exposure lowered liver glycogen content, whereas none of the other metabolic indicators were significantly affected. In fasted fish, PCB exposure depressed liver glycogen content and activities of glucose-6-phosphate dehydrogenase, alanine aminotransferase, lactate dehydrogenase, and phosphoenolpyruvate carboxykinase and elevated 3-hydroxyacylcoA dehydrogenase activity and glucocorticoid receptor protein expression. There were no significant impacts of PCB on heat shock protein 70 (hsp70) and hsp90 contents in either fed or fasted fish. Collectively, our study demonstrates that winter emaciation associated with the anadromous lifestyle predisposes arctic char to PCB impact on hepatic metabolism including disruption of the adaptive metabolic responses to extended fasting. ?? 2006 Oxford University Press.
Understanding how lake populations of arctic char are structured and function with special consideration of the potential effects of climate change: A multi-faceted approach.

Science.gov (United States)

Budy, Phaedra; Luecke, Chris

2014-01-01

Size dimorphism in fish populations, both its causes and consequences, has been an area of considerable focus; however, uncertainty remains whether size dimorphism is dynamic or stabilizing and about the role of exogenous factors. Here, we explored patterns among empirical vital rates, population structure, abundance and trend, and predicted the effects of climate change on populations of arctic char (Salvelinus alpinus) in two lakes. Both populations cycle dramatically between dominance by small (≤300 mm) and large (>300 mm) char. Apparent survival (Φ) and specific growth rates (SGR) were relatively high (40–96 %; SGR range 0.03–1.5 %) and comparable to those of conspecifics at lower latitudes. Climate change scenarios mimicked observed patterns of warming and resulted in temperatures closer to optimal for char growth (15.15 °C) and a longer growing season. An increase in consumption rates (28–34 %) under climate change scenarios led to much greater growth rates (23–34 %). Higher growth rates predicted under climate change resulted in an even greater predicted amplitude of cycles in population structure as well as an increase in reproductive output (Ro) and decrease in generation time (Go). Collectively, these results indicate arctic char populations (not just individuals) are extremely sensitive to small changes in the number of ice-free days. We hypothesize years with a longer growing season, predicted to occur more often under climate change, produce elevated growth rates of small char and act in a manner similar to a “resource pulse,” allowing a sub-set of small char to “break through,” thus setting the cycle in population structure.
Persistent organic pollutants in biota samples collected during the Ymer-80 expedition to the Arctic

Directory of Open Access Journals (Sweden)

Henrik Kylin

2015-10-01

Full Text Available During the 1980 expedition to the Arctic with the icebreaker Ymer, a number of vertebrate species were sampled for determination of persistent organic pollutants. Samples of Arctic char (Salvelinus alpinus, n=34, glaucous gull (Larus hyperboreus, n=8, common eider (Somateria mollissima, n=10, Brünnich's guillemot (Uria lomvia, n=9, ringed seal (Pusa hispida, n=2 and polar bear (Ursus maritimus, n=2 were collected. With the exception of Brünnich's guillemot, there was a marked contamination difference of birds from western as compared to eastern/northern Svalbard. Samples in the west contained a larger number of polychlorinated biphenyl (PCB congeners and also polychlorinated terphenyls, indicating local sources. Brünnich's guillemots had similar pollutant concentrations in the west and east/north; possibly younger birds were sampled in the west. In Arctic char, pollutant profiles from lake Linnévatn (n=5, the lake closest to the main economic activities in Svalbard, were similar to profiles in Arctic char from the Shetland Islands (n=5, but differed from lakes to the north and east in Svalbard (n=30. Arctic char samples had higher concentrations of hexachlorocyclohexanes (HCHs than the marine species of birds and mammals, possibly due to accumulation via snowmelt. Compared to the Baltic Sea, comparable species collected in Svalbard had lower concentrations of PCB and dichlorodiphenyltrichloroethane (DDT, but similar concentrations indicating long-range transport of hexachlorobenzene, HCHs and cyclodiene pesticides. In samples collected in Svalbard in 1971, the concentrations of PCB and DDT in Brünnich's guillemot (n=7, glaucous gull (n=2 and polar bear (n=2 were similar to the concentrations found in 1980.
Comparing the accuracy of food outlet datasets in an urban environment

Directory of Open Access Journals (Sweden)

Michelle S. Wong

2017-05-01

Full Text Available Studies that investigate the relationship between the retail food environment and health outcomes often use geospatial datasets. Prior studies have identified challenges of using the most common data sources. Retail food environment datasets created through academic-government partnership present an alternative, but their validity (retail existence, type, location has not been assessed yet. In our study, we used ground-truth data to compare the validity of two datasets, a 2015 commercial dataset (InfoUSA and data collected from 2012 to 2014 through the Maryland Food Systems Mapping Project (MFSMP, an academic-government partnership, on the retail food environment in two low-income, inner city neighbourhoods in Baltimore City. We compared sensitivity and positive predictive value (PPV of the commercial and academic-government partnership data to ground-truth data for two broad categories of unhealthy food retailers: small food retailers and quick-service restaurants. Ground-truth data was collected in 2015 and analysed in 2016. Compared to the ground-truth data, MFSMP and InfoUSA generally had similar sensitivity that was greater than 85%. MFSMP had higher PPV compared to InfoUSA for both small food retailers (MFSMP: 56.3% vs InfoUSA: 40.7% and quick-service restaurants (MFSMP: 58.6% vs InfoUSA: 36.4%. We conclude that data from academic-government partnerships like MFSMP might be an attractive alternative option and improvement to relying only on commercial data. Other research institutes or cities might consider efforts to create and maintain such an environmental dataset. Even if these datasets cannot be updated on an annual basis, they are likely more accurate than commercial data.
Comparing the accuracy of food outlet datasets in an urban environment.

Science.gov (United States)

Wong, Michelle S; Peyton, Jennifer M; Shields, Timothy M; Curriero, Frank C; Gudzune, Kimberly A

2017-05-11

Studies that investigate the relationship between the retail food environment and health outcomes often use geospatial datasets. Prior studies have identified challenges of using the most common data sources. Retail food environment datasets created through academic-government partnership present an alternative, but their validity (retail existence, type, location) has not been assessed yet. In our study, we used ground-truth data to compare the validity of two datasets, a 2015 commercial dataset (InfoUSA) and data collected from 2012 to 2014 through the Maryland Food Systems Mapping Project (MFSMP), an academic-government partnership, on the retail food environment in two low-income, inner city neighbourhoods in Baltimore City. We compared sensitivity and positive predictive value (PPV) of the commercial and academic-government partnership data to ground-truth data for two broad categories of unhealthy food retailers: small food retailers and quick-service restaurants. Ground-truth data was collected in 2015 and analysed in 2016. Compared to the ground-truth data, MFSMP and InfoUSA generally had similar sensitivity that was greater than 85%. MFSMP had higher PPV compared to InfoUSA for both small food retailers (MFSMP: 56.3% vs InfoUSA: 40.7%) and quick-service restaurants (MFSMP: 58.6% vs InfoUSA: 36.4%). We conclude that data from academic-government partnerships like MFSMP might be an attractive alternative option and improvement to relying only on commercial data. Other research institutes or cities might consider efforts to create and maintain such an environmental dataset. Even if these datasets cannot be updated on an annual basis, they are likely more accurate than commercial data.

Global-scale evaluation of 22 precipitation datasets using gauge observations and hydrological modeling

Directory of Open Access Journals (Sweden)

H. E. Beck

2017-12-01

Full Text Available We undertook a comprehensive evaluation of 22 gridded (quasi-global (sub-daily precipitation (P datasets for the period 2000–2016. Thirteen non-gauge-corrected P datasets were evaluated using daily P gauge observations from 76 086 gauges worldwide. Another nine gauge-corrected datasets were evaluated using hydrological modeling, by calibrating the HBV conceptual model against streamflow records for each of 9053 small to medium-sized ( <  50 000 km2 catchments worldwide, and comparing the resulting performance. Marked differences in spatio-temporal patterns and accuracy were found among the datasets. Among the uncorrected P datasets, the satellite- and reanalysis-based MSWEP-ng V1.2 and V2.0 datasets generally showed the best temporal correlations with the gauge observations, followed by the reanalyses (ERA-Interim, JRA-55, and NCEP-CFSR and the satellite- and reanalysis-based CHIRP V2.0 dataset, the estimates based primarily on passive microwave remote sensing of rainfall (CMORPH V1.0, GSMaP V5/6, and TMPA 3B42RT V7 or near-surface soil moisture (SM2RAIN-ASCAT, and finally, estimates based primarily on thermal infrared imagery (GridSat V1.0, PERSIANN, and PERSIANN-CCS. Two of the three reanalyses (ERA-Interim and JRA-55 unexpectedly obtained lower trend errors than the satellite datasets. Among the corrected P datasets, the ones directly incorporating daily gauge data (CPC Unified, and MSWEP V1.2 and V2.0 generally provided the best calibration scores, although the good performance of the fully gauge-based CPC Unified is unlikely to translate to sparsely or ungauged regions. Next best results were obtained with P estimates directly incorporating temporally coarser gauge data (CHIRPS V2.0, GPCP-1DD V1.2, TMPA 3B42 V7, and WFDEI-CRU, which in turn outperformed the one indirectly incorporating gauge data through another multi-source dataset (PERSIANN-CDR V1R1. Our results highlight large differences in estimation accuracy
Creation of the Naturalistic Engagement in Secondary Tasks (NEST) distracted driving dataset.

Science.gov (United States)

Owens, Justin M; Angell, Linda; Hankey, Jonathan M; Foley, James; Ebe, Kazutoshi

2015-09-01

Distracted driving has become a topic of critical importance to driving safety research over the past several decades. Naturalistic driving data offer a unique opportunity to study how drivers engage with secondary tasks in real-world driving; however, the complexities involved with identifying and coding relevant epochs of naturalistic data have limited its accessibility to the general research community. This project was developed to help address this problem by creating an accessible dataset of driver behavior and situational factors observed during distraction-related safety-critical events and baseline driving epochs, using the Strategic Highway Research Program 2 (SHRP2) naturalistic dataset. The new NEST (Naturalistic Engagement in Secondary Tasks) dataset was created using crashes and near-crashes from the SHRP2 dataset that were identified as including secondary task engagement as a potential contributing factor. Data coding included frame-by-frame video analysis of secondary task and hands-on-wheel activity, as well as summary event information. In addition, information about each secondary task engagement within the trip prior to the crash/near-crash was coded at a higher level. Data were also coded for four baseline epochs and trips per safety-critical event. 1,180 events and baseline epochs were coded, and a dataset was constructed. The project team is currently working to determine the most useful way to allow broad public access to the dataset. We anticipate that the NEST dataset will be extraordinarily useful in allowing qualified researchers access to timely, real-world data concerning how drivers interact with secondary tasks during safety-critical events and baseline driving. The coded dataset developed for this project will allow future researchers to have access to detailed data on driver secondary task engagement in the real world. It will be useful for standalone research, as well as for integration with additional SHRP2 data to enable the
A multimodal dataset for authoring and editing multimedia content: The MAMEM project

Directory of Open Access Journals (Sweden)

Spiros Nikolopoulos

2017-12-01

Full Text Available We present a dataset that combines multimodal biosignals and eye tracking information gathered under a human-computer interaction framework. The dataset was developed in the vein of the MAMEM project that aims to endow people with motor disabilities with the ability to edit and author multimedia content through mental commands and gaze activity. The dataset includes EEG, eye-tracking, and physiological (GSR and Heart rate signals collected from 34 individuals (18 able-bodied and 16 motor-impaired. Data were collected during the interaction with specifically designed interface for web browsing and multimedia content manipulation and during imaginary movement tasks. The presented dataset will contribute towards the development and evaluation of modern human-computer interaction systems that would foster the integration of people with severe motor impairments back into society.
Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

Data.gov (United States)

National Aeronautics and Space Administration — We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved...
An integrated dataset for in silico drug discovery

Directory of Open Access Journals (Sweden)

Cockell Simon J

2010-12-01

Full Text Available Drug development is expensive and prone to failure. It is potentially much less risky and expensive to reuse a drug developed for one condition for treating a second disease, than it is to develop an entirely new compound. Systematic approaches to drug repositioning are needed to increase throughput and find candidates more reliably. Here we address this need with an integrated systems biology dataset, developed using the Ondex data integration platform, for the in silico discovery of new drug repositioning candidates. We demonstrate that the information in this dataset allows known repositioning examples to be discovered. We also propose a means of automating the search for new treatment indications of existing compounds.
Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

Science.gov (United States)

Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene

2018-01-01

Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie
An innovative privacy preserving technique for incremental datasets on cloud computing.

Science.gov (United States)

Aldeen, Yousra Abdul Alsahib S; Salleh, Mazleena; Aljeroudi, Yazan

2016-08-01

Cloud computing (CC) is a magnificent service-based delivery with gigantic computer processing power and data storage across connected communications channels. It imparted overwhelming technological impetus in the internet (web) mediated IT industry, where users can easily share private data for further analysis and mining. Furthermore, user affable CC services enable to deploy sundry applications economically. Meanwhile, simple data sharing impelled various phishing attacks and malware assisted security threats. Some privacy sensitive applications like health services on cloud that are built with several economic and operational benefits necessitate enhanced security. Thus, absolute cyberspace security and mitigation against phishing blitz became mandatory to protect overall data privacy. Typically, diverse applications datasets are anonymized with better privacy to owners without providing all secrecy requirements to the newly added records. Some proposed techniques emphasized this issue by re-anonymizing the datasets from the scratch. The utmost privacy protection over incremental datasets on CC is far from being achieved. Certainly, the distribution of huge datasets volume across multiple storage nodes limits the privacy preservation. In this view, we propose a new anonymization technique to attain better privacy protection with high data utility over distributed and incremental datasets on CC. The proficiency of data privacy preservation and improved confidentiality requirements is demonstrated through performance evaluation. Copyright © 2016 Elsevier Inc. All rights reserved.
TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

KAUST Repository

Mü ller, Matthias; Bibi, Adel Aamer; Giancola, Silvio; Al-Subaihi, Salman; Ghanem, Bernard

2018-01-01

Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.
TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

KAUST Repository

Müller, Matthias

2018-03-28

Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.
Parton Distributions based on a Maximally Consistent Dataset

Science.gov (United States)

Rojo, Juan

2016-04-01

The choice of data that enters a global QCD analysis can have a substantial impact on the resulting parton distributions and their predictions for collider observables. One of the main reasons for this has to do with the possible presence of inconsistencies, either internal within an experiment or external between different experiments. In order to assess the robustness of the global fit, different definitions of a conservative PDF set, that is, a PDF set based on a maximally consistent dataset, have been introduced. However, these approaches are typically affected by theory biases in the selection of the dataset. In this contribution, after a brief overview of recent NNPDF developments, we propose a new, fully objective, definition of a conservative PDF set, based on the Bayesian reweighting approach. Using the new NNPDF3.0 framework, we produce various conservative sets, which turn out to be mutually in agreement within the respective PDF uncertainties, as well as with the global fit. We explore some of their implications for LHC phenomenology, finding also good consistency with the global fit result. These results provide a non-trivial validation test of the new NNPDF3.0 fitting methodology, and indicate that possible inconsistencies in the fitted dataset do not affect substantially the global fit PDFs.
Decoys Selection in Benchmarking Datasets: Overview and Perspectives

Science.gov (United States)

Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

2018-01-01

Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509
Multiresolution persistent homology for excessively large biomolecular datasets

Energy Technology Data Exchange (ETDEWEB)

Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

2015-10-07

Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.
Cross-Cultural Concept Mapping of Standardized Datasets

DEFF Research Database (Denmark)

Kano Glückstad, Fumiko

2012-01-01

This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...
dataTEL - Datasets for Technology Enhanced Learning

NARCIS (Netherlands)

Drachsler, Hendrik; Verbert, Katrien; Sicilia, Miguel-Angel; Wolpers, Martin; Manouselis, Nikos; Vuorikari, Riina; Lindstaedt, Stefanie; Fischer, Frank

2011-01-01

Drachsler, H., Verbert, K., Sicilia, M. A., Wolpers, M., Manouselis, N., Vuorikari, R., Lindstaedt, S., & Fischer, F. (2011). dataTEL - Datasets for Technology Enhanced Learning. STELLAR Alpine Rendez-Vous White Paper. Alpine Rendez-Vous 2011 White paper collection, Nr. 13., France (2011)
Tissue-Based MRI Intensity Standardization: Application to Multicentric Datasets

Directory of Open Access Journals (Sweden)

Nicolas Robitaille

2012-01-01

Full Text Available Intensity standardization in MRI aims at correcting scanner-dependent intensity variations. Existing simple and robust techniques aim at matching the input image histogram onto a standard, while we think that standardization should aim at matching spatially corresponding tissue intensities. In this study, we present a novel automatic technique, called STI for STandardization of Intensities, which not only shares the simplicity and robustness of histogram-matching techniques, but also incorporates tissue spatial intensity information. STI uses joint intensity histograms to determine intensity correspondence in each tissue between the input and standard images. We compared STI to an existing histogram-matching technique on two multicentric datasets, Pilot E-ADNI and ADNI, by measuring the intensity error with respect to the standard image after performing nonlinear registration. The Pilot E-ADNI dataset consisted in 3 subjects each scanned in 7 different sites. The ADNI dataset consisted in 795 subjects scanned in more than 50 different sites. STI was superior to the histogram-matching technique, showing significantly better intensity matching for the brain white matter with respect to the standard image.
Exploring massive, genome scale datasets with the genometricorr package

KAUST Repository

Favorov, Alexander; Mularoni, Loris; Cope, Leslie M.; Medvedeva, Yulia; Mironov, Andrey A.; Makeev, Vsevolod J.; Wheelan, Sarah J.

2012-01-01

We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.
Principal Component Analysis of Process Datasets with Missing Values

Directory of Open Access Journals (Sweden)

Kristen A. Severson

2017-07-01

Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.
Exploring massive, genome scale datasets with the genometricorr package

KAUST Repository

Favorov, Alexander

2012-05-31

We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.
Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

OpenAIRE

Li, Lianwei; Ma, Zhanshan (Sam)

2016-01-01

The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health?the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples...
Self-Reported Juvenile Firesetting: Results from Two National Survey Datasets

OpenAIRE

Howell Bowling, Carrie; Merrick, Joav; Omar, Hatim A.

2013-01-01

The main purpose of this study was to address gaps in existing research by examining the relationship between academic performance and attention problems with juvenile firesetting. Two datasets from the Achenbach System for Empirically Based Assessment (ASEBA) were used. The Factor Analysis Dataset (N = 975) was utilized and results indicated that adolescents who report lower academic performance are more likely to set fires. Additionally, adolescents who report a poor attitude toward school ...

Self-reported juvenile firesetting: Results from two national survey datasets

OpenAIRE

Carrie Howell Bowling; Joav eMerrick; Joav eMerrick; Joav eMerrick; Joav eMerrick; Hatim A Omar

2013-01-01

The main purpose of this study was to address gaps in existing research by examining the relationship between academic performance and attention problems with juvenile firesetting. Two datasets from the Achenbach System for Empirically Based Assessment (ASEBA) were used. The Factor Analysis Dataset (N = 975) was utilized and results indicated that adolescents who report lower academic performance are more likely to set fires. Additionally, adolescents who report a poor attitude toward school...
A high quality finger vascular pattern dataset collected using a custom designed capturing device

NARCIS (Netherlands)

Ton, B.T.; Veldhuis, Raymond N.J.

2013-01-01

The number of finger vascular pattern datasets available for the research community is scarce, therefore a new finger vascular pattern dataset containing 1440 images is prsented. This dataset is unique in its kind as the images are of high resolution and have a known pixel density. Furthermore this
RetroTransformDB: A Dataset of Generic Transforms for Retrosynthetic Analysis

Directory of Open Access Journals (Sweden)

Svetlana Avramova

2018-04-01

Full Text Available Presently, software tools for retrosynthetic analysis are widely used by organic, medicinal, and computational chemists. Rule-based systems extensively use collections of retro-reactions (transforms. While there are many public datasets with reactions in synthetic direction (usually non-generic reactions, there are no publicly-available databases with generic reactions in computer-readable format which can be used for the purposes of retrosynthetic analysis. Here we present RetroTransformDB—a dataset of transforms, compiled and coded in SMIRKS line notation by us. The collection is comprised of more than 100 records, with each one including the reaction name, SMIRKS linear notation, the functional group to be obtained, and the transform type classification. All SMIRKS transforms were tested syntactically, semantically, and from a chemical point of view in different software platforms. The overall dataset design and the retrosynthetic fitness were analyzed and curated by organic chemistry experts. The RetroTransformDB dataset may be used by open-source and commercial software packages, as well as chemoinformatics tools.
A multi-environment dataset for activity of daily living recognition in video streams.

Science.gov (United States)

Borreo, Alessandro; Onofri, Leonardo; Soda, Paolo

2015-08-01

Public datasets played a key role in the increasing level of interest that vision-based human action recognition has attracted in last years. While the production of such datasets has been influenced by the variability introduced by various actors performing the actions, the different modalities of interactions with the environment introduced by the variation of the scenes around the actors has been scarcely took into account. As a consequence, public datasets do not provide a proper test-bed for recognition algorithms that aim at achieving high accuracy, irrespective of the environment where actions are performed. This is all the more so, when systems are designed to recognize activities of daily living (ADL), which are characterized by a high level of human-environment interaction. For that reason, we present in this manuscript the MEA dataset, a new multi-environment ADL dataset, which permitted us to show how the change of scenario can affect the performances of state-of-the-art approaches for action recognition.
Visual Comparison of Multiple Gene Expression Datasets in a Genomic Context

Directory of Open Access Journals (Sweden)

Borowski Krzysztof

2008-06-01

Full Text Available The need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.
AFSC/REFM: Seabird Necropsy dataset of North Pacific

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...
Publishing datasets with eSciDoc and panMetaDocs

Science.gov (United States)

Ulbricht, D.; Klump, J.; Bertelmann, R.

2012-04-01

Currently serveral research institutions worldwide undertake considerable efforts to have their scientific datasets published and to syndicate them to data portals as extensively described objects identified by a persistent identifier. This is done to foster the reuse of data, to make scientific work more transparent, and to create a citable entity that can be referenced unambigously in written publications. GFZ Potsdam established a publishing workflow for file based research datasets. Key software components are an eSciDoc infrastructure [1] and multiple instances of the data curation tool panMetaDocs [2]. The eSciDoc repository holds data objects and their associated metadata in container objects, called eSciDoc items. A key metadata element in this context is the publication status of the referenced data set. PanMetaDocs, which is based on PanMetaWorks [3], is a PHP based web application that allows to describe data with any XML-based metadata schema. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and make use of contextual information in a project setting. Access rights can be applied to set visibility of datasets to other project members and allow collaboration on and notifying about datasets (RSS) and interaction with the internal messaging system, that was inherited from panMetaWorks. When a dataset is to be published, panMetaDocs allows to change the publication status of the eSciDoc item from status "private" to "submitted" and prepare the dataset for verification by an external reviewer. After quality checks, the item publication status can be changed to "published". This makes the data and metadata available through the internet worldwide. PanMetaDocs is developed as an eSciDoc application. It is an easy to use graphical user interface to eSciDoc items, their data and metadata. It is also an application supporting a DOI publication agent during the process of
Random Coefficient Logit Model for Large Datasets

NARCIS (Netherlands)

C. Hernández-Mireles (Carlos); D. Fok (Dennis)

2010-01-01

textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,
NOAA Global Surface Temperature Dataset, Version 4.0

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...
A multimodal MRI dataset of professional chess players.

Science.gov (United States)

Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

2015-01-01

Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.
REM-3D Reference Datasets: Reconciling large and diverse compilations of travel-time observations

Science.gov (United States)

Moulik, P.; Lekic, V.; Romanowicz, B. A.

2017-12-01

A three-dimensional Reference Earth model (REM-3D) should ideally represent the consensus view of long-wavelength heterogeneity in the Earth's mantle through the joint modeling of large and diverse seismological datasets. This requires reconciliation of datasets obtained using various methodologies and identification of consistent features. The goal of REM-3D datasets is to provide a quality-controlled and comprehensive set of seismic observations that would not only enable construction of REM-3D, but also allow identification of outliers and assist in more detailed studies of heterogeneity. The community response to data solicitation has been enthusiastic with several groups across the world contributing recent measurements of normal modes, (fundamental mode and overtone) surface waves, and body waves. We present results from ongoing work with body and surface wave datasets analyzed in consultation with a Reference Dataset Working Group. We have formulated procedures for reconciling travel-time datasets that include: (1) quality control for salvaging missing metadata; (2) identification of and reasons for discrepant measurements; (3) homogenization of coverage through the construction of summary rays; and (4) inversions of structure at various wavelengths to evaluate inter-dataset consistency. In consultation with the Reference Dataset Working Group, we retrieved the station and earthquake metadata in several legacy compilations and codified several guidelines that would facilitate easy storage and reproducibility. We find strong agreement between the dispersion measurements of fundamental-mode Rayleigh waves, particularly when made using supervised techniques. The agreement deteriorates substantially in surface-wave overtones, for which discrepancies vary with frequency and overtone number. A half-cycle band of discrepancies is attributed to reversed instrument polarities at a limited number of stations, which are not reflected in the instrument response history
An integrated pan-tropical biomass map using multiple reference datasets

OpenAIRE

Avitabile, V.; Herold, M.; Heuvelink, G. B. M.; Lewis, S. L.; Phillips, O. L.; Asner, G. P.; Armston, J.; Ashton, P. S.; Banin, L.; Bayol, N.; Berry, N. J.; Boeckx, P.; de Jong, B. H. J.; DeVries, B.; Girardin, C. A. J.

2016-01-01

We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging...
USGS National Hydrography Dataset from The National Map

Data.gov (United States)

U.S. Geological Survey, Department of the Interior — USGS The National Map - National Hydrography Dataset (NHD) is a comprehensive set of digital spatial data that encodes information about naturally occurring and...
Newton SSANTA Dr Water using POU filters dataset

Data.gov (United States)

U.S. Environmental Protection Agency — This dataset contains information about all the features extracted from the raw data files, the formulas that were assigned to some of these features, and the...
Full-Scale Approximations of Spatio-Temporal Covariance Models for Large Datasets

KAUST Repository

Zhang, Bohai

2014-01-01

Various continuously-indexed spatio-temporal process models have been constructed to characterize spatio-temporal dependence structures, but the computational complexity for model fitting and predictions grows in a cubic order with the size of dataset and application of such models is not feasible for large datasets. This article extends the full-scale approximation (FSA) approach by Sang and Huang (2012) to the spatio-temporal context to reduce computational complexity. A reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points. Our approach is applicable to nonseparable and nonstationary spatio-temporal covariance models. We illustrate the effectiveness of our method through simulation experiments and application to an ozone measurement dataset.
USGS National Boundary Dataset (NBD) Downloadable Data Collection

Data.gov (United States)

U.S. Geological Survey, Department of the Interior — The USGS Governmental Unit Boundaries dataset from The National Map (TNM) represents major civil areas for the Nation, including States or Territories, counties (or...
Thesaurus Dataset of Educational Technology in Chinese

Science.gov (United States)

Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

2015-01-01

The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…
BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

Data.gov (United States)

Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...
Exudate-based diabetic macular edema detection in fundus images using publicly available datasets

Energy Technology Data Exchange (ETDEWEB)

Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

2011-01-01

Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.
A conceptual prototype for the next-generation national elevation dataset

Science.gov (United States)

Stoker, Jason M.; Heidemann, Hans Karl; Evans, Gayla A.; Greenlee, Susan K.

2013-01-01

In 2012 the U.S. Geological Survey's (USGS) National Geospatial Program (NGP) funded a study to develop a conceptual prototype for a new National Elevation Dataset (NED) design with expanded capabilities to generate and deliver a suite of bare earth and above ground feature information over the United States. This report details the research on identifying operational requirements based on prior research, evaluation of what is needed for the USGS to meet these requirements, and development of a possible conceptual framework that could potentially deliver the kinds of information that are needed to support NGP's partners and constituents. This report provides an initial proof-of-concept demonstration using an existing dataset, and recommendations for the future, to inform NGP's ongoing and future elevation program planning and management decisions. The demonstration shows that this type of functional process can robustly create derivatives from lidar point cloud data; however, more research needs to be done to see how well it extends to multiple datasets.

Exploring massive, genome scale datasets with the GenometriCorr package.

Directory of Open Access Journals (Sweden)

Alexander Favorov

2012-05-01

Full Text Available We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features, for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets.The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.
Kernel-based discriminant feature extraction using a representative dataset

Science.gov (United States)

Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

2002-07-01

Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.
Topographical effects of climate dataset and their impacts on the estimation of regional net primary productivity

Science.gov (United States)

Sun, L. Qing; Feng, Feng X.

2014-11-01

In this study, we first built and compared two different climate datasets for Wuling mountainous area in 2010, one of which considered topographical effects during the ANUSPLIN interpolation was referred as terrain-based climate dataset, while the other one did not was called ordinary climate dataset. Then, we quantified the topographical effects of climatic inputs on NPP estimation by inputting two different climate datasets to the same ecosystem model, the Boreal Ecosystem Productivity Simulator (BEPS), to evaluate the importance of considering relief when estimating NPP. Finally, we found the primary contributing variables to the topographical effects through a series of experiments given an overall accuracy of the model output for NPP. The results showed that: (1) The terrain-based climate dataset presented more reliable topographic information and had closer agreements with the station dataset than the ordinary climate dataset at successive time series of 365 days in terms of the daily mean values. (2) On average, ordinary climate dataset underestimated NPP by 12.5% compared with terrain-based climate dataset over the whole study area. (3) The primary climate variables contributing to the topographical effects of climatic inputs for Wuling mountainous area were temperatures, which suggest that it is necessary to correct temperature differences for estimating NPP accurately in such a complex terrain.
USGS Watershed Boundary Dataset (WBD) Overlay Map Service from The National Map - National Geospatial Data Asset (NGDA) Watershed Boundary Dataset (WBD)

Data.gov (United States)

U.S. Geological Survey, Department of the Interior — The Watershed Boundary Dataset (WBD) from The National Map (TNM) defines the perimeter of drainage areas formed by the terrain and other landscape characteristics....
Assessment of NASA's Physiographic and Meteorological Datasets as Input to HSPF and SWAT Hydrological Models

Science.gov (United States)

Alacron, Vladimir J.; Nigro, Joseph D.; McAnally, William H.; OHara, Charles G.; Engman, Edwin Ted; Toll, David

2011-01-01

This paper documents the use of simulated Moderate Resolution Imaging Spectroradiometer land use/land cover (MODIS-LULC), NASA-LIS generated precipitation and evapo-transpiration (ET), and Shuttle Radar Topography Mission (SRTM) datasets (in conjunction with standard land use, topographical and meteorological datasets) as input to hydrological models routinely used by the watershed hydrology modeling community. The study is focused in coastal watersheds in the Mississippi Gulf Coast although one of the test cases focuses in an inland watershed located in northeastern State of Mississippi, USA. The decision support tools (DSTs) into which the NASA datasets were assimilated were the Soil Water & Assessment Tool (SWAT) and the Hydrological Simulation Program FORTRAN (HSPF). These DSTs are endorsed by several US government agencies (EPA, FEMA, USGS) for water resources management strategies. These models use physiographic and meteorological data extensively. Precipitation gages and USGS gage stations in the region were used to calibrate several HSPF and SWAT model applications. Land use and topographical datasets were swapped to assess model output sensitivities. NASA-LIS meteorological data were introduced in the calibrated model applications for simulation of watershed hydrology for a time period in which no weather data were available (1997-2006). The performance of the NASA datasets in the context of hydrological modeling was assessed through comparison of measured and model-simulated hydrographs. Overall, NASA datasets were as useful as standard land use, topographical , and meteorological datasets. Moreover, NASA datasets were used for performing analyses that the standard datasets could not made possible, e.g., introduction of land use dynamics into hydrological simulations
A novel dataset for real-life evaluation of facial expression recognition methodologies

NARCIS (Netherlands)

Siddiqi, Muhammad Hameed; Ali, Maqbool; Idris, Muhammad; Banos Legran, Oresti; Lee, Sungyoung; Choo, Hyunseung

2016-01-01

One limitation seen among most of the previous methods is that they were evaluated under settings that are far from real-life scenarios. The reason is that the existing facial expression recognition (FER) datasets are mostly pose-based and assume a predefined setup. The expressions in these datasets
Creating a Regional MODIS Satellite-Driven Net Primary Production Dataset for European Forests

Directory of Open Access Journals (Sweden)

Mathias Neumann

2016-06-01

Full Text Available Net primary production (NPP is an important ecological metric for studying forest ecosystems and their carbon sequestration, for assessing the potential supply of food or timber and quantifying the impacts of climate change on ecosystems. The global MODIS NPP dataset using the MOD17 algorithm provides valuable information for monitoring NPP at 1-km resolution. Since coarse-resolution global climate data are used, the global dataset may contain uncertainties for Europe. We used a 1-km daily gridded European climate data set with the MOD17 algorithm to create the regional NPP dataset MODIS EURO. For evaluation of this new dataset, we compare MODIS EURO with terrestrial driven NPP from analyzing and harmonizing forest inventory data (NFI from 196,434 plots in 12 European countries as well as the global MODIS NPP dataset for the years 2000 to 2012. Comparing these three NPP datasets, we found that the global MODIS NPP dataset differs from NFI NPP by 26%, while MODIS EURO only differs by 7%. MODIS EURO also agrees with NFI NPP across scales (from continental, regional to country and gradients (elevation, location, tree age, dominant species, etc.. The agreement is particularly good for elevation, dominant species or tree height. This suggests that using improved climate data allows the MOD17 algorithm to provide realistic NPP estimates for Europe. Local discrepancies between MODIS EURO and NFI NPP can be related to differences in stand density due to forest management and the national carbon estimation methods. With this study, we provide a consistent, temporally continuous and spatially explicit productivity dataset for the years 2000 to 2012 on a 1-km resolution, which can be used to assess climate change impacts on ecosystems or the potential biomass supply of the European forests for an increasing bio-based economy. MODIS EURO data are made freely available at ftp://palantir.boku.ac.at/Public/MODIS_EURO.
Boundary expansion algorithm of a decision tree induction for an imbalanced dataset

Directory of Open Access Journals (Sweden)

Kesinee Boonchuay

2017-10-01

Full Text Available A decision tree is one of the famous classifiers based on a recursive partitioning algorithm. This paper introduces the Boundary Expansion Algorithm (BEA to improve a decision tree induction that deals with an imbalanced dataset. BEA utilizes all attributes to define non-splittable ranges. The computed means of all attributes for minority instances are used to find the nearest minority instance, which will be expanded along all attributes to cover a minority region. As a result, BEA can successfully cope with an imbalanced dataset comparing with C4.5, Gini, asymmetric entropy, top-down tree, and Hellinger distance decision tree on 25 imbalanced datasets from the UCI Repository.
The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: National Elevation Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — This dataset represents the elevation values within individual local NHDPlusV2 catchments and upstream, contributing watersheds based on the National Elevation...
Socioeconomic Data and Applications Center (SEDAC) Treaty Status Dataset

Data.gov (United States)

National Aeronautics and Space Administration — The Socioeconomic Data and Application Center (SEDAC) Treaty Status Dataset contains comprehensive treaty information for multilateral environmental agreements,...
Karna Particle Size Dataset for Tables and Figures

Data.gov (United States)

U.S. Environmental Protection Agency — This dataset contains 1) table of bulk Pb-XAS LCF results, 2) table of bulk As-XAS LCF results, 3) figure data of particle size distribution, and 4) figure data for...
Quality Controlling CMIP datasets at GFDL

Science.gov (United States)

Horowitz, L. W.; Radhakrishnan, A.; Balaji, V.; Adcroft, A.; Krasting, J. P.; Nikonov, S.; Mason, E. E.; Schweitzer, R.; Nadeau, D.

2017-12-01

As GFDL makes the switch from model development to production in light of the Climate Model Intercomparison Project (CMIP), GFDL's efforts are shifted to testing and more importantly establishing guidelines and protocols for Quality Controlling and semi-automated data publishing. Every CMIP cycle introduces key challenges and the upcoming CMIP6 is no exception. The new CMIP experimental design comprises of multiple MIPs facilitating research in different focus areas. This paradigm has implications not only for the groups that develop the models and conduct the runs, but also for the groups that monitor, analyze and quality control the datasets before data publishing, before their knowledge makes its way into reports like the IPCC (Intergovernmental Panel on Climate Change) Assessment Reports. In this talk, we discuss some of the paths taken at GFDL to quality control the CMIP-ready datasets including: Jupyter notebooks, PrePARE, LAMP (Linux, Apache, MySQL, PHP/Python/Perl): technology-driven tracker system to monitor the status of experiments qualitatively and quantitatively, provide additional metadata and analysis services along with some in-built controlled-vocabulary validations in the workflow. In addition to this, we also discuss the integration of community-based model evaluation software (ESMValTool, PCMDI Metrics Package, and ILAMB) as part of our CMIP6 workflow.
Automatic registration method for multisensor datasets adopted for dimensional measurements on cutting tools

International Nuclear Information System (INIS)

Shaw, L; Mehari, F; Weckenmann, A; Ettl, S; Häusler, G

2013-01-01

Multisensor systems with optical 3D sensors are frequently employed to capture complete surface information by measuring workpieces from different views. During coarse and fine registration the resulting datasets are afterward transformed into one common coordinate system. Automatic fine registration methods are well established in dimensional metrology, whereas there is a deficit in automatic coarse registration methods. The advantage of a fully automatic registration procedure is twofold: it enables a fast and contact-free alignment and further a flexible application to datasets of any kind of optical 3D sensor. In this paper, an algorithm adapted for a robust automatic coarse registration is presented. The method was originally developed for the field of object reconstruction or localization. It is based on a segmentation of planes in the datasets to calculate the transformation parameters. The rotation is defined by the normals of three corresponding segmented planes of two overlapping datasets, while the translation is calculated via the intersection point of the segmented planes. First results have shown that the translation is strongly shape dependent: 3D data of objects with non-orthogonal planar flanks cannot be registered with the current method. In the novel supplement for the algorithm, the translation is additionally calculated via the distance between centroids of corresponding segmented planes, which results in more than one option for the transformation. A newly introduced measure considering the distance between the datasets after coarse registration evaluates the best possible transformation. Results of the robust automatic registration method are presented on the example of datasets taken from a cutting tool with a fringe-projection system and a focus-variation system. The successful application in dimensional metrology is proven with evaluations of shape parameters based on the registered datasets of a calibrated workpiece. (paper)
Analysis of Naïve Bayes Algorithm for Email Spam Filtering across Multiple Datasets

Science.gov (United States)

Fitriah Rusland, Nurul; Wahid, Norfaradilla; Kasim, Shahreen; Hafit, Hanayanti

2017-08-01

E-mail spam continues to become a problem on the Internet. Spammed e-mail may contain many copies of the same message, commercial advertisement or other irrelevant posts like pornographic content. In previous research, different filtering techniques are used to detect these e-mails such as using Random Forest, Naïve Bayesian, Support Vector Machine (SVM) and Neutral Network. In this research, we test Naïve Bayes algorithm for e-mail spam filtering on two datasets and test its performance, i.e., Spam Data and SPAMBASE datasets [8]. The performance of the datasets is evaluated based on their accuracy, recall, precision and F-measure. Our research use WEKA tool for the evaluation of Naïve Bayes algorithm for e-mail spam filtering on both datasets. The result shows that the type of email and the number of instances of the dataset has an influence towards the performance of Naïve Bayes.
Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

Directory of Open Access Journals (Sweden)

Min-Wei Huang

2018-01-01

Full Text Available Many real-world medical datasets contain some proportion of missing (attribute values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.
Discovery of Teleconnections Using Data Mining Technologies in Global Climate Datasets

Directory of Open Access Journals (Sweden)

Fan Lin

2007-10-01

Full Text Available In this paper, we apply data mining technologies to a 100-year global land precipitation dataset and a 100-year Sea Surface Temperature (SST dataset. Some interesting teleconnections are discovered, including well-known patterns and unknown patterns (to the best of our knowledge, such as teleconnections between the abnormally low temperature events of the North Atlantic and floods in Northern Bolivia, abnormally low temperatures of the Venezuelan Coast and floods in Northern Algeria and Tunisia, etc. In particular, we use a high dimensional clustering method and a method that mines episode association rules in event sequences. The former is used to cluster the original time series datasets into higher spatial granularity, and the later is used to discover teleconnection patterns among events sequences that are generated by the clustering method. In order to verify our method, we also do experiments on the SOI index and a 100-year global land precipitation dataset and find many well-known teleconnections, such as teleconnections between SOI lower events and drought events of Eastern Australia, South Africa, and North Brazil; SOI lower events and flood events of the middle-lower reaches of Yangtze River; etc. We also do explorative experiments to help domain scientists discover new knowledge.
Document Questionnaires and Datasets with DDI: A Hands-On Introduction with Colectica

OpenAIRE

Iverson, Jeremy; Smith, Dan

2018-01-01

This workshop offers a hands-on, practical approach to creating and documenting both surveys and datasets with DDI and Colectica. Participants will build and field a DDI-driven survey using their own questions or samples provided in the workshop. They will then ingest, annotate, and publish DDI dataset descriptions using the collected survey data.
Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets

Energy Technology Data Exchange (ETDEWEB)

Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

2011-01-01

Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.
A Unified Framework for Measuring Stewardship Practices Applied to Digital Environmental Datasets

Directory of Open Access Journals (Sweden)

Ge Peng

2015-01-01

Full Text Available This paper presents a stewardship maturity assessment model in the form of a matrix for digital environmental datasets. Nine key components are identified based on requirements imposed on digital environmental data and information that are cared for and disseminated by U.S. Federal agencies by U.S. law, i.e., Information Quality Act of 2001, agencies’ guidance, expert bodies’ recommendations, and users. These components include: preservability, accessibility, usability, production sustainability, data quality assurance, data quality control/monitoring, data quality assessment, transparency/traceability, and data integrity. A five-level progressive maturity scale is then defined for each component associated with measurable practices applied to individual datasets, representing Ad Hoc, Minimal, Intermediate, Advanced, and Optimal stages. The rationale for each key component and its maturity levels is described. This maturity model, leveraging community best practices and standards, provides a unified framework for assessing scientific data stewardship. It can be used to create a stewardship maturity scoreboard of dataset(s and a roadmap for scientific data stewardship improvement or to provide data quality and usability information to users, stakeholders, and decision makers.
Dataset Preservation for the Long Term: Results of the DareLux Project

Directory of Open Access Journals (Sweden)

Eugène Dürr

2008-08-01

Full Text Available The purpose of the DareLux (Data Archiving River Environment Luxembourg Project was the preservation of unique and irreplaceable datasets, for which we chose hydrology data that will be required to be used in future climatic models. The results are: an operational archive built with XML containers, the OAI-PMH protocol and an architecture based upon web services. Major conclusions are: quality control on ingest is important; digital rights management demands attention; and cost aspects of ingest and retrieval cannot be underestimated. We propose a new paradigm for information retrieval of this type of dataset. We recommend research into visualisation tools for the search and retrieval of this type of dataset.

Enhancing Conservation with High Resolution Productivity Datasets for the Conterminous United States

Science.gov (United States)

Robinson, Nathaniel Paul

Human driven alteration of the earth's terrestrial surface is accelerating through land use changes, intensification of human activity, climate change, and other anthropogenic pressures. These changes occur at broad spatio-temporal scales, challenging our ability to effectively monitor and assess the impacts and subsequent conservation strategies. While satellite remote sensing (SRS) products enable monitoring of the earth's terrestrial surface continuously across space and time, the practical applications for conservation and management of these products are limited. Often the processes driving ecological change occur at fine spatial resolutions and are undetectable given the resolution of available datasets. Additionally, the links between SRS data and ecologically meaningful metrics are weak. Recent advances in cloud computing technology along with the growing record of high resolution SRS data enable the development of SRS products that quantify ecologically meaningful variables at relevant scales applicable for conservation and management. The focus of my dissertation is to improve the applicability of terrestrial gross and net primary productivity (GPP/NPP) datasets for the conterminous United States (CONUS). In chapter one, I develop a framework for creating high resolution datasets of vegetation dynamics. I use the entire archive of Landsat 5, 7, and 8 surface reflectance data and a novel gap filling approach to create spatially continuous 30 m, 16-day composites of the normalized difference vegetation index (NDVI) from 1986 to 2016. In chapter two, I integrate this with other high resolution datasets and the MOD17 algorithm to create the first high resolution GPP and NPP datasets for CONUS. I demonstrate the applicability of these products for conservation and management, showing the improvements beyond currently available products. In chapter three, I utilize this dataset to evaluate the relationships between land ownership and terrestrial production
CLARA-A1: a cloud, albedo, and radiation dataset from 28 yr of global AVHRR data

Directory of Open Access Journals (Sweden)

K.-G. Karlsson

2013-05-01

Full Text Available A new satellite-derived climate dataset – denoted CLARA-A1 ("The CM SAF cLoud, Albedo and RAdiation dataset from AVHRR data" – is described. The dataset covers the 28 yr period from 1982 until 2009 and consists of cloud, surface albedo, and radiation budget products derived from the AVHRR (Advanced Very High Resolution Radiometer sensor carried by polar-orbiting operational meteorological satellites. Its content, anticipated accuracies, limitations, and potential applications are described. The dataset is produced by the EUMETSAT Climate Monitoring Satellite Application Facility (CM SAF project. The dataset has its strengths in the long duration, its foundation upon a homogenized AVHRR radiance data record, and in some unique features, e.g. the availability of 28 yr of summer surface albedo and cloudiness parameters over the polar regions. Quality characteristics are also well investigated and particularly useful results can be found over the tropics, mid to high latitudes and over nearly all oceanic areas. Being the first CM SAF dataset of its kind, an intensive evaluation of the quality of the datasets was performed and major findings with regard to merits and shortcomings of the datasets are reported. However, the CM SAF's long-term commitment to perform two additional reprocessing events within the time frame 2013–2018 will allow proper handling of limitations as well as upgrading the dataset with new features (e.g. uncertainty estimates and extension of the temporal coverage.
SPICE: exploration and analysis of post-cytometric complex multivariate datasets.

Science.gov (United States)

Roederer, Mario; Nozzi, Joshua L; Nason, Martha C

2011-02-01

Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.

Directory of Open Access Journals (Sweden)

Douglas Teodoro

Full Text Available The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers

Science.gov (United States)

Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

2018-01-01

The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms. PMID:29293556
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.

Science.gov (United States)

Teodoro, Douglas; Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

2018-01-01

The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
Dataset for Probabilistic estimation of residential air exchange rates for population-based exposure modeling

Data.gov (United States)

U.S. Environmental Protection Agency — This dataset provides the city-specific air exchange rate measurements, modeled, literature-based as well as housing characteristics. This dataset is associated with...
Anonymising the Sparse Dataset: A New Privacy Preservation Approach while Predicting Diseases

Directory of Open Access Journals (Sweden)

V. Shyamala Susan

2016-09-01

Full Text Available Data mining techniques analyze the medical dataset with the intention of enhancing patient’s health and privacy. Most of the existing techniques are properly suited for low dimensional medical dataset. The proposed methodology designs a model for the representation of sparse high dimensional medical dataset with the attitude of protecting the patient’s privacy from an adversary and additionally to predict the disease’s threat degree. In a sparse data set many non-zero values are randomly spread in the entire data space. Hence, the challenge is to cluster the correlated patient’s record to predict the risk degree of the disease earlier than they occur in patients and to keep privacy. The first phase converts the sparse dataset right into a band matrix through the Genetic algorithm along with Cuckoo Search (GCS.This groups the correlated patient’s record together and arranges them close to the diagonal. The next segment dissociates the patient’s disease, which is a sensitive value (SA with the parameters that determine the disease normally Quasi Identifier (QI.Finally, density based clustering technique is used over the underlying data to create anonymized groups to maintain privacy and to predict the risk level of disease. Empirical assessments on actual health care data corresponding to V.A.Medical Centre heart disease dataset reveal the efficiency of this model pertaining to information loss, utility and privacy.
CoVennTree: A new method for the comparative analysis of large datasets

Directory of Open Access Journals (Sweden)

Steffen C. Lott

2015-02-01

Full Text Available The visualization of massive datasets, such as those resulting from comparative metatranscriptome analyses or the analysis of microbial population structures using ribosomal RNA sequences, is a challenging task. We developed a new method called CoVennTree (Comparative weighted Venn Tree that simultaneously compares up to three multifarious datasets by aggregating and propagating information from the bottom to the top level and produces a graphical output in Cytoscape. With the introduction of weighted Venn structures, the contents and relationships of various datasets can be correlated and simultaneously aggregated without losing information. We demonstrate the suitability of this approach using a dataset of 16S rDNA sequences obtained from microbial populations at three different depths of the Gulf of Aqaba in the Red Sea. CoVennTree has been integrated into the Galaxy ToolShed and can be directly downloaded and integrated into the user instance.
Dataset of Passerine bird communities in a Mediterranean high mountain (Sierra Nevada, Spain).

Science.gov (United States)

Pérez-Luque, Antonio Jesús; Barea-Azcón, José Miguel; Álvarez-Ruiz, Lola; Bonet-García, Francisco Javier; Zamora, Regino

2016-01-01

In this data paper, a dataset of passerine bird communities is described in Sierra Nevada, a Mediterranean high mountain located in southern Spain. The dataset includes occurrence data from bird surveys conducted in four representative ecosystem types of Sierra Nevada from 2008 to 2015. For each visit, bird species numbers as well as distance to the transect line were recorded. A total of 27847 occurrence records were compiled with accompanying measurements on distance to the transect and animal counts. All records are of species in the order Passeriformes. Records of 16 different families and 44 genera were collected. Some of the taxa in the dataset are included in the European Red List. This dataset belongs to the Sierra Nevada Global-Change Observatory (OBSNEV), a long-term research project designed to compile socio-ecological information on the major ecosystem types in order to identify the impacts of global change in this area.
Vector Nonlinear Time-Series Analysis of Gamma-Ray Burst Datasets on Heterogeneous Clusters

Directory of Open Access Journals (Sweden)

Ioana Banicescu

2005-01-01

Full Text Available The simultaneous analysis of a number of related datasets using a single statistical model is an important problem in statistical computing. A parameterized statistical model is to be fitted on multiple datasets and tested for goodness of fit within a fixed analytical framework. Definitive conclusions are hopefully achieved by analyzing the datasets together. This paper proposes a strategy for the efficient execution of this type of analysis on heterogeneous clusters. Based on partitioning processors into groups for efficient communications and a dynamic loop scheduling approach for load balancing, the strategy addresses the variability of the computational loads of the datasets, as well as the unpredictable irregularities of the cluster environment. Results from preliminary tests of using this strategy to fit gamma-ray burst time profiles with vector functional coefficient autoregressive models on 64 processors of a general purpose Linux cluster demonstrate the effectiveness of the strategy.
Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

Science.gov (United States)

Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

2015-12-01

One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all
Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

Science.gov (United States)

Besha, A. A.; Steele, C. M.; Fernald, A.

2014-12-01

Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.
Soil chemistry in lithologically diverse datasets: the quartz dilution effect

Science.gov (United States)

Bern, Carleton R.

2009-01-01

National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.
Validity and reliability of stillbirth data using linked self-reported and administrative datasets.

Science.gov (United States)

Hure, Alexis J; Chojenta, Catherine L; Powers, Jennifer R; Byles, Julie E; Loxton, Deborah

2015-01-01

A high rate of stillbirth was previously observed in the Australian Longitudinal Study of Women's Health (ALSWH). Our primary objective was to test the validity and reliability of self-reported stillbirth data linked to state-based administrative datasets. Self-reported data, collected as part of the ALSWH cohort born in 1973-1978, were linked to three administrative datasets for women in New South Wales, Australia (n = 4374): the Midwives Data Collection; Admitted Patient Data Collection; and Perinatal Death Review Database. Linkages were obtained from the Centre for Health Record Linkage for the period 1996-2009. True cases of stillbirth were defined by being consistently recorded in two or more independent data sources. Sensitivity, specificity, positive predictive value, negative predictive value, percent agreement, and kappa statistics were calculated for each dataset. Forty-nine women reported 53 stillbirths. No dataset was 100% accurate. The administrative datasets performed better than self-reported data, with high accuracy and agreement. Self-reported data showed high sensitivity (100%) but low specificity (30%), meaning women who had a stillbirth always reported it, but there was also over-reporting of stillbirths. About half of the misreported cases in the ALSWH were able to be removed by identifying inconsistencies in longitudinal data. Data linkage provides great opportunity to assess the validity and reliability of self-reported study data. Conversely, self-reported study data can help to resolve inconsistencies in administrative datasets. Quantifying the strengths and limitations of both self-reported and administrative data can improve epidemiological research, especially by guiding methods and interpretation of findings.
Introducing a Web API for Dataset Submission into a NASA Earth Science Data Center

Science.gov (United States)

Moroni, D. F.; Quach, N.; Francis-Curley, W.

2016-12-01

As the landscape of data becomes increasingly more diverse in the domain of Earth Science, the challenges of managing and preserving data become more onerous and complex, particularly for data centers on fixed budgets and limited staff. Many solutions already exist to ease the cost burden for the downstream component of the data lifecycle, yet most archive centers are still racing to keep up with the influx of new data that still needs to find a quasi-permanent resting place. For instance, having well-defined metadata that is consistent across the entire data landscape provides for well-managed and preserved datasets throughout the latter end of the data lifecycle. Translators between different metadata dialects are already in operational use, and facilitate keeping older datasets relevant in today's world of rapidly evolving metadata standards. However, very little is done to address the first phase of the lifecycle, which deals with the entry of both data and the corresponding metadata into a system that is traditionally opaque and closed off to external data producers, thus resulting in a significant bottleneck to the dataset submission process. The ATRAC system was the NOAA NCEI's answer to this previously obfuscated barrier to scientists wishing to find a home for their climate data records, providing a web-based entry point to submit timely and accurate metadata and information about a very specific dataset. A couple of NASA's Distributed Active Archive Centers (DAACs) have implemented their own versions of a web-based dataset and metadata submission form including the ASDC and the ORNL DAAC. The Physical Oceanography DAAC is the most recent in the list of NASA-operated DAACs who have begun to offer their own web-based dataset and metadata submission services to data producers. What makes the PO.DAAC dataset and metadata submission service stand out from these pre-existing services is the option of utilizing both a web browser GUI and a RESTful API to
Brook trout (Salvelinus fontinalis extinction in small boreal lakes revealed by ephippia pigmentation: a preliminary analysis

Directory of Open Access Journals (Sweden)

Alexandre Bérubé Tellier

2016-12-01

Full Text Available Ephippium pigmentation is a plastic trait which can be related to a trade-off between visual predation pressure and better protection of cladoceran eggs against different types of stress. Experimental studies showed that planktivorous fish exert a greater predation pressure on individuals carrying darker ephippia, but little is known about the variation of ephippium pigmentation along gradients of fish predation pressure in natural conditions. For this study, our experimental design included four small boreal lakes with known fish assemblages. Two of the lakes have viable brook trout (Salvelinus fontinalis populations, whereas the other two lakes experienced brook trout extinctions during the 20th century. Cladoceran ephippia were extracted from sediment cores at layers corresponding to the documented post- extinction phase (1990's and from an older layer (1950's for which the brook trout population status is not known precisely. Our first objective was to determine whether brook trout extinction has a direct effect on both ephippium pigmentation and size. Our second objective was to give a preliminary assessment of the status of brook trout populations in the 1950's by comparing the variation in ephippia traits measured from this layer to those measured in the 1990's, for which the extinction patterns are well known. Cost-effective image analysis was used to assess variation in pigmentation levels in ephippia. This approach provided a proxy for the amount of melanin invested in each ephippium analysed. Our study clearly shows that ephippium pigmentation may represent a better indicator of the presence of fish predators than ephippium size, a trait that showed a less clear pattern of variation between lakes with and without fish. For the 1990's period, ephippia from fishless lakes were darker and showed a slight tendency to be larger than ephippia from lakes with brook trout. However, no clear differences in either ephippium size or pigmentation
Traffic sign classification with dataset augmentation and convolutional neural network

Science.gov (United States)

Tang, Qing; Kurnianggoro, Laksono; Jo, Kang-Hyun

2018-04-01

This paper presents a method for traffic sign classification using a convolutional neural network (CNN). In this method, firstly we transfer a color image into grayscale, and then normalize it in the range (-1,1) as the preprocessing step. To increase robustness of classification model, we apply a dataset augmentation algorithm and create new images to train the model. To avoid overfitting, we utilize a dropout module before the last fully connection layer. To assess the performance of the proposed method, the German traffic sign recognition benchmark (GTSRB) dataset is utilized. Experimental results show that the method is effective in classifying traffic signs.
MicroRNA Array Normalization: An Evaluation Using a Randomized Dataset as the Benchmark

Science.gov (United States)

Qin, Li-Xuan; Zhou, Qin

2014-01-01

MicroRNA arrays possess a number of unique data features that challenge the assumption key to many normalization methods. We assessed the performance of existing normalization methods using two microRNA array datasets derived from the same set of tumor samples: one dataset was generated using a blocked randomization design when assigning arrays to samples and hence was free of confounding array effects; the second dataset was generated without blocking or randomization and exhibited array effects. The randomized dataset was assessed for differential expression between two tumor groups and treated as the benchmark. The non-randomized dataset was assessed for differential expression after normalization and compared against the benchmark. Normalization improved the true positive rate significantly in the non-randomized data but still possessed a false discovery rate as high as 50%. Adding a batch adjustment step before normalization further reduced the number of false positive markers while maintaining a similar number of true positive markers, which resulted in a false discovery rate of 32% to 48%, depending on the specific normalization method. We concluded the paper with some insights on possible causes of false discoveries to shed light on how to improve normalization for microRNA arrays. PMID:24905456
A global gridded dataset of daily precipitation going back to 1950, ideal for analysing precipitation extremes

Science.gov (United States)

Contractor, S.; Donat, M.; Alexander, L. V.

2017-12-01

Reliable observations of precipitation are necessary to determine past changes in precipitation and validate models, allowing for reliable future projections. Existing gauge based gridded datasets of daily precipitation and satellite based observations contain artefacts and have a short length of record, making them unsuitable to analyse precipitation extremes. The largest limiting factor for the gauge based datasets is a dense and reliable station network. Currently, there are two major data archives of global in situ daily rainfall data, first is Global Historical Station Network (GHCN-Daily) hosted by National Oceanic and Atmospheric Administration (NOAA) and the other by Global Precipitation Climatology Centre (GPCC) part of the Deutsche Wetterdienst (DWD). We combine the two data archives and use automated quality control techniques to create a reliable long term network of raw station data, which we then interpolate using block kriging to create a global gridded dataset of daily precipitation going back to 1950. We compare our interpolated dataset with existing global gridded data of daily precipitation: NOAA Climate Prediction Centre (CPC) Global V1.0 and GPCC Full Data Daily Version 1.0, as well as various regional datasets. We find that our raw station density is much higher than other datasets. To avoid artefacts due to station network variability, we provide multiple versions of our dataset based on various completeness criteria, as well as provide the standard deviation, kriging error and number of stations for each grid cell and timestep to encourage responsible use of our dataset. Despite our efforts to increase the raw data density, the in situ station network remains sparse in India after the 1960s and in Africa throughout the timespan of the dataset. Our dataset would allow for more reliable global analyses of rainfall including its extremes and pave the way for better global precipitation observations with lower and more transparent uncertainties.

Global Man-made Impervious Surface (GMIS) Dataset From Landsat

Data.gov (United States)

National Aeronautics and Space Administration — The Global Man-made Impervious Surface (GMIS) Dataset From Landsat consists of global estimates of fractional impervious cover derived from the Global Land Survey...
Dataset: Multi Sensor-Orientation Movement Data of Goats

NARCIS (Netherlands)

Kamminga, Jacob Wilhelm

2018-01-01

This is a labeled dataset. Motion data were collected from six sensor nodes that were fixed with different orientations to a collar around the neck of goats. These six sensor nodes simultaneously, with different orientations, recorded various activities performed by the goat. We recorded the
ProDaMa: an open source Python library to generate protein structure datasets.

Science.gov (United States)

Armano, Giuliano; Manconi, Andrea

2009-10-02

The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.
Dataset of Passerine bird communities in a Mediterranean high mountain (Sierra Nevada, Spain)

Science.gov (United States)

Pérez-Luque, Antonio Jesús; Barea-Azcón, José Miguel; Álvarez-Ruiz, Lola; Bonet-García, Francisco Javier; Zamora, Regino

2016-01-01

Abstract In this data paper, a dataset of passerine bird communities is described in Sierra Nevada, a Mediterranean high mountain located in southern Spain. The dataset includes occurrence data from bird surveys conducted in four representative ecosystem types of Sierra Nevada from 2008 to 2015. For each visit, bird species numbers as well as distance to the transect line were recorded. A total of 27847 occurrence records were compiled with accompanying measurements on distance to the transect and animal counts. All records are of species in the order Passeriformes. Records of 16 different families and 44 genera were collected. Some of the taxa in the dataset are included in the European Red List. This dataset belongs to the Sierra Nevada Global-Change Observatory (OBSNEV), a long-term research project designed to compile socio-ecological information on the major ecosystem types in order to identify the impacts of global change in this area. PMID:26865820
Structured decision making for conservation of bull trout (Salvelinus confluentus) in Long Creek, Klamath River Basin, south-central Oregon

Science.gov (United States)

Benjamin, Joseph R.; McDonnell, Kevin; Dunham, Jason B.; Brignon, William R.; Peterson, James T.

2017-06-21

With the decline of bull trout (Salvelinus confluentus), managers face multiple, and sometimes contradictory, management alternatives for species recovery. Moreover, effective decision-making involves all stakeholders influenced by the decisions (such as Tribal, State, Federal, private, and non-governmental organizations) because they represent diverse objectives, jurisdictions, policy mandates, and opinions of the best management strategy. The process of structured decision making is explicitly designed to address these elements of the decision making process. Here we report on an application of structured decision making to a population of bull trout believed threatened by high densities of nonnative brook trout (S. fontinalis) and habitat fragmentation in Long Creek, a tributary to the Sycan River in the Klamath River Basin, south-central Oregon. This involved engaging stakeholders to identify (1) their fundamental objectives for the conservation of bull trout, (2) feasible management alternatives to achieve their objectives, and (3) biological information and assumptions to incorporate in a decision model. Model simulations suggested an overarching theme among the top decision alternatives, which was a need to simultaneously control brook trout and ensure that the migratory tactic of bull trout can be expressed. More specifically, the optimal management decision, based on the estimated adult abundance at year 10, was to combine the eradication of brook trout from Long Creek with improvement of downstream conditions (for example, connectivity or habitat conditions). Other top decisions included these actions independently, as well as electrofishing removal of brook trout. In contrast, translocating bull trout to a different stream or installing a barrier to prevent upstream spread of brook trout had minimal or negative effects on the bull trout population. Moreover, sensitivity analyses suggested that these actions were consistently identified as optimal across
The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: National Anthropogenic Barrier Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — This dataset represents the dam density and storage volumes within individual, local NHDPlusV2 catchments and upstream, contributing watersheds based on the National...
Evaluation of Modified Categorical Data Fuzzy Clustering Algorithm on the Wisconsin Breast Cancer Dataset

Directory of Open Access Journals (Sweden)

Amir Ahmad

2016-01-01

Full Text Available The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data.
Trophic transfer of persistent organochlorine contaminants (OCs) within an Arctic marine food web from the southern Beaufort-Chukchi Seas

International Nuclear Information System (INIS)

Hoekstra, P.F.; O'Hara, T.M.; Fisk, A.T.; Borgaa, K.; Solomon, K.R.; Muir, D.C.G.

2003-01-01

The trophic status and biomagnification of persistent OCs within the near-shore Beaufort-Chukchi Seas food web from Barrow, AK is discussed. - Stable isotope values (δ 13 C, δ 15 N) and concentrations of persistent organochlorine contaminants (OCs) were determined to evaluate the near-shore marine trophic status of biota and biomagnification of OCs from the southern Beaufort-Chukchi Seas (1999-2000) near Barrow, AK. The biota examined included zooplankton (Calanus spp.), fish species such as arctic cod (Boreogadus saida), arctic char (Salvelinus alpinus), pink salmon (Oncorhynchus gorbuscha), and fourhorn sculpin (Myoxocephalus quadricornis), along with marine mammals, including bowhead whales (Balaena mysticetus), beluga whales (Delphinapterus leucas), ringed seals (Phoca hispida) and bearded seals (Erignathus barbatus). The isotopically derived trophic position of biota from the Beaufort-Chukchi Seas marine food web, avian fauna excluded, is similar to other coastal food webs in the Arctic. Concentrations of OCs in marine mammals were significantly greater than in fish and corresponded with determined trophic level. In general, OCs with the greatest food web magnification factors (FWMFs) were those either formed due to biotransformation (e.g. p,p'-DDE, oxychlordane) or considered recalcitrant (e.g. β-HCH, 2,4,5-Cl substituted PCBs) in most biota, whereas concentrations of OCs that are considered to be readily eliminated (e.g. γ-HCH) did not correlate with trophic level. Differences in physical-chemical properties of OCs, feeding strategy and possible biotransformation were reflected in the variable biomagnification between fish and marine mammals. The FWMFs in the Beaufort-Chukchi Seas region were consistent with reported values in the Canadian Arctic and temperate food webs, but were statistically different than FWMFs from the Barents and White Seas, indicating that the spatial variability of OC contamination in top-level marine Arctic predators is
Morphological divergence between three Arctic charr morphs - the significance of the deep-water environment.

Science.gov (United States)

Skoglund, Sigrid; Siwertsson, Anna; Amundsen, Per-Arne; Knudsen, Rune

2015-08-01

Morphological divergence was evident among three sympatric morphs of Arctic charr (Salvelinus alpinus (L.)) that are ecologically diverged along the shallow-, deep-water resource axis in a subarctic postglacial lake (Norway). The two deep-water (profundal) spawning morphs, a benthivore (PB-morph) and a piscivore (PP-morph), have evolved under identical abiotic conditions with constant low light and temperature levels in their deep-water habitat, and were morphologically most similar. However, they differed in important head traits (e.g., eye and mouth size) related to their different diet specializations. The small-sized PB-morph had a paedomorphic appearance with a blunt head shape, large eyes, and a deep body shape adapted to their profundal lifestyle feeding on submerged benthos from soft, deep-water sediments. The PP-morph had a robust head, large mouth with numerous teeth, and an elongated body shape strongly related to their piscivorous behavior. The littoral spawning omnivore morph (LO-morph) predominantly utilizes the shallow benthic-pelagic habitat and food resources. Compared to the deep-water morphs, the LO-morph had smaller head relative to body size. The LO-morph exhibited traits typical for both shallow-water benthic feeding (e.g., large body depths and small eyes) and planktivorous feeding in the pelagic habitat (e.g., streamlined body shape and small mouth). The development of morphological differences within the same deep-water habitat for the PB- and PP-morphs highlights the potential of biotic factors and ecological interactions to promote further divergence in the evolution of polymorphism in a tentative incipient speciation process. The diversity of deep-water charr in this study represents a novelty in the Arctic charr polymorphism as a truly deep-water piscivore morph has to our knowledge not been described elsewhere.
Cesium in Arctic char lakes - effects of the Chernobyl accident

International Nuclear Information System (INIS)

Hammar, J.; Notter, M.; Neumann, G.

1991-01-01

Fallout radiocesium from the Chernobyl accident caused extensive contamination in a region of previously well studied alpine lake ecosystems in northern Sweden. Levels of Cs-137 in the barren catchment basins reached 20-50 kBq/m 2 during 1986. The distribution, pathways and major transport mechanisms of radiocesium through the lake ecosystems were studied during 1986-1990. Levels of Cs-137, Cs-134 and K-40 in water, surface sediment, detritus (sediment traps) and different trophic levels of the food chains of Arctic char (Salvelinus alpinus) and brown trout (Salmo trutta) were monitored in a series of lakes forming a matrix of 4 natural lakes and 3 lake reservoirs, with or without the introduced new fish food organism, Mysis relicta. The reservoirs were found to act as sinks for radiocesium with extensive accumulation recorded in water, detritus, sediment, invertebrates and salmonids. Whereas concentrations in water and biota have declined from the extreme peak levels in 1986-1987, the levels in surface sediment increased extensively until fall of 1988. The concentration of Cs-137 in fish populations feeding on benthic invertebrates, i.e. mysids and amphipods, were significantly higher than in planktivorous fish. During the three first winters a significant increase in levels of Cs-137 in winter active Arctic char were recorded, whereas the levels declined during the succeeding summers. The introduced Mysis relicta were found to enhance the transport of Cs-137 from zooplankton and settling particles to Arctic char and brown trout. The results suggest a successive change in transport of radiocesium from water via zooplankton to planktivorous fish during the early summer of 1986 to post-depositional mobilization via benthic organisms to benthic fish in successive years. (213 refs.) (au)
Preference for cannibalism and ontogenetic constraints in competitive ability of piscivorous top predators.

Directory of Open Access Journals (Sweden)

Pär Byström

Full Text Available Occurrence of cannibalism and inferior competitive ability of predators compared to their prey have been suggested to promote coexistence in size-structured intraguild predation (IGP systems. The intrinsic size-structure of fish provides the necessary prerequisites to test whether the above mechanisms are general features of species interactions in fish communities where IGP is common. We first experimentally tested whether Arctic char (Salvelinus alpinus were more efficient as a cannibal than as an interspecific predator on the prey fish ninespine stickleback (Pungitius pungitius and whether ninespine stickleback were a more efficient competitor on the shared zooplankton prey than its predator, Arctic char. Secondly, we performed a literature survey to evaluate if piscivores in general are more efficient as cannibals than as interspecific predators and whether piscivores are inferior competitors on shared resources compared to their prey fish species. Both controlled pool experiments and outdoor pond experiments showed that char imposed a higher mortality on YOY char than on ninespine sticklebacks, suggesting that piscivorous char is a more efficient cannibal than interspecific predator. Estimates of size dependent attack rates on zooplankton further showed a consistently higher attack rate of ninespine sticklebacks compared to similar sized char on zooplankton, suggesting that ninespine stickleback is a more efficient competitor than char on zooplankton resources. The literature survey showed that piscivorous top consumers generally selected conspecifics over interspecific prey, and that prey species are competitively superior compared to juvenile piscivorous species in the zooplankton niche. We suggest that the observed selectivity for cannibal prey over interspecific prey and the competitive advantage of prey species over juvenile piscivores are common features in fish communities and that the observed selectivity for cannibalism over
Preference for cannibalism and ontogenetic constraints in competitive ability of piscivorous top predators.

Science.gov (United States)

Byström, Pär; Ask, Per; Andersson, Jens; Persson, Lennart

2013-01-01

Occurrence of cannibalism and inferior competitive ability of predators compared to their prey have been suggested to promote coexistence in size-structured intraguild predation (IGP) systems. The intrinsic size-structure of fish provides the necessary prerequisites to test whether the above mechanisms are general features of species interactions in fish communities where IGP is common. We first experimentally tested whether Arctic char (Salvelinus alpinus) were more efficient as a cannibal than as an interspecific predator on the prey fish ninespine stickleback (Pungitius pungitius) and whether ninespine stickleback were a more efficient competitor on the shared zooplankton prey than its predator, Arctic char. Secondly, we performed a literature survey to evaluate if piscivores in general are more efficient as cannibals than as interspecific predators and whether piscivores are inferior competitors on shared resources compared to their prey fish species. Both controlled pool experiments and outdoor pond experiments showed that char imposed a higher mortality on YOY char than on ninespine sticklebacks, suggesting that piscivorous char is a more efficient cannibal than interspecific predator. Estimates of size dependent attack rates on zooplankton further showed a consistently higher attack rate of ninespine sticklebacks compared to similar sized char on zooplankton, suggesting that ninespine stickleback is a more efficient competitor than char on zooplankton resources. The literature survey showed that piscivorous top consumers generally selected conspecifics over interspecific prey, and that prey species are competitively superior compared to juvenile piscivorous species in the zooplankton niche. We suggest that the observed selectivity for cannibal prey over interspecific prey and the competitive advantage of prey species over juvenile piscivores are common features in fish communities and that the observed selectivity for cannibalism over interspecific prey has
Stocking activities for the Arctic charr in Lake Geneva: Genetic effects in space and time.

Science.gov (United States)

Savary, Romain; Dufresnes, Christophe; Champigneulle, Alexis; Caudron, Arnaud; Dubey, Sylvain; Perrin, Nicolas; Fumagalli, Luca

2017-07-01

Artificial stocking practices are widely used by resource managers worldwide, in order to sustain fish populations exploited by both recreational and commercial activities, but their benefits are controversial. Former practices involved exotic strains, although current programs rather consider artificial breeding of local fishes (supportive breeding). Understanding the complex genetic effects of these management strategies is an important challenge with economic and conservation implications, especially in the context of population declines. In this study, we focus on the declining Arctic charr ( Salvelinus alpinus ) population from Lake Geneva (Switzerland and France), which has initially been restocked with allochtonous fishes in the early eighties, followed by supportive breeding. In this context, we conducted a genetic survey to document the evolution of the genetic diversity and structure throughout the last 50 years, before and after the initiation of hatchery supplementation, using contemporary and historical samples. We show that the introduction of exotic fishes was associated with a genetic bottleneck in the 1980-1990s, a break of Hardy-Weinberg Equilibrium (HWE), a reduction in genetic diversity, an increase in genetic structure among spawning sites, and a change in their genetic composition. Together with better environmental conditions, three decades of subsequent supportive breeding using local fishes allowed to re-establish HWE and the initial levels of genetic variation. However, current spawning sites have not fully recovered their original genetic composition and were extensively homogenized across the lake. Our study demonstrates the drastic genetic consequences of different restocking tactics in a comprehensive spatiotemporal framework and suggests that genetic alteration by nonlocal stocking may be partly reversible through supportive breeding. We recommend that conservation-based programs consider local diversity and implement adequate
NEW WEB-BASED ACCESS TO NUCLEAR STRUCTURE DATASETS.

Energy Technology Data Exchange (ETDEWEB)

WINCHELL,D.F.

2004-09-26

As part of an effort to migrate the National Nuclear Data Center (NNDC) databases to a relational platform, a new web interface has been developed for the dissemination of the nuclear structure datasets stored in the Evaluated Nuclear Structure Data File and Experimental Unevaluated Nuclear Data List.
Cross-Dataset Analysis and Visualization Driven by Expressive Web Services

Science.gov (United States)

Alexandru Dumitru, Mircea; Catalin Merticariu, Vlad

2015-04-01

The deluge of data that is hitting us every day from satellite and airborne sensors is changing the workflow of environmental data analysts and modelers. Web geo-services play now a fundamental role, and are no longer needed to preliminary download and store the data, but rather they interact in real-time with GIS applications. Due to the very large amount of data that is curated and made available by web services, it is crucial to deploy smart solutions for optimizing network bandwidth, reducing duplication of data and moving the processing closer to the data. In this context we have created a visualization application for analysis and cross-comparison of aerosol optical thickness datasets. The application aims to help researchers identify and visualize discrepancies between datasets coming from various sources, having different spatial and time resolutions. It also acts as a proof of concept for integration of OGC Web Services under a user-friendly interface that provides beautiful visualizations of the explored data. The tool was built on top of the World Wind engine, a Java based virtual globe built by NASA and the open source community. For data retrieval and processing we exploited the OGC Web Coverage Service potential: the most exciting aspect being its processing extension, a.k.a. the OGC Web Coverage Processing Service (WCPS) standard. A WCPS-compliant service allows a client to execute a processing query on any coverage offered by the server. By exploiting a full grammar, several different kinds of information can be retrieved from one or more datasets together: scalar condensers, cross-sectional profiles, comparison maps and plots, etc. This combination of technology made the application versatile and portable. As the processing is done on the server-side, we ensured that the minimal amount of data is transferred and that the processing is done on a fully-capable server, leaving the client hardware resources to be used for rendering the visualization
The Role of Datasets on Scientific Influence within Conflict Research.

Science.gov (United States)

Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

2016-01-01

We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the
SatelliteDL: a Toolkit for Analysis of Heterogeneous Satellite Datasets

Science.gov (United States)

Galloy, M. D.; Fillmore, D.

2014-12-01

SatelliteDL is an IDL toolkit for the analysis of satellite Earth observations from a diverse set of platforms and sensors. The core function of the toolkit is the spatial and temporal alignment of satellite swath and geostationary data. The design features an abstraction layer that allows for easy inclusion of new datasets in a modular way. Our overarching objective is to create utilities that automate the mundane aspects of satellite data analysis, are extensible and maintainable, and do not place limitations on the analysis itself. IDL has a powerful suite of statistical and visualization tools that can be used in conjunction with SatelliteDL. Toward this end we have constructed SatelliteDL to include (1) HTML and LaTeX API document generation,(2) a unit test framework,(3) automatic message and error logs,(4) HTML and LaTeX plot and table generation, and(5) several real world examples with bundled datasets available for download. For ease of use, datasets, variables and optional workflows may be specified in a flexible format configuration file. Configuration statements may specify, for example, a region and date range, and the creation of images, plots and statistical summary tables for a long list of variables. SatelliteDL enforces data provenance; all data should be traceable and reproducible. The output NetCDF file metadata holds a complete history of the original datasets and their transformations, and a method exists to reconstruct a configuration file from this information. Release 0.1.0 distributes with ingest methods for GOES, MODIS, VIIRS and CERES radiance data (L1) as well as select 2D atmosphere products (L2) such as aerosol and cloud (MODIS and VIIRS) and radiant flux (CERES). Future releases will provide ingest methods for ocean and land surface products, gridded and time averaged datasets (L3 Daily, Monthly and Yearly), and support for 3D products such as temperature and water vapor profiles. Emphasis will be on NPP Sensor, Environmental and
Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

Directory of Open Access Journals (Sweden)

Folorunso Olufemi A.

2011-04-01

Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.
Associating uncertainty with datasets using Linked Data and allowing propagation via provenance chains

Science.gov (United States)

Car, Nicholas; Cox, Simon; Fitch, Peter

2015-04-01

With earth-science datasets increasingly being published to enable re-use in projects disassociated from the original data acquisition or generation, there is an urgent need for associated metadata to be connected, in order to guide their application. In particular, provenance traces should support the evaluation of data quality and reliability. However, while standards for describing provenance are emerging (e.g. PROV-O), these do not include the necessary statistical descriptors and confidence assessments. UncertML has a mature conceptual model that may be used to record uncertainty metadata. However, by itself UncertML does not support the representation of uncertainty of multi-part datasets, and provides no direct way of associating the uncertainty information - metadata in relation to a dataset - with dataset objects.We present a method to address both these issues by combining UncertML with PROV-O, and delivering resulting uncertainty-enriched provenance traces through the Linked Data API. UncertProv extends the PROV-O provenance ontology with an RDF formulation of the UncertML conceptual model elements, adds further elements to support uncertainty representation without a conceptual model and the integration of UncertML through links to documents. The Linked ID API provides a systematic way of navigating from dataset objects to their UncertProv metadata and back again. The Linked Data API's 'views' capability enables access to UncertML and non-UncertML uncertainty metadata representations for a dataset. With this approach, it is possible to access and navigate the uncertainty metadata associated with a published dataset using standard semantic web tools, such as SPARQL queries. Where the uncertainty data follows the UncertML model it can be automatically interpreted and may also support automatic uncertainty propagation . Repositories wishing to enable uncertainty propagation for all datasets must ensure that all elements that are associated with uncertainty
Datasets collected in general practice: an international comparison using the example of obesity.

Science.gov (United States)

Sturgiss, Elizabeth; van Boven, Kees

2018-06-04

International datasets from general practice enable the comparison of how conditions are managed within consultations in different primary healthcare settings. The Australian Bettering the Evaluation and Care of Health (BEACH) and TransHIS from the Netherlands collect in-consultation general practice data that have been used extensively to inform local policy and practice. Obesity is a global health issue with different countries applying varying approaches to management. The objective of the present paper is to compare the primary care management of obesity in Australia and the Netherlands using data collected from consultations. Despite the different prevalence in obesity in the two countries, the number of patients per 1000 patient-years seen with obesity is similar. Patients in Australia with obesity are referred to allied health practitioners more often than Dutch patients. Without quality general practice data, primary care researchers will not have data about the management of conditions within consultations. We use obesity to highlight the strengths of these general practice data sources and to compare their differences. What is known about the topic? Australia had one of the longest-running consecutive datasets about general practice activity in the world, but it has recently lost government funding. The Netherlands has a longitudinal general practice dataset of information collected within consultations since 1985. What does this paper add? We discuss the benefits of general practice-collected data in two countries. Using obesity as a case example, we compare management in general practice between Australia and the Netherlands. This type of analysis should start all international collaborations of primary care management of any health condition. Having a national general practice dataset allows international comparisons of the management of conditions with primary care. Without a current, quality general practice dataset, primary care researchers will not

A Dataset of Three Educational Technology Experiments on Differentiation, Formative Testing and Feedback

Science.gov (United States)

Haelermans, Carla; Ghysels, Joris; Prince, Fernao

2015-01-01

This paper describes a dataset with data from three individually randomized educational technology experiments on differentiation, formative testing and feedback during one school year for a group of 8th grade students in the Netherlands, using administrative data and the online motivation questionnaire of Boekaerts. The dataset consists of pre-…
New public dataset for spotting patterns in medieval document images

Science.gov (United States)

En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

2017-01-01

With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.
Common integration sites of published datasets identified using a graph-based framework

Directory of Open Access Journals (Sweden)

Alessandro Vasciaveo

2016-01-01

Full Text Available With next-generation sequencing, the genomic data available for the characterization of integration sites (IS has dramatically increased. At present, in a single experiment, several thousand viral integration genome targets can be investigated to define genomic hot spots. In a previous article, we renovated a formal CIS analysis based on a rigid fixed window demarcation into a more stretchy definition grounded on graphs. Here, we present a selection of supporting data related to the graph-based framework (GBF from our previous article, in which a collection of common integration sites (CIS was identified on six published datasets. In this work, we will focus on two datasets, ISRTCGD and ISHIV, which have been previously discussed. Moreover, we show in more detail the workflow design that originates the datasets.
Advanced Neuropsychological Diagnostics Infrastructure (ANDI): A Normative Database Created from Control Datasets.

Science.gov (United States)

de Vent, Nathalie R; Agelink van Rentergem, Joost A; Schmand, Ben A; Murre, Jaap M J; Huizenga, Hilde M

2016-01-01

In the Advanced Neuropsychological Diagnostics Infrastructure (ANDI), datasets of several research groups are combined into a single database, containing scores on neuropsychological tests from healthy participants. For most popular neuropsychological tests the quantity, and range of these data surpasses that of traditional normative data, thereby enabling more accurate neuropsychological assessment. Because of the unique structure of the database, it facilitates normative comparison methods that were not feasible before, in particular those in which entire profiles of scores are evaluated. In this article, we describe the steps that were necessary to combine the separate datasets into a single database. These steps involve matching variables from multiple datasets, removing outlying values, determining the influence of demographic variables, and finding appropriate transformations to normality. Also, a brief description of the current contents of the ANDI database is given.
Fulltext PDF

Indian Academy of Sciences (India)

holding a quivering mouse deer (Tragulus memmina), Asia's small- est deer, in it's mouth. .... pack members waiting in the open area. A frequent manoeuvre ... The Whistling Hunters: Field Studies of the Asiatic Wild Dog. (Cuon alpinus).
GLEAM version 3: Global Land Evaporation Datasets and Model

Science.gov (United States)

Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

2015-12-01

Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ
Dataset of herbarium specimens of threatened vascular plants in Catalonia.

Science.gov (United States)

Nualart, Neus; Ibáñez, Neus; Luque, Pere; Pedrol, Joan; Vilar, Lluís; Guàrdia, Roser

2017-01-01

This data paper describes a specimens' dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE). Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU). This dataset includes 1,618 records collected from 17 th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.
A robust post-processing workflow for datasets with motion artifacts in diffusion kurtosis imaging.

Science.gov (United States)

Li, Xianjun; Yang, Jian; Gao, Jie; Luo, Xue; Zhou, Zhenyu; Hu, Yajie; Wu, Ed X; Wan, Mingxi

2014-01-01

The aim of this study was to develop a robust post-processing workflow for motion-corrupted datasets in diffusion kurtosis imaging (DKI). The proposed workflow consisted of brain extraction, rigid registration, distortion correction, artifacts rejection, spatial smoothing and tensor estimation. Rigid registration was utilized to correct misalignments. Motion artifacts were rejected by using local Pearson correlation coefficient (LPCC). The performance of LPCC in characterizing relative differences between artifacts and artifact-free images was compared with that of the conventional correlation coefficient in 10 randomly selected DKI datasets. The influence of rejected artifacts with information of gradient directions and b values for the parameter estimation was investigated by using mean square error (MSE). The variance of noise was used as the criterion for MSEs. The clinical practicality of the proposed workflow was evaluated by the image quality and measurements in regions of interest on 36 DKI datasets, including 18 artifact-free (18 pediatric subjects) and 18 motion-corrupted datasets (15 pediatric subjects and 3 essential tremor patients). The relative difference between artifacts and artifact-free images calculated by LPCC was larger than that of the conventional correlation coefficient (pworkflow improved the image quality and reduced the measurement biases significantly on motion-corrupted datasets (pworkflow was reliable to improve the image quality and the measurement precision of the derived parameters on motion-corrupted DKI datasets. The workflow provided an effective post-processing method for clinical applications of DKI in subjects with involuntary movements.
Using Real Datasets for Interdisciplinary Business/Economics Projects

Science.gov (United States)

Goel, Rajni; Straight, Ronald L.

2005-01-01

The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…
The Most Common Geometric and Semantic Errors in CityGML Datasets

Science.gov (United States)

Biljecki, F.; Ledoux, H.; Du, X.; Stoter, J.; Soon, K. H.; Khoo, V. H. S.

2016-10-01

To be used as input in most simulation and modelling software, 3D city models should be geometrically and topologically valid, and semantically rich. We investigate in this paper what is the quality of currently available CityGML datasets, i.e. we validate the geometry/topology of the 3D primitives (Solid and MultiSurface), and we validate whether the semantics of the boundary surfaces of buildings is correct or not. We have analysed all the CityGML datasets we could find, both from portals of cities and on different websites, plus a few that were made available to us. We have thus validated 40M surfaces in 16M 3D primitives and 3.6M buildings found in 37 CityGML datasets originating from 9 countries, and produced by several companies with diverse software and acquisition techniques. The results indicate that CityGML datasets without errors are rare, and those that are nearly valid are mostly simple LOD1 models. We report on the most common errors we have found, and analyse them. One main observation is that many of these errors could be automatically fixed or prevented with simple modifications to the modelling software. Our principal aim is to highlight the most common errors so that these are not repeated in the future. We hope that our paper and the open-source software we have developed will help raise awareness for data quality among data providers and 3D GIS software producers.
Spatially continuous dataset at local scale of Taita Hills in Kenya and Mount Kilimanjaro in Tanzania

Directory of Open Access Journals (Sweden)

Sizah Mwalusepo

2016-09-01

Full Text Available Climate change is a global concern, requiring local scale spatially continuous dataset and modeling of meteorological variables. This dataset article provided the interpolated temperature, rainfall and relative humidity dataset at local scale along Taita Hills and Mount Kilimanjaro altitudinal gradients in Kenya and Tanzania, respectively. The temperature and relative humidity were recorded hourly using automatic onset THHOBO data loggers and rainfall was recorded daily using GENERALR wireless rain gauges. Thin plate spline (TPS was used to interpolate, with the degree of data smoothing determined by minimizing the generalized cross validation. The dataset provide information on the status of the current climatic conditions along the two mountainous altitudinal gradients in Kenya and Tanzania. The dataset will, thus, enhance future research. Keywords: Spatial climate data, Climate change, Modeling, Local scale
The impact of the resolution of meteorological datasets on catchment-scale drought studies

Science.gov (United States)

Hellwig, Jost; Stahl, Kerstin

2017-04-01

Gridded meteorological datasets provide the basis to study drought at a range of scales, including catchment scale drought studies in hydrology. They are readily available to study past weather conditions and often serve real time monitoring as well. As these datasets differ in spatial/temporal coverage and spatial/temporal resolution, for most studies there is a tradeoff between these features. Our investigation examines whether biases occur when studying drought on catchment scale with low resolution input data. For that, a comparison among the datasets HYRAS (covering Central Europe, 1x1 km grid, daily data, 1951 - 2005), E-OBS (Europe, 0.25° grid, daily data, 1950-2015) and GPCC (whole world, 0.5° grid, monthly data, 1901 - 2013) is carried out. Generally, biases in precipitation increase with decreasing resolution. Most important variations are found during summer. In low mountain range of Central Europe the datasets of sparse resolution (E-OBS, GPCC) overestimate dry days and underestimate total precipitation since they are not able to describe high spatial variability. However, relative measures like the correlation coefficient reveal good consistencies of dry and wet periods, both for absolute precipitation values and standardized indices like the Standardized Precipitation Index (SPI) or Standardized Precipitation Evaporation Index (SPEI). Particularly the most severe droughts derived from the different datasets match very well. These results indicate that absolute values of sparse resolution datasets applied to catchment scale might be critical to use for an assessment of the hydrological drought at catchment scale, whereas relative measures for determining periods of drought are more trustworthy. Therefore, studies on drought, that downscale meteorological data, should carefully consider their data needs and focus on relative measures for dry periods if sufficient for the task.
Handling limited datasets with neural networks in medical applications: A small-data approach.

Science.gov (United States)

Shaikhina, Torgyn; Khovanova, Natalia A

2017-01-01

Single-centre studies in medical domain are often characterised by limited samples due to the complexity and high costs of patient data collection. Machine learning methods for regression modelling of small datasets (less than 10 observations per predictor variable) remain scarce. Our work bridges this gap by developing a novel framework for application of artificial neural networks (NNs) for regression tasks involving small medical datasets. In order to address the sporadic fluctuations and validation issues that appear in regression NNs trained on small datasets, the method of multiple runs and surrogate data analysis were proposed in this work. The approach was compared to the state-of-the-art ensemble NNs; the effect of dataset size on NN performance was also investigated. The proposed framework was applied for the prediction of compressive strength (CS) of femoral trabecular bone in patients suffering from severe osteoarthritis. The NN model was able to estimate the CS of osteoarthritic trabecular bone from its structural and biological properties with a standard error of 0.85MPa. When evaluated on independent test samples, the NN achieved accuracy of 98.3%, outperforming an ensemble NN model by 11%. We reproduce this result on CS data of another porous solid (concrete) and demonstrate that the proposed framework allows for an NN modelled with as few as 56 samples to generalise on 300 independent test samples with 86.5% accuracy, which is comparable to the performance of an NN developed with 18 times larger dataset (1030 samples). The significance of this work is two-fold: the practical application allows for non-destructive prediction of bone fracture risk, while the novel methodology extends beyond the task considered in this study and provides a general framework for application of regression NNs to medical problems characterised by limited dataset sizes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
ProDaMa: an open source Python library to generate protein structure datasets

Directory of Open Access Journals (Sweden)

Manconi Andrea

2009-10-01

Full Text Available Abstract Background The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. Findings To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. Conclusion ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.
Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge

Science.gov (United States)

Scerri, Antony; Kuriakose, John; Deshmane, Amit Ajit; Stanger, Mark; Moore, Rebekah; Naik, Raj; de Waard, Anita

2017-01-01

Abstract We developed a two-stream, Apache Solr-based information retrieval system in response to the bioCADDIE 2016 Dataset Retrieval Challenge. One stream was based on the principle of word embeddings, the other was rooted in ontology based indexing. Despite encountering several issues in the data, the evaluation procedure and the technologies used, the system performed quite well. We provide some pointers towards future work: in particular, we suggest that more work in query expansion could benefit future biomedical search engines. Database URL: https://data.mendeley.com/datasets/zd9dxpyybg/1 PMID:29220454
Multimedia Content Development as a Facial Expression Datasets for Recognition of Human Emotions

Science.gov (United States)

Mamonto, N. E.; Maulana, H.; Liliana, D. Y.; Basaruddin, T.

2018-02-01

Datasets that have been developed before contain facial expression from foreign people. The development of multimedia content aims to answer the problems experienced by the research team and other researchers who will conduct similar research. The method used in the development of multimedia content as facial expression datasets for human emotion recognition is the Villamil-Molina version of the multimedia development method. Multimedia content developed with 10 subjects or talents with each talent performing 3 shots with each capturing talent having to demonstrate 19 facial expressions. After the process of editing and rendering, tests are carried out with the conclusion that the multimedia content can be used as a facial expression dataset for recognition of human emotions.
Microscopy Image Browser: A Platform for Segmentation and Analysis of Multidimensional Datasets.

Directory of Open Access Journals (Sweden)

Ilya Belevich

2016-01-01

Full Text Available Understanding the structure-function relationship of cells and organelles in their natural context requires multidimensional imaging. As techniques for multimodal 3-D imaging have become more accessible, effective processing, visualization, and analysis of large datasets are posing a bottleneck for the workflow. Here, we present a new software package for high-performance segmentation and image processing of multidimensional datasets that improves and facilitates the full utilization and quantitative analysis of acquired data, which is freely available from a dedicated website. The open-source environment enables modification and insertion of new plug-ins to customize the program for specific needs. We provide practical examples of program features used for processing, segmentation and analysis of light and electron microscopy datasets, and detailed tutorials to enable users to rapidly and thoroughly learn how to use the program.
Advanced Neuropsychological Diagnostics Infrastructure (ANDI: A Normative Database Created from Control Datasets.

Directory of Open Access Journals (Sweden)

Nathalie R. de Vent

2016-10-01

Full Text Available In the Advanced Neuropsychological Diagnostics Infrastructure (ANDI, datasets of several research groups are combined into a single database, containing scores on neuropsychological tests from healthy participants. For most popular neuropsychological tests the quantity and range of these data surpasses that of traditional normative data, thereby enabling more accurate neuropsychological assessment. Because of the unique structure of the database, it facilitates normative comparison methods that were not feasible before, in particular those in which entire profiles of scores are evaluated. In this article, we describe the steps that were necessary to combine the separate datasets into a single database. These steps involve matching variables from multiple datasets, removing outlying values, determining the influence of demographic variables, and finding appropriate transformations to normality. Also, a brief description of the current contents of the ANDI database is given.
Resolution testing and limitations of geodetic and tsunami datasets for finite fault inversions along subduction zones

Science.gov (United States)

Williamson, A.; Newman, A. V.

2017-12-01

Finite fault inversions utilizing multiple datasets have become commonplace for large earthquakes pending data availability. The mixture of geodetic datasets such as Global Navigational Satellite Systems (GNSS) and InSAR, seismic waveforms, and when applicable, tsunami waveforms from Deep-Ocean Assessment and Reporting of Tsunami (DART) gauges, provide slightly different observations that when incorporated together lead to a more robust model of fault slip distribution. The merging of different datasets is of particular importance along subduction zones where direct observations of seafloor deformation over the rupture area are extremely limited. Instead, instrumentation measures related ground motion from tens to hundreds of kilometers away. The distance from the event and dataset type can lead to a variable degree of resolution, affecting the ability to accurately model the spatial distribution of slip. This study analyzes the spatial resolution attained individually from geodetic and tsunami datasets as well as in a combined dataset. We constrain the importance of distance between estimated parameters and observed data and how that varies between land-based and open ocean datasets. Analysis focuses on accurately scaled subduction zone synthetic models as well as analysis of the relationship between slip and data in recent large subduction zone earthquakes. This study shows that seafloor deformation sensitive datasets, like open-ocean tsunami waveforms or seafloor geodetic instrumentation, can provide unique offshore resolution for understanding most large and particularly tsunamigenic megathrust earthquake activity. In most environments, we simply lack the capability to resolve static displacements using land-based geodetic observations.
Dataset of Phenology of Mediterranean high-mountain meadows flora (Sierra Nevada, Spain)

OpenAIRE

Antonio Jesús Pérez-Luque; Cristina Patricia Sánchez-Rojas; Regino Zamora; Ramón Pérez-Pérez; Francisco Javier Bonet

2015-01-01

Abstract Sierra Nevada mountain range (southern Spain) hosts a high number of endemic plant species, being one of the most important biodiversity hotspots in the Mediterranean basin. The high-mountain meadow ecosystems (borreguiles) harbour a large number of endemic and threatened plant species. In this data paper, we describe a dataset of the flora inhabiting this threatened ecosystem in this Mediterranean mountain. The dataset includes occurrence data for flora collected in those ecosystems...

An integrated pan-tropical biomass map using multiple reference datasets.

Science.gov (United States)

Avitabile, Valerio; Herold, Martin; Heuvelink, Gerard B M; Lewis, Simon L; Phillips, Oliver L; Asner, Gregory P; Armston, John; Ashton, Peter S; Banin, Lindsay; Bayol, Nicolas; Berry, Nicholas J; Boeckx, Pascal; de Jong, Bernardus H J; DeVries, Ben; Girardin, Cecile A J; Kearsley, Elizabeth; Lindsell, Jeremy A; Lopez-Gonzalez, Gabriela; Lucas, Richard; Malhi, Yadvinder; Morel, Alexandra; Mitchard, Edward T A; Nagy, Laszlo; Qie, Lan; Quinones, Marcela J; Ryan, Casey M; Ferry, Slik J W; Sunderland, Terry; Laurin, Gaia Vaglio; Gatti, Roberto Cazzolla; Valentini, Riccardo; Verbeeck, Hans; Wijaya, Arief; Willcock, Simon

2016-04-01

We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging that incorporates and spatializes the biomass patterns indicated by the reference data. The method was applied independently in areas (strata) with homogeneous error patterns of the input (Saatchi and Baccini) maps, which were estimated from the reference data and additional covariates. Based on the fused map, we estimated AGB stock for the tropics (23.4 N-23.4 S) of 375 Pg dry mass, 9-18% lower than the Saatchi and Baccini estimates. The fused map also showed differing spatial patterns of AGB over large areas, with higher AGB density in the dense forest areas in the Congo basin, Eastern Amazon and South-East Asia, and lower values in Central America and in most dry vegetation areas of Africa than either of the input maps. The validation exercise, based on 2118 estimates from the reference dataset not used in the fusion process, showed that the fused map had a RMSE 15-21% lower than that of the input maps and, most importantly, nearly unbiased estimates (mean bias 5 Mg dry mass ha(-1) vs. 21 and 28 Mg ha(-1) for the input maps). The fusion method can be applied at any scale including the policy-relevant national level, where it can provide improved biomass estimates by integrating existing regional biomass maps as input maps and additional, country-specific reference datasets. © 2015 John Wiley & Sons Ltd.
Accounting for inertia in modal choices: some new evidence using a RP/SP dataset

DEFF Research Database (Denmark)

Cherchi, Elisabetta; Manca, Francesco

2011-01-01

effect is stable along the SP experiments. Inertia has been studied more extensively with panel datasets, but few investigations have used RP/SP datasets. In this paper we extend previous work in several ways. We test and compare several ways of measuring inertia, including measures that have been...... proposed for both short and long RP panel datasets. We also explore new measures of inertia to test for the effect of “learning” (in the sense of acquiring experience or getting more familiar with) along the SP experiment and we disentangle this effect from the pure inertia effect. A mixed logit model...... is used that allows us to account for both systematic and random taste variations in the inertia effect and for correlations among RP and SP observations. Finally we explore the relation between the utility specification (especially in the SP dataset) and the role of inertia in explaining current choices....
New fuzzy support vector machine for the class imbalance problem in medical datasets classification.

Science.gov (United States)

Gu, Xiaoqing; Ni, Tongguang; Wang, Hongyuan

2014-01-01

In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.
New Fuzzy Support Vector Machine for the Class Imbalance Problem in Medical Datasets Classification

Directory of Open Access Journals (Sweden)

Xiaoqing Gu

2014-01-01

Full Text Available In medical datasets classification, support vector machine (SVM is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM for the class imbalance problem (called FSVM-CIP is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.
The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 Catchments (Version 2.1) for the Conterminous United States: National Coal Resource Dataset System

Data.gov (United States)

U.S. Environmental Protection Agency — This dataset represents the coal mine density and storage volumes within individual, local NHDPlusV2 catchments and upstream, contributing watersheds based on the...
Investigations of bull trout (Salvelinus confluentus), steelhead trout (Oncorhynchus mykiss), and spring chinook salmon (O. tshawytscha) interactions in Southeast Washington streams. Final report 1992

International Nuclear Information System (INIS)

Underwood, K.D.; Martin, S.W.; Schuck, M.L.; Scholz, A.T.

1995-01-01

The goal of this two year study was to determine if supplementation with hatchery reared steelhead trout (Oncorhynchus mykiss) and spring chinook salmon (O. tshawytscha) negatively impacted wild native bull trout (Salvelinus confluentus) through competitive interactions. Four streams with varying levels of fish supplementation activity were sampled in Southeast Washington. Tasks performed during this study were population density, relative abundance, microhabitat utilization, habitat availability, diet analysis, bull trout spawning ground surveys, radio telemetry of adult bull trout, and growth analysis. Results indicate that bull trout overlapped geographically with the supplemented species in each of the study streams suggesting competition among species was possible. Within a stream, bull trout and the supplemented species utilized dissimilar microhabitats and microhabitat utilization by each species was the same among streams suggesting that there was no shifts in microhabitat utilization among streams. The diet of bull trout and O. mykiss significantly overlapped in each of the study streams. The stream most intensely supplemented contained bull trout with the slowest growth and the non-supplemented stream contained bull trout with the fastest growth. Conversely, the stream most intensely supplemented contain steelhead with the fastest growth and the non-supplemented stream contained steelhead with the slowest growth. Growth indicated that bull trout may have been negatively impacted from supplementation, although other factors may have contributed. At current population levels, and current habitat quantity and quality, no impacts to bull trout as a result of supplementation with hatchery reared steelhead trout and spring chinook salmon were detected. Project limitations and future research recommendations are discussed
Investigations of Bull Trout (Salvelinus Confluentus), Steelhead Trout (Oncorhynchus Mykiss), and Spring Chinook Salmon (O. Tshawytscha) Interactions in Southeast Washington Streams. Final Report 1992.

Energy Technology Data Exchange (ETDEWEB)

Underwood, Keith D.

1995-01-01

The goal of this two year study was to determine if supplementation with hatchery reared steelhead trout (Oncorhynchus mykiss) and spring chinook salmon (O. tshawytscha) negatively impacted wild native bull trout (Salvelinus confluentus) through competitive interactions. Four streams with varying levels of fish supplementation activity were sampled in Southeast Washington. Tasks performed during this study were population density, relative abundance, microhabitat utilization, habitat availability, diet analysis, bull trout spawning ground surveys, radio telemetry of adult bull trout, and growth analysis. Results indicate that bull trout overlapped geographically with the supplemented species in each of the study streams suggesting competition among species was possible. Within a stream, bull trout and the supplemented species utilized dissimilar microhabitats and microhabitat utilization by each species was the same among streams suggesting that there was no shifts in microhabitat utilization among streams. The diet of bull trout and O. mykiss significantly overlapped in each of the study streams. The stream most intensely supplemented contained bull trout with the slowest growth and the non-supplemented stream contained bull trout with the fastest growth. Conversely, the stream most intensely supplemented contain steelhead with the fastest growth and the non-supplemented stream contained steelhead with the slowest growth. Growth indicated that bull trout may have been negatively impacted from supplementation, although other factors may have contributed. At current population levels, and current habitat quantity and quality, no impacts to bull trout as a result of supplementation with hatchery reared steelhead trout and spring chinook salmon were detected. Project limitations and future research recommendations are discussed.
BIA Indian Lands Dataset (Indian Lands of the United States)

Data.gov (United States)

Federal Geographic Data Committee — The American Indian Reservations / Federally Recognized Tribal Entities dataset depicts feature location, selected demographics and other associated data for the 561...
Wehmas et al. 94-04 Toxicol Sci: Datasets for manuscript

Data.gov (United States)

U.S. Environmental Protection Agency — Dataset includes overview text document (accepted version of manuscript) and tables, figures, and supplementary materials. Supplementary tables provide summary data...
Climate Forcing Datasets for Agricultural Modeling: Merged Products for Gap-Filling and Historical Climate Series Estimation

Science.gov (United States)

Ruane, Alex C.; Goldberg, Richard; Chryssanthacopoulos, James

2014-01-01

The AgMERRA and AgCFSR climate forcing datasets provide daily, high-resolution, continuous, meteorological series over the 1980-2010 period designed for applications examining the agricultural impacts of climate variability and climate change. These datasets combine daily resolution data from retrospective analyses (the Modern-Era Retrospective Analysis for Research and Applications, MERRA, and the Climate Forecast System Reanalysis, CFSR) with in situ and remotely-sensed observational datasets for temperature, precipitation, and solar radiation, leading to substantial reductions in bias in comparison to a network of 2324 agricultural-region stations from the Hadley Integrated Surface Dataset (HadISD). Results compare favorably against the original reanalyses as well as the leading climate forcing datasets (Princeton, WFD, WFD-EI, and GRASP), and AgMERRA distinguishes itself with substantially improved representation of daily precipitation distributions and extreme events owing to its use of the MERRA-Land dataset. These datasets also peg relative humidity to the maximum temperature time of day, allowing for more accurate representation of the diurnal cycle of near-surface moisture in agricultural models. AgMERRA and AgCFSR enable a number of ongoing investigations in the Agricultural Model Intercomparison and Improvement Project (AgMIP) and related research networks, and may be used to fill gaps in historical observations as well as a basis for the generation of future climate scenarios.
Knowledge discovery with classification rules in a cardiovascular dataset.

Science.gov (United States)

Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

2005-12-01

In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.
BLOND, a building-level office environment dataset of typical electrical appliances

Science.gov (United States)

Kriechbaumer, Thomas; Jacobsen, Hans-Arno

2018-03-01

Energy metering has gained popularity as conventional meters are replaced by electronic smart meters that promise energy savings and higher comfort levels for occupants. Achieving these goals requires a deeper understanding of consumption patterns to reduce the energy footprint: load profile forecasting, power disaggregation, appliance identification, startup event detection, etc. Publicly available datasets are used to test, verify, and benchmark possible solutions to these problems. For this purpose, we present the BLOND dataset: continuous energy measurements of a typical office environment at high sampling rates with common appliances and load profiles. We provide voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The dataset contains 53 appliances (16 classes) in a 3-phase power grid. BLOND-50 contains 213 days of measurements sampled at 50kSps (aggregate) and 6.4kSps (individual appliances). BLOND-250 consists of the same setup: 50 days, 250kSps (aggregate), 50kSps (individual appliances). These are the longest continuous measurements at such high sampling rates and fully-labeled ground truth we are aware of.
BLOND, a building-level office environment dataset of typical electrical appliances.

Science.gov (United States)

Kriechbaumer, Thomas; Jacobsen, Hans-Arno

2018-03-27

Energy metering has gained popularity as conventional meters are replaced by electronic smart meters that promise energy savings and higher comfort levels for occupants. Achieving these goals requires a deeper understanding of consumption patterns to reduce the energy footprint: load profile forecasting, power disaggregation, appliance identification, startup event detection, etc. Publicly available datasets are used to test, verify, and benchmark possible solutions to these problems. For this purpose, we present the BLOND dataset: continuous energy measurements of a typical office environment at high sampling rates with common appliances and load profiles. We provide voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The dataset contains 53 appliances (16 classes) in a 3-phase power grid. BLOND-50 contains 213 days of measurements sampled at 50kSps (aggregate) and 6.4kSps (individual appliances). BLOND-250 consists of the same setup: 50 days, 250kSps (aggregate), 50kSps (individual appliances). These are the longest continuous measurements at such high sampling rates and fully-labeled ground truth we are aware of.
Geochemical Fingerprinting of Coltan Ores by Machine Learning on Uneven Datasets

International Nuclear Information System (INIS)

Savu-Krohn, Christian; Rantitsch, Gerd; Auer, Peter; Melcher, Frank; Graupner, Torsten

2011-01-01

Two modern machine learning techniques, Linear Programming Boosting (LPBoost) and Support Vector Machines (SVMs), are introduced and applied to a geochemical dataset of niobium–tantalum (“coltan”) ores from Central Africa to demonstrate how such information may be used to distinguish ore provenance, i.e., place of origin. The compositional data used include uni- and multivariate outliers and elemental distributions are not described by parametric frequency distribution functions. The “soft margin” techniques of LPBoost and SVMs can be applied to such data. Optimization of their learning parameters results in an average accuracy of up to c. 92%, if spot measurements are assessed to estimate the provenance of ore samples originating from two geographically defined source areas. A parameterized performance measure, together with common methods for its optimization, was evaluated to account for the presence of uneven datasets. Optimization of the classification function threshold improves the performance, as class importance is shifted towards one of those classes. For this dataset, the average performance of the SVMs is significantly better compared to that of LPBoost.
MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING

Data.gov (United States)

National Aeronautics and Space Administration — MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING MOHAMMAD SALIM AHMED, LATIFUR KHAN, NIKUNJ OZA, AND MANDAVA RAJESWARI Abstract....
A method for generating large datasets of organ geometries for radiotherapy treatment planning studies

International Nuclear Information System (INIS)

Hu, Nan; Cerviño, Laura; Segars, Paul; Lewis, John; Shan, Jinlu; Jiang, Steve; Zheng, Xiaolin; Wang, Ge

2014-01-01

With the rapidly increasing application of adaptive radiotherapy, large datasets of organ geometries based on the patient’s anatomy are desired to support clinical application or research work, such as image segmentation, re-planning, and organ deformation analysis. Sometimes only limited datasets are available in clinical practice. In this study, we propose a new method to generate large datasets of organ geometries to be utilized in adaptive radiotherapy. Given a training dataset of organ shapes derived from daily cone-beam CT, we align them into a common coordinate frame and select one of the training surfaces as reference surface. A statistical shape model of organs was constructed, based on the establishment of point correspondence between surfaces and non-uniform rational B-spline (NURBS) representation. A principal component analysis is performed on the sampled surface points to capture the major variation modes of each organ. A set of principal components and their respective coefficients, which represent organ surface deformation, were obtained, and a statistical analysis of the coefficients was performed. New sets of statistically equivalent coefficients can be constructed and assigned to the principal components, resulting in a larger geometry dataset for the patient’s organs. These generated organ geometries are realistic and statistically representative
Sparse multivariate measures of similarity between intra-modal neuroimaging datasets

Directory of Open Access Journals (Sweden)

Maria J. Rosa

2015-10-01

Full Text Available An increasing number of neuroimaging studies are now based on either combining more than one data modality (inter-modal or combining more than one measurement from the same modality (intra-modal. To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA. However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA, overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labelling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow.
Spatio-Temporal Data Model for Integrating Evolving Nation-Level Datasets

Science.gov (United States)

Sorokine, A.; Stewart, R. N.

2017-10-01

Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc.) and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets). Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.
Ecohydrological Index, Native Fish, and Climate Trends and Relationships in the Kansas River Basin_dataset

Data.gov (United States)

U.S. Environmental Protection Agency — The dataset is an excel file that contain data for the figures in the manuscript. This dataset is associated with the following publication: Sinnathamby, S., K....
Integrated remotely sensed datasets for disaster management

Science.gov (United States)

McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

2008-10-01

Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.

FTSPlot: fast time series visualization for large datasets.

Directory of Open Access Journals (Sweden)

Michael Riss

Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.
Designing the colorectal cancer core dataset in Iran

Directory of Open Access Journals (Sweden)

Sara Dorri

2017-01-01

Full Text Available Background: There is no need to explain the importance of collection, recording and analyzing the information of disease in any health organization. In this regard, systematic design of standard data sets can be helpful to record uniform and consistent information. It can create interoperability between health care systems. The main purpose of this study was design the core dataset to record colorectal cancer information in Iran. Methods: For the design of the colorectal cancer core data set, a combination of literature review and expert consensus were used. In the first phase, the draft of the data set was designed based on colorectal cancer literature review and comparative studies. Then, in the second phase, this data set was evaluated by experts from different discipline such as medical informatics, oncology and surgery. Their comments and opinion were taken. In the third phase refined data set, was evaluated again by experts and eventually data set was proposed. Results: In first phase, based on the literature review, a draft set of 85 data elements was designed. In the second phase this data set was evaluated by experts and supplementary information was offered by professionals in subgroups especially in treatment part. In this phase the number of elements totally were arrived to 93 numbers. In the third phase, evaluation was conducted by experts and finally this dataset was designed in five main parts including: demographic information, diagnostic information, treatment information, clinical status assessment information, and clinical trial information. Conclusion: In this study the comprehensive core data set of colorectal cancer was designed. This dataset in the field of collecting colorectal cancer information can be useful through facilitating exchange of health information. Designing such data set for similar disease can help providers to collect standard data from patients and can accelerate retrieval from storage systems.
The Role of Datasets on Scientific Influence within Conflict Research.

Directory of Open Access Journals (Sweden)

Tracy Van Holt

Full Text Available We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS over a 66-year period (1945-2011. We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA, a specialized social network analysis on this citation network (~1.5 million works, to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993. The critical path consisted of a number of key features: 1 Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2 Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3 We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography. Publically available conflict datasets developed early on helped
The Role of Datasets on Scientific Influence within Conflict Research

Science.gov (United States)

Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

2016-01-01

We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped
Do Higher Government Wages Reduce Corruption? Evidence Based on a Novel Dataset

OpenAIRE

Le, Van-Ha; de Haan, Jakob; Dietzenbacher, Erik

2013-01-01

This paper employs a novel dataset on government wages to investigate the relationship between government remuneration policy and corruption. Our dataset, as derived from national household or labor surveys, is more reliable than the data on government wages as used in previous research. When the relationship between government wages and corruption is modeled to vary with the level of income, we find that the impact of government wages on corruption is strong at relatively low-income levels.
Sex reversal of brook trout (Salvelinus fontinalis) by 17α-methyltestosterone exposure: A serial experimental approach to determine optimal timing and delivery regimes.

Science.gov (United States)

Fatima, Shafaq; Adams, Mark; Wilkinson, Ryan

2016-12-01

Commercial culture of Brook trout (Salvelinus fontinalis) in Tasmania was partly abandoned due to sexual maturation of male fish early on during the estuarine rearing phase. Maturation adversely affects body mass, flesh quality and immunocompetency effectively. Sex reversal techniques such as the in-feed addition of a synthetic androgen have proven difficult to adapt in brook trout. An appropriate timing, duration and delivery vehicle for administration of 17α-methyltestosterone (MT) to produce phenotypic males (neomales) from genotypically female brook trout required further investigation. In this study, groups of brook trout eggs (n=1000) maintained at 9.5±0.15-10±0.14°C, were immersed in MT (400μgL -1 ) for four hours on two alternate days (two immersions/group) staggered over a two week period surrounding the hatch of embryos (control groups excluded). The groups were then split and half received MT-supplemented feed for 60days and the other a standard diet. Following an 11 month on-growing period sex phenotypes were determined by gross & histological gonad morphology. The highest proportion of male phenotypes (75%) was found in fish immersed six and four days pre-hatch and subsequently fed a normal diet. Fish fed a MT supplemented diet and immersed in MT showed significantly higher proportions of sterile fish. These data indicate that a pre-hatch immersion-only regime (4-6days pre-hatch at 9.5°C) should be pursued as a target for optimization studies to further refine the effective concentration and duration of exposure to MT for the successful production of neo-male brook trout. Copyright Â© 2016 Elsevier B.V. All rights reserved.
Exotic "Gill Lice" Species (Copepoda: Lernaeopodidae: Salmincola SPP.) Infect Rainbow Trout (Oncorhynchus mykiss) and Brook Trout (Salvelinus fontinalis) in the Southeastern United States.

Science.gov (United States)

Ruiz, Carlos F; Rash, Jacob M; Besler, Doug A; Roberts, Jackson R; Warren, Micah B; Arias, Cova R; Bullard, Stephen A

2017-08-01

Salmincola californiensis infected 25 of 31 (prevalence 0.8; intensity 2-35 [mean 6.6 ± standard deviation 7.7; n = 25]) rainbow trout, Oncorhynchus mykiss, from a private trout farm connected to the Watauga River, North Carolina. Salmincola edwardsii infected all of 9 (1.0; 2-43 [9.3 ± 13.0; 9]) brook trout, Salvelinus fontinalis, from Big Norton Prong, a tributary of the Little Tennessee River, North Carolina. Both lernaeopodids are well-known salmonid pathogens, but neither is native to, nor has been previously taxonomically confirmed from, the southeastern United States. Herein, we (1) use light and scanning electron microscopy to identify and provide supplemental morphological observations of these lernaeopodids, (2) furnish complementary molecular sequence data from the 28S rDNA (28S), and (3) document the pathological effects of gill infections. We identified and differentiated these lernaeopodids by the second antenna (exopod tip with large [S. californiensis] vs. slender [S. edwardsii] spines; endopod terminal segment with subequal ventral processes shorter than [S. californiensis] vs. longer than or equal to [S. edwardsii] dorsal hook), maxilliped palp (length typically ≤1/3 [S. californiensis] vs. 1/3-1/2 [S. edwardsii] subchela length exclusive of claw), and bulla (sub-circular and concave on manubrium's side [S. californiensis] vs. non-stellate [S. edwardsii]). Analysis of the 28S rDNA sequences confirmed our taxonomic assignments as demonstrated by 100% sequence similarity among the sympatric, morphologically-conspecific isolates. Histopathology revealed focal gill epithelial hyperplasia, obstruction of interlamellar water channels, lamellar fusion, and crypting of gill filaments. High intensity infections by either lernaeopodid are surveillance-worthy because they are potentially pathogenic to trout in the southeastern United States.
Dataset-driven research for improving recommender systems for learning

NARCIS (Netherlands)

Verbert, Katrien; Drachsler, Hendrik; Manouselis, Nikos; Wolpers, Martin; Vuorikari, Riina; Duval, Erik

2011-01-01

Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., & Duval, E. (2011). Dataset-driven research for improving recommender systems for learning. In Ph. Long, & G. Siemens (Eds.), Proceedings of 1st International Conference Learning Analytics & Knowledge (pp. 44-53). February,
A Hybrid Neuro-Fuzzy Model For Integrating Large Earth-Science Datasets

Science.gov (United States)

Porwal, A.; Carranza, J.; Hale, M.

2004-12-01

A GIS-based hybrid neuro-fuzzy approach to integration of large earth-science datasets for mineral prospectivity mapping is described. It implements a Takagi-Sugeno type fuzzy inference system in the framework of a four-layered feed-forward adaptive neural network. Each unique combination of the datasets is considered a feature vector whose components are derived by knowledge-based ordinal encoding of the constituent datasets. A subset of feature vectors with a known output target vector (i.e., unique conditions known to be associated with either a mineralized or a barren location) is used for the training of an adaptive neuro-fuzzy inference system. Training involves iterative adjustment of parameters of the adaptive neuro-fuzzy inference system using a hybrid learning procedure for mapping each training vector to its output target vector with minimum sum of squared error. The trained adaptive neuro-fuzzy inference system is used to process all feature vectors. The output for each feature vector is a value that indicates the extent to which a feature vector belongs to the mineralized class or the barren class. These values are used to generate a prospectivity map. The procedure is demonstrated by an application to regional-scale base metal prospectivity mapping in a study area located in the Aravalli metallogenic province (western India). A comparison of the hybrid neuro-fuzzy approach with pure knowledge-driven fuzzy and pure data-driven neural network approaches indicates that the former offers a superior method for integrating large earth-science datasets for predictive spatial mathematical modelling.
GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction

Directory of Open Access Journals (Sweden)

Zheng Huiru

2009-01-01

Full Text Available Abstract Background Information about protein interaction networks is fundamental to understanding protein function and cellular processes. Interaction patterns among proteins can suggest new drug targets and aid in the design of new therapeutic interventions. Efforts have been made to map interactions on a proteomic-wide scale using both experimental and computational techniques. Reference datasets that contain known interacting proteins (positive cases and non-interacting proteins (negative cases are essential to support computational prediction and validation of protein-protein interactions. Information on known interacting and non interacting proteins are usually stored within databases. Extraction of these data can be both complex and time consuming. Although, the automatic construction of reference datasets for classification is a useful resource for researchers no public resource currently exists to perform this task. Results GRIP (Gold Reference dataset constructor from Information on Protein complexes is a web-based system that provides researchers with the functionality to create reference datasets for protein-protein interaction prediction in Saccharomyces cerevisiae. Both positive and negative cases for a reference dataset can be extracted, organised and downloaded by the user. GRIP also provides an upload facility whereby users can submit proteins to determine protein complex membership. A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae. Conclusion GRIP is developed to retrieve information on protein complex, cellular localisation, and physical and genetic interactions in Saccharomyces cerevisiae. Manual construction of reference datasets can be a time consuming process requiring programming knowledge. GRIP simplifies and speeds up this process by allowing users to automatically construct reference datasets. GRIP is free to access at http://rosalind.infj.ulst.ac.uk/GRIP/.
Developing predictive imaging biomarkers using whole-brain classifiers: Application to the ABIDE I dataset

Directory of Open Access Journals (Sweden)

Swati Rane

2017-03-01

Full Text Available We designed a modular machine learning program that uses functional magnetic resonance imaging (fMRI data in order to distinguish individuals with autism spectrum disorders from neurodevelopmentally normal individuals. Data was selected from the Autism Brain Imaging Dataset Exchange (ABIDE I Preprocessed Dataset.
Research Report Non-invasive DNA-based species and sex ...

Indian Academy of Sciences (India)

shrushti modi

Non-invasive DNA-based species and sex identification of Asiatic wild dog (Cuon alpinus) .... We did not find any cross-gender amplification with any of the reference or field-collected samples. Success rate for sex discrimination for all field-.
Utilizing the Antarctic Master Directory to find orphan datasets

Science.gov (United States)

Bonczkowski, J.; Carbotte, S. M.; Arko, R. A.; Grebas, S. K.

2011-12-01

While most Antarctic data are housed at an established disciplinary-specific data repository, there are data types for which no suitable repository exists. In some cases, these "orphan" data, without an appropriate national archive, are served from local servers by the principal investigators who produced the data. There are many pitfalls with data served privately, including the frequent lack of adequate documentation to ensure the data can be understood by others for re-use and the impermanence of personal web sites. For example, if an investigator leaves an institution and the data moves, the link published is no longer accessible. To ensure continued availability of data, submission to long-term national data repositories is needed. As stated in the National Science Foundation Office of Polar Programs (NSF/OPP) Guidelines and Award Conditions for Scientific Data, investigators are obligated to submit their data for curation and long-term preservation; this includes the registration of a dataset description into the Antarctic Master Directory (AMD), http://gcmd.nasa.gov/Data/portals/amd/. The AMD is a Web-based, searchable directory of thousands of dataset descriptions, known as DIF records, submitted by scientists from over 20 countries. It serves as a node of the International Directory Network/Global Change Master Directory (IDN/GCMD). The US Antarctic Program Data Coordination Center (USAP-DCC), http://www.usap-data.org/, funded through NSF/OPP, was established in 2007 to help streamline the process of data submission and DIF record creation. When data does not quite fit within any existing disciplinary repository, it can be registered within the USAP-DCC as the fallback data repository. Within the scope of the USAP-DCC we undertook the challenge of discovering and "rescuing" orphan datasets currently registered within the AMD. In order to find which DIF records led to data served privately, all records relating to US data within the AMD were parsed. After
[Research on developping the spectral dataset for Dunhuang typical colors based on color constancy].

Science.gov (United States)

Liu, Qiang; Wan, Xiao-Xia; Liu, Zhen; Li, Chan; Liang, Jin-Xing

2013-11-01

The present paper aims at developping a method to reasonably set up the typical spectral color dataset for different kinds of Chinese cultural heritage in color rendering process. The world famous wall paintings dating from more than 1700 years ago in Dunhuang Mogao Grottoes was taken as typical case in this research. In order to maintain the color constancy during the color rendering workflow of Dunhuang culture relics, a chromatic adaptation based method for developping the spectral dataset of typical colors for those wall paintings was proposed from the view point of human vision perception ability. Under the help and guidance of researchers in the art-research institution and protection-research institution of Dunhuang Academy and according to the existing research achievement of Dunhuang Research in the past years, 48 typical known Dunhuang pigments were chosen and 240 representative color samples were made with reflective spectral ranging from 360 to 750 nm was acquired by a spectrometer. In order to find the typical colors of the above mentioned color samples, the original dataset was devided into several subgroups by clustering analysis. The grouping number, together with the most typical samples for each subgroup which made up the firstly built typical color dataset, was determined by wilcoxon signed rank test according to the color inconstancy index comprehensively calculated under 6 typical illuminating conditions. Considering the completeness of gamut of Dunhuang wall paintings, 8 complementary colors was determined and finally the typical spectral color dataset was built up which contains 100 representative spectral colors. The analytical calculating results show that the median color inconstancy index of the built dataset in 99% confidence level by wilcoxon signed rank test was 3.28 and the 100 colors are distributing in the whole gamut uniformly, which ensures that this dataset can provide reasonable reference for choosing the color with highest
Statistical exploration of dataset examining key indicators influencing housing and urban infrastructure investments in megacities

Directory of Open Access Journals (Sweden)

Adedeji O. Afolabi

2018-06-01

Full Text Available Lagos, by the UN standards, has attained the megacity status, with the attendant challenges of living up to that titanic position; regrettably it struggles with its present stock of housing and infrastructural facilities to match its new status. Based on a survey of construction professionals’ perception residing within the state, a questionnaire instrument was used to gather the dataset. The statistical exploration contains dataset on the state of housing and urban infrastructural deficit, key indicators spurring the investment by government to upturn the deficit and improvement mechanisms to tackle the infrastructural dearth. Descriptive statistics and inferential statistics were used to present the dataset. The dataset when analyzed can be useful for policy makers, local and international governments, world funding bodies, researchers and infrastructural investors. Keywords: Construction, Housing, Megacities, Population, Urban infrastructures
Toxics Release Inventory Chemical Hazard Information Profiles (TRI-CHIP) Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — The Toxics Release Inventory (TRI) Chemical Hazard Information Profiles (TRI-CHIP) dataset contains hazard information about the chemicals reported in TRI. Users can...
Risk behaviours among internet-facilitated sex workers: evidence from two new datasets.

Science.gov (United States)

Cunningham, Scott; Kendall, Todd D

2010-12-01

Sex workers have historically played a central role in STI outbreaks by forming a core group for transmission and due to their higher rates of concurrency and inconsistent condom usage. Over the past 15 years, North American commercial sex markets have been radically reorganised by internet technologies that channelled a sizeable share of the marketplace online. These changes may have had a meaningful impact on the role that sex workers play in STI epidemics. In this study, two new datasets documenting the characteristics and practices of internet-facilitated sex workers are presented and analysed. The first dataset comes from a ratings website where clients share detailed information on over 94,000 sex workers in over 40 cities between 1999 and 2008. The second dataset reflects a year-long field survey of 685 sex workers who advertise online. Evidence from these datasets suggests that internet-facilitated sex workers are dissimilar from the street-based workers who largely populated the marketplace in earlier eras. Differences in characteristics and practices were found which suggest a lower potential for the spread of STIs among internet-facilitated sex workers. The internet-facilitated population appears to include a high proportion of sex workers who are well-educated, hold health insurance and operate only part time. They also engage in relatively low levels of risky sexual practices.
Extraction of drainage networks from large terrain datasets using high throughput computing

Science.gov (United States)

Gong, Jianya; Xie, Jibo

2009-02-01

Advanced digital photogrammetry and remote sensing technology produces large terrain datasets (LTD). How to process and use these LTD has become a big challenge for GIS users. Extracting drainage networks, which are basic for hydrological applications, from LTD is one of the typical applications of digital terrain analysis (DTA) in geographical information applications. Existing serial drainage algorithms cannot deal with large data volumes in a timely fashion, and few GIS platforms can process LTD beyond the GB size. High throughput computing (HTC), a distributed parallel computing mode, is proposed to improve the efficiency of drainage networks extraction from LTD. Drainage network extraction using HTC involves two key issues: (1) how to decompose the large DEM datasets into independent computing units and (2) how to merge the separate outputs into a final result. A new decomposition method is presented in which the large datasets are partitioned into independent computing units using natural watershed boundaries instead of using regular 1-dimensional (strip-wise) and 2-dimensional (block-wise) decomposition. Because the distribution of drainage networks is strongly related to watershed boundaries, the new decomposition method is more effective and natural. The method to extract natural watershed boundaries was improved by using multi-scale DEMs instead of single-scale DEMs. A HTC environment is employed to test the proposed methods with real datasets.
Dataset on predictive compressive strength model for self-compacting concrete.

Science.gov (United States)

Ofuyatan, O M; Edeki, S O

2018-04-01

The determination of compressive strength is affected by many variables such as the water cement (WC) ratio, the superplasticizer (SP), the aggregate combination, and the binder combination. In this dataset article, 7, 28, and 90-day compressive strength models are derived using statistical analysis. The response surface methodology is used toinvestigate the effect of the parameters: Varying percentages of ash, cement, WC, and SP on hardened properties-compressive strengthat 7,28 and 90 days. Thelevels of independent parameters are determinedbased on preliminary experiments. The experimental values for compressive strengthat 7, 28 and 90 days and modulus of elasticity underdifferent treatment conditions are also discussed and presented.These dataset can effectively be used for modelling and prediction in concrete production settings.
The effects of spatial population dataset choice on estimates of population at risk of disease

Directory of Open Access Journals (Sweden)

Gething Peter W

2011-02-01

Full Text Available Abstract Background The spatial modeling of infectious disease distributions and dynamics is increasingly being undertaken for health services planning and disease control monitoring, implementation, and evaluation. Where risks are heterogeneous in space or dependent on person-to-person transmission, spatial data on human population distributions are required to estimate infectious disease risks, burdens, and dynamics. Several different modeled human population distribution datasets are available and widely used, but the disparities among them and the implications for enumerating disease burdens and populations at risk have not been considered systematically. Here, we quantify some of these effects using global estimates of populations at risk (PAR of P. falciparum malaria as an example. Methods The recent construction of a global map of P. falciparum malaria endemicity enabled the testing of different gridded population datasets for providing estimates of PAR by endemicity class. The estimated population numbers within each class were calculated for each country using four different global gridded human population datasets: GRUMP (~1 km spatial resolution, LandScan (~1 km, UNEP Global Population Databases (~5 km, and GPW3 (~5 km. More detailed assessments of PAR variation and accuracy were conducted for three African countries where census data were available at a higher administrative-unit level than used by any of the four gridded population datasets. Results The estimates of PAR based on the datasets varied by more than 10 million people for some countries, even accounting for the fact that estimates of population totals made by different agencies are used to correct national totals in these datasets and can vary by more than 5% for many low-income countries. In many cases, these variations in PAR estimates comprised more than 10% of the total national population. The detailed country-level assessments suggested that none of the datasets was

AFSC/REFM: Seabird food habits dataset of the North Pacific

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — The seabird food habits dataset contains information on the stomach contents from seabird specimens that were collected under salvage and scientific collection...
Dataset of Phenology of Mediterranean high-mountain meadows flora (Sierra Nevada, Spain).

Science.gov (United States)

Pérez-Luque, Antonio Jesús; Sánchez-Rojas, Cristina Patricia; Zamora, Regino; Pérez-Pérez, Ramón; Bonet, Francisco Javier

2015-01-01

Sierra Nevada mountain range (southern Spain) hosts a high number of endemic plant species, being one of the most important biodiversity hotspots in the Mediterranean basin. The high-mountain meadow ecosystems (borreguiles) harbour a large number of endemic and threatened plant species. In this data paper, we describe a dataset of the flora inhabiting this threatened ecosystem in this Mediterranean mountain. The dataset includes occurrence data for flora collected in those ecosystems in two periods: 1988-1990 and 2009-2013. A total of 11002 records of occurrences belonging to 19 orders, 28 families 52 genera were collected. 73 taxa were recorded with 29 threatened taxa. We also included data of cover-abundance and phenology attributes for the records. The dataset is included in the Sierra Nevada Global-Change Observatory (OBSNEV), a long-term research project designed to compile socio-ecological information on the major ecosystem types in order to identify the impacts of global change in this area.
Dataset of Phenology of Mediterranean high-mountain meadows flora (Sierra Nevada, Spain)

Science.gov (United States)

Pérez-Luque, Antonio Jesús; Sánchez-Rojas, Cristina Patricia; Zamora, Regino; Pérez-Pérez, Ramón; Bonet, Francisco Javier

2015-01-01

Abstract Sierra Nevada mountain range (southern Spain) hosts a high number of endemic plant species, being one of the most important biodiversity hotspots in the Mediterranean basin. The high-mountain meadow ecosystems (borreguiles) harbour a large number of endemic and threatened plant species. In this data paper, we describe a dataset of the flora inhabiting this threatened ecosystem in this Mediterranean mountain. The dataset includes occurrence data for flora collected in those ecosystems in two periods: 1988–1990 and 2009–2013. A total of 11002 records of occurrences belonging to 19 orders, 28 families 52 genera were collected. 73 taxa were recorded with 29 threatened taxa. We also included data of cover-abundance and phenology attributes for the records. The dataset is included in the Sierra Nevada Global-Change Observatory (OBSNEV), a long-term research project designed to compile socio-ecological information on the major ecosystem types in order to identify the impacts of global change in this area. PMID:25878552
Dataset on information strategies for energy conservation: A field experiment in India.

Science.gov (United States)

Chen, Victor L; Delmas, Magali A; Locke, Stephen L; Singh, Amarjeet

2018-02-01

The data presented in this article are related to the research article entitled: "Information strategies for energy conservation: a field experiment in India" (Chen et al., 2017) [1]. The availability of high-resolution electricity data offers benefits to both utilities and consumers to understand the dynamics of energy consumption for example, between billing periods or times of peak demand. However, few public datasets with high-temporal resolution have been available to researchers on electricity use, especially at the appliance-level. This article describes data collected in a residential field experiment for 19 apartments at an Indian faculty housing complex during the period from August 1, 2013 to May 12, 2014. The dataset includes detailed information about electricity consumption. It also includes information on apartment characteristics and hourly weather variation to enable further studies of energy performance. These data can be used by researchers as training datasets to evaluate electricity usage consumption.
Valuation of large variable annuity portfolios: Monte Carlo simulation and synthetic datasets

Directory of Open Access Journals (Sweden)

Gan Guojun

2017-12-01

Full Text Available Metamodeling techniques have recently been proposed to address the computational issues related to the valuation of large portfolios of variable annuity contracts. However, it is extremely diffcult, if not impossible, for researchers to obtain real datasets frominsurance companies in order to test their metamodeling techniques on such real datasets and publish the results in academic journals. To facilitate the development and dissemination of research related to the effcient valuation of large variable annuity portfolios, this paper creates a large synthetic portfolio of variable annuity contracts based on the properties of real portfolios of variable annuities and implements a simple Monte Carlo simulation engine for valuing the synthetic portfolio. In addition, this paper presents fair market values and Greeks for the synthetic portfolio of variable annuity contracts that are important quantities for managing the financial risks associated with variable annuities. The resulting datasets can be used by researchers to test and compare the performance of various metamodeling techniques.
A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals

Directory of Open Access Journals (Sweden)

Claudiane A. Fukuchi

2018-04-01

Full Text Available In a typical clinical gait analysis, the gait patterns of pathological individuals are commonly compared with the typically faster, comfortable pace of healthy subjects. However, due to potential bias related to gait speed, this comparison may not be valid. Publicly available gait datasets have failed to address this issue. Therefore, the goal of this study was to present a publicly available dataset of 42 healthy volunteers (24 young adults and 18 older adults who walked both overground and on a treadmill at a range of gait speeds. Their lower-extremity and pelvis kinematics were measured using a three-dimensional (3D motion-capture system. The external forces during both overground and treadmill walking were collected using force plates and an instrumented treadmill, respectively. The results include both raw and processed kinematic and kinetic data in different file formats: c3d and ASCII files. In addition, a metadata file is provided that contain demographic and anthropometric data and data related to each file in the dataset. All data are available at Figshare (DOI: 10.6084/m9.figshare.5722711. We foresee several applications of this public dataset, including to examine the influences of speed, age, and environment (overground vs. treadmill on gait biomechanics, to meet educational needs, and, with the inclusion of additional participants, to use as a normative dataset.
Integration of geophysical datasets by a conjoint probability tomography approach: application to Italian active volcanic areas

Directory of Open Access Journals (Sweden)

D. Patella

2008-06-01

Full Text Available We expand the theory of probability tomography to the integration of different geophysical datasets. The aim of the new method is to improve the information quality using a conjoint occurrence probability function addressed to highlight the existence of common sources of anomalies. The new method is tested on gravity, magnetic and self-potential datasets collected in the volcanic area of Mt. Vesuvius (Naples, and on gravity and dipole geoelectrical datasets collected in the volcanic area of Mt. Etna (Sicily. The application demonstrates that, from a probabilistic point of view, the integrated analysis can delineate the signature of some important volcanic targets better than the analysis of the tomographic image of each dataset considered separately.
Data-Driven Decision Support for Radiologists: Re-using the National Lung Screening Trial Dataset for Pulmonary Nodule Management

OpenAIRE

Morrison, James J.; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L.

2014-01-01

Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was conve...
Comparision of analysis of the QTLMAS XII common dataset

DEFF Research Database (Denmark)

Lund, Mogens Sandø; Sahana, Goutam; de Koning, Dirk-Jan

2009-01-01

A dataset was simulated and distributed to participants of the QTLMAS XII workshop who were invited to develop genomic selection models. Each contributing group was asked to describe the model development and validation as well as to submit genomic predictions for three generations of individuals...
TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958-2015

Science.gov (United States)

Abatzoglou, John T.; Dobrowski, Solomon Z.; Parks, Sean A.; Hegewisch, Katherine C.

2018-01-01

We present TerraClimate, a dataset of high-spatial resolution (1/24°, ~4-km) monthly climate and climatic water balance for global terrestrial surfaces from 1958-2015. TerraClimate uses climatically aided interpolation, combining high-spatial resolution climatological normals from the WorldClim dataset, with coarser resolution time varying (i.e., monthly) data from other sources to produce a monthly dataset of precipitation, maximum and minimum temperature, wind speed, vapor pressure, and solar radiation. TerraClimate additionally produces monthly surface water balance datasets using a water balance model that incorporates reference evapotranspiration, precipitation, temperature, and interpolated plant extractable soil water capacity. These data provide important inputs for ecological and hydrological studies at global scales that require high spatial resolution and time varying climate and climatic water balance data. We validated spatiotemporal aspects of TerraClimate using annual temperature, precipitation, and calculated reference evapotranspiration from station data, as well as annual runoff from streamflow gauges. TerraClimate datasets showed noted improvement in overall mean absolute error and increased spatial realism relative to coarser resolution gridded datasets.
Gridded precipitation dataset for the Rhine basin made with the genRE interpolation method

NARCIS (Netherlands)

Osnabrugge, van B.; Uijlenhoet, R.

2017-01-01

A high resolution (1.2x1.2km) gridded precipitation dataset with hourly time step that covers the whole Rhine basin for the period 1997-2015. Made from gauge data with the genRE interpolation scheme. See "genRE: A method to extend gridded precipitation climatology datasets in near real-time for
Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking

KAUST Repository

Huang, Huang

2017-07-16

This thesis focuses on two topics, computational methods for large spatial datasets and functional data ranking. Both are tackling the challenges of big and high-dimensional data. The first topic is motivated by the prohibitive computational burden in fitting Gaussian process models to large and irregularly spaced spatial datasets. Various approximation methods have been introduced to reduce the computational cost, but many rely on unrealistic assumptions about the process and retaining statistical efficiency remains an issue. We propose a new scheme to approximate the maximum likelihood estimator and the kriging predictor when the exact computation is infeasible. The proposed method provides different types of hierarchical low-rank approximations that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements with the hierarchical low-rank approximation and apply the proposed fast kriging to fill gaps for satellite images. The second topic is motivated by rank-based outlier detection methods for functional data. Compared to magnitude outliers, it is more challenging to detect shape outliers as they are often masked among samples. We develop a new notion of functional data depth by taking the integration of a univariate depth function. Having a form of the integrated depth, it shares many desirable features. Furthermore, the novel formation leads to a useful decomposition for detecting both shape and magnitude outliers. Our simulation studies show the proposed outlier detection procedure outperforms competitors in various outlier models. We also illustrate our methodology using real datasets of curves, images, and video frames. Finally, we introduce the functional data ranking technique to spatio-temporal statistics for visualizing and assessing covariance properties, such as
SPATIO-TEMPORAL DATA MODEL FOR INTEGRATING EVOLVING NATION-LEVEL DATASETS

Directory of Open Access Journals (Sweden)

A. Sorokine

2017-10-01

Full Text Available Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc. and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets. Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.
National Hydrography Dataset (NHD) - USGS National Map Downloadable Data Collection

Data.gov (United States)

U.S. Geological Survey, Department of the Interior — The USGS National Hydrography Dataset (NHD) Downloadable Data Collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes...
Watershed Boundary Dataset (WBD) - USGS National Map Downloadable Data Collection

Data.gov (United States)

U.S. Geological Survey, Department of the Interior — The Watershed Boundary Dataset (WBD) from The National Map (TNM) defines the perimeter of drainage areas formed by the terrain and other landscape characteristics....
Investigations of bull trout (Salvelinus confluentus), steelhead trout (Oncorhynchus mykiss), and spring chinook salmon (O. tshawytscha) interactions in Southeast Washington streams. Final report 1992; FINAL

International Nuclear Information System (INIS)

Underwood, K.D.; Martin, S.W.; Schuck, M.L.; Scholz, A.T.

1995-01-01

The goal of this two year study was to determine if supplementation with hatchery reared steelhead trout (Oncorhynchus mykiss) and spring chinook salmon (O. tshawytscha) negatively impacted wild native bull trout (Salvelinus confluentus) through competitive interactions. Four streams with varying levels of fish supplementation activity were sampled in Southeast Washington. Tasks performed during this study were population density, relative abundance, microhabitat utilization, habitat availability, diet analysis, bull trout spawning ground surveys, radio telemetry of adult bull trout, and growth analysis. Results indicate that bull trout overlapped geographically with the supplemented species in each of the study streams suggesting competition among species was possible. Within a stream, bull trout and the supplemented species utilized dissimilar microhabitats and microhabitat utilization by each species was the same among streams suggesting that there was no shifts in microhabitat utilization among streams. The diet of bull trout and O. mykiss significantly overlapped in each of the study streams. The stream most intensely supplemented contained bull trout with the slowest growth and the non-supplemented stream contained bull trout with the fastest growth. Conversely, the stream most intensely supplemented contain steelhead with the fastest growth and the non-supplemented stream contained steelhead with the slowest growth. Growth indicated that bull trout may have been negatively impacted from supplementation, although other factors may have contributed. At current population levels, and current habitat quantity and quality, no impacts to bull trout as a result of supplementation with hatchery reared steelhead trout and spring chinook salmon were detected. Project limitations and future research recommendations are discussed
Dataset of anomalies and malicious acts in a cyber-physical subsystem.

Science.gov (United States)

Laso, Pedro Merino; Brosset, David; Puentes, John

2017-10-01

This article presents a dataset produced to investigate how data and information quality estimations enable to detect aNomalies and malicious acts in cyber-physical systems. Data were acquired making use of a cyber-physical subsystem consisting of liquid containers for fuel or water, along with its automated control and data acquisition infrastructure. Described data consist of temporal series representing five operational scenarios - Normal, aNomalies, breakdown, sabotages, and cyber-attacks - corresponding to 15 different real situations. The dataset is publicly available in the .zip file published with the article, to investigate and compare faulty operation detection and characterization methods for cyber-physical systems.
Datasets of mung bean proteins and metabolites from four different cultivars

Directory of Open Access Journals (Sweden)

Akiko Hashiguchi

2017-08-01

Full Text Available Plants produce a wide array of nutrients that exert synergistic interaction among whole combinations of nutrients. Therefore comprehensive nutrient profiling is required to evaluate their nutritional/nutraceutical value and health promoting effect. In order to obtain such datasets for mung bean, which is known as a medicinal plant with heat alleviating effect, proteomic and metabolomic analyses were performed using four cultivars from China, Thailand, and Myanmar. In total, 449 proteins and 210 metabolic compounds were identified in seed coat; whereas 480 proteins and 217 metabolic compounds were detected in seed flesh, establishing the first comprehensive dataset of mung bean for nutraceutical evaluation.
participatory development of a minimum dataset for the khayelitsha ...

African Journals Online (AJOL)

This dataset was integrated with data requirements at ... model for defining health information needs at district level. This participatory process has enabled health workers to appraise their .... of reproductive health, mental health, disability and community ... each chose a facilitator and met in between the forum meetings.
Re-inspection of small RNA sequence datasets reveals several novel human miRNA genes.

Directory of Open Access Journals (Sweden)

Thomas Birkballe Hansen

Full Text Available BACKGROUND: miRNAs are key players in gene expression regulation. To fully understand the complex nature of cellular differentiation or initiation and progression of disease, it is important to assess the expression patterns of as many miRNAs as possible. Thereby, identifying novel miRNAs is an essential prerequisite to make possible a comprehensive and coherent understanding of cellular biology. METHODOLOGY/PRINCIPAL FINDINGS: Based on two extensive, but previously published, small RNA sequence datasets from human embryonic stem cells and human embroid bodies, respectively [1], we identified 112 novel miRNA-like structures and were able to validate miRNA processing in 12 out of 17 investigated cases. Several miRNA candidates were furthermore substantiated by including additional available small RNA datasets, thereby demonstrating the power of combining datasets to identify miRNAs that otherwise may be assigned as experimental noise. CONCLUSIONS/SIGNIFICANCE: Our analysis highlights that existing datasets are not yet exhaustedly studied and continuous re-analysis of the available data is important to uncover all features of small RNA sequencing.

An enhanced topologically significant directed random walk in cancer classification using gene expression datasets

Directory of Open Access Journals (Sweden)

Choon Sen Seah

2017-12-01

Full Text Available Microarray technology has become one of the elementary tools for researchers to study the genome of organisms. As the complexity and heterogeneity of cancer is being increasingly appreciated through genomic analysis, cancerous classification is an emerging important trend. Significant directed random walk is proposed as one of the cancerous classification approach which have higher sensitivity of risk gene prediction and higher accuracy of cancer classification. In this paper, the methodology and material used for the experiment are presented. Tuning parameter selection method and weight as parameter are applied in proposed approach. Gene expression dataset is used as the input datasets while pathway dataset is used to build a directed graph, as reference datasets, to complete the bias process in random walk approach. In addition, we demonstrate that our approach can improve sensitive predictions with higher accuracy and biological meaningful classification result. Comparison result takes place between significant directed random walk and directed random walk to show the improvement in term of sensitivity of prediction and accuracy of cancer classification.
Merged SAGE II, Ozone_cci and OMPS ozone profile dataset and evaluation of ozone trends in the stratosphere

Directory of Open Access Journals (Sweden)

V. F. Sofieva

2017-10-01

Full Text Available In this paper, we present a merged dataset of ozone profiles from several satellite instruments: SAGE II on ERBS, GOMOS, SCIAMACHY and MIPAS on Envisat, OSIRIS on Odin, ACE-FTS on SCISAT, and OMPS on Suomi-NPP. The merged dataset is created in the framework of the European Space Agency Climate Change Initiative (Ozone_cci with the aim of analyzing stratospheric ozone trends. For the merged dataset, we used the latest versions of the original ozone datasets. The datasets from the individual instruments have been extensively validated and intercompared; only those datasets which are in good agreement, and do not exhibit significant drifts with respect to collocated ground-based observations and with respect to each other, are used for merging. The long-term SAGE–CCI–OMPS dataset is created by computation and merging of deseasonalized anomalies from individual instruments. The merged SAGE–CCI–OMPS dataset consists of deseasonalized anomalies of ozone in 10° latitude bands from 90° S to 90° N and from 10 to 50 km in steps of 1 km covering the period from October 1984 to July 2016. This newly created dataset is used for evaluating ozone trends in the stratosphere through multiple linear regression. Negative ozone trends in the upper stratosphere are observed before 1997 and positive trends are found after 1997. The upper stratospheric trends are statistically significant at midlatitudes and indicate ozone recovery, as expected from the decrease of stratospheric halogens that started in the middle of the 1990s and stratospheric cooling.
Supervised Variational Relevance Learning, An Analytic Geometric Feature Selection with Applications to Omic Datasets.

Science.gov (United States)

Boareto, Marcelo; Cesar, Jonatas; Leite, Vitor B P; Caticha, Nestor

2015-01-01

We introduce Supervised Variational Relevance Learning (Suvrel), a variational method to determine metric tensors to define distance based similarity in pattern classification, inspired in relevance learning. The variational method is applied to a cost function that penalizes large intraclass distances and favors small interclass distances. We find analytically the metric tensor that minimizes the cost function. Preprocessing the patterns by doing linear transformations using the metric tensor yields a dataset which can be more efficiently classified. We test our methods using publicly available datasets, for some standard classifiers. Among these datasets, two were tested by the MAQC-II project and, even without the use of further preprocessing, our results improve on their performance.
Genome-wide gene expression dataset used to identify potential therapeutic targets in androgenetic alopecia

Directory of Open Access Journals (Sweden)

R. Dey-Rao

2017-08-01

Full Text Available The microarray dataset attached to this report is related to the research article with the title: “A genomic approach to susceptibility and pathogenesis leads to identifying potential novel therapeutic targets in androgenetic alopecia” (Dey-Rao and Sinha, 2017 [1]. Male-pattern hair loss that is induced by androgens (testosterone in genetically predisposed individuals is known as androgenetic alopecia (AGA. The raw dataset is being made publicly available to enable critical and/or extended analyses. Our related research paper utilizes the attached raw dataset, for genome-wide gene-expression associated investigations. Combined with several in silico bioinformatics-based analyses we were able to delineate five strategic molecular elements as potential novel targets towards future AGA-therapy.
Semi-supervised tracking of extreme weather events in global spatio-temporal climate datasets

Science.gov (United States)

Kim, S. K.; Prabhat, M.; Williams, D. N.

2017-12-01

Deep neural networks have been successfully applied to solve problem to detect extreme weather events in large scale climate datasets and attend superior performance that overshadows all previous hand-crafted methods. Recent work has shown that multichannel spatiotemporal encoder-decoder CNN architecture is able to localize events in semi-supervised bounding box. Motivated by this work, we propose new learning metric based on Variational Auto-Encoders (VAE) and Long-Short-Term-Memory (LSTM) to track extreme weather events in spatio-temporal dataset. We consider spatio-temporal object tracking problems as learning probabilistic distribution of continuous latent features of auto-encoder using stochastic variational inference. For this, we assume that our datasets are i.i.d and latent features is able to be modeled by Gaussian distribution. In proposed metric, we first train VAE to generate approximate posterior given multichannel climate input with an extreme climate event at fixed time. Then, we predict bounding box, location and class of extreme climate events using convolutional layers given input concatenating three features including embedding, sampled mean and standard deviation. Lastly, we train LSTM with concatenated input to learn timely information of dataset by recurrently feeding output back to next time-step's input of VAE. Our contribution is two-fold. First, we show the first semi-supervised end-to-end architecture based on VAE to track extreme weather events which can apply to massive scaled unlabeled climate datasets. Second, the information of timely movement of events is considered for bounding box prediction using LSTM which can improve accuracy of localization. To our knowledge, this technique has not been explored neither in climate community or in Machine Learning community.
Software ion scan functions in analysis of glycomic and lipidomic MS/MS datasets.

Science.gov (United States)

Haramija, Marko

2018-03-01

Hardware ion scan functions unique to tandem mass spectrometry (MS/MS) mode of data acquisition, such as precursor ion scan (PIS) and neutral loss scan (NLS), are important for selective extraction of key structural data from complex MS/MS spectra. However, their software counterparts, software ion scan (SIS) functions, are still not regularly available. Software ion scan functions can be easily coded for additional functionalities, such as software multiple precursor ion scan, software no ion scan, and software variable ion scan functions. These are often necessary, since they allow more efficient analysis of complex MS/MS datasets, often encountered in glycomics and lipidomics. Software ion scan functions can be easily coded by using modern script languages and can be independent of instrument manufacturer. Here we demonstrate the utility of SIS functions on a medium-size glycomic MS/MS dataset. Knowledge of sample properties, as well as of diagnostic and conditional diagnostic ions crucial for data analysis, was needed. Based on the tables constructed with the output data from the SIS functions performed, a detailed analysis of a complex MS/MS glycomic dataset could be carried out in a quick, accurate, and efficient manner. Glycomic research is progressing slowly, and with respect to the MS experiments, one of the key obstacles for moving forward is the lack of appropriate bioinformatic tools necessary for fast analysis of glycomic MS/MS datasets. Adding novel SIS functionalities to the glycomic MS/MS toolbox has a potential to significantly speed up the glycomic data analysis process. Similar tools are useful for analysis of lipidomic MS/MS datasets as well, as will be discussed briefly. Copyright © 2017 John Wiley & Sons, Ltd.
On standardization of basic datasets of electronic medical records in traditional Chinese medicine.

Science.gov (United States)

Zhang, Hong; Ni, Wandong; Li, Jing; Jiang, Youlin; Liu, Kunjing; Ma, Zhaohui

2017-12-24

Standardization of electronic medical record, so as to enable resource-sharing and information exchange among medical institutions has become inevitable in view of the ever increasing medical information. The current research is an effort towards the standardization of basic dataset of electronic medical records in traditional Chinese medicine. In this work, an outpatient clinical information model and an inpatient clinical information model are created to adequately depict the diagnosis processes and treatment procedures of traditional Chinese medicine. To be backward compatible with the existing dataset standard created for western medicine, the new standard shall be a superset of the existing standard. Thus, the two models are checked against the existing standard in conjunction with 170,000 medical record cases. If a case cannot be covered by the existing standard due to the particularity of Chinese medicine, then either an existing data element is expanded with some Chinese medicine contents or a new data element is created. Some dataset subsets are also created to group and record Chinese medicine special diagnoses and treatments such as acupuncture. The outcome of this research is a proposal of standardized traditional Chinese medicine medical records datasets. The proposal has been verified successfully in three medical institutions with hundreds of thousands of medical records. A new dataset standard for traditional Chinese medicine is proposed in this paper. The proposed standard, covering traditional Chinese medicine as well as western medicine, is expected to be soon approved by the authority. A widespread adoption of this proposal will enable traditional Chinese medicine hospitals and institutions to easily exchange information and share resources. Copyright © 2017. Published by Elsevier B.V.
A large-scale dataset of solar event reports from automated feature recognition modules

Science.gov (United States)

Schuh, Michael A.; Angryk, Rafal A.; Martens, Petrus C.

2016-05-01

The massive repository of images of the Sun captured by the Solar Dynamics Observatory (SDO) mission has ushered in the era of Big Data for Solar Physics. In this work, we investigate the entire public collection of events reported to the Heliophysics Event Knowledgebase (HEK) from automated solar feature recognition modules operated by the SDO Feature Finding Team (FFT). With the SDO mission recently surpassing five years of operations, and over 280,000 event reports for seven types of solar phenomena, we present the broadest and most comprehensive large-scale dataset of the SDO FFT modules to date. We also present numerous statistics on these modules, providing valuable contextual information for better understanding and validating of the individual event reports and the entire dataset as a whole. After extensive data cleaning through exploratory data analysis, we highlight several opportunities for knowledge discovery from data (KDD). Through these important prerequisite analyses presented here, the results of KDD from Solar Big Data will be overall more reliable and better understood. As the SDO mission remains operational over the coming years, these datasets will continue to grow in size and value. Future versions of this dataset will be analyzed in the general framework established in this work and maintained publicly online for easy access by the community.
Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments.

Science.gov (United States)

Keuleers, Emmanuel; Balota, David A

2015-01-01

This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics. Finally, as is the case across the papers, we highlight some methodological issues that are brought forth via the analyses of such datasets.
Estimation of Missed Statin Prescription Use in an Administrative Claims Dataset.

Science.gov (United States)

Wade, Rolin L; Patel, Jeetvan G; Hill, Jerrold W; De, Ajita P; Harrison, David J

2017-09-01

Nonadherence to statin medications is associated with increased risk of cardiovascular disease and poses a challenge to lipid management in patients who are at risk for atherosclerotic cardiovascular disease. Numerous studies have examined statin adherence based on administrative claims data; however, these data may underestimate statin use in patients who participate in generic drug discount programs or who have alternative coverage. To estimate the proportion of patients with missing statin claims in a claims database and determine how missing claims affect commonly used utilization metrics. This retrospective cohort study used pharmacy data from the PharMetrics Plus (P+) claims dataset linked to the IMS longitudinal pharmacy point-of-sale prescription database (LRx) from January 1, 2012, through December 31, 2014. Eligible patients were represented in the P+ and LRx datasets, had ≥1 claim for a statin (index claim) in either database, and had ≥ 24 months of continuous enrollment in P+. Patients were linked between P+ and LRx using a deterministic method. Duplicate claims between LRx and P+ were removed to produce a new dataset comprised of P+ claims augmented with LRx claims. Statin use was then compared between P+ and the augmented P+ dataset. Utilization metrics that were evaluated included percentage of patients with ≥ 1 missing statin claim over 12 months in P+; the number of patients misclassified as new users in P+; the number of patients misclassified as nonstatin users in P+; the change in 12-month medication possession ratio (MPR) and proportion of days covered (PDC) in P+; the comparison between P+ and LRx of classifications of statin treatment patterns (statin intensity and patients with treatment modifications); and the payment status for missing statin claims. Data from 965,785 patients with statin claims in P+ were analyzed (mean age 56.6 years; 57% male). In P+, 20.1% had ≥ 1 missing statin claim post-index; 13.7% were misclassified as
Benchmarking Deep Learning Models on Large Healthcare Datasets.

Science.gov (United States)

Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

2018-06-04

Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.
Comparision of analysis of the QTLMAS XII common dataset

DEFF Research Database (Denmark)

Crooks, Lucy; Sahana, Goutam; de Koning, Dirk-Jan

2009-01-01

As part of the QTLMAS XII workshop, a simulated dataset was distributed and participants were invited to submit analyses of the data based on genome-wide association, fine mapping and genomic selection. We have evaluated the findings from the groups that reported fine mapping and genome-wide asso...
A Bayesian trans-dimensional approach for the fusion of multiple geophysical datasets

Science.gov (United States)

JafarGandomi, Arash; Binley, Andrew

2013-09-01

We propose a Bayesian fusion approach to integrate multiple geophysical datasets with different coverage and sensitivity. The fusion strategy is based on the capability of various geophysical methods to provide enough resolution to identify either subsurface material parameters or subsurface structure, or both. We focus on electrical resistivity as the target material parameter and electrical resistivity tomography (ERT), electromagnetic induction (EMI), and ground penetrating radar (GPR) as the set of geophysical methods. However, extending the approach to different sets of geophysical parameters and methods is straightforward. Different geophysical datasets are entered into a trans-dimensional Markov chain Monte Carlo (McMC) search-based joint inversion algorithm. The trans-dimensional property of the McMC algorithm allows dynamic parameterisation of the model space, which in turn helps to avoid bias of the post-inversion results towards a particular model. Given that we are attempting to develop an approach that has practical potential, we discretize the subsurface into an array of one-dimensional earth-models. Accordingly, the ERT data that are collected by using two-dimensional acquisition geometry are re-casted to a set of equivalent vertical electric soundings. Different data are inverted either individually or jointly to estimate one-dimensional subsurface models at discrete locations. We use Shannon's information measure to quantify the information obtained from the inversion of different combinations of geophysical datasets. Information from multiple methods is brought together via introducing joint likelihood function and/or constraining the prior information. A Bayesian maximum entropy approach is used for spatial fusion of spatially dispersed estimated one-dimensional models and mapping of the target parameter. We illustrate the approach with a synthetic dataset and then apply it to a field dataset. We show that the proposed fusion strategy is
SAR image dataset of military ground targets with multiple poses for ATR

Science.gov (United States)

Belloni, Carole; Balleri, Alessio; Aouf, Nabil; Merlet, Thomas; Le Caillec, Jean-Marc

2017-10-01

Automatic Target Recognition (ATR) is the task of automatically detecting and classifying targets. Recognition using Synthetic Aperture Radar (SAR) images is interesting because SAR images can be acquired at night and under any weather conditions, whereas optical sensors operating in the visible band do not have this capability. Existing SAR ATR algorithms have mostly been evaluated using the MSTAR dataset.1 The problem with the MSTAR is that some of the proposed ATR methods have shown good classification performance even when targets were hidden,2 suggesting the presence of a bias in the dataset. Evaluations of SAR ATR techniques are currently challenging due to the lack of publicly available data in the SAR domain. In this paper, we present a high resolution SAR dataset consisting of images of a set of ground military target models taken at various aspect angles, The dataset can be used for a fair evaluation and comparison of SAR ATR algorithms. We applied the Inverse Synthetic Aperture Radar (ISAR) technique to echoes from targets rotating on a turntable and illuminated with a stepped frequency waveform. The targets in the database consist of four variants of two 1.7m-long models of T-64 and T-72 tanks. The gun, the turret position and the depression angle are varied to form 26 different sequences of images. The emitted signal spanned the frequency range from 13 GHz to 18 GHz to achieve a bandwidth of 5 GHz sampled with 4001 frequency points. The resolution obtained with respect to the size of the model targets is comparable to typical values obtained using SAR airborne systems. Single polarized images (Horizontal-Horizontal) are generated using the backprojection algorithm.3 A total of 1480 images are produced using a 20° integration angle. The images in the dataset are organized in a suggested training and testing set to facilitate a standard evaluation of SAR ATR algorithms.
Internationally coordinated glacier monitoring: strategy and datasets

Science.gov (United States)

Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

2014-05-01

(c) the Randolph Glacier Inventory (RGI), a new and globally complete digital dataset of outlines from about 180,000 glaciers with some meta-information, which has been used for many applications relating to the IPCC AR5 report. Concerning glacier changes, a database (Fluctuations of Glaciers) exists containing information about mass balance, front variations including past reconstructed time series, geodetic changes and special events. Annual mass balance reporting contains information for about 125 glaciers with a subset of 37 glaciers with continuous observational series since 1980 or earlier. Front variation observations of around 1800 glaciers are available from most of the mountain ranges world-wide. This database was recently updated with 26 glaciers having an unprecedented dataset of length changes from from reconstructions of well-dated historical evidence going back as far as the 16th century. Geodetic observations of about 430 glaciers are available. The database is completed by a dataset containing information on special events including glacier surges, glacier lake outbursts, ice avalanches, eruptions of ice-clad volcanoes, etc. related to about 200 glaciers. A special database of glacier photographs contains 13,000 pictures from around 500 glaciers, some of them dating back to the 19th century. A key challenge is to combine and extend the traditional observations with fast evolving datasets from new technologies.
BayesMotif: de novo protein sorting motif discovery from impure datasets.

Science.gov (United States)

Hu, Jianjun; Zhang, Fan

2010-01-18

Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of
Relative Error Evaluation to Typical Open Global dem Datasets in Shanxi Plateau of China

Science.gov (United States)

Zhao, S.; Zhang, S.; Cheng, W.

2018-04-01

Produced by radar data or stereo remote sensing image pairs, global DEM datasets are one of the most important types for DEM data. Relative error relates to surface quality created by DEM data, so it relates to geomorphology and hydrologic applications using DEM data. Taking Shanxi Plateau of China as the study area, this research evaluated the relative error to typical open global DEM datasets including Shuttle Radar Terrain Mission (SRTM) data with 1 arc second resolution (SRTM1), SRTM data with 3 arc second resolution (SRTM3), ASTER global DEM data in the second version (GDEM-v2) and ALOS world 3D-30m (AW3D) data. Through process and selection, more than 300,000 ICESat/GLA14 points were used as the GCP data, and the vertical error was computed and compared among four typical global DEM datasets. Then, more than 2,600,000 ICESat/GLA14 point pairs were acquired using the distance threshold between 100 m and 500 m. Meanwhile, the horizontal distance between every point pair was computed, so the relative error was achieved using slope values based on vertical error difference and the horizontal distance of the point pairs. Finally, false slope ratio (FSR) index was computed through analyzing the difference between DEM and ICESat/GLA14 values for every point pair. Both relative error and FSR index were categorically compared for the four DEM datasets under different slope classes. Research results show: Overall, AW3D has the lowest relative error values in mean error, mean absolute error, root mean square error and standard deviation error; then the SRTM1 data, its values are a little higher than AW3D data; the SRTM3 and GDEM-v2 data have the highest relative error values, and the values for the two datasets are similar. Considering different slope conditions, all the four DEM data have better performance in flat areas but worse performance in sloping regions; AW3D has the best performance in all the slope classes, a litter better than SRTM1; with slope increasing
Spatial and temporal distribution of bull trout (Salvelinus confluentus)-size fish near the floating surface collector in the North Fork Reservoir, Oregon, 2016

Science.gov (United States)

Adams, Noah S.; Smith, Collin D.

2017-06-26

Acoustic cameras were used to assess the behavior and abundance of bull trout (Salvelinus confluentus)-size fish at the entrance to the North Fork Reservoir juvenile fish floating surface collector (FSC). The purpose of the FSC is to collect downriver migrating juvenile salmonids at the North Fork Dam, and safely route them around the hydroelectric projects. The objective of the acoustic camera component of this study was to assess the behaviors of bull trout-size fish observed near the FSC, and to determine if the presence of bull trout-size fish influenced the collection or abundance of juvenile salmonids. Acoustic cameras were deployed near the surface and floor of the entrance to the FSC. The acoustic camera technology was an informative tool for assessing abundance and spatial and temporal behaviors of bull trout-size fish near the entrance of the FSC. Bull trout-size fish were regularly observed near the entrance, with greater abundances on the deep camera than on the shallow camera. Additionally, greater abundances were observed during the hours of sunlight than were observed during the night. Behavioral differences also were observed at the two depths, with surface fish traveling faster and straighter with more directed movement, and fish observed on the deep camera generally showing more milling behavior. Modeling potential predator-prey interactions and influences using collected passive integrated transponder (PIT) -tagged juvenile salmonids proved largely unpredictable, although these fish provided relevant timing and collection information. Overall, the results indicate that bull trout-size fish are present near the entrance of the FSC, concomitant with juvenile salmonids, and their abundances and behaviors indicate that they may be drawn to the entrance of the FSC because of the abundance of prey-sized fish.
Mapping Global Ocean Surface Albedo from Satellite Observations: Models, Algorithms, and Datasets

Science.gov (United States)

Li, X.; Fan, X.; Yan, H.; Li, A.; Wang, M.; Qu, Y.

2018-04-01

Ocean surface albedo (OSA) is one of the important parameters in surface radiation budget (SRB). It is usually considered as a controlling factor of the heat exchange among the atmosphere and ocean. The temporal and spatial dynamics of OSA determine the energy absorption of upper level ocean water, and have influences on the oceanic currents, atmospheric circulations, and transportation of material and energy of hydrosphere. Therefore, various parameterizations and models have been developed for describing the dynamics of OSA. However, it has been demonstrated that the currently available OSA datasets cannot full fill the requirement of global climate change studies. In this study, we present a literature review on mapping global OSA from satellite observations. The models (parameterizations, the coupled ocean-atmosphere radiative transfer (COART), and the three component ocean water albedo (TCOWA)), algorithms (the estimation method based on reanalysis data, and the direct-estimation algorithm), and datasets (the cloud, albedo and radiation (CLARA) surface albedo product, dataset derived by the TCOWA model, and the global land surface satellite (GLASS) phase-2 surface broadband albedo product) of OSA have been discussed, separately.
A Novel Technique for Time-Centric Analysis of Massive Remotely-Sensed Datasets

Directory of Open Access Journals (Sweden)

Glenn E. Grant

2015-04-01

Full Text Available Analyzing massive remotely-sensed datasets presents formidable challenges. The volume of satellite imagery collected often outpaces analytical capabilities, however thorough analyses of complete datasets may provide new insights into processes that would otherwise be unseen. In this study we present a novel, object-oriented approach to storing, retrieving, and analyzing large remotely-sensed datasets. The objective is to provide a new structure for scalable storage and rapid, Internet-based analysis of climatology data. The concept of a “data rod” is introduced, a conceptual data object that organizes time-series information into a temporally-oriented vertical column at any given location. To demonstrate one possible use, we ingest 25 years of Greenland imagery into a series of pure-object databases, then retrieve and analyze the data. The results provide a basis for evaluating the database performance and scientific analysis capabilities. The project succeeds in demonstrating the effectiveness of the prototype database architecture and analysis approach, not because new scientific information is discovered, but because quality control issues are revealed in the source data that had gone undetected for years.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.