WorldWideScience

Sample records for data-driven neighborhood selection

  1. Selection of the Sample for Data-Driven $Z \\to \

    CERN Document Server

    Krauss, Martin

    2009-01-01

    The topic of this study was to improve the selection of the sample for data-driven Z → ν ν background estimation, which is a major contribution in supersymmetric searches in ̄ a no-lepton search mode. The data is based on Z → + − samples using data created with ATLAS simulation software. This method works if two leptons are reconstructed, but using cuts that are typical for SUSY searches reconstruction efficiency for electrons and muons is rather low. For this reason it was tried to enhance the data sample. Therefore events were considered, where only one electron was reconstructed. In this case the invariant mass for the electron and each jet was computed to select the jet with the best match for the Z boson mass as not reconstructed electron. This way the sample can be extended but significantly looses purity because of also reconstructed background events. To improve this method other variables have to be considered which were not available for this study. Applying a similar method to muons using ...

  2. Practical options for selecting data-driven or physics-based prognostics algorithms with reviews

    International Nuclear Information System (INIS)

    An, Dawn; Kim, Nam H.; Choi, Joo-Ho

    2015-01-01

    This paper is to provide practical options for prognostics so that beginners can select appropriate methods for their fields of application. To achieve this goal, several popular algorithms are first reviewed in the data-driven and physics-based prognostics methods. Each algorithm’s attributes and pros and cons are analyzed in terms of model definition, model parameter estimation and ability to handle noise and bias in data. Fatigue crack growth examples are then used to illustrate the characteristics of different algorithms. In order to suggest a suitable algorithm, several studies are made based on the number of data sets, the level of noise and bias, availability of loading and physical models, and complexity of the damage growth behavior. Based on the study, it is concluded that the Gaussian process is easy and fast to implement, but works well only when the covariance function is properly defined. The neural network has the advantage in the case of large noise and complex models but only with many training data sets. The particle filter and Bayesian method are superior to the former methods because they are less affected by noise and model complexity, but work only when physical model and loading conditions are available. - Highlights: • Practical review of data-driven and physics-based prognostics are provided. • As common prognostics algorithms, NN, GP, PF and BM are introduced. • Algorithms’ attributes, pros and cons, and applicable conditions are discussed. • This will be helpful to choose the best algorithm for different applications

  3. Input variable selection for data-driven models of Coriolis flowmeters for two-phase flow measurement

    International Nuclear Information System (INIS)

    Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao

    2017-01-01

    Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction. (paper)

  4. On the selection of user-defined parameters in data-driven stochastic subspace identification

    Science.gov (United States)

    Priori, C.; De Angelis, M.; Betti, R.

    2018-02-01

    The paper focuses on the time domain output-only technique called Data-Driven Stochastic Subspace Identification (DD-SSI); in order to identify modal models (frequencies, damping ratios and mode shapes), the role of its user-defined parameters is studied, and rules to determine their minimum values are proposed. Such investigation is carried out using, first, the time histories of structural responses to stationary excitations, with a large number of samples, satisfying the hypothesis on the input imposed by DD-SSI. Then, the case of non-stationary seismic excitations with a reduced number of samples is considered. In this paper, partitions of the data matrix different from the one proposed in the SSI literature are investigated, together with the influence of different choices of the weighting matrices. The study is carried out considering two different applications: (1) data obtained from vibration tests on a scaled structure and (2) in-situ tests on a reinforced concrete building. Referring to the former, the identification of a steel frame structure tested on a shaking table is performed using its responses in terms of absolute accelerations to a stationary (white noise) base excitation and to non-stationary seismic excitations of low intensity. Black-box and modal models are identified in both cases and the results are compared with those from an input-output subspace technique. With regards to the latter, the identification of a complex hospital building is conducted using data obtained from ambient vibration tests.

  5. Redefining the Practice of Peer Review Through Intelligent Automation Part 2: Data-Driven Peer Review Selection and Assignment.

    Science.gov (United States)

    Reiner, Bruce I

    2017-12-01

    In conventional radiology peer review practice, a small number of exams (routinely 5% of the total volume) is randomly selected, which may significantly underestimate the true error rate within a given radiology practice. An alternative and preferable approach would be to create a data-driven model which mathematically quantifies a peer review risk score for each individual exam and uses this data to identify high risk exams and readers, and selectively target these exams for peer review. An analogous model can also be created to assist in the assignment of these peer review cases in keeping with specific priorities of the service provider. An additional option to enhance the peer review process would be to assign the peer review cases in a truly blinded fashion. In addition to eliminating traditional peer review bias, this approach has the potential to better define exam-specific standard of care, particularly when multiple readers participate in the peer review process.

  6. Regional regression models of percentile flows for the contiguous United States: Expert versus data-driven independent variable selection

    Directory of Open Access Journals (Sweden)

    Geoffrey Fouad

    2018-06-01

    New hydrological insights for the region: A set of three variables selected based on an expert assessment of factors that influence percentile flows performed similarly to larger sets of variables selected using a data-driven method. Expert assessment variables included mean annual precipitation, potential evapotranspiration, and baseflow index. Larger sets of up to 37 variables contributed little, if any, additional predictive information. Variables used to describe the distribution of basin data (e.g. standard deviation were not useful, and average values were sufficient to characterize physical and climatic basin conditions. Effectiveness of the expert assessment variables may be due to the high degree of multicollinearity (i.e. cross-correlation among additional variables. A tool is provided in the Supplementary material to predict percentile flows based on the three expert assessment variables. Future work should develop new variables with a strong understanding of the processes related to percentile flows.

  7. Multivariate modeling of complications with data driven variable selection: Guarding against overfitting and effects of data set size

    International Nuclear Information System (INIS)

    Schaaf, Arjen van der; Xu Chengjian; Luijk, Peter van; Veld, Aart A. van’t; Langendijk, Johannes A.; Schilstra, Cornelis

    2012-01-01

    Purpose: Multivariate modeling of complications after radiotherapy is frequently used in conjunction with data driven variable selection. This study quantifies the risk of overfitting in a data driven modeling method using bootstrapping for data with typical clinical characteristics, and estimates the minimum amount of data needed to obtain models with relatively high predictive power. Materials and methods: To facilitate repeated modeling and cross-validation with independent datasets for the assessment of true predictive power, a method was developed to generate simulated data with statistical properties similar to real clinical data sets. Characteristics of three clinical data sets from radiotherapy treatment of head and neck cancer patients were used to simulate data with set sizes between 50 and 1000 patients. A logistic regression method using bootstrapping and forward variable selection was used for complication modeling, resulting for each simulated data set in a selected number of variables and an estimated predictive power. The true optimal number of variables and true predictive power were calculated using cross-validation with very large independent data sets. Results: For all simulated data set sizes the number of variables selected by the bootstrapping method was on average close to the true optimal number of variables, but showed considerable spread. Bootstrapping is more accurate in selecting the optimal number of variables than the AIC and BIC alternatives, but this did not translate into a significant difference of the true predictive power. The true predictive power asymptotically converged toward a maximum predictive power for large data sets, and the estimated predictive power converged toward the true predictive power. More than half of the potential predictive power is gained after approximately 200 samples. Our simulations demonstrated severe overfitting (a predicative power lower than that of predicting 50% probability) in a number of small

  8. A data-driven multi-model methodology with deep feature selection for short-term wind forecasting

    International Nuclear Information System (INIS)

    Feng, Cong; Cui, Mingjian; Hodge, Bri-Mathias; Zhang, Jie

    2017-01-01

    Highlights: • An ensemble model is developed to produce both deterministic and probabilistic wind forecasts. • A deep feature selection framework is developed to optimally determine the inputs to the forecasting methodology. • The developed ensemble methodology has improved the forecasting accuracy by up to 30%. - Abstract: With the growing wind penetration into the power system worldwide, improving wind power forecasting accuracy is becoming increasingly important to ensure continued economic and reliable power system operations. In this paper, a data-driven multi-model wind forecasting methodology is developed with a two-layer ensemble machine learning technique. The first layer is composed of multiple machine learning models that generate individual forecasts. A deep feature selection framework is developed to determine the most suitable inputs to the first layer machine learning models. Then, a blending algorithm is applied in the second layer to create an ensemble of the forecasts produced by first layer models and generate both deterministic and probabilistic forecasts. This two-layer model seeks to utilize the statistically different characteristics of each machine learning algorithm. A number of machine learning algorithms are selected and compared in both layers. This developed multi-model wind forecasting methodology is compared to several benchmarks. The effectiveness of the proposed methodology is evaluated to provide 1-hour-ahead wind speed forecasting at seven locations of the Surface Radiation network. Numerical results show that comparing to the single-algorithm models, the developed multi-model framework with deep feature selection procedure has improved the forecasting accuracy by up to 30%.

  9. An optimal baseline selection methodology for data-driven damage detection and temperature compensation in acousto-ultrasonics

    International Nuclear Information System (INIS)

    Torres-Arredondo, M-A; Sierra-Pérez, Julián; Cabanes, Guénaël

    2016-01-01

    The process of measuring and analysing the data from a distributed sensor network all over a structural system in order to quantify its condition is known as structural health monitoring (SHM). For the design of a trustworthy health monitoring system, a vast amount of information regarding the inherent physical characteristics of the sources and their propagation and interaction across the structure is crucial. Moreover, any SHM system which is expected to transition to field operation must take into account the influence of environmental and operational changes which cause modifications in the stiffness and damping of the structure and consequently modify its dynamic behaviour. On that account, special attention is paid in this paper to the development of an efficient SHM methodology where robust signal processing and pattern recognition techniques are integrated for the correct interpretation of complex ultrasonic waves within the context of damage detection and identification. The methodology is based on an acousto-ultrasonics technique where the discrete wavelet transform is evaluated for feature extraction and selection, linear principal component analysis for data-driven modelling and self-organising maps for a two-level clustering under the principle of local density. At the end, the methodology is experimentally demonstrated and results show that all the damages were detectable and identifiable. (paper)

  10. An optimal baseline selection methodology for data-driven damage detection and temperature compensation in acousto-ultrasonics

    Science.gov (United States)

    Torres-Arredondo, M.-A.; Sierra-Pérez, Julián; Cabanes, Guénaël

    2016-05-01

    The process of measuring and analysing the data from a distributed sensor network all over a structural system in order to quantify its condition is known as structural health monitoring (SHM). For the design of a trustworthy health monitoring system, a vast amount of information regarding the inherent physical characteristics of the sources and their propagation and interaction across the structure is crucial. Moreover, any SHM system which is expected to transition to field operation must take into account the influence of environmental and operational changes which cause modifications in the stiffness and damping of the structure and consequently modify its dynamic behaviour. On that account, special attention is paid in this paper to the development of an efficient SHM methodology where robust signal processing and pattern recognition techniques are integrated for the correct interpretation of complex ultrasonic waves within the context of damage detection and identification. The methodology is based on an acousto-ultrasonics technique where the discrete wavelet transform is evaluated for feature extraction and selection, linear principal component analysis for data-driven modelling and self-organising maps for a two-level clustering under the principle of local density. At the end, the methodology is experimentally demonstrated and results show that all the damages were detectable and identifiable.

  11. Data-Driven Derivation of an "Informer Compound Set" for Improved Selection of Active Compounds in High-Throughput Screening.

    Science.gov (United States)

    Paricharak, Shardul; IJzerman, Adriaan P; Jenkins, Jeremy L; Bender, Andreas; Nigsch, Florian

    2016-09-26

    Despite the usefulness of high-throughput screening (HTS) in drug discovery, for some systems, low assay throughput or high screening cost can prohibit the screening of large numbers of compounds. In such cases, iterative cycles of screening involving active learning (AL) are employed, creating the need for smaller "informer sets" that can be routinely screened to build predictive models for selecting compounds from the screening collection for follow-up screens. Here, we present a data-driven derivation of an informer compound set with improved predictivity of active compounds in HTS, and we validate its benefit over randomly selected training sets on 46 PubChem assays comprising at least 300,000 compounds and covering a wide range of assay biology. The informer compound set showed improvement in BEDROC(α = 100), PRAUC, and ROCAUC values averaged over all assays of 0.024, 0.014, and 0.016, respectively, compared to randomly selected training sets, all with paired t-test p-values agnostic fashion. This approach led to a consistent improvement in hit rates in follow-up screens without compromising scaffold retrieval. The informer set is adjustable in size depending on the number of compounds one intends to screen, as performance gains are realized for sets with more than 3,000 compounds, and this set is therefore applicable to a variety of situations. Finally, our results indicate that random sampling may not adequately cover descriptor space, drawing attention to the importance of the composition of the training set for predicting actives.

  12. Data-driven batch schuduling

    Energy Technology Data Exchange (ETDEWEB)

    Bent, John [Los Alamos National Laboratory; Denehy, Tim [GOOGLE; Arpaci - Dusseau, Remzi [UNIV OF WISCONSIN; Livny, Miron [UNIV OF WISCONSIN; Arpaci - Dusseau, Andrea C [NON LANL

    2009-01-01

    In this paper, we develop data-driven strategies for batch computing schedulers. Current CPU-centric batch schedulers ignore the data needs within workloads and execute them by linking them transparently and directly to their needed data. When scheduled on remote computational resources, this elegant solution of direct data access can incur an order of magnitude performance penalty for data-intensive workloads. Adding data-awareness to batch schedulers allows a careful coordination of data and CPU allocation thereby reducing the cost of remote execution. We offer here new techniques by which batch schedulers can become data-driven. Such systems can use our analytical predictive models to select one of the four data-driven scheduling policies that we have created. Through simulation, we demonstrate the accuracy of our predictive models and show how they can reduce time to completion for some workloads by as much as 80%.

  13. Data-driven storytelling

    CERN Document Server

    Hurter, Christophe; Diakopoulos, Nicholas ed.; Carpendale, Sheelagh

    2018-01-01

    This book is an accessible introduction to data-driven storytelling, resulting from discussions between data visualization researchers and data journalists. This book will be the first to define the topic, present compelling examples and existing resources, as well as identify challenges and new opportunities for research.

  14. Ecological neighborhoods as a framework for umbrella species selection

    Science.gov (United States)

    Stuber, Erica F.; Fontaine, Joseph J.

    2018-01-01

    Umbrella species are typically chosen because they are expected to confer protection for other species assumed to have similar ecological requirements. Despite its popularity and substantial history, the value of the umbrella species concept has come into question because umbrella species chosen using heuristic methods, such as body or home range size, are not acting as adequate proxies for the metrics of interest: species richness or population abundance in a multi-species community for which protection is sought. How species associate with habitat across ecological scales has important implications for understanding population size and species richness, and therefore may be a better proxy for choosing an umbrella species. We determined the spatial scales of ecological neighborhoods important for predicting abundance of 8 potential umbrella species breeding in Nebraska using Bayesian latent indicator scale selection in N-mixture models accounting for imperfect detection. We compare the conservation value measured as collective avian abundance under different umbrella species selected following commonly used criteria and selected based on identifying spatial land cover characteristics within ecological neighborhoods that maximize collective abundance. Using traditional criteria to select an umbrella species resulted in sub-maximal expected collective abundance in 86% of cases compared to selecting an umbrella species based on land cover characteristics that maximized collective abundance directly. We conclude that directly assessing the expected quantitative outcomes, rather than ecological proxies, is likely the most efficient method to maximize the potential for conservation success under the umbrella species concept.

  15. On Lack of Robustness in Hydrological Model Development Due to Absence of Guidelines for Selecting Calibration and Evaluation Data: Demonstration for Data-Driven Models

    Science.gov (United States)

    Zheng, Feifei; Maier, Holger R.; Wu, Wenyan; Dandy, Graeme C.; Gupta, Hoshin V.; Zhang, Tuqiao

    2018-02-01

    Hydrological models are used for a wide variety of engineering purposes, including streamflow forecasting and flood-risk estimation. To develop such models, it is common to allocate the available data to calibration and evaluation data subsets. Surprisingly, the issue of how this allocation can affect model evaluation performance has been largely ignored in the research literature. This paper discusses the evaluation performance bias that can arise from how available data are allocated to calibration and evaluation subsets. As a first step to assessing this issue in a statistically rigorous fashion, we present a comprehensive investigation of the influence of data allocation on the development of data-driven artificial neural network (ANN) models of streamflow. Four well-known formal data splitting methods are applied to 754 catchments from Australia and the U.S. to develop 902,483 ANN models. Results clearly show that the choice of the method used for data allocation has a significant impact on model performance, particularly for runoff data that are more highly skewed, highlighting the importance of considering the impact of data splitting when developing hydrological models. The statistical behavior of the data splitting methods investigated is discussed and guidance is offered on the selection of the most appropriate data splitting methods to achieve representative evaluation performance for streamflow data with different statistical properties. Although our results are obtained for data-driven models, they highlight the fact that this issue is likely to have a significant impact on all types of hydrological models, especially conceptual rainfall-runoff models.

  16. A critical review of seven selected neighborhood sustainability assessment tools

    Energy Technology Data Exchange (ETDEWEB)

    Sharifi, Ayyoob, E-mail: sharifi.ayyoob@a.mbox.nagoya-u.ac.jp; Murayama, Akito, E-mail: murayama@corot.nuac.nagoya-u.ac.jp

    2013-01-15

    Neighborhood sustainability assessment tools have become widespread since the turn of 21st century and many communities, mainly in the developed world, are utilizing these tools to measure their success in approaching sustainable development goals. In this study, seven tools from Australia, Europe, Japan, and the United States are selected and analyzed with the aim of providing insights into the current situations; highlighting the strengths, weaknesses, successes, and failures; and making recommendations for future improvements. Using a content analysis, the issues of sustainability coverage, pre-requisites, local adaptability, scoring and weighting, participation, reporting, and applicability are discussed in this paper. The results of this study indicate that most of the tools are not doing well regarding the coverage of social, economic, and institutional aspects of sustainability; there are ambiguities and shortcomings in the weighting, scoring, and rating; in most cases, there is no mechanism for local adaptability and participation; and, only those tools which are embedded within the broader planning framework are doing well with regard to applicability. - Highlights: Black-Right-Pointing-Pointer Seven widely used assessment tools were analyzed. Black-Right-Pointing-Pointer There is a lack of balanced assessment of sustainability dimensions. Black-Right-Pointing-Pointer Tools are not doing well regarding the applicability. Black-Right-Pointing-Pointer Refinements are needed to make the tools more effective. Black-Right-Pointing-Pointer Assessment tools must be integrated into the planning process.

  17. A critical review of seven selected neighborhood sustainability assessment tools

    International Nuclear Information System (INIS)

    Sharifi, Ayyoob; Murayama, Akito

    2013-01-01

    Neighborhood sustainability assessment tools have become widespread since the turn of 21st century and many communities, mainly in the developed world, are utilizing these tools to measure their success in approaching sustainable development goals. In this study, seven tools from Australia, Europe, Japan, and the United States are selected and analyzed with the aim of providing insights into the current situations; highlighting the strengths, weaknesses, successes, and failures; and making recommendations for future improvements. Using a content analysis, the issues of sustainability coverage, pre-requisites, local adaptability, scoring and weighting, participation, reporting, and applicability are discussed in this paper. The results of this study indicate that most of the tools are not doing well regarding the coverage of social, economic, and institutional aspects of sustainability; there are ambiguities and shortcomings in the weighting, scoring, and rating; in most cases, there is no mechanism for local adaptability and participation; and, only those tools which are embedded within the broader planning framework are doing well with regard to applicability. - Highlights: ► Seven widely used assessment tools were analyzed. ► There is a lack of balanced assessment of sustainability dimensions. ► Tools are not doing well regarding the applicability. ► Refinements are needed to make the tools more effective. ► Assessment tools must be integrated into the planning process.

  18. Health selection into neighborhoods among patients enrolled in a clinical trial

    Directory of Open Access Journals (Sweden)

    Mariana C. Arcaya

    2017-12-01

    Full Text Available Health selection into neighborhoods may contribute to geographic health disparities. We demonstrate the potential for clinical trial data to help clarify the causal role of health on locational attainment. We used data from the 20-year United Kingdom Prospective Diabetes Study (UKPDS to explore whether random assignment to intensive blood-glucose control therapy, which improved long-term health outcomes after median 10 years follow-up, subsequently affected what neighborhoods patients lived in. We extracted postcode-level deprivation indices for the 2710 surviving participants of UKPDS living in England at study end in 1996/1997. We observed small neighborhood advantages in the intensive versus conventional therapy group, although these differences were not statistically significant. This analysis failed to show conclusive evidence of health selection into neighborhoods, but data suggest the hypothesis may be worthy of exploration in other clinical trials or in a meta-analysis. Keywords: Neighborhoods, Self-selection, Health, Equity, Socioeconomic status

  19. Challenges of Data-driven Healthcare Management

    DEFF Research Database (Denmark)

    Bossen, Claus; Danholt, Peter; Ubbesen, Morten Bonde

    This paper describes the new kind of data-work involved in developing data-driven healthcare based on two cases from Denmark: The first case concerns a governance infrastructure based on Diagnose-Related Groups (DRG), which was introduced in Denmark in the 1990s. The DRG-system links healthcare...... activity and financing and relies of extensive data entry, reporting and calculations. This has required the development of new skills, work and work roles. The second case concerns a New Governance project aimed at developing new performance indicators for healthcare delivery as an alternative to DRG....... Here, a core challenge is select indicators and actually being able to acquire data upon them. The two cases point out that data-driven healthcare requires more and new kinds of work for which new skills, functions and work roles have to be developed....

  20. Data driven marketing for dummies

    CERN Document Server

    Semmelroth, David

    2013-01-01

    Embrace data and use it to sell and market your products Data is everywhere and it keeps growing and accumulating. Companies need to embrace big data and make it work harder to help them sell and market their products. Successful data analysis can help marketing professionals spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Data Driven Marketing For Dummies helps companies use all the data at their disposal to make current customers more satisfied, reach new customers, and sell to their most important customer segments more efficiently. Identifyi

  1. Maximum relevance, minimum redundancy band selection based on neighborhood rough set for hyperspectral data classification

    International Nuclear Information System (INIS)

    Liu, Yao; Chen, Yuehua; Tan, Kezhu; Xie, Hong; Wang, Liguo; Xie, Wu; Yan, Xiaozhen; Xu, Zhen

    2016-01-01

    Band selection is considered to be an important processing step in handling hyperspectral data. In this work, we selected informative bands according to the maximal relevance minimal redundancy (MRMR) criterion based on neighborhood mutual information. Two measures MRMR difference and MRMR quotient were defined and a forward greedy search for band selection was constructed. The performance of the proposed algorithm, along with a comparison with other methods (neighborhood dependency measure based algorithm, genetic algorithm and uninformative variable elimination algorithm), was studied using the classification accuracy of extreme learning machine (ELM) and random forests (RF) classifiers on soybeans’ hyperspectral datasets. The results show that the proposed MRMR algorithm leads to promising improvement in band selection and classification accuracy. (paper)

  2. Mutation Accumulation, Soft Selection and the Middle-Class Neighborhood

    Science.gov (United States)

    Moorad, Jacob A.; Hall, David W.

    2009-01-01

    The “middle-class neighborhood” is a breeding design intended to allow new mutations to accumulate by lessening the effects of purifying selection through the elimination of among-line fitness variation. We show that this design effectively applies soft selection to the experimental population, potentially causing biased estimates of mutational effects if social effects contribute to fitness. PMID:19448272

  3. Hyperspectral band selection based on consistency-measure of neighborhood rough set theory

    International Nuclear Information System (INIS)

    Liu, Yao; Xie, Hong; Wang, Liguo; Tan, Kezhu; Chen, Yuehua; Xu, Zhen

    2016-01-01

    Band selection is a well-known approach for reducing dimensionality in hyperspectral imaging. In this paper, a band selection method based on consistency-measure of neighborhood rough set theory (CMNRS) was proposed to select informative bands from hyperspectral images. A decision-making information system was established by the reflection spectrum of soybeans’ hyperspectral data between 400 nm and 1000 nm wavelengths. The neighborhood consistency-measure, which reflects not only the size of the decision positive region, but also the sample distribution in the boundary region, was used as the evaluation function of band significance. The optimal band subset was selected by a forward greedy search algorithm. A post-pruning strategy was employed to overcome the over-fitting problem and find the minimum subset. To assess the effectiveness of the proposed band selection technique, two classification models (extreme learning machine (ELM) and random forests (RF)) were built. The experimental results showed that the proposed algorithm can effectively select key bands and obtain satisfactory classification accuracy. (paper)

  4. The Difference-in-Difference Method: Assessing the Selection Bias in the Effects of Neighborhood Environment on Health

    Science.gov (United States)

    Grafova, Irina; Freedman, Vicki; Lurie, Nicole; Kumar, Rizie; Rogowski, Jeannette

    2013-01-01

    This paper uses the difference-in-difference estimation approach to explore the self-selection bias in estimating the effect of neighborhood economic environment on self-assessed health among older adults. The results indicate that there is evidence of downward bias in the conventional estimates of the effect of neighborhood economic disadvantage on self-reported health, representing a lower bound of the true effect. PMID:23623818

  5. Data-driven architectural production and operation

    NARCIS (Netherlands)

    Bier, H.H.; Mostafavi, S.

    2014-01-01

    Data-driven architectural production and operation as explored within Hyperbody rely heavily on system thinking implying that all parts of a system are to be understood in relation to each other. These relations are increasingly established bi-directionally so that data-driven architecture is not

  6. Data Driven Economic Model Predictive Control

    Directory of Open Access Journals (Sweden)

    Masoud Kheradmandi

    2018-04-01

    Full Text Available This manuscript addresses the problem of data driven model based economic model predictive control (MPC design. To this end, first, a data-driven Lyapunov-based MPC is designed, and shown to be capable of stabilizing a system at an unstable equilibrium point. The data driven Lyapunov-based MPC utilizes a linear time invariant (LTI model cognizant of the fact that the training data, owing to the unstable nature of the equilibrium point, has to be obtained from closed-loop operation or experiments. Simulation results are first presented demonstrating closed-loop stability under the proposed data-driven Lyapunov-based MPC. The underlying data-driven model is then utilized as the basis to design an economic MPC. The economic improvements yielded by the proposed method are illustrated through simulations on a nonlinear chemical process system example.

  7. Data-Driven Problems in Elasticity

    Science.gov (United States)

    Conti, S.; Müller, S.; Ortiz, M.

    2018-01-01

    We consider a new class of problems in elasticity, referred to as Data-Driven problems, defined on the space of strain-stress field pairs, or phase space. The problem consists of minimizing the distance between a given material data set and the subspace of compatible strain fields and stress fields in equilibrium. We find that the classical solutions are recovered in the case of linear elasticity. We identify conditions for convergence of Data-Driven solutions corresponding to sequences of approximating material data sets. Specialization to constant material data set sequences in turn establishes an appropriate notion of relaxation. We find that relaxation within this Data-Driven framework is fundamentally different from the classical relaxation of energy functions. For instance, we show that in the Data-Driven framework the relaxation of a bistable material leads to material data sets that are not graphs.

  8. Consistent data-driven computational mechanics

    Science.gov (United States)

    González, D.; Chinesta, F.; Cueto, E.

    2018-05-01

    We present a novel method, within the realm of data-driven computational mechanics, to obtain reliable and thermodynamically sound simulation from experimental data. We thus avoid the need to fit any phenomenological model in the construction of the simulation model. This kind of techniques opens unprecedented possibilities in the framework of data-driven application systems and, particularly, in the paradigm of industry 4.0.

  9. Data-driven regionalization of housing markets

    NARCIS (Netherlands)

    Helbich, M.; Brunauer, W.; Hagenauer, J.; Leitner, M.

    2013-01-01

    This article presents a data-driven framework for housing market segmentation. Local marginal house price surfaces are investigated by means of mixed geographically weighted regression and are reduced to a set of principal component maps, which in turn serve as input for spatial regionalization. The

  10. Data Driven Constraints for the SVM

    DEFF Research Database (Denmark)

    Darkner, Sune; Clemmensen, Line Katrine Harder

    2012-01-01

    We propose a generalized data driven constraint for support vector machines exemplified by classification of paired observations in general and specifically on the human ear canal. This is particularly interesting in dynamic cases such as tissue movement or pathologies developing over time. Assum...

  11. Data Driven Tuning of Inventory Controllers

    DEFF Research Database (Denmark)

    Huusom, Jakob Kjøbsted; Santacoloma, Paloma Andrade; Poulsen, Niels Kjølstad

    2007-01-01

    A systematic method for criterion based tuning of inventory controllers based on data-driven iterative feedback tuning is presented. This tuning method circumvent problems with modeling bias. The process model used for the design of the inventory control is utilized in the tuning...... as an approximation to reduce time required on experiments. The method is illustrated in an application with a multivariable inventory control implementation on a four tank system....

  12. Finding candidate locations for aerosol pollution monitoring at street level using a data-driven methodology

    Science.gov (United States)

    Moosavi, V.; Aschwanden, G.; Velasco, E.

    2015-09-01

    Finding the number and best locations of fixed air quality monitoring stations at street level is challenging because of the complexity of the urban environment and the large number of factors affecting the pollutants concentration. Data sets of such urban parameters as land use, building morphology and street geometry in high-resolution grid cells in combination with direct measurements of airborne pollutants at high frequency (1-10 s) along a reasonable number of streets can be used to interpolate concentration of pollutants in a whole gridded domain and determine the optimum number of monitoring sites and best locations for a network of fixed monitors at ground level. In this context, a data-driven modeling methodology is developed based on the application of Self-Organizing Map (SOM) to approximate the nonlinear relations between urban parameters (80 in this work) and aerosol pollution data, such as mass and number concentrations measured along streets of a commercial/residential neighborhood of Singapore. Cross-validations between measured and predicted aerosol concentrations based on the urban parameters at each individual grid cell showed satisfying results. This proof of concept study showed that the selected urban parameters proved to be an appropriate indirect measure of aerosol concentrations within the studied area. The potential locations for fixed air quality monitors are identified through clustering of areas (i.e., group of cells) with similar urban patterns. The typological center of each cluster corresponds to the most representative cell for all other cells in the cluster. In the studied neighborhood four different clusters were identified and for each cluster potential sites for air quality monitoring at ground level are identified.

  13. Durham Neighborhood Compass Neighborhoods

    Data.gov (United States)

    City and County of Durham, North Carolina — The Durham Neighborhood Compass is a quantitative indicators project with qualitative values, integrating data from local government, the Census Bureau and other...

  14. Data-driven workflows for microservices

    DEFF Research Database (Denmark)

    Safina, Larisa; Mazzara, Manuel; Montesi, Fabrizio

    2016-01-01

    Microservices is an architectural style inspired by service-oriented computing that has recently started gainingpopularity. Jolie is a programming language based on the microservices paradigm: the main building block of Jolie systems are services, in contrast to, e.g., functions or objects....... The primitives offered by the Jolie language elicit many of the recurring patterns found in microservices, like load balancers and structured processes. However, Jolie still lacks some useful constructs for dealing with message types and data manipulation that are present in service-oriented computing......). We show the impact of our implementation on some of the typical scenarios found in microservice systems. This shows how computation can move from a process-driven to a data-driven approach, and leads to the preliminary identification of recurring communication patterns that can be shaped as design...

  15. Data-Driven Security-Constrained OPF

    DEFF Research Database (Denmark)

    Thams, Florian; Halilbasic, Lejla; Pinson, Pierre

    2017-01-01

    considerations, while being less conservative than current approaches. Our approach can be scalable for large systems, accounts explicitly for power system security, and enables the electricity market to identify a cost-efficient dispatch avoiding redispatching actions. We demonstrate the performance of our......In this paper we unify electricity market operations with power system security considerations. Using data-driven techniques, we address both small signal stability and steady-state security, derive tractable decision rules in the form of line flow limits, and incorporate the resulting constraints...... in market clearing algorithms. Our goal is to minimize redispatching actions, and instead allow the market to determine the most cost-efficient dispatch while considering all security constraints. To maintain tractability of our approach we perform our security assessment offline, examining large datasets...

  16. Social causation and neighborhood selection underlie associations of neighborhood factors with illicit drug-using social networks and illicit drug use among adults relocated from public housing.

    Science.gov (United States)

    Linton, Sabriya L; Haley, Danielle F; Hunter-Jones, Josalin; Ross, Zev; Cooper, Hannah L F

    2017-07-01

    Theories of social causation and social influence, which posit that neighborhood and social network characteristics are distal causes of substance use, are frequently used to interpret associations among neighborhood characteristics, social network characteristics and substance use. These associations are also hypothesized to result from selection processes, in which substance use determines where people live and who they interact with. The potential for these competing selection mechanisms to co-occur has been underexplored among adults. This study utilizes path analysis to determine the paths that relate census tract characteristics (e.g., economic deprivation), social network characteristics (i.e., having ≥ 1 illicit drug-using network member) and illicit drug use, among 172 African American adults relocated from public housing in Atlanta, Georgia and followed from 2009 to 2014 (7 waves). Individual and network-level characteristics were captured using surveys. Census tract characteristics were created using administrative data. Waves 1 (pre-relocation), 2 (1st wave post-relocation), and 7 were analyzed. When controlling for individual-level sociodemographic factors, residing in census tracts with prior economic disadvantage was significantly associated with illicit drug use at wave 1; illicit drug use at wave 1 was significantly associated with living in economically-disadvantaged census tracts at wave 2; and violent crime at wave 2 was associated with illicit drug-using social network members at wave 7. Findings from this study support theories that describe social causation and neighborhood selection processes as explaining relationships of neighborhood characteristics with illicit drug use and illicit drug-using social networks. Policies that improve local economic and social conditions of neighborhoods may discourage substance use. Future studies should further identify the barriers that prevent substance users from obtaining housing in less

  17. Combining engineering and data-driven approaches

    DEFF Research Database (Denmark)

    Fischer, Katharina; De Sanctis, Gianluca; Kohler, Jochen

    2015-01-01

    Two general approaches may be followed for the development of a fire risk model: statistical models based on observed fire losses can support simple cost-benefit studies but are usually not detailed enough for engineering decision-making. Engineering models, on the other hand, require many assump...... to the calibration of a generic fire risk model for single family houses to Swiss insurance data. The example demonstrates that the bias in the risk estimation can be strongly reduced by model calibration.......Two general approaches may be followed for the development of a fire risk model: statistical models based on observed fire losses can support simple cost-benefit studies but are usually not detailed enough for engineering decision-making. Engineering models, on the other hand, require many...... assumptions that may result in a biased risk assessment. In two related papers we show how engineering and data-driven modelling can be combined by developing generic risk models that are calibrated to statistical data on observed fire events. The focus of the present paper is on the calibration procedure...

  18. Data driven modelling of vertical atmospheric radiation

    International Nuclear Information System (INIS)

    Antoch, Jaromir; Hlubinka, Daniel

    2011-01-01

    In the Czech Hydrometeorological Institute (CHMI) there exists a unique set of meteorological measurements consisting of the values of vertical atmospheric levels of beta and gamma radiation. In this paper a stochastic data-driven model based on nonlinear regression and on nonhomogeneous Poisson process is suggested. In the first part of the paper, growth curves were used to establish an appropriate nonlinear regression model. For comparison we considered a nonhomogeneous Poisson process with its intensity based on growth curves. In the second part both approaches were applied to the real data and compared. Computational aspects are briefly discussed as well. The primary goal of this paper is to present an improved understanding of the distribution of environmental radiation as obtained from the measurements of the vertical radioactivity profiles by the radioactivity sonde system. - Highlights: → We model vertical atmospheric levels of beta and gamma radiation. → We suggest appropriate nonlinear regression model based on growth curves. → We compare nonlinear regression modelling with Poisson process based modeling. → We apply both models to the real data.

  19. Data driven innovations in structural health monitoring

    Science.gov (United States)

    Rosales, M. J.; Liyanapathirana, R.

    2017-05-01

    At present, substantial investments are being allocated to civil infrastructures also considered as valuable assets at a national or global scale. Structural Health Monitoring (SHM) is an indispensable tool required to ensure the performance and safety of these structures based on measured response parameters. The research to date on damage assessment has tended to focus on the utilization of wireless sensor networks (WSN) as it proves to be the best alternative over the traditional visual inspections and tethered or wired counterparts. Over the last decade, the structural health and behaviour of innumerable infrastructure has been measured and evaluated owing to several successful ventures of implementing these sensor networks. Various monitoring systems have the capability to rapidly transmit, measure, and store large capacities of data. The amount of data collected from these networks have eventually been unmanageable which paved the way to other relevant issues such as data quality, relevance, re-use, and decision support. There is an increasing need to integrate new technologies in order to automate the evaluation processes as well as to enhance the objectivity of data assessment routines. This paper aims to identify feasible methodologies towards the application of time-series analysis techniques to judiciously exploit the vast amount of readily available as well as the upcoming data resources. It continues the momentum of a greater effort to collect and archive SHM approaches that will serve as data-driven innovations for the assessment of damage through efficient algorithms and data analytics.

  20. Rolling Bearing Fault Diagnosis Using Modified Neighborhood Preserving Embedding and Maximal Overlap Discrete Wavelet Packet Transform with Sensitive Features Selection

    Directory of Open Access Journals (Sweden)

    Fei Dong

    2018-01-01

    Full Text Available In order to enhance the performance of bearing fault diagnosis and classification, features extraction and features dimensionality reduction have become more important. The original statistical feature set was calculated from single branch reconstruction vibration signals obtained by using maximal overlap discrete wavelet packet transform (MODWPT. In order to reduce redundancy information of original statistical feature set, features selection by adjusted rand index and sum of within-class mean deviations (FSASD was proposed to select fault sensitive features. Furthermore, a modified features dimensionality reduction method, supervised neighborhood preserving embedding with label information (SNPEL, was proposed to realize low-dimensional representations for high-dimensional feature space. Finally, vibration signals collected from two experimental test rigs were employed to evaluate the performance of the proposed procedure. The results show that the effectiveness, adaptability, and superiority of the proposed procedure can serve as an intelligent bearing fault diagnosis system.

  1. Objective, Quantitative, Data-Driven Assessment of Chemical Probes.

    Science.gov (United States)

    Antolin, Albert A; Tym, Joseph E; Komianou, Angeliki; Collins, Ian; Workman, Paul; Al-Lazikani, Bissan

    2018-02-15

    Chemical probes are essential tools for understanding biological systems and for target validation, yet selecting probes for biomedical research is rarely based on objective assessment of all potential compounds. Here, we describe the Probe Miner: Chemical Probes Objective Assessment resource, capitalizing on the plethora of public medicinal chemistry data to empower quantitative, objective, data-driven evaluation of chemical probes. We assess >1.8 million compounds for their suitability as chemical tools against 2,220 human targets and dissect the biases and limitations encountered. Probe Miner represents a valuable resource to aid the identification of potential chemical probes, particularly when used alongside expert curation. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. Helioseismic and neutrino data-driven reconstruction of solar properties

    Science.gov (United States)

    Song, Ningqiang; Gonzalez-Garcia, M. C.; Villante, Francesco L.; Vinyoles, Nuria; Serenelli, Aldo

    2018-06-01

    In this work, we use Bayesian inference to quantitatively reconstruct the solar properties most relevant to the solar composition problem using as inputs the information provided by helioseismic and solar neutrino data. In particular, we use a Gaussian process to model the functional shape of the opacity uncertainty to gain flexibility and become as free as possible from prejudice in this regard. With these tools we first readdress the statistical significance of the solar composition problem. Furthermore, starting from a composition unbiased set of standard solar models (SSMs) we are able to statistically select those with solar chemical composition and other solar inputs which better describe the helioseismic and neutrino observations. In particular, we are able to reconstruct the solar opacity profile in a data-driven fashion, independently of any reference opacity tables, obtaining a 4 per cent uncertainty at the base of the convective envelope and 0.8 per cent at the solar core. When systematic uncertainties are included, results are 7.5 per cent and 2 per cent, respectively. In addition, we find that the values of most of the other inputs of the SSMs required to better describe the helioseismic and neutrino data are in good agreement with those adopted as the standard priors, with the exception of the astrophysical factor S11 and the microscopic diffusion rates, for which data suggests a 1 per cent and 30 per cent reduction, respectively. As an output of the study we derive the corresponding data-driven predictions for the solar neutrino fluxes.

  3. Data-driven architectural design to production and operation

    NARCIS (Netherlands)

    Bier, H.H.; Mostafavi, S.

    2015-01-01

    Data-driven architectural production and operation explored within Hyperbody rely heavily on system thinking implying that all parts of a system are to be understood in relation to each other. These relations are established bi-directionally so that data-driven architecture is not only produced

  4. Data-Driven Methods to Diversify Knowledge of Human Psychology

    OpenAIRE

    Jack, Rachael E.; Crivelli, Carlos; Wheatley, Thalia

    2017-01-01

    open access article Psychology aims to understand real human behavior. However, cultural biases in the scientific process can constrain knowledge. We describe here how data-driven methods can relax these constraints to reveal new insights that theories can overlook. To advance knowledge we advocate a symbiotic approach that better combines data-driven methods with theory.

  5. Data driven smooth tests for composite hypotheses

    NARCIS (Netherlands)

    Inglot, Tadeusz; Kallenberg, Wilbert C.M.; Ledwina, Teresa

    1997-01-01

    The classical problem of testing goodness-of-fit of a parametric family is reconsidered. A new test for this problem is proposed and investigated. The new test statistic is a combination of the smooth test statistic and Schwarz's selection rule. More precisely, as the sample size increases, an

  6. Data-driven approach for auditory profiling

    DEFF Research Database (Denmark)

    Sanchez Lopez, Raul; Bianchi, Federica; Fereczkowski, Michal

    2017-01-01

    Nowadays, the pure-tone audiogram is the main tool used to characterizehearing loss and to fit hearing aids. However, the perceptual consequencesof hearing loss are typically not only associated with a loss of sensitivity, butalso with a clarity loss that is not captured by the audiogram. A detai......-in-noise perception. The current approach is promising for analyzingother existing data sets in order to select the most relevant tests for auditoryprofiling....

  7. Dynamic Data-Driven UAV Network for Plume Characterization

    Science.gov (United States)

    2016-05-23

    AFRL-AFOSR-VA-TR-2016-0203 Dynamic Data-Driven UAV Network for Plume Characterization Kamran Mohseni UNIVERSITY OF FLORIDA Final Report 05/23/2016...AND SUBTITLE Dynamic Data-Driven UAV Network for Plume Characterization 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA9550-13-1-0090 5c.  PROGRAM ELEMENT...studied a dynamic data driven (DDD) approach to operation of a heterogeneous team of unmanned aerial vehicles ( UAVs ) or micro/miniature aerial

  8. Data-Driven Exercises for Chemistry: A New Digital Collection

    Science.gov (United States)

    Grubbs, W. Tandy

    2007-01-01

    The analysis presents a new digital collection for various data-driven exercises that are used for teaching chemistry to the students. Such methods are expected to help the students to think in a more scientific manner.

  9. Data-Driven Model Order Reduction for Bayesian Inverse Problems

    KAUST Repository

    Cui, Tiangang; Youssef, Marzouk; Willcox, Karen

    2014-01-01

    One of the major challenges in using MCMC for the solution of inverse problems is the repeated evaluation of computationally expensive numerical models. We develop a data-driven projection- based model order reduction technique to reduce

  10. Dynamically adaptive data-driven simulation of extreme hydrological flows

    KAUST Repository

    Kumar Jain, Pushkar; Mandli, Kyle; Hoteit, Ibrahim; Knio, Omar; Dawson, Clint

    2017-01-01

    evacuation in real-time and through the development of resilient infrastructure based on knowledge of how systems respond to extreme events. Data-driven computational modeling is a critical technology underpinning these efforts. This investigation focuses

  11. Neighborhood cohesion, neighborhood disorder, and cardiometabolic risk.

    Science.gov (United States)

    Robinette, Jennifer W; Charles, Susan T; Gruenewald, Tara L

    2018-02-01

    Perceptions of neighborhood disorder (trash, vandalism) and cohesion (neighbors trust one another) are related to residents' health. Affective and behavioral factors have been identified, but often in studies using geographically select samples. We use a nationally representative sample (n = 9032) of United States older adults from the Health and Retirement Study to examine cardiometabolic risk in relation to perceptions of neighborhood cohesion and disorder. Lower cohesion is significantly related to greater cardiometabolic risk in 2006/2008 and predicts greater risk four years later (2010/2012). The longitudinal relation is partially accounted for by anxiety and physical activity. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. NEIGHBORHOOD CHOICE AND NEIGHBORHOOD CHANGE

    OpenAIRE

    Bruch, Elizabeth; Mare, Robert D.

    2006-01-01

    This paper examines the relationships between the residential choices of individuals and aggregate patterns of neighborhood change. We investigate the conditions under which individuals’ preferences for the race-ethnic composition of their neighborhoods produce high levels of segregation. Using computational models, we find that high levels of segregation occur only when individuals’ preferences follow a threshold function. If individuals make finer-grained distinctions among neighborhoods th...

  13. Incorporating Neighborhood Choice in a Model of Neighborhood Effects on Income.

    Science.gov (United States)

    van Ham, Maarten; Boschman, Sanne; Vogel, Matt

    2018-05-09

    Studies of neighborhood effects often attempt to identify causal effects of neighborhood characteristics on individual outcomes, such as income, education, employment, and health. However, selection looms large in this line of research, and it has been argued that estimates of neighborhood effects are biased because people nonrandomly select into neighborhoods based on their preferences, income, and the availability of alternative housing. We propose a two-step framework to disentangle selection processes in the relationship between neighborhood deprivation and earnings. We model neighborhood selection using a conditional logit model, from which we derive correction terms. Driven by the recognition that most households prefer certain types of neighborhoods rather than specific areas, we employ a principle components analysis to reduce these terms into eight correction components. We use these to adjust parameter estimates from a model of subsequent neighborhood effects on individual income for the unequal probability that a household chooses to live in a particular type of neighborhood. We apply this technique to administrative data from the Netherlands. After we adjust for the differential sorting of households into certain types of neighborhoods, the effect of neighborhood income on individual income diminishes but remains significant. These results further emphasize that researchers need to be attuned to the role of selection bias when assessing the role of neighborhood effects on individual outcomes. Perhaps more importantly, the persistent effect of neighborhood deprivation on subsequent earnings suggests that neighborhood effects reflect more than the shared characteristics of neighborhood residents: place of residence partially determines economic well-being.

  14. Data-driven classification of patients with primary progressive aphasia.

    Science.gov (United States)

    Hoffman, Paul; Sajjadi, Seyed Ahmad; Patterson, Karalyn; Nestor, Peter J

    2017-11-01

    Current diagnostic criteria classify primary progressive aphasia into three variants-semantic (sv), nonfluent (nfv) and logopenic (lv) PPA-though the adequacy of this scheme is debated. This study took a data-driven approach, applying k-means clustering to data from 43 PPA patients. The algorithm grouped patients based on similarities in language, semantic and non-linguistic cognitive scores. The optimum solution consisted of three groups. One group, almost exclusively those diagnosed as svPPA, displayed a selective semantic impairment. A second cluster, with impairments to speech production, repetition and syntactic processing, contained a majority of patients with nfvPPA but also some lvPPA patients. The final group exhibited more severe deficits to speech, repetition and syntax as well as semantic and other cognitive deficits. These results suggest that, amongst cases of non-semantic PPA, differentiation mainly reflects overall degree of language/cognitive impairment. The observed patterns were scarcely affected by inclusion/exclusion of non-linguistic cognitive scores. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Data driven fault detection and isolation: a wind turbine scenario

    Directory of Open Access Journals (Sweden)

    Rubén Francisco Manrique Piramanrique

    2015-04-01

    Full Text Available One of the greatest drawbacks in wind energy generation is the high maintenance cost associated to mechanical faults. This problem becomes more evident in utility scale wind turbines, where the increased size and nominal capacity comes with additional problems associated with structural vibrations and aeroelastic effects in the blades. Due to the increased operation capability, it is imperative to detect system degradation and faults in an efficient manner, maintaining system integrity, reliability and reducing operation costs. This paper presents a comprehensive comparison of four different Fault Detection and Isolation (FDI filters based on “Data Driven” (DD techniques. In order to enhance FDI performance, a multi-level strategy is used where:  the first level detects the occurrence of any given fault (detection, while  the second identifies the source of the fault (isolation. Four different DD classification techniques (namely Support Vector Machines, Artificial Neural Networks, K Nearest Neighbors and Gaussian Mixture Models were studied and compared for each of the proposed classification levels. The best strategy at each level could be selected to build the final data driven FDI system. The performance of the proposed scheme is evaluated on a benchmark model of a commercial wind turbine. 

  16. A ROBUST CLUSTER HEAD SELECTION BASED ON NEIGHBORHOOD CONTRIBUTION AND AVERAGE MINIMUM POWER FOR MANETs

    Directory of Open Access Journals (Sweden)

    S.Balaji

    2015-06-01

    Full Text Available Mobile Adhoc network is an instantaneous wireless network that is dynamic in nature. It supports single hop and multihop communication. In this infrastructure less network, clustering is a significant model to maintain the topology of the network. The clustering process includes different phases like cluster formation, cluster head selection, cluster maintenance. Choosing cluster head is important as the stability of the network depends on well-organized and resourceful cluster head. When the node has increased number of neighbors it can act as a link between the neighbor nodes which in further reduces the number of hops in multihop communication. Promisingly the node with more number of neighbors should also be available with enough energy to provide stability in the network. Hence these aspects demand the focus. In weight based cluster head selection, closeness and average minimum power required is considered for purging the ineligible nodes. The optimal set of nodes selected after purging will compete to become cluster head. The node with maximum weight selected as cluster head. Mathematical formulation is developed to show the proposed method provides optimum result. It is also suggested that weight factor in calculating the node weight should give precise importance to energy and node stability.

  17. The Structural Consequences of Big Data-Driven Education.

    Science.gov (United States)

    Zeide, Elana

    2017-06-01

    Educators and commenters who evaluate big data-driven learning environments focus on specific questions: whether automated education platforms improve learning outcomes, invade student privacy, and promote equality. This article puts aside separate unresolved-and perhaps unresolvable-issues regarding the concrete effects of specific technologies. It instead examines how big data-driven tools alter the structure of schools' pedagogical decision-making, and, in doing so, change fundamental aspects of America's education enterprise. Technological mediation and data-driven decision-making have a particularly significant impact in learning environments because the education process primarily consists of dynamic information exchange. In this overview, I highlight three significant structural shifts that accompany school reliance on data-driven instructional platforms that perform core school functions: teaching, assessment, and credentialing. First, virtual learning environments create information technology infrastructures featuring constant data collection, continuous algorithmic assessment, and possibly infinite record retention. This undermines the traditional intellectual privacy and safety of classrooms. Second, these systems displace pedagogical decision-making from educators serving public interests to private, often for-profit, technology providers. They constrain teachers' academic autonomy, obscure student evaluation, and reduce parents' and students' ability to participate or challenge education decision-making. Third, big data-driven tools define what "counts" as education by mapping the concepts, creating the content, determining the metrics, and setting desired learning outcomes of instruction. These shifts cede important decision-making to private entities without public scrutiny or pedagogical examination. In contrast to the public and heated debates that accompany textbook choices, schools often adopt education technologies ad hoc. Given education

  18. Neighborhood spaces

    OpenAIRE

    D. C. Kent; Won Keun Min

    2002-01-01

    Neighborhood spaces, pretopological spaces, and closure spaces are topological space generalizations which can be characterized by means of their associated interior (or closure) operators. The category NBD of neighborhood spaces and continuous maps contains PRTOP as a bicoreflective subcategory and CLS as a bireflective subcategory, whereas TOP is bireflectively embedded in PRTOP and bicoreflectively embedded in CLS. Initial and final structures are described in these categories, and it is s...

  19. Temporal Data-Driven Sleep Scheduling and Spatial Data-Driven Anomaly Detection for Clustered Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Gang Li

    2016-09-01

    Full Text Available The spatial–temporal correlation is an important feature of sensor data in wireless sensor networks (WSNs. Most of the existing works based on the spatial–temporal correlation can be divided into two parts: redundancy reduction and anomaly detection. These two parts are pursued separately in existing works. In this work, the combination of temporal data-driven sleep scheduling (TDSS and spatial data-driven anomaly detection is proposed, where TDSS can reduce data redundancy. The TDSS model is inspired by transmission control protocol (TCP congestion control. Based on long and linear cluster structure in the tunnel monitoring system, cooperative TDSS and spatial data-driven anomaly detection are then proposed. To realize synchronous acquisition in the same ring for analyzing the situation of every ring, TDSS is implemented in a cooperative way in the cluster. To keep the precision of sensor data, spatial data-driven anomaly detection based on the spatial correlation and Kriging method is realized to generate an anomaly indicator. The experiment results show that cooperative TDSS can realize non-uniform sensing effectively to reduce the energy consumption. In addition, spatial data-driven anomaly detection is quite significant for maintaining and improving the precision of sensor data.

  20. Analysis and modeling of safety parameters in the selection of optimal routes for emergency evacuation after the earthquake (Case study: 13 Aban neighborhood of Tehran

    Directory of Open Access Journals (Sweden)

    Sajad Ganjehi

    2013-08-01

    Full Text Available Introduction : Earthquakes are imminent threats to urban areas of Iran, especially Tehran. They can cause extensive destructions and lead to heavy casualties. One of the most important aspects of disaster management after earthquake is the rapid transfer of casualties to emergency shelters. To expedite emergency evacuation process the optimal safe path method should be considered. To examine the safety of road networks and to determine the most optimal route at pre-earthquake phase, a series of parameters should be taken into account.   Methods : In this study, we employed a multi-criteria decision-making approach to determine and evaluate the effective safety parameters for selection of optimal routes in emergency evacuation after an earthquake.   Results: The relationship between the parameters was analyzed and the effect of each parameter was listed. A process model was described and a case study was implemented in the 13th Aban neighborhood ( Tehran’s 20th municipal district . Then, an optimal path to safe places in an emergency evacuation after an earthquake in the 13th Aban neighborhood was selected.   Conclusion : Analytic hierarchy process (AHP, as the main model, was employed. Each parameter of the model was described. Also, the capabilities of GIS software such as layer coverage were used.     Keywords: Earthquake, emergency evacuation, Analytic Hierarchy Process (AHP, crisis management, optimization, 13th Aban neighborhood of Tehran

  1. Choice Neighborhood Grantees

    Data.gov (United States)

    Department of Housing and Urban Development — Choice Neighborhoods grants transform distressed neighborhoods, public and assisted projects into viable and sustainable mixed-income neighborhoods by linking...

  2. Data-Driven Learning: Reasonable Fears and Rational Reassurance

    Science.gov (United States)

    Boulton, Alex

    2009-01-01

    Computer corpora have many potential applications in teaching and learning languages, the most direct of which--when the learners explore a corpus themselves--has become known as data-driven learning (DDL). Despite considerable enthusiasm in the research community and interest in higher education, the approach has not made major inroads to…

  3. Data-driven Regulation and Governance in Smart Cities

    NARCIS (Netherlands)

    Ranchordás, Sofia; Klop, Abram; Mak, Vanessa; Berlee, Anna; Tjong Tjin Tai, Eric

    2018-01-01

    This chapter discusses the concept of data-driven regulation and governance in the context of smart cities by describing how these urban centres harness these technologies to collect and process information about citizens, traffic, urban planning or waste production. It describes how several smart

  4. Data-Driven Planning: Using Assessment in Strategic Planning

    Science.gov (United States)

    Bresciani, Marilee J.

    2010-01-01

    Data-driven planning or evidence-based decision making represents nothing new in its concept. For years, business leaders have claimed they have implemented planning informed by data that have been strategically and systematically gathered. Within higher education and student affairs, there may be less evidence of the actual practice of…

  5. Data-Driven Model Order Reduction for Bayesian Inverse Problems

    KAUST Repository

    Cui, Tiangang

    2014-01-06

    One of the major challenges in using MCMC for the solution of inverse problems is the repeated evaluation of computationally expensive numerical models. We develop a data-driven projection- based model order reduction technique to reduce the computational cost of numerical PDE evaluations in this context.

  6. Data mining, knowledge discovery and data-driven modelling

    NARCIS (Netherlands)

    Solomatine, D.P.; Velickov, S.; Bhattacharya, B.; Van der Wal, B.

    2003-01-01

    The project was aimed at exploring the possibilities of a new paradigm in modelling - data-driven modelling, often referred as "data mining". Several application areas were considered: sedimentation problems in the Port of Rotterdam, automatic soil classification on the basis of cone penetration

  7. Scalable data-driven short-term traffic prediction

    NARCIS (Netherlands)

    Friso, K.; Wismans, L. J.J.; Tijink, M. B.

    2017-01-01

    Short-term traffic prediction has a lot of potential for traffic management. However, most research has traditionally focused on either traffic models-which do not scale very well to large networks, computationally-or on data-driven methods for freeways, leaving out urban arterials completely. Urban

  8. Data-driven analysis of blood glucose management effectiveness

    NARCIS (Netherlands)

    Nannings, B.; Abu-Hanna, A.; Bosman, R. J.

    2005-01-01

    The blood-glucose-level (BGL) of Intensive Care (IC) patients requires close monitoring and control. In this paper we describe a general data-driven analytical method for studying the effectiveness of BGL management. The method is based on developing and studying a clinical outcome reflecting the

  9. Data-Driven Learning of Q-Matrix

    Science.gov (United States)

    Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2012-01-01

    The recent surge of interests in cognitive assessment has led to developments of novel statistical models for diagnostic classification. Central to many such models is the well-known "Q"-matrix, which specifies the item-attribute relationships. This article proposes a data-driven approach to identification of the "Q"-matrix and estimation of…

  10. Knowledge-Driven Versus Data-Driven Logics

    Czech Academy of Sciences Publication Activity Database

    Dubois, D.; Hájek, Petr; Prade, H.

    2000-01-01

    Roč. 9, č. 1 (2000), s. 65-89 ISSN 0925-8531 R&D Projects: GA AV ČR IAA1030601 Grant - others:CNRS(FR) 4008 Institutional research plan: AV0Z1030915 Keywords : epistemic logic * possibility theory * data-driven reasoning * deontic logic Subject RIV: BA - General Mathematics

  11. Developing Annotation Solutions for Online Data Driven Learning

    Science.gov (United States)

    Perez-Paredes, Pascual; Alcaraz-Calero, Jose M.

    2009-01-01

    Although "annotation" is a widely-researched topic in Corpus Linguistics (CL), its potential role in Data Driven Learning (DDL) has not been addressed in depth by Foreign Language Teaching (FLT) practitioners. Furthermore, most of the research in the use of DDL methods pays little attention to annotation in the design and implementation…

  12. Data-driven modelling of LTI systems using symbolic regression

    NARCIS (Netherlands)

    Khandelwal, D.; Toth, R.; Van den Hof, P.M.J.

    2017-01-01

    The aim of this project is to automate the task of data-driven identification of dynamical systems. The underlying goal is to develop an identification tool that models a physical system without distinguishing between classes of systems such as linear, nonlinear or possibly even hybrid systems. Such

  13. Data-Driven Controller Design The H2 Approach

    CERN Document Server

    Sanfelice Bazanella, Alexandre; Eckhard, Diego

    2012-01-01

    Data-driven methodologies have recently emerged as an important paradigm alternative to model-based controller design and several such methodologies are formulated as an H2 performance optimization. This book presents a comprehensive theoretical treatment of the H2 approach to data-driven control design. The fundamental properties implied by the H2 problem formulation are analyzed in detail, so that common features to all solutions are identified. Direct methods (VRFT) and iterative methods (IFT, DFT, CbT) are put under a common theoretical framework. The choice of the reference model, the experimental conditions, the optimization method to be used, and several other designer’s choices are crucial to the quality of the final outcome, and firm guidelines for all these choices are derived from the theoretical analysis presented. The practical application of the concepts in the book is illustrated with a large number of practical designs performed for different classes of processes: thermal, fluid processing a...

  14. Data-driven importance distributions for articulated tracking

    DEFF Research Database (Denmark)

    Hauberg, Søren; Pedersen, Kim Steenstrup

    2011-01-01

    We present two data-driven importance distributions for particle filterbased articulated tracking; one based on background subtraction, another on depth information. In order to keep the algorithms efficient, we represent human poses in terms of spatial joint positions. To ensure constant bone le...... filter, where they improve both accuracy and efficiency of the tracker. In fact, they triple the effective number of samples compared to the most commonly used importance distribution at little extra computational cost....

  15. A data-driven framework for investigating customer retention

    OpenAIRE

    Mgbemena, Chidozie Simon

    2016-01-01

    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London. This study presents a data-driven simulation framework in order to understand customer behaviour and therefore improve customer retention. The overarching system design methodology used for this study is aligned with the design science paradigm. The Social Media Domain Analysis (SoMeDoA) approach is adopted and evaluated to build a model on the determinants of customer satisfaction ...

  16. Retrospective data-driven respiratory gating for PET/CT

    International Nuclear Information System (INIS)

    Schleyer, Paul J; O'Doherty, Michael J; Barrington, Sally F; Marsden, Paul K

    2009-01-01

    Respiratory motion can adversely affect both PET and CT acquisitions. Respiratory gating allows an acquisition to be divided into a series of motion-reduced bins according to the respiratory signal, which is typically hardware acquired. In order that the effects of motion can potentially be corrected for, we have developed a novel, automatic, data-driven gating method which retrospectively derives the respiratory signal from the acquired PET and CT data. PET data are acquired in listmode and analysed in sinogram space, and CT data are acquired in cine mode and analysed in image space. Spectral analysis is used to identify regions within the CT and PET data which are subject to respiratory motion, and the variation of counts within these regions is used to estimate the respiratory signal. Amplitude binning is then used to create motion-reduced PET and CT frames. The method was demonstrated with four patient datasets acquired on a 4-slice PET/CT system. To assess the accuracy of the data-derived respiratory signal, a hardware-based signal was acquired for comparison. Data-driven gating was successfully performed on PET and CT datasets for all four patients. Gated images demonstrated respiratory motion throughout the bin sequences for all PET and CT series, and image analysis and direct comparison of the traces derived from the data-driven method with the hardware-acquired traces indicated accurate recovery of the respiratory signal.

  17. Authoring Data-Driven Videos with DataClips.

    Science.gov (United States)

    Amini, Fereshteh; Riche, Nathalie Henry; Lee, Bongshin; Monroy-Hernandez, Andres; Irani, Pourang

    2017-01-01

    Data videos, or short data-driven motion graphics, are an increasingly popular medium for storytelling. However, creating data videos is difficult as it involves pulling together a unique combination of skills. We introduce DataClips, an authoring tool aimed at lowering the barriers to crafting data videos. DataClips allows non-experts to assemble data-driven "clips" together to form longer sequences. We constructed the library of data clips by analyzing the composition of over 70 data videos produced by reputable sources such as The New York Times and The Guardian. We demonstrate that DataClips can reproduce over 90% of our data videos corpus. We also report on a qualitative study comparing the authoring process and outcome achieved by (1) non-experts using DataClips, and (2) experts using Adobe Illustrator and After Effects to create data-driven clips. Results indicated that non-experts are able to learn and use DataClips with a short training period. In the span of one hour, they were able to produce more videos than experts using a professional editing tool, and their clips were rated similarly by an independent audience.

  18. Data-Driven H∞ Control for Nonlinear Distributed Parameter Systems.

    Science.gov (United States)

    Luo, Biao; Huang, Tingwen; Wu, Huai-Ning; Yang, Xiong

    2015-11-01

    The data-driven H∞ control problem of nonlinear distributed parameter systems is considered in this paper. An off-policy learning method is developed to learn the H∞ control policy from real system data rather than the mathematical model. First, Karhunen-Loève decomposition is used to compute the empirical eigenfunctions, which are then employed to derive a reduced-order model (ROM) of slow subsystem based on the singular perturbation theory. The H∞ control problem is reformulated based on the ROM, which can be transformed to solve the Hamilton-Jacobi-Isaacs (HJI) equation, theoretically. To learn the solution of the HJI equation from real system data, a data-driven off-policy learning approach is proposed based on the simultaneous policy update algorithm and its convergence is proved. For implementation purpose, a neural network (NN)- based action-critic structure is developed, where a critic NN and two action NNs are employed to approximate the value function, control, and disturbance policies, respectively. Subsequently, a least-square NN weight-tuning rule is derived with the method of weighted residuals. Finally, the developed data-driven off-policy learning approach is applied to a nonlinear diffusion-reaction process, and the obtained results demonstrate its effectiveness.

  19. Neighborhood Quality and Labor Market Outcomes

    DEFF Research Database (Denmark)

    Damm, Anna Piil

    of men living in the neighborhood, but positively affected by the employment rate of non-Western immigrant men and co-national men living in the neighborhood. This is strong evidence that immigrants find jobs in part through their employed immigrant and co-ethnic contacts in the neighborhood of residence...... successfully addresses the methodological problem of endogenous neighborhood selection. Taking account of location sorting, living in a socially deprived neighborhood does not affect labor market outcomes of refugee men. Furthermore, their labor market outcomes are not affected by the overall employment rate...

  20. General Purpose Data-Driven Monitoring for Space Operations

    Science.gov (United States)

    Iverson, David L.; Martin, Rodney A.; Schwabacher, Mark A.; Spirkovska, Liljana; Taylor, William McCaa; Castle, Joseph P.; Mackey, Ryan M.

    2009-01-01

    As modern space propulsion and exploration systems improve in capability and efficiency, their designs are becoming increasingly sophisticated and complex. Determining the health state of these systems, using traditional parameter limit checking, model-based, or rule-based methods, is becoming more difficult as the number of sensors and component interactions grow. Data-driven monitoring techniques have been developed to address these issues by analyzing system operations data to automatically characterize normal system behavior. System health can be monitored by comparing real-time operating data with these nominal characterizations, providing detection of anomalous data signatures indicative of system faults or failures. The Inductive Monitoring System (IMS) is a data-driven system health monitoring software tool that has been successfully applied to several aerospace applications. IMS uses a data mining technique called clustering to analyze archived system data and characterize normal interactions between parameters. The scope of IMS based data-driven monitoring applications continues to expand with current development activities. Successful IMS deployment in the International Space Station (ISS) flight control room to monitor ISS attitude control systems has led to applications in other ISS flight control disciplines, such as thermal control. It has also generated interest in data-driven monitoring capability for Constellation, NASA's program to replace the Space Shuttle with new launch vehicles and spacecraft capable of returning astronauts to the moon, and then on to Mars. Several projects are currently underway to evaluate and mature the IMS technology and complementary tools for use in the Constellation program. These include an experiment on board the Air Force TacSat-3 satellite, and ground systems monitoring for NASA's Ares I-X and Ares I launch vehicles. The TacSat-3 Vehicle System Management (TVSM) project is a software experiment to integrate fault

  1. Data-driven algorithm to estimate friction in automobile engine

    DEFF Research Database (Denmark)

    Stotsky, Alexander A.

    2010-01-01

    Algorithms based on the oscillations of the engine angular rotational speed under fuel cutoff and no-load were proposed for estimation of the engine friction torque. The recursive algorithm to restore the periodic signal is used to calculate the amplitude of the engine speed signal at fuel cutoff....... The values of the friction torque in the corresponding table entries are updated at acquiring new measurements of the friction moment. A new, data-driven algorithm for table adaptation on the basis of stepwise regression was developed and verified using the six-cylinder Volvo engine....

  2. Data driven information system for supervision of judicial open

    Directory of Open Access Journals (Sweden)

    Ming LI

    2016-08-01

    Full Text Available Aiming at the four outstanding problems of informationized supervision for judicial publicity, the judicial public data is classified based on data driven to form the finally valuable data. Then, the functional structure, technical structure and business structure of the data processing system are put forward, including data collection module, data reduction module, data analysis module, data application module and data security module, etc. The development of the data processing system based on these structures can effectively reduce work intensity of judicial open iformation management, summarize the work state, find the problems, and promote the level of judicial publicity.

  3. Product design pattern based on big data-driven scenario

    OpenAIRE

    Conggang Yu; Lusha Zhu

    2016-01-01

    This article discusses about new product design patterns in the big data era, gives designer a new rational thinking way, and is a new way to understand the design of the product. Based on the key criteria of the product design process, category, element, and product are used to input the data, which comprises concrete data and abstract data as an enlargement of the criteria of product design process for the establishment of a big data-driven product design pattern’s model. Moreover, an exper...

  4. Controller synthesis for negative imaginary systems: a data driven approach

    KAUST Repository

    Mabrok, Mohamed

    2016-02-17

    The negative imaginary (NI) property occurs in many important applications. For instance, flexible structure systems with collocated force actuators and position sensors can be modelled as negative imaginary systems. In this study, a data-driven controller synthesis methodology for NI systems is presented. In this approach, measured frequency response data of the plant is used to construct the controller frequency response at every frequency by minimising a cost function. Then, this controller response is used to identify the controller transfer function using system identification methods. © The Institution of Engineering and Technology 2016.

  5. Data-Driven Model Reduction and Transfer Operator Approximation

    Science.gov (United States)

    Klus, Stefan; Nüske, Feliks; Koltai, Péter; Wu, Hao; Kevrekidis, Ioannis; Schütte, Christof; Noé, Frank

    2018-06-01

    In this review paper, we will present different data-driven dimension reduction techniques for dynamical systems that are based on transfer operator theory as well as methods to approximate transfer operators and their eigenvalues, eigenfunctions, and eigenmodes. The goal is to point out similarities and differences between methods developed independently by the dynamical systems, fluid dynamics, and molecular dynamics communities such as time-lagged independent component analysis, dynamic mode decomposition, and their respective generalizations. As a result, extensions and best practices developed for one particular method can be carried over to other related methods.

  6. Data-Driven Healthcare: Challenges and Opportunities for Interactive Visualization.

    Science.gov (United States)

    Gotz, David; Borland, David

    2016-01-01

    The healthcare industry's widespread digitization efforts are reshaping one of the largest sectors of the world's economy. This transformation is enabling systems that promise to use ever-improving data-driven evidence to help doctors make more precise diagnoses, institutions identify at risk patients for intervention, clinicians develop more personalized treatment plans, and researchers better understand medical outcomes within complex patient populations. Given the scale and complexity of the data required to achieve these goals, advanced data visualization tools have the potential to play a critical role. This article reviews a number of visualization challenges unique to the healthcare discipline.

  7. Data-driven execution of fast multipole methods

    KAUST Repository

    Ltaief, Hatem

    2013-09-17

    Fast multipole methods (FMMs) have O (N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation supercomputers. Their most common application is to accelerate N-body problems, but they can also be used to solve boundary integral equations. When the particle distribution is irregular and the tree structure is adaptive, load balancing becomes a non-trivial question. A common strategy for load balancing FMMs is to use the work load from the previous step as weights to statically repartition the next step. The authors discuss in the paper another approach based on data-driven execution to efficiently tackle this challenging load balancing problem. The core idea consists of breaking the most time-consuming stages of the FMMs into smaller tasks. The algorithm can then be represented as a directed acyclic graph where nodes represent tasks and edges represent dependencies among them. The execution of the algorithm is performed by asynchronously scheduling the tasks using the queueing and runtime for kernels runtime environment, in a way such that data dependencies are not violated for numerical correctness purposes. This asynchronous scheduling results in an out-of-order execution. The performance results of the data-driven FMM execution outperform the previous strategy and show linear speedup on a quad-socket quad-core Intel Xeon system.Copyright © 2013 John Wiley & Sons, Ltd. Copyright © 2013 John Wiley & Sons, Ltd.

  8. A data driven nonlinear stochastic model for blood glucose dynamics.

    Science.gov (United States)

    Zhang, Yan; Holt, Tim A; Khovanova, Natalia

    2016-03-01

    The development of adequate mathematical models for blood glucose dynamics may improve early diagnosis and control of diabetes mellitus (DM). We have developed a stochastic nonlinear second order differential equation to describe the response of blood glucose concentration to food intake using continuous glucose monitoring (CGM) data. A variational Bayesian learning scheme was applied to define the number and values of the system's parameters by iterative optimisation of free energy. The model has the minimal order and number of parameters to successfully describe blood glucose dynamics in people with and without DM. The model accounts for the nonlinearity and stochasticity of the underlying glucose-insulin dynamic process. Being data-driven, it takes full advantage of available CGM data and, at the same time, reflects the intrinsic characteristics of the glucose-insulin system without detailed knowledge of the physiological mechanisms. We have shown that the dynamics of some postprandial blood glucose excursions can be described by a reduced (linear) model, previously seen in the literature. A comprehensive analysis demonstrates that deterministic system parameters belong to different ranges for diabetes and controls. Implications for clinical practice are discussed. This is the first study introducing a continuous data-driven nonlinear stochastic model capable of describing both DM and non-DM profiles. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  9. Data driven CAN node reliability assessment for manufacturing system

    Science.gov (United States)

    Zhang, Leiming; Yuan, Yong; Lei, Yong

    2017-01-01

    The reliability of the Controller Area Network(CAN) is critical to the performance and safety of the system. However, direct bus-off time assessment tools are lacking in practice due to inaccessibility of the node information and the complexity of the node interactions upon errors. In order to measure the mean time to bus-off(MTTB) of all the nodes, a novel data driven node bus-off time assessment method for CAN network is proposed by directly using network error information. First, the corresponding network error event sequence for each node is constructed using multiple-layer network error information. Then, the generalized zero inflated Poisson process(GZIP) model is established for each node based on the error event sequence. Finally, the stochastic model is constructed to predict the MTTB of the node. The accelerated case studies with different error injection rates are conducted on a laboratory network to demonstrate the proposed method, where the network errors are generated by a computer controlled error injection system. Experiment results show that the MTTB of nodes predicted by the proposed method agree well with observations in the case studies. The proposed data driven node time to bus-off assessment method for CAN networks can successfully predict the MTTB of nodes by directly using network error event data.

  10. Locative media and data-driven computing experiments

    Directory of Open Access Journals (Sweden)

    Sung-Yueh Perng

    2016-06-01

    Full Text Available Over the past two decades urban social life has undergone a rapid and pervasive geocoding, becoming mediated, augmented and anticipated by location-sensitive technologies and services that generate and utilise big, personal, locative data. The production of these data has prompted the development of exploratory data-driven computing experiments that seek to find ways to extract value and insight from them. These projects often start from the data, rather than from a question or theory, and try to imagine and identify their potential utility. In this paper, we explore the desires and mechanics of data-driven computing experiments. We demonstrate how both locative media data and computing experiments are ‘staged’ to create new values and computing techniques, which in turn are used to try and derive possible futures that are ridden with unintended consequences. We argue that using computing experiments to imagine potential urban futures produces effects that often have little to do with creating new urban practices. Instead, these experiments promote Big Data science and the prospect that data produced for one purpose can be recast for another and act as alternative mechanisms of envisioning urban futures.

  11. Dynamically adaptive data-driven simulation of extreme hydrological flows

    Science.gov (United States)

    Kumar Jain, Pushkar; Mandli, Kyle; Hoteit, Ibrahim; Knio, Omar; Dawson, Clint

    2018-02-01

    Hydrological hazards such as storm surges, tsunamis, and rainfall-induced flooding are physically complex events that are costly in loss of human life and economic productivity. Many such disasters could be mitigated through improved emergency evacuation in real-time and through the development of resilient infrastructure based on knowledge of how systems respond to extreme events. Data-driven computational modeling is a critical technology underpinning these efforts. This investigation focuses on the novel combination of methodologies in forward simulation and data assimilation. The forward geophysical model utilizes adaptive mesh refinement (AMR), a process by which a computational mesh can adapt in time and space based on the current state of a simulation. The forward solution is combined with ensemble based data assimilation methods, whereby observations from an event are assimilated into the forward simulation to improve the veracity of the solution, or used to invert for uncertain physical parameters. The novelty in our approach is the tight two-way coupling of AMR and ensemble filtering techniques. The technology is tested using actual data from the Chile tsunami event of February 27, 2010. These advances offer the promise of significantly transforming data-driven, real-time modeling of hydrological hazards, with potentially broader applications in other science domains.

  12. Product design pattern based on big data-driven scenario

    Directory of Open Access Journals (Sweden)

    Conggang Yu

    2016-07-01

    Full Text Available This article discusses about new product design patterns in the big data era, gives designer a new rational thinking way, and is a new way to understand the design of the product. Based on the key criteria of the product design process, category, element, and product are used to input the data, which comprises concrete data and abstract data as an enlargement of the criteria of product design process for the establishment of a big data-driven product design pattern’s model. Moreover, an experiment and a product design case are conducted to verify the feasibility of the new pattern. Ultimately, we will conclude that the data-driven product design has two patterns: one is the concrete data supporting the product design, namely “product–data–product” pattern, and the second is based on the value of the abstract data for product design, namely “data–product–data” pattern. Through the data, users are involving themselves in the design development process. Data and product form a huge network, and data plays a role of connection or node. So the essence of the design is to find a new connection based on element, and to find a new node based on category.

  13. Dynamically adaptive data-driven simulation of extreme hydrological flows

    KAUST Repository

    Kumar Jain, Pushkar

    2017-12-27

    Hydrological hazards such as storm surges, tsunamis, and rainfall-induced flooding are physically complex events that are costly in loss of human life and economic productivity. Many such disasters could be mitigated through improved emergency evacuation in real-time and through the development of resilient infrastructure based on knowledge of how systems respond to extreme events. Data-driven computational modeling is a critical technology underpinning these efforts. This investigation focuses on the novel combination of methodologies in forward simulation and data assimilation. The forward geophysical model utilizes adaptive mesh refinement (AMR), a process by which a computational mesh can adapt in time and space based on the current state of a simulation. The forward solution is combined with ensemble based data assimilation methods, whereby observations from an event are assimilated into the forward simulation to improve the veracity of the solution, or used to invert for uncertain physical parameters. The novelty in our approach is the tight two-way coupling of AMR and ensemble filtering techniques. The technology is tested using actual data from the Chile tsunami event of February 27, 2010. These advances offer the promise of significantly transforming data-driven, real-time modeling of hydrological hazards, with potentially broader applications in other science domains.

  14. Data driven model generation based on computational intelligence

    Science.gov (United States)

    Gemmar, Peter; Gronz, Oliver; Faust, Christophe; Casper, Markus

    2010-05-01

    The simulation of discharges at a local gauge or the modeling of large scale river catchments are effectively involved in estimation and decision tasks of hydrological research and practical applications like flood prediction or water resource management. However, modeling such processes using analytical or conceptual approaches is made difficult by both complexity of process relations and heterogeneity of processes. It was shown manifold that unknown or assumed process relations can principally be described by computational methods, and that system models can automatically be derived from observed behavior or measured process data. This study describes the development of hydrological process models using computational methods including Fuzzy logic and artificial neural networks (ANN) in a comprehensive and automated manner. Methods We consider a closed concept for data driven development of hydrological models based on measured (experimental) data. The concept is centered on a Fuzzy system using rules of Takagi-Sugeno-Kang type which formulate the input-output relation in a generic structure like Ri : IFq(t) = lowAND...THENq(t+Δt) = ai0 +ai1q(t)+ai2p(t-Δti1)+ai3p(t+Δti2)+.... The rule's premise part (IF) describes process states involving available process information, e.g. actual outlet q(t) is low where low is one of several Fuzzy sets defined over variable q(t). The rule's conclusion (THEN) estimates expected outlet q(t + Δt) by a linear function over selected system variables, e.g. actual outlet q(t), previous and/or forecasted precipitation p(t ?Δtik). In case of river catchment modeling we use head gauges, tributary and upriver gauges in the conclusion part as well. In addition, we consider temperature and temporal (season) information in the premise part. By creating a set of rules R = {Ri|(i = 1,...,N)} the space of process states can be covered as concise as necessary. Model adaptation is achieved by finding on optimal set A = (aij) of conclusion

  15. Longitudinal association of neighborhood variables with body mass index in dutch school-age children: The KOALA birth cohort study

    NARCIS (Netherlands)

    Schmidt, S.C.; Sleddens, E.F.C.; Vries, S.I. de; Gubbels, J.; Thijs, C.

    2015-01-01

    Changes in the neighborhood environment may explain part of the rapid increase in childhood overweight and obesity during the last decades. To date few theory-driven rather than data-driven studies have explored longitudinal associations between multiple neighborhood characteristics and child body

  16. Neighborhood Quality and Labor Market Outcomes: Evidence from Quasi-Random Neighborhood Assignment of Immigrants

    DEFF Research Database (Denmark)

    Damm, Anna Piil

    2012-01-01

    of men living in the neighborhood, but positively affected by the employment rate of non-Western immigrant men and co-national men living in the neighborhood. This is strong evidence that immigrants find jobs in part through their employed immigrant and co-ethnic contacts in the neighborhood of residence...... successfully addresses the methodological problem of endogenous neighborhood selection. Taking account of location sorting, living in a socially deprived neighborhood does not affect labor market outcomes of refugee men. Furthermore, their labor market outcomes are not affected by the overall employment rate...

  17. Data-driven simulation methodology using DES 4-layer architecture

    Directory of Open Access Journals (Sweden)

    Aida Saez

    2016-05-01

    Full Text Available In this study, we present a methodology to build data-driven simulation models of manufacturing plants. We go further than other research proposals and we suggest focusing simulation model development under a 4-layer architecture (network, logic, database and visual reality. The Network layer includes system infrastructure. The Logic layer covers operations planning and control system, and material handling equipment system. The Database holds all the information needed to perform the simulation, the results used to analyze and the values that the Logic layer is using to manage the Plant. Finally, the Visual Reality displays an augmented reality system including not only the machinery and the movement but also blackboards and other Andon elements. This architecture provides numerous advantages as helps to build a simulation model that consistently considers the internal logistics, in a very flexible way.

  18. Data driven approaches for diagnostics and optimization of NPP operation

    International Nuclear Information System (INIS)

    Pliska, J.; Machat, Z.

    2014-01-01

    The efficiency and heat rate is an important indicator of both the health of the power plant equipment and the quality of power plant operation. To achieve this challenges powerful tool is a statistical data processing of large data sets which are stored in data historians. These large data sets contain useful information about process quality and equipment and sensor health. The paper discusses data-driven approaches for model building of main power plant equipment such as condenser, cooling tower and the overall thermal cycle as well using multivariate regression techniques based on so called a regression triplet - data, model and method. Regression models comprise a base for diagnostics and optimization tasks. Diagnostics and optimization tasks are demonstrated on practical cases - diagnostics of main power plant equipment to early identify equipment fault, and optimization task of cooling circuit by cooling water flow control to achieve for a given boundary conditions the highest power output. (authors)

  19. submitter Data-driven RBE parameterization for helium ion beams

    CERN Document Server

    Mairani, A; Dokic, I; Valle, S M; Tessonnier, T; Galm, R; Ciocca, M; Parodi, K; Ferrari, A; Jäkel, O; Haberer, T; Pedroni, P; Böhlen, T T

    2016-01-01

    Helium ion beams are expected to be available again in the near future for clinical use. A suitable formalism to obtain relative biological effectiveness (RBE) values for treatment planning (TP) studies is needed. In this work we developed a data-driven RBE parameterization based on published in vitro experimental values. The RBE parameterization has been developed within the framework of the linear-quadratic (LQ) model as a function of the helium linear energy transfer (LET), dose and the tissue specific parameter ${{(\\alpha /\\beta )}_{\\text{ph}}}$ of the LQ model for the reference radiation. Analytic expressions are provided, derived from the collected database, describing the $\\text{RB}{{\\text{E}}_{\\alpha}}={{\\alpha}_{\\text{He}}}/{{\\alpha}_{\\text{ph}}}$ and ${{\\text{R}}_{\\beta}}={{\\beta}_{\\text{He}}}/{{\\beta}_{\\text{ph}}}$ ratios as a function of LET. Calculated RBE values at 2 Gy photon dose and at 10% survival ($\\text{RB}{{\\text{E}}_{10}}$ ) are compared with the experimental ones. Pearson's correlati...

  20. Data-driven forward model inference for EEG brain imaging

    DEFF Research Database (Denmark)

    Hansen, Sofie Therese; Hauberg, Søren; Hansen, Lars Kai

    2016-01-01

    Electroencephalography (EEG) is a flexible and accessible tool with excellent temporal resolution but with a spatial resolution hampered by volume conduction. Reconstruction of the cortical sources of measured EEG activity partly alleviates this problem and effectively turns EEG into a brain......-of-concept study, we show that, even when anatomical knowledge is unavailable, a suitable forward model can be estimated directly from the EEG. We propose a data-driven approach that provides a low-dimensional parametrization of head geometry and compartment conductivities, built using a corpus of forward models....... Combined with only a recorded EEG signal, we are able to estimate both the brain sources and a person-specific forward model by optimizing this parametrization. We thus not only solve an inverse problem, but also optimize over its specification. Our work demonstrates that personalized EEG brain imaging...

  1. Data-Driven Predictive Direct Load Control of Refrigeration Systems

    DEFF Research Database (Denmark)

    Shafiei, Seyed Ehsan; Knudsen, Torben; Wisniewski, Rafal

    2015-01-01

    A predictive control using subspace identification is applied for the smart grid integration of refrigeration systems under a direct load control scheme. A realistic demand response scenario based on regulation of the electrical power consumption is considered. A receding horizon optimal control...... is proposed to fulfil two important objectives: to secure high coefficient of performance and to participate in power consumption management. Moreover, a new method for design of input signals for system identification is put forward. The control method is fully data driven without an explicit use of model...... against real data. The performance improvement results in a 22% reduction in the energy consumption. A comparative simulation is accomplished showing the superiority of the method over the existing approaches in terms of the load following performance....

  2. Data-Driven Assistance Functions for Industrial Automation Systems

    International Nuclear Information System (INIS)

    Windmann, Stefan; Niggemann, Oliver

    2015-01-01

    The increasing amount of data in industrial automation systems overburdens the user in process control and diagnosis tasks. One possibility to cope with these challenges consists of using smart assistance systems that automatically monitor and optimize processes. This article deals with aspects of data-driven assistance systems such as assistance functions, process models and data acquisition. The paper describes novel approaches for self-diagnosis and self-optimization, and shows how these assistance functions can be integrated in different industrial environments. The considered assistance functions are based on process models that are automatically learned from process data. Fault detection and isolation is based on the comparison of observations of the real system with predictions obtained by application of the process models. The process models are further employed for energy efficiency optimization of industrial processes. Experimental results are presented for fault detection and energy efficiency optimization of a drive system. (paper)

  3. Data-driven discovery of Koopman eigenfunctions using deep learning

    Science.gov (United States)

    Lusch, Bethany; Brunton, Steven L.; Kutz, J. Nathan

    2017-11-01

    Koopman operator theory transforms any autonomous non-linear dynamical system into an infinite-dimensional linear system. Since linear systems are well-understood, a mapping of non-linear dynamics to linear dynamics provides a powerful approach to understanding and controlling fluid flows. However, finding the correct change of variables remains an open challenge. We present a strategy to discover an approximate mapping using deep learning. Our neural networks find this change of variables, its inverse, and a finite-dimensional linear dynamical system defined on the new variables. Our method is completely data-driven and only requires measurements of the system, i.e. it does not require derivatives or knowledge of the governing equations. We find a minimal set of approximate Koopman eigenfunctions that are sufficient to reconstruct and advance the system to future states. We demonstrate the method on several dynamical systems.

  4. Data-driven identification of potential Zika virus vectors

    Science.gov (United States)

    Evans, Michelle V; Dallas, Tad A; Han, Barbara A; Murdock, Courtney C; Drake, John M

    2017-01-01

    Zika is an emerging virus whose rapid spread is of great public health concern. Knowledge about transmission remains incomplete, especially concerning potential transmission in geographic areas in which it has not yet been introduced. To identify unknown vectors of Zika, we developed a data-driven model linking vector species and the Zika virus via vector-virus trait combinations that confer a propensity toward associations in an ecological network connecting flaviviruses and their mosquito vectors. Our model predicts that thirty-five species may be able to transmit the virus, seven of which are found in the continental United States, including Culex quinquefasciatus and Cx. pipiens. We suggest that empirical studies prioritize these species to confirm predictions of vector competence, enabling the correct identification of populations at risk for transmission within the United States. DOI: http://dx.doi.org/10.7554/eLife.22053.001 PMID:28244371

  5. Data-driven sensor placement from coherent fluid structures

    Science.gov (United States)

    Manohar, Krithika; Kaiser, Eurika; Brunton, Bingni W.; Kutz, J. Nathan; Brunton, Steven L.

    2017-11-01

    Optimal sensor placement is a central challenge in the prediction, estimation and control of fluid flows. We reinterpret sensor placement as optimizing discrete samples of coherent fluid structures for full state reconstruction. This permits a drastic reduction in the number of sensors required for faithful reconstruction, since complex fluid interactions can often be described by a small number of coherent structures. Our work optimizes point sensors using the pivoted matrix QR factorization to sample coherent structures directly computed from flow data. We apply this sampling technique in conjunction with various data-driven modal identification methods, including the proper orthogonal decomposition (POD) and dynamic mode decomposition (DMD). In contrast to POD-based sensors, DMD demonstrably enables the optimization of sensors for prediction in systems exhibiting multiple scales of dynamics. Finally, reconstruction accuracy from pivot sensors is shown to be competitive with sensors obtained using traditional computationally prohibitive optimization methods.

  6. Data-driven system to predict academic grades and dropout

    Science.gov (United States)

    Rovira, Sergi; Puertas, Eloi

    2017-01-01

    Nowadays, the role of a tutor is more important than ever to prevent students dropout and improve their academic performance. This work proposes a data-driven system to extract relevant information hidden in the student academic data and, thus, help tutors to offer their pupils a more proactive personal guidance. In particular, our system, based on machine learning techniques, makes predictions of dropout intention and courses grades of students, as well as personalized course recommendations. Moreover, we present different visualizations which help in the interpretation of the results. In the experimental validation, we show that the system obtains promising results with data from the degree studies in Law, Computer Science and Mathematics of the Universitat de Barcelona. PMID:28196078

  7. Using Shape Memory Alloys: A Dynamic Data Driven Approach

    KAUST Repository

    Douglas, Craig C.

    2013-06-01

    Shape Memory Alloys (SMAs) are capable of changing their crystallographic structure due to changes of either stress or temperature. SMAs are used in a number of aerospace devices and are required in some devices in exotic environments. We are developing dynamic data driven application system (DDDAS) tools to monitor and change SMAs in real time for delivering payloads by aerospace vehicles. We must be able to turn on and off the sensors and heating units, change the stress on the SMA, monitor on-line data streams, change scales based on incoming data, and control what type of data is generated. The application must have the capability to be run and steered remotely as an unmanned feedback control loop.

  8. Facilitating Data Driven Business Model Innovation - A Case study

    DEFF Research Database (Denmark)

    Bjerrum, Torben Cæsar Bisgaard; Andersen, Troels Christian; Aagaard, Annabeth

    2016-01-01

    . The businesses interdisciplinary capabilities come into play in the BMI process, where knowledge from the facilitation strategy and knowledge from phases of the BMI process needs to be present to create new knowledge, hence new BMs and innovations. Depending on the environment and shareholders, this also exposes......This paper aims to understand the barriers that businesses meet in understanding their current business models (BM) and in their attempt at innovating new data driven business models (DDBM) using data. The interdisciplinary challenge of knowledge exchange occurring outside and/or inside businesses......, that gathers knowledge is of great importance. The SMEs have little, if no experience, within data handling, data analytics, and working with structured Business Model Innovation (BMI), that relates to both new and conventional products, processes and services. This new frontier of data and BMI will have...

  9. Econophysics and Data Driven Modelling of Market Dynamics

    CERN Document Server

    Aoyama, Hideaki; Chakrabarti, Bikas; Chakraborti, Anirban; Ghosh, Asim; Econophysics and Data Driven Modelling of Market Dynamics

    2015-01-01

    This book presents the works and research findings of physicists, economists, mathematicians, statisticians, and financial engineers who have undertaken data-driven modelling of market dynamics and other empirical studies in the field of Econophysics. During recent decades, the financial market landscape has changed dramatically with the deregulation of markets and the growing complexity of products. The ever-increasing speed and decreasing costs of computational power and networks have led to the emergence of huge databases. The availability of these data should permit the development of models that are better founded empirically, and econophysicists have accordingly been advocating that one should rely primarily on the empirical observations in order to construct models and validate them. The recent turmoil in financial markets and the 2008 crash appear to offer a strong rationale for new models and approaches. The Econophysics community accordingly has an important future role to play in market modelling....

  10. A Transition Towards a Data-Driven Business Model (DDBM)

    DEFF Research Database (Denmark)

    Zaki, Mohamed; Bøe-Lillegraven, Tor; Neely, Andy

    2016-01-01

    Nettavisen is a Norwegian online start-up that experienced a boost after the financial crisis of 2009. Since then, the firm has been able to increase its market share and profitability through the use of highly disruptive business models, allowing the relatively small staff to outcompete powerhouse...... legacy-publishing companies and new media players such as Facebook and Google. These disruptive business models have been successful, as Nettavisen captured a large market share in Norway early on, and was consistently one of the top-three online news sites in Norway. Capitalising on media data explosion...... and the recent acquisition of blogger network ‘Blog.no’, Nettavisen is moving towards a data-driven business model (DDBM). In particular, the firm aims to analyse huge volumes of user Web browsing and purchasing habits....

  11. Active living neighborhoods: is neighborhood walkability a key element for Belgian adolescents?

    Science.gov (United States)

    De Meester, Femke; Van Dyck, Delfien; De Bourdeaudhuij, Ilse; Deforche, Benedicte; Sallis, James F; Cardon, Greet

    2012-01-04

    In adult research, neighborhood walkability has been acknowledged as an important construct among the built environmental correlates of physical activity. Research into this association has only recently been extended to adolescents and the current empirical evidence is not consistent. This study investigated whether neighborhood walkability and neighborhood socioeconomic status (SES) are associated with physical activity among Belgian adolescents and whether the association between neighborhood walkability and physical activity is moderated by neighborhood SES and gender. In Ghent (Belgium), 32 neighborhoods were selected based on GIS-based walkability and SES derived from census data. In total, 637 adolescents (aged 13-15 year, 49.6% male) participated in the study. Physical activity was assessed using accelerometers and the Flemish Physical Activity Questionnaire. To analyze the associations between neighborhood walkability, neighborhood SES and individual physical activity, multivariate multi-level regression analyses were conducted. Only in low-SES neighborhoods, neighborhood walkability was positively associated with accelerometer-based moderate to vigorous physical activity and the average activity level expressed in counts/minute. For active transport to and from school, cycling for transport during leisure time and sport during leisure time no association with neighborhood walkability nor, with neighborhood SES was found. For walking for transport during leisure time a negative association with neighborhood SES was found. Gender did not moderate the associations of neighborhood walkability and SES with adolescent physical activity. Neighborhood walkability was related to accelerometer-based physical activity only among adolescent boys and girls living in low-SES neighborhoods. The relation of built environment to adolescent physical activity may depend on the context.

  12. Active living neighborhoods: is neighborhood walkability a key element for Belgian adolescents?

    Directory of Open Access Journals (Sweden)

    De Meester Femke

    2012-01-01

    Full Text Available Abstract Background In adult research, neighborhood walkability has been acknowledged as an important construct among the built environmental correlates of physical activity. Research into this association has only recently been extended to adolescents and the current empirical evidence is not consistent. This study investigated whether neighborhood walkability and neighborhood socioeconomic status (SES are associated with physical activity among Belgian adolescents and whether the association between neighborhood walkability and physical activity is moderated by neighborhood SES and gender. Methods In Ghent (Belgium, 32 neighborhoods were selected based on GIS-based walkability and SES derived from census data. In total, 637 adolescents (aged 13-15 year, 49.6% male participated in the study. Physical activity was assessed using accelerometers and the Flemish Physical Activity Questionnaire. To analyze the associations between neighborhood walkability, neighborhood SES and individual physical activity, multivariate multi-level regression analyses were conducted. Results Only in low-SES neighborhoods, neighborhood walkability was positively associated with accelerometer-based moderate to vigorous physical activity and the average activity level expressed in counts/minute. For active transport to and from school, cycling for transport during leisure time and sport during leisure time no association with neighborhood walkability nor, with neighborhood SES was found. For walking for transport during leisure time a negative association with neighborhood SES was found. Gender did not moderate the associations of neighborhood walkability and SES with adolescent physical activity. Conclusions Neighborhood walkability was related to accelerometer-based physical activity only among adolescent boys and girls living in low-SES neighborhoods. The relation of built environment to adolescent physical activity may depend on the context.

  13. Ensemble of data-driven prognostic algorithms for robust prediction of remaining useful life

    International Nuclear Information System (INIS)

    Hu Chao; Youn, Byeng D.; Wang Pingfeng; Taek Yoon, Joung

    2012-01-01

    Prognostics aims at determining whether a failure of an engineered system (e.g., a nuclear power plant) is impending and estimating the remaining useful life (RUL) before the failure occurs. The traditional data-driven prognostic approach is to construct multiple candidate algorithms using a training data set, evaluate their respective performance using a testing data set, and select the one with the best performance while discarding all the others. This approach has three shortcomings: (i) the selected standalone algorithm may not be robust; (ii) it wastes the resources for constructing the algorithms that are discarded; (iii) it requires the testing data in addition to the training data. To overcome these drawbacks, this paper proposes an ensemble data-driven prognostic approach which combines multiple member algorithms with a weighted-sum formulation. Three weighting schemes, namely the accuracy-based weighting, diversity-based weighting and optimization-based weighting, are proposed to determine the weights of member algorithms. The k-fold cross validation (CV) is employed to estimate the prediction error required by the weighting schemes. The results obtained from three case studies suggest that the ensemble approach with any weighting scheme gives more accurate RUL predictions compared to any sole algorithm when member algorithms producing diverse RUL predictions have comparable prediction accuracy and that the optimization-based weighting scheme gives the best overall performance among the three weighting schemes.

  14. Large Neighborhood Search

    DEFF Research Database (Denmark)

    Pisinger, David; Røpke, Stefan

    2010-01-01

    Heuristics based on large neighborhood search have recently shown outstanding results in solving various transportation and scheduling problems. Large neighborhood search methods explore a complex neighborhood by use of heuristics. Using large neighborhoods makes it possible to find better...... candidate solutions in each iteration and hence traverse a more promising search path. Starting from the large neighborhood search method,we give an overview of very large scale neighborhood search methods and discuss recent variants and extensions like variable depth search and adaptive large neighborhood...

  15. Data-driven non-Markovian closure models

    Science.gov (United States)

    Kondrashov, Dmitri; Chekroun, Mickaël D.; Ghil, Michael

    2015-03-01

    This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori-Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the MZ formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of MSM applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lotka-Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model's parameter

  16. Perceived neighborhood partner availability, partner selection, and risk for sexually transmitted infections within a cohort of adolescent females.

    Science.gov (United States)

    Matson, Pamela A; Chung, Shang-En; Ellen, Jonathan M

    2014-07-01

    This research examined the association between a novel measure of perceived partner availability and discordance between ideal and actual partner characteristics as well as trajectories of ideal partner preferences and perceptions of partner availability over time. A clinic-recruited cohort of adolescent females (N = 92), aged 1619 years, were interviewed quarterly for 12 months using audio computer-assisted self-interview. Participants ranked the importance of characteristics for their ideal main sex partner and then reported on these characteristics for their current main partner. Participants reported on perceptions of availability of ideal sex partners in their neighborhood. Paired t-tests examined discordance between ideal and actual partner characteristics. Random-intercept regression models examined repeated measures. Actual partner ratings were lower than ideal partner preferences for fidelity, equaled ideal preferences for emotional support and exceeded ideal preferences for social/economic status and physical attractiveness. Discordance on emotional support and social/economic status was associated with sex partner concurrency. Participants perceived low availability of ideal sex partners. Those who perceived more availability were less likely to be ideal/actual discordant on fidelity [OR = .88, 95% CI: .78, 1.0]. Neither ideal partner preferences nor perceptions of partner availability changed over 12 months. Current main sex partners met or exceeded ideal partner preferences in all domains except fidelity. If emotional needs are met, adolescents may tolerate partner concurrency in areas of limited partner pools. Urban adolescent females who perceive low availability may be at increased risk for sexually transmitted infection (STI) because they may be more likely to have nonmonogamous partners. Copyright © 2014 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  17. A Data-Driven Approach to Realistic Shape Morphing

    KAUST Repository

    Gao, Lin; Lai, Yu-Kun; Huang, Qi-Xing; Hu, Shi-Min

    2013-01-01

    Morphing between 3D objects is a fundamental technique in computer graphics. Traditional methods of shape morphing focus on establishing meaningful correspondences and finding smooth interpolation between shapes. Such methods however only take geometric information as input and thus cannot in general avoid producing unnatural interpolation, in particular for large-scale deformations. This paper proposes a novel data-driven approach for shape morphing. Given a database with various models belonging to the same category, we treat them as data samples in the plausible deformation space. These models are then clustered to form local shape spaces of plausible deformations. We use a simple metric to reasonably represent the closeness between pairs of models. Given source and target models, the morphing problem is casted as a global optimization problem of finding a minimal distance path within the local shape spaces connecting these models. Under the guidance of intermediate models in the path, an extended as-rigid-as-possible interpolation is used to produce the final morphing. By exploiting the knowledge of plausible models, our approach produces realistic morphing for challenging cases as demonstrated by various examples in the paper. © 2013 The Eurographics Association and Blackwell Publishing Ltd.

  18. Data driven parallelism in experimental high energy physics applications

    International Nuclear Information System (INIS)

    Pohl, M.

    1987-01-01

    I present global design principles for the implementation of high energy physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of high energy physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordiate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms). (orig.)

  19. A data-driven approach to quality risk management.

    Science.gov (United States)

    Alemayehu, Demissie; Alvir, Jose; Levenstein, Marcia; Nickerson, David

    2013-10-01

    An effective clinical trial strategy to ensure patient safety as well as trial quality and efficiency involves an integrated approach, including prospective identification of risk factors, mitigation of the risks through proper study design and execution, and assessment of quality metrics in real-time. Such an integrated quality management plan may also be enhanced by using data-driven techniques to identify risk factors that are most relevant in predicting quality issues associated with a trial. In this paper, we illustrate such an approach using data collected from actual clinical trials. Several statistical methods were employed, including the Wilcoxon rank-sum test and logistic regression, to identify the presence of association between risk factors and the occurrence of quality issues, applied to data on quality of clinical trials sponsored by Pfizer. ONLY A SUBSET OF THE RISK FACTORS HAD A SIGNIFICANT ASSOCIATION WITH QUALITY ISSUES, AND INCLUDED: Whether study used Placebo, whether an agent was a biologic, unusual packaging label, complex dosing, and over 25 planned procedures. Proper implementation of the strategy can help to optimize resource utilization without compromising trial integrity and patient safety.

  20. A data-driven approach to quality risk management

    Directory of Open Access Journals (Sweden)

    Demissie Alemayehu

    2013-01-01

    Full Text Available Aim: An effective clinical trial strategy to ensure patient safety as well as trial quality and efficiency involves an integrated approach, including prospective identification of risk factors, mitigation of the risks through proper study design and execution, and assessment of quality metrics in real-time. Such an integrated quality management plan may also be enhanced by using data-driven techniques to identify risk factors that are most relevant in predicting quality issues associated with a trial. In this paper, we illustrate such an approach using data collected from actual clinical trials. Materials and Methods: Several statistical methods were employed, including the Wilcoxon rank-sum test and logistic regression, to identify the presence of association between risk factors and the occurrence of quality issues, applied to data on quality of clinical trials sponsored by Pfizer. Results: Only a subset of the risk factors had a significant association with quality issues, and included: Whether study used Placebo, whether an agent was a biologic, unusual packaging label, complex dosing, and over 25 planned procedures. Conclusion: Proper implementation of the strategy can help to optimize resource utilization without compromising trial integrity and patient safety.

  1. ATLAS job transforms: a data driven workflow engine

    International Nuclear Information System (INIS)

    Stewart, G A; Breaden-Madden, W B; Maddocks, H J; Harenberg, T; Sandhoff, M; Sarrazin, B

    2014-01-01

    The need to run complex workflows for a high energy physics experiment such as ATLAS has always been present. However, as computing resources have become even more constrained, compared to the wealth of data generated by the LHC, the need to use resources efficiently and manage complex workflows within a single grid job have increased. In ATLAS, a new Job Transform framework has been developed that we describe in this paper. This framework manages the multiple execution steps needed to 'transform' one data type into another (e.g., RAW data to ESD to AOD to final ntuple) and also provides a consistent interface for the ATLAS production system. The new framework uses a data driven workflow definition which is both easy to manage and powerful. After a transform is defined, jobs are expressed simply by specifying the input data and the desired output data. The transform infrastructure then executes only the necessary substeps to produce the final data products. The global execution cost of running the job is minimised and the transform can adapt to scenarios where data can be produced along different execution paths. Transforms for specific physics tasks which support up to 60 individual substeps have been successfully run. As the new transforms infrastructure has been deployed in production many features have been added to the framework which improve reliability, quality of error reporting and also provide support for multi-process jobs.

  2. Human body segmentation via data-driven graph cut.

    Science.gov (United States)

    Li, Shifeng; Lu, Huchuan; Shao, Xingqing

    2014-11-01

    Human body segmentation is a challenging and important problem in computer vision. Existing methods usually entail a time-consuming training phase for prior knowledge learning with complex shape matching for body segmentation. In this paper, we propose a data-driven method that integrates top-down body pose information and bottom-up low-level visual cues for segmenting humans in static images within the graph cut framework. The key idea of our approach is first to exploit human kinematics to search for body part candidates via dynamic programming for high-level evidence. Then, by using the body parts classifiers, obtaining bottom-up cues of human body distribution for low-level evidence. All the evidence collected from top-down and bottom-up procedures are integrated in a graph cut framework for human body segmentation. Qualitative and quantitative experiment results demonstrate the merits of the proposed method in segmenting human bodies with arbitrary poses from cluttered backgrounds.

  3. A Data-Driven Approach to Realistic Shape Morphing

    KAUST Repository

    Gao, Lin

    2013-05-01

    Morphing between 3D objects is a fundamental technique in computer graphics. Traditional methods of shape morphing focus on establishing meaningful correspondences and finding smooth interpolation between shapes. Such methods however only take geometric information as input and thus cannot in general avoid producing unnatural interpolation, in particular for large-scale deformations. This paper proposes a novel data-driven approach for shape morphing. Given a database with various models belonging to the same category, we treat them as data samples in the plausible deformation space. These models are then clustered to form local shape spaces of plausible deformations. We use a simple metric to reasonably represent the closeness between pairs of models. Given source and target models, the morphing problem is casted as a global optimization problem of finding a minimal distance path within the local shape spaces connecting these models. Under the guidance of intermediate models in the path, an extended as-rigid-as-possible interpolation is used to produce the final morphing. By exploiting the knowledge of plausible models, our approach produces realistic morphing for challenging cases as demonstrated by various examples in the paper. © 2013 The Eurographics Association and Blackwell Publishing Ltd.

  4. Data driven parallelism in experimental high energy physics applications

    Science.gov (United States)

    Pohl, Martin

    1987-08-01

    I present global design principles for the implementation of High Energy Physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of High Energy Physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The Task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordinate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms).

  5. Data driven profiting from your most important business asset

    CERN Document Server

    Redman, Thomas C

    2008-01-01

    Your company's data has the potential to add enormous value to every facet of the organization -- from marketing and new product development to strategy to financial management. Yet if your company is like most, it's not using its data to create strategic advantage. Data sits around unused -- or incorrect data fouls up operations and decision making. In Data Driven, Thomas Redman, the "Data Doc," shows how to leverage and deploy data to sharpen your company's competitive edge and enhance its profitability. The author reveals: · The special properties that make data such a powerful asset · The hidden costs of flawed, outdated, or otherwise poor-quality data · How to improve data quality for competitive advantage · Strategies for exploiting your data to make better business decisions · The many ways to bring data to market · Ideas for dealing with political struggles over data and concerns about privacy rights Your company's data is a key business asset, and you need to manage it aggressively and professi...

  6. Data driven processor 'Vertex Trigger' for B experiments

    International Nuclear Information System (INIS)

    Hartouni, E.P.

    1993-01-01

    Data Driven Processors (DDP's) are specialized computation engines configured to solve specific numerical problems, such as vertex reconstruction. The architecture of the DDP which is the subject of this talk was designed and implemented by W. Sippach and B.C. Knapp at Nevis Lab. in the early 1980's. This particular implementation allows multiple parallel streams of data to provide input to a heterogenous collection of simple operators whose interconnection form an algorithm. The local data flow control allows this device to execute algorithms extremely quickly provided that care is taken in the layout of the algorithm. I/O rates of several hundred megabytes/second are routinely achieved thus making DDP's attractive candidates for complex online calculations. The original question was open-quote can a DDP reconstruct tracks in a Silicon Vertex Detector, find events with a separated vertex and do it fast enough to be used as an online trigger?close-quote Restating this inquiry as three questions and describing the answers to the questions will be the subject of this talk. The three specific questions are: (1) Can an algorithm be found which reconstructs tracks in a planar geometry and no magnetic field; (2) Can separated vertices be recognized in some way; (3) Can the algorithm be implemented in the Nevis-UMass and DDP and execute in 10-20 μs?

  7. EXPLORING DATA-DRIVEN SPECTRAL MODELS FOR APOGEE M DWARFS

    Science.gov (United States)

    Lua Birky, Jessica; Hogg, David; Burgasser, Adam J.; Jessica Birky

    2018-01-01

    The Cannon (Ness et al. 2015; Casey et al. 2016) is a flexible, data-driven spectral modeling and parameter inference framework, demonstrated on high-resolution Apache Point Galactic Evolution Experiment (APOGEE; λ/Δλ~22,500, 1.5-1.7µm) spectra of giant stars to estimate stellar labels (Teff, logg, [Fe/H], and chemical abundances) to precisions higher than the model-grid pipeline. The lack of reliable stellar parameters reported by the APOGEE pipeline for temperatures less than ~3550K, motivates extension of this approach to M dwarf stars. Using a training set of 51 M dwarfs with spectral types ranging M0-M9 obtained from SDSS optical spectra, we demonstrate that the Cannon can infer spectral types to a precision of +/-0.6 types, making it an effective tool for classifying high-resolution near-infrared spectra. We discuss the potential for extending this work to determine the physical stellar labels Teff, logg, and [Fe/H].This work is supported by the SDSS Faculty and Student (FAST) initiative.

  8. Data-driven approach for creating synthetic electronic medical records.

    Science.gov (United States)

    Buczak, Anna L; Babin, Steven; Moniz, Linda

    2010-10-14

    New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population. We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4

  9. Evidence-based and data-driven road safety management

    Directory of Open Access Journals (Sweden)

    Fred Wegman

    2015-07-01

    Full Text Available Over the past decades, road safety in highly-motorised countries has made significant progress. Although we have a fair understanding of the reasons for this progress, we don't have conclusive evidence for this. A new generation of road safety management approaches has entered road safety, starting when countries decided to guide themselves by setting quantitative targets (e.g. 50% less casualties in ten years' time. Setting realistic targets, designing strategies and action plans to achieve these targets and monitoring progress have resulted in more scientific research to support decision-making on these topics. Three subjects are key in this new approach of evidence-based and data-driven road safety management: ex-post and ex-ante evaluation of both individual interventions and intervention packages in road safety strategies, and transferability (external validity of the research results. In this article, we explore these subjects based on recent experiences in four jurisdictions (Western Australia, the Netherlands, Sweden and Switzerland. All four apply similar approaches and tools; differences are considered marginal. It is concluded that policy-making and political decisions were influenced to a great extent by the results of analysis and research. Nevertheless, to compensate for a relatively weak theoretical basis and to improve the power of this new approach, a number of issues will need further research. This includes ex-post and ex-ante evaluation, a better understanding of extrapolation of historical trends and the transferability of research results. This new approach cannot be realized without high-quality road safety data. Good data and knowledge are indispensable for this new and very promising approach.

  10. Data-Driven Model Uncertainty Estimation in Hydrologic Data Assimilation

    Science.gov (United States)

    Pathiraja, S.; Moradkhani, H.; Marshall, L.; Sharma, A.; Geenens, G.

    2018-02-01

    The increasing availability of earth observations necessitates mathematical methods to optimally combine such data with hydrologic models. Several algorithms exist for such purposes, under the umbrella of data assimilation (DA). However, DA methods are often applied in a suboptimal fashion for complex real-world problems, due largely to several practical implementation issues. One such issue is error characterization, which is known to be critical for a successful assimilation. Mischaracterized errors lead to suboptimal forecasts, and in the worst case, to degraded estimates even compared to the no assimilation case. Model uncertainty characterization has received little attention relative to other aspects of DA science. Traditional methods rely on subjective, ad hoc tuning factors or parametric distribution assumptions that may not always be applicable. We propose a novel data-driven approach (named SDMU) to model uncertainty characterization for DA studies where (1) the system states are partially observed and (2) minimal prior knowledge of the model error processes is available, except that the errors display state dependence. It includes an approach for estimating the uncertainty in hidden model states, with the end goal of improving predictions of observed variables. The SDMU is therefore suited to DA studies where the observed variables are of primary interest. Its efficacy is demonstrated through a synthetic case study with low-dimensional chaotic dynamics and a real hydrologic experiment for one-day-ahead streamflow forecasting. In both experiments, the proposed method leads to substantial improvements in the hidden states and observed system outputs over a standard method involving perturbation with Gaussian noise.

  11. Data-driven motion correction in brain SPECT

    International Nuclear Information System (INIS)

    Kyme, A.Z.; Hutton, B.F.; Hatton, R.L.; Skerrett, D.W.

    2002-01-01

    Patient motion can cause image artifacts in SPECT despite restraining measures. Data-driven detection and correction of motion can be achieved by comparison of acquired data with the forward-projections. By optimising the orientation of the reconstruction, parameters can be obtained for each misaligned projection and applied to update this volume using a 3D reconstruction algorithm. Digital and physical phantom validation was performed to investigate this approach. Noisy projection data simulating at least one fully 3D patient head movement during acquisition were constructed by projecting the digital Huffman brain phantom at various orientations. Motion correction was applied to the reconstructed studies. The importance of including attenuation effects in the estimation of motion and the need for implementing an iterated correction were assessed in the process. Correction success was assessed visually for artifact reduction, and quantitatively using a mean square difference (MSD) measure. Physical Huffman phantom studies with deliberate movements introduced during the acquisition were also acquired and motion corrected. Effective artifact reduction in the simulated corrupt studies was achieved by motion correction. Typically the MSD ratio between the corrected and reference studies compared to the corrupted and reference studies was > 2. Motion correction could be achieved without inclusion of attenuation effects in the motion estimation stage, providing simpler implementation and greater efficiency. Moreover the additional improvement with multiple iterations of the approach was small. Improvement was also observed in the physical phantom data, though the technique appeared limited here by an object symmetry. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  12. Architectural Strategies for Enabling Data-Driven Science at Scale

    Science.gov (United States)

    Crichton, D. J.; Law, E. S.; Doyle, R. J.; Little, M. M.

    2017-12-01

    architectural strategies, including a 2015-2016 NASA AIST Study on Big Data, for evolving scientific research towards massively distributed data-driven discovery. It will include example use cases across earth science, planetary science, and other disciplines.

  13. SIDEKICK: Genomic data driven analysis and decision-making framework

    Directory of Open Access Journals (Sweden)

    Yoon Kihoon

    2010-12-01

    Full Text Available Abstract Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to

  14. Neighborhood choices, neighborhood effects and housing vouchers

    OpenAIRE

    Davis, Morris A.; Gregory, Jesse; Hartley, Daniel A.; Tan, Kegon T. K.

    2017-01-01

    We study how households choose neighborhoods, how neighborhoods affect child ability, and how housing vouchers influence neighborhood choices and child outcomes. We use two new panel data sets with tract-level detail for Los Angeles county to estimate a dynamic model of optimal tract-level location choice for renting households and, separately, the impact of living in a given tract on child test scores (which we call "child ability" throughout). We simulate optimal location choices and change...

  15. The test of data driven TDC application in high energy physics experiment

    International Nuclear Information System (INIS)

    Liu Shubin; Guo Jianhua; Zhang Yanli; Zhao Long; An Qi

    2006-01-01

    In the high energy physics domain there is a trend to use integrated, high resolution, multi-hit time-digital-converter for time measurement, of which the data driven TDC is an important direction. Study on the method of how to test high performance TDC's characters and how to improve these characters will help us to select the proper TDC. The authors have studied the testing of a new high resolution TDC, which is planned to use in the third modification project of Beijing Spectrometer (BESIII). This paper introduces the test platform we built for the TDC, and the method by which we tested for nonlinearity, resolution, double pulse resolution characters, etc. The paper also gives the test results and introduces the compensation way to achieve a very high resolution (24.4 ps). (authors)

  16. Sensor fault analysis using decision theory and data-driven modeling of pressurized water reactor subsystems

    International Nuclear Information System (INIS)

    Upadhyaya, B.R.; Skorska, M.

    1984-01-01

    Instrument fault detection and estimation is important for process surveillance, control, and safety functions of a power plant. The method incorporates the dual-hypotheses decision procedure and system characterization using data-driven time-domain models of signals representing the system. The multivariate models can be developed on-line and can be adapted to changing system conditions. For the method to be effective, specific subsystems of pressurized water reactors were considered, and signal selection was made such that a strong causal relationship exists among the measured variables. The technique is applied to the reactor core subsystem of the loss-of-fluid test reactor using in-core neutron detector and core-exit thermocouple signals. Thermocouple anomalies such as bias error, noise error, and slow drift in the sensor are detected and estimated using appropriate measurement models

  17. Data-driven process decomposition and robust online distributed modelling for large-scale processes

    Science.gov (United States)

    Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou

    2018-02-01

    With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.

  18. Data-driven approach for creating synthetic electronic medical records

    Directory of Open Access Journals (Sweden)

    Moniz Linda

    2010-10-01

    Full Text Available Abstract Background New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. Methods This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia and for background records. The method developed has three major steps: 1 synthetic patient identity and basic information generation; 2 identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3 adaptation of these care patterns to the synthetic patient population. Results We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. Conclusions A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders. The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious

  19. Data-driven modeling of nano-nose gas sensor arrays

    DEFF Research Database (Denmark)

    Alstrøm, Tommy Sonne; Larsen, Jan; Nielsen, Claus Højgård

    2010-01-01

    We present a data-driven approach to classification of Quartz Crystal Microbalance (QCM) sensor data. The sensor is a nano-nose gas sensor that detects concentrations of analytes down to ppm levels using plasma polymorized coatings. Each sensor experiment takes approximately one hour hence...... the number of available training data is limited. We suggest a data-driven classification model which work from few examples. The paper compares a number of data-driven classification and quantification schemes able to detect the gas and the concentration level. The data-driven approaches are based on state...

  20. A Data-Driven Control Design Approach for Freeway Traffic Ramp Metering with Virtual Reference Feedback Tuning

    Directory of Open Access Journals (Sweden)

    Shangtai Jin

    2014-01-01

    Full Text Available ALINEA is a simple, efficient, and easily implemented ramp metering strategy. Virtual reference feedback tuning (VRFT is most suitable for many practical systems since it is a “one-shot” data-driven control design methodology. This paper presents an application of VRFT to a ramp metering problem of freeway traffic system. When there is not enough prior knowledge of the controlled system to select a proper parameter of ALINEA, the VRFT approach is used to optimize the ALINEA's parameter by only using a batch of input and output data collected from the freeway traffic system. The extensive simulations are built on both the macroscopic MATLAB platform and the microscopic PARAMICS platform to show the effectiveness and applicability of the proposed data-driven controller tuning approach.

  1. Data-Driven Zero-Sum Neuro-Optimal Control for a Class of Continuous-Time Unknown Nonlinear Systems With Disturbance Using ADP.

    Science.gov (United States)

    Wei, Qinglai; Song, Ruizhuo; Yan, Pengfei

    2016-02-01

    This paper is concerned with a new data-driven zero-sum neuro-optimal control problem for continuous-time unknown nonlinear systems with disturbance. According to the input-output data of the nonlinear system, an effective recurrent neural network is introduced to reconstruct the dynamics of the nonlinear system. Considering the system disturbance as a control input, a two-player zero-sum optimal control problem is established. Adaptive dynamic programming (ADP) is developed to obtain the optimal control under the worst case of the disturbance. Three single-layer neural networks, including one critic and two action networks, are employed to approximate the performance index function, the optimal control law, and the disturbance, respectively, for facilitating the implementation of the ADP method. Convergence properties of the ADP method are developed to show that the system state will converge to a finite neighborhood of the equilibrium. The weight matrices of the critic and the two action networks are also convergent to finite neighborhoods of their optimal ones. Finally, the simulation results will show the effectiveness of the developed data-driven ADP methods.

  2. Walkable new urban LEED_Neighborhood-Development (LEED-ND community design and children's physical activity: selection, environmental, or catalyst effects?

    Directory of Open Access Journals (Sweden)

    Stevens, Robert B

    2011-12-01

    Full Text Available Abstract Background Interest is growing in physical activity-friendly community designs, but few tests exist of communities explicitly designed to be walkable. We test whether students living in a new urbanist community that is also a pilot LEED_ND (Leadership in Energy and Environmental Design-Neighborhood Development community have greater accelerometer-measured moderate-to-vigorous physical activity (MVPA across particular time periods compared to students from other communities. We test various time/place periods to see if the data best conform to one of three explanations for MVPA. Environmental effects suggest that MVPA occurs when individuals are exposed to activity-friendly settings; selection effects suggest that walkable community residents prefer MVPA, which leads to both their choice of a walkable community and their high levels of MVPA; catalyst effects occur when walking to school creates more MVPA, beyond the school commute, on schooldays but not weekends. Methods Fifth graders (n = 187 were sampled from two schools representing three communities: (1 a walkable community, Daybreak, designed with new urbanist and LEED-ND pilot design standards; (2 a mixed community (where students lived in a less walkable community but attended the walkable school so that part of the route to school was walkable, and (3 a less walkable community. Selection threats were addressed through controlling for parental preferences for their child to walk to school as well as comparing in-school MVPA for the walkable and mixed groups. Results Minutes of MVPA were tested with 3 × 2 (Community by Gender analyses of covariance (ANCOVAs. Community walkability related to more MVPA during the half hour before and after school and, among boys only, more MVPA after school. Boys were more active than girls, except during the half hour after school. Students from the mixed and walkable communities--who attended the same school--had similar in-school MVPA levels, and

  3. Walkable new urban LEED_Neighborhood-Development (LEED-ND) community design and children's physical activity: selection, environmental, or catalyst effects?

    Science.gov (United States)

    2011-01-01

    Background Interest is growing in physical activity-friendly community designs, but few tests exist of communities explicitly designed to be walkable. We test whether students living in a new urbanist community that is also a pilot LEED_ND (Leadership in Energy and Environmental Design-Neighborhood Development) community have greater accelerometer-measured moderate-to-vigorous physical activity (MVPA) across particular time periods compared to students from other communities. We test various time/place periods to see if the data best conform to one of three explanations for MVPA. Environmental effects suggest that MVPA occurs when individuals are exposed to activity-friendly settings; selection effects suggest that walkable community residents prefer MVPA, which leads to both their choice of a walkable community and their high levels of MVPA; catalyst effects occur when walking to school creates more MVPA, beyond the school commute, on schooldays but not weekends. Methods Fifth graders (n = 187) were sampled from two schools representing three communities: (1) a walkable community, Daybreak, designed with new urbanist and LEED-ND pilot design standards; (2) a mixed community (where students lived in a less walkable community but attended the walkable school so that part of the route to school was walkable), and (3) a less walkable community. Selection threats were addressed through controlling for parental preferences for their child to walk to school as well as comparing in-school MVPA for the walkable and mixed groups. Results Minutes of MVPA were tested with 3 × 2 (Community by Gender) analyses of covariance (ANCOVAs). Community walkability related to more MVPA during the half hour before and after school and, among boys only, more MVPA after school. Boys were more active than girls, except during the half hour after school. Students from the mixed and walkable communities--who attended the same school--had similar in-school MVPA levels, and community groups

  4. Neighborhood Environmental Watch Network

    International Nuclear Information System (INIS)

    Sanders, L.D.

    1993-01-01

    The Neighborhood Environmental Watch Network (NEWNET) is a regional network of environmental monitoring stations and a data archival center that supports collaboration between communities, industry, and government agencies to solve environmental problems. The stations provide local displays of measurements for the public and transmit measurements via satellite to a central site for archival and analysis. Station managers are selected from the local community and trained to support the stations. Archived data and analysis tools are available to researchers, educational institutions, industrial collaborators, and the public across the nation through a communications network. Los Alamos National Laboratory and the Environmental Protection Agency have developed a NEWNET pilot program for the Department of Energy. The pilot program supports monitoring stations in Nevada, Arizona, Utah, Wyoming, and California. Additional stations are being placed in Colorado and New Mexico. Pilot stations take radiological and meteorological measurements. Other measurements are possible by exchanging sensors

  5. Data-Driven Property Estimation for Protective Clothing

    Science.gov (United States)

    2014-09-01

    reliable predictions falls under the rubric “machine learning”. Inspired by the applications of machine learning in pharmaceutical drug design and...using genetic algorithms, for instance— descriptor selection can be automated as well. A well-known structured learning technique—Artificial Neural...descriptors automatically, by iteration, e.g., using a genetic algorithm [49]. 4.2.4 Avoiding Overfitting A peril of all regression—least squares as

  6. Data-driven smooth tests of the proportional hazards assumption

    Czech Academy of Sciences Publication Activity Database

    Kraus, David

    2007-01-01

    Roč. 13, č. 1 (2007), s. 1-16 ISSN 1380-7870 R&D Projects: GA AV ČR(CZ) IAA101120604; GA ČR(CZ) GD201/05/H007 Institutional research plan: CEZ:AV0Z10750506 Keywords : Cox model * Neyman's smooth test * proportional hazards assumption * Schwarz's selection rule Subject RIV: BA - General Mathematics Impact factor: 0.491, year: 2007

  7. Data-driven discovery of partial differential equations.

    Science.gov (United States)

    Rudy, Samuel H; Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan

    2017-04-01

    We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.

  8. Neighborhood Disparities in the Restaurant Food Environment.

    Science.gov (United States)

    Martinez-Donate, Ana P; Espino, Jennifer Valdivia; Meinen, Amy; Escaron, Anne L; Roubal, Anne; Nieto, Javier; Malecki, Kristen

    2016-11-01

    Restaurant meals account for a significant portion of the American diet. Investigating disparities in the restaurant food environment can inform targeted interventions to increase opportunities for healthy eating among those who need them most. To examine neighborhood disparities in restaurant density and the nutrition environment within restaurants among a statewide sample of Wisconsin households. Households (N = 259) were selected from the 2009-2010 Survey of the Health of Wisconsin (SHOW), a population-based survey of Wisconsin adults. Restaurants in the household neighborhood were enumerated and audited using the Nutrition Environment Measures Survey for Restaurants (NEMS-R). Neighborhoods were defined as a 2- and 5-mile street-distance buffer around households in urban and non-urban areas, respectively. Adjusted linear regression models identified independent associations between sociodemographic household characteristics and neighborhood restaurant density and nutrition environment scores. On average, each neighborhood contained approximately 26 restaurants. On average, restaurants obtained 36.1% of the total nutrition environment points. After adjusting for household characteristics, higher restaurant density was associated with both younger and older household average age (P restaurant food environment in Wisconsin neighborhoods varies by age, race, and urbanicity, but offers ample room for improvement across socioeconomic groups and urbanicity levels. Future research must identify policy and environmental interventions to promote healthy eating in all restaurants, especially in young and/or rural neighborhoods in Wisconsin.

  9. Geoscience Meets Social Science: A Flexible Data Driven Approach for Developing High Resolution Population Datasets at Global Scale

    Science.gov (United States)

    Rose, A.; McKee, J.; Weber, E.; Bhaduri, B. L.

    2017-12-01

    Leveraging decades of expertise in population modeling, and in response to growing demand for higher resolution population data, Oak Ridge National Laboratory is now generating LandScan HD at global scale. LandScan HD is conceived as a 90m resolution population distribution where modeling is tailored to the unique geography and data conditions of individual countries or regions by combining social, cultural, physiographic, and other information with novel geocomputation methods. Similarities among these areas are exploited in order to leverage existing training data and machine learning algorithms to rapidly scale development. Drawing on ORNL's unique set of capabilities, LandScan HD adapts highly mature population modeling methods developed for LandScan Global and LandScan USA, settlement mapping research and production in high-performance computing (HPC) environments, land use and neighborhood mapping through image segmentation, and facility-specific population density models. Adopting a flexible methodology to accommodate different geographic areas, LandScan HD accounts for the availability, completeness, and level of detail of relevant ancillary data. Beyond core population and mapped settlement inputs, these factors determine the model complexity for an area, requiring that for any given area, a data-driven model could support either a simple top-down approach, a more detailed bottom-up approach, or a hybrid approach.

  10. Data-Driven Visualization and Group Analysis of Multichannel EEG Coherence with Functional Units

    NARCIS (Netherlands)

    Caat, Michael ten; Maurits, Natasha M.; Roerdink, Jos B.T.M.

    2008-01-01

    A typical data- driven visualization of electroencephalography ( EEG) coherence is a graph layout, with vertices representing electrodes and edges representing significant coherences between electrode signals. A drawback of this layout is its visual clutter for multichannel EEG. To reduce clutter,

  11. Estimating the Probability of Wind Ramping Events: A Data-driven Approach

    OpenAIRE

    Wang, Cheng; Wei, Wei; Wang, Jianhui; Qiu, Feng

    2016-01-01

    This letter proposes a data-driven method for estimating the probability of wind ramping events without exploiting the exact probability distribution function (PDF) of wind power. Actual wind data validates the proposed method.

  12. Autonomous Soil Assessment System: A Data-Driven Approach to Planetary Mobility Hazard Detection

    Science.gov (United States)

    Raimalwala, K.; Faragalli, M.; Reid, E.

    2018-04-01

    The Autonomous Soil Assessment System predicts mobility hazards for rovers. Its development and performance are presented, with focus on its data-driven models, machine learning algorithms, and real-time sensor data fusion for predictive analytics.

  13. Designing Data-Driven Battery Prognostic Approaches for Variable Loading Profiles: Some Lessons Learned

    Data.gov (United States)

    National Aeronautics and Space Administration — Among various approaches for implementing prognostic algorithms data-driven algorithms are popular in the industry due to their intuitive nature and relatively fast...

  14. Short-term stream flow forecasting at Australian river sites using data-driven regression techniques

    CSIR Research Space (South Africa)

    Steyn, Melise

    2017-09-01

    Full Text Available This study proposes a computationally efficient solution to stream flow forecasting for river basins where historical time series data are available. Two data-driven modeling techniques are investigated, namely support vector regression...

  15. Service and Data Driven Multi Business Model Platform in a World of Persuasive Technologies

    DEFF Research Database (Denmark)

    Andersen, Troels Christian; Bjerrum, Torben Cæsar Bisgaard

    2016-01-01

    companies in establishing a service organization that delivers, creates and captures value through service and data driven business models by utilizing their network, resources and customers and/or users. Furthermore, based on literature and collaboration with the case company, the suggestion of a new...... framework provides the necessary construction of how the manufac- turing companies can evolve their current business to provide multi service and data driven business models, using the same resources, networks and customers....

  16. Data-Driven Cyber-Physical Systems via Real-Time Stream Analytics and Machine Learning

    OpenAIRE

    Akkaya, Ilge

    2016-01-01

    Emerging distributed cyber-physical systems (CPSs) integrate a wide range of heterogeneous components that need to be orchestrated in a dynamic environment. While model-based techniques are commonly used in CPS design, they be- come inadequate in capturing the complexity as systems become larger and extremely dynamic. The adaptive nature of the systems makes data-driven approaches highly desirable, if not necessary.Traditionally, data-driven systems utilize large volumes of static data sets t...

  17. Data-driven mapping of the potential mountain permafrost distribution.

    Science.gov (United States)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2017-07-15

    Existing mountain permafrost distribution models generally offer a good overview of the potential extent of this phenomenon at a regional scale. They are however not always able to reproduce the high spatial discontinuity of permafrost at the micro-scale (scale of a specific landform; ten to several hundreds of meters). To overcome this lack, we tested an alternative modelling approach using three classification algorithms belonging to statistics and machine learning: Logistic regression, Support Vector Machines and Random forests. These supervised learning techniques infer a classification function from labelled training data (pixels of permafrost absence and presence) with the aim of predicting the permafrost occurrence where it is unknown. The research was carried out in a 588km 2 area of the Western Swiss Alps. Permafrost evidences were mapped from ortho-image interpretation (rock glacier inventorying) and field data (mainly geoelectrical and thermal data). The relationship between selected permafrost evidences and permafrost controlling factors was computed with the mentioned techniques. Classification performances, assessed with AUROC, range between 0.81 for Logistic regression, 0.85 with Support Vector Machines and 0.88 with Random forests. The adopted machine learning algorithms have demonstrated to be efficient for permafrost distribution modelling thanks to consistent results compared to the field reality. The high resolution of the input dataset (10m) allows elaborating maps at the micro-scale with a modelled permafrost spatial distribution less optimistic than classic spatial models. Moreover, the probability output of adopted algorithms offers a more precise overview of the potential distribution of mountain permafrost than proposing simple indexes of the permafrost favorability. These encouraging results also open the way to new possibilities of permafrost data analysis and mapping. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Data-Driven Optimization of Incentive-based Demand Response System with Uncertain Responses of Customers

    Directory of Open Access Journals (Sweden)

    Jimyung Kang

    2017-10-01

    Full Text Available Demand response is nowadays considered as another type of generator, beyond just a simple peak reduction mechanism. A demand response service provider (DRSP can, through its subcontracts with many energy customers, virtually generate electricity with actual load reduction. However, in this type of virtual generator, the amount of load reduction includes inevitable uncertainty, because it consists of a very large number of independent energy customers. While they may reduce energy today, they might not tomorrow. In this circumstance, a DSRP must choose a proper set of these uncertain customers to achieve the exact preferred amount of load curtailment. In this paper, the customer selection problem for a service provider that consists of uncertain responses of customers is defined and solved. The uncertainty of energy reduction is fully considered in the formulation with data-driven probability distribution modeling and stochastic programming technique. The proposed optimization method that utilizes only the observed load data provides a realistic and applicable solution to a demand response system. The performance of the proposed optimization is verified with real demand response event data in Korea, and the results show increased and stabilized performance from the service provider’s perspective.

  19. Data-driven quantification of the robustness and sensitivity of cell signaling networks

    International Nuclear Information System (INIS)

    Mukherjee, Sayak; Seok, Sang-Cheol; Vieland, Veronica J; Das, Jayajit

    2013-01-01

    Robustness and sensitivity of responses generated by cell signaling networks has been associated with survival and evolvability of organisms. However, existing methods analyzing robustness and sensitivity of signaling networks ignore the experimentally observed cell-to-cell variations of protein abundances and cell functions or contain ad hoc assumptions. We propose and apply a data-driven maximum entropy based method to quantify robustness and sensitivity of Escherichia coli (E. coli) chemotaxis signaling network. Our analysis correctly rank orders different models of E. coli chemotaxis based on their robustness and suggests that parameters regulating cell signaling are evolutionary selected to vary in individual cells according to their abilities to perturb cell functions. Furthermore, predictions from our approach regarding distribution of protein abundances and properties of chemotactic responses in individual cells based on cell population averaged data are in excellent agreement with their experimental counterparts. Our approach is general and can be used to evaluate robustness as well as generate predictions of single cell properties based on population averaged experimental data in a wide range of cell signaling systems. (paper)

  20. Internet Bad Neighborhoods Aggregation

    NARCIS (Netherlands)

    Moreira Moura, Giovane; Sadre, R.; Sperotto, Anna; Pras, Aiko; Paschoal Gaspary, L.; De Turk, Filip

    Internet Bad Neighborhoods have proven to be an innovative approach for fighting spam. They have also helped to understand how spammers are distributed on the Internet. In our previous works, the size of each bad neighborhood was fixed to a /24 subnetwork. In this paper, however, we investigate if

  1. Does Physical Activity Mediate the Association Between Perceived Neighborhood Aesthetics and Overweight/Obesity Among South African Adults Living in Selected Urban and Rural Communities?

    Science.gov (United States)

    Malambo, Pasmore; Kengne, Andre P; Lambert, Estelle V; De Villiers, Anniza; Puoane, Thandi

    2017-12-01

    To investigate the mediation effects of physical activity (PA) on the relationship between the perceived neighborhood aesthetic environment and overweight/obesity in free-living South Africans. A cross-sectional study of 671 adults aged ≥ 35 years was analyzed. PA was assessed using the validated International Physical Activity Questionnaire. Perceived neighborhood aesthetics was assessed using the Neighborhood Environment Walkability Scale Questionnaire. Of 671 participants, 76.0% were women, 34.1% aged 45-54 years, and 69.2% were overweight or obese. In adjusted logistic regression models, overweight/obesity was significantly associated with neighborhood aesthetics [odds ratio (OR) = 0.68; 95% confidence interval (CI), 0.50-0.93] and PA (OR = 0.65; 95% CI, 0.65-0.90). In expanded multivariable models, overweight/obesity was associated with age 45-55 years (OR = 1.59; 95% CI, 1.05-2.40), female gender (OR = 6.24; 95% CI, 3.95-9.86), tertiary education (OR = 4.05; 95% CI, 1.19-13.86), and urban residence (OR = 2.46; 95% CI, 1.66-3.65). Aesthetics was positively associated with PA; both aesthetics and PA were negatively associated with overweight and obesity. There was no evidence to support a significant mediating effect of PA on the relationship between aesthetics and overweight/obesity. Future studies should consider objective assessment of aesthetics and PA. In addition, future studies should consider using longitudinal design to evaluate food-related environments, which are related to overweight or obesity.

  2. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application

    Directory of Open Access Journals (Sweden)

    A. Elshorbagy

    2010-10-01

    Full Text Available In this second part of the two-part paper, the data driven modeling (DDM experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs, genetic programming (GP, evolutionary polynomial regression (EPR, Support vector machines (SVM, M5 model trees (M5, K-nearest neighbors (K-nn, and multiple linear regression (MLR techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it

  3. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application

    Science.gov (United States)

    Elshorbagy, A.; Corzo, G.; Srinivasulu, S.; Solomatine, D. P.

    2010-10-01

    In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K-nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should

  4. Data-driven design of fault diagnosis and fault-tolerant control systems

    CERN Document Server

    Ding, Steven X

    2014-01-01

    Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods, and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and...

  5. Observer and data-driven model based fault detection in Power Plant Coal Mills

    DEFF Research Database (Denmark)

    Fogh Odgaard, Peter; Lin, Bao; Jørgensen, Sten Bay

    2008-01-01

    model with motor power as the controlled variable, data-driven methods for fault detection are also investigated. Regression models that represent normal operating conditions (NOCs) are developed with both static and dynamic principal component analysis and partial least squares methods. The residual...... between process measurement and the NOC model prediction is used for fault detection. A hybrid approach, where a data-driven model is employed to derive an optimal unknown input observer, is also implemented. The three methods are evaluated with case studies on coal mill data, which includes a fault......This paper presents and compares model-based and data-driven fault detection approaches for coal mill systems. The first approach detects faults with an optimal unknown input observer developed from a simplified energy balance model. Due to the time-consuming effort in developing a first principles...

  6. Data-driven remaining useful life prognosis techniques stochastic models, methods and applications

    CERN Document Server

    Si, Xiao-Sheng; Hu, Chang-Hua

    2017-01-01

    This book introduces data-driven remaining useful life prognosis techniques, and shows how to utilize the condition monitoring data to predict the remaining useful life of stochastic degrading systems and to schedule maintenance and logistics plans. It is also the first book that describes the basic data-driven remaining useful life prognosis theory systematically and in detail. The emphasis of the book is on the stochastic models, methods and applications employed in remaining useful life prognosis. It includes a wealth of degradation monitoring experiment data, practical prognosis methods for remaining useful life in various cases, and a series of applications incorporated into prognostic information in decision-making, such as maintenance-related decisions and ordering spare parts. It also highlights the latest advances in data-driven remaining useful life prognosis techniques, especially in the contexts of adaptive prognosis for linear stochastic degrading systems, nonlinear degradation modeling based pro...

  7. Neighborhood Mapping Tool

    Data.gov (United States)

    Department of Housing and Urban Development — This tool assists the public and Choice Neighborhoods applicants to prepare data to submit with their grant application by allowing applicants to draw the exact...

  8. Data-driven analysis of simultaneous EEG/fMRI reveals neurophysiological phenotypes of impulse control.

    Science.gov (United States)

    Schmüser, Lena; Sebastian, Alexandra; Mobascher, Arian; Lieb, Klaus; Feige, Bernd; Tüscher, Oliver

    2016-09-01

    Response inhibition is the ability to suppress inadequate but prepotent or ongoing response tendencies. A fronto-striatal network is involved in these processes. Between-subject differences in the intra-individual variability have been suggested to constitute a key to pathological processes underlying impulse control disorders. Single-trial EEG/fMRI analysis allows to increase sensitivity for inter-individual differences by incorporating intra-individual variability. Thirty-eight healthy subjects performed a visual Go/Nogo task during simultaneous EEG/fMRI. Of 38 healthy subjects, 21 subjects reliably showed Nogo-related ICs (Nogo-IC-positive) while 17 subjects (Nogo-IC-negative) did not. Comparing both groups revealed differences on various levels: On trait level, Nogo-IC-negative subjects scored higher on questionnaires regarding attention deficit/hyperactivity disorder; on a behavioral level, they displayed slower response times (RT) and higher intra-individual RT variability while both groups did not differ in their inhibitory performance. On the neurophysiological level, Nogo-IC-negative subjects showed a hyperactivation of left inferior frontal cortex/insula and left putamen as well as significantly reduced P3 amplitudes. Thus, a data-driven approach for IC classification and the resulting presence or absence of early Nogo-specific ICs as criterion for group selection revealed group differences at behavioral and neurophysiological levels. This may indicate electrophysiological phenotypes characterized by inter-individual variations of neural and behavioral correlates of impulse control. We demonstrated that the inter-individual difference in an electrophysiological correlate of response inhibition is correlated with distinct, potentially compensatory neural activity. This may suggest the existence of electrophysiologically dissociable phenotypes of behavioral and neural motor response inhibition with the Nogo-IC-positive phenotype possibly providing

  9. Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

    Energy Technology Data Exchange (ETDEWEB)

    Leistedt, Boris; Hogg, David W., E-mail: boris.leistedt@nyu.edu, E-mail: david.hogg@nyu.edu [Center for Cosmology and Particle Physics, Department of Physics, New York University, New York, NY 10003 (United States)

    2017-03-20

    We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux–redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i -magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST ) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

  10. Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

    International Nuclear Information System (INIS)

    Leistedt, Boris; Hogg, David W.

    2017-01-01

    We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux–redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i -magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST ) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

  11. STUDY OF THE POYNTING FLUX IN ACTIVE REGION 10930 USING DATA-DRIVEN MAGNETOHYDRODYNAMIC SIMULATION

    International Nuclear Information System (INIS)

    Fan, Y. L.; Wang, H. N.; He, H.; Zhu, X. S.

    2011-01-01

    Powerful solar flares are closely related to the evolution of magnetic field configuration on the photosphere. We choose the Poynting flux as a parameter in the study of magnetic field changes. We use time-dependent multidimensional MHD simulations around a flare occurrence to generate the results, with the temporal variation of the bottom boundary conditions being deduced from the projected normal characteristic method. By this method, the photospheric magnetogram could be incorporated self-consistently as the bottom condition of data-driven simulations. The model is first applied to a simulation datum produced by an emerging magnetic flux rope as a test case. Then, the model is used to study NOAA AR 10930, which has an X3.4 flare, the data of which has been obtained by the Hinode/Solar Optical Telescope on 2006 December 13. We compute the magnitude of Poynting flux (S total ), radial Poynting flux (S z ), a proxy for ideal radial Poynting flux (S proxy ), Poynting flux due to plasma surface motion (S sur ), and Poynting flux due to plasma emergence (S emg ) and analyze their extensive properties in four selected areas: the whole sunspot, the positive sunspot, the negative sunspot, and the strong-field polarity inversion line (SPIL) area. It is found that (1) the S total , S z , and S proxy parameters show similar behaviors in the whole sunspot area and in the negative sunspot area. The evolutions of these three parameters in the positive area and the SPIL area are more volatile because of the effect of sunspot rotation and flux emergence. (2) The evolution of S sur is largely influenced by the process of sunspot rotation, especially in the positive sunspot. The evolution of S emg is greatly affected by flux emergence, especially in the SPIL area.

  12. Central focused convolutional neural networks: Developing a data-driven model for lung nodule segmentation.

    Science.gov (United States)

    Wang, Shuo; Zhou, Mu; Liu, Zaiyi; Liu, Zhenyu; Gu, Dongsheng; Zang, Yali; Dong, Di; Gevaert, Olivier; Tian, Jie

    2017-08-01

    Accurate lung nodule segmentation from computed tomography (CT) images is of great importance for image-driven lung cancer analysis. However, the heterogeneity of lung nodules and the presence of similar visual characteristics between nodules and their surroundings make it difficult for robust nodule segmentation. In this study, we propose a data-driven model, termed the Central Focused Convolutional Neural Networks (CF-CNN), to segment lung nodules from heterogeneous CT images. Our approach combines two key insights: 1) the proposed model captures a diverse set of nodule-sensitive features from both 3-D and 2-D CT images simultaneously; 2) when classifying an image voxel, the effects of its neighbor voxels can vary according to their spatial locations. We describe this phenomenon by proposing a novel central pooling layer retaining much information on voxel patch center, followed by a multi-scale patch learning strategy. Moreover, we design a weighted sampling to facilitate the model training, where training samples are selected according to their degree of segmentation difficulty. The proposed method has been extensively evaluated on the public LIDC dataset including 893 nodules and an independent dataset with 74 nodules from Guangdong General Hospital (GDGH). We showed that CF-CNN achieved superior segmentation performance with average dice scores of 82.15% and 80.02% for the two datasets respectively. Moreover, we compared our results with the inter-radiologists consistency on LIDC dataset, showing a difference in average dice score of only 1.98%. Copyright © 2017. Published by Elsevier B.V.

  13. Probing the dynamics of identified neurons with a data-driven modeling approach.

    Directory of Open Access Journals (Sweden)

    Thomas Nowotny

    2008-07-01

    Full Text Available In controlling animal behavior the nervous system has to perform within the operational limits set by the requirements of each specific behavior. The implications for the corresponding range of suitable network, single neuron, and ion channel properties have remained elusive. In this article we approach the question of how well-constrained properties of neuronal systems may be on the neuronal level. We used large data sets of the activity of isolated invertebrate identified cells and built an accurate conductance-based model for this cell type using customized automated parameter estimation techniques. By direct inspection of the data we found that the variability of the neurons is larger when they are isolated from the circuit than when in the intact system. Furthermore, the responses of the neurons to perturbations appear to be more consistent than their autonomous behavior under stationary conditions. In the developed model, the constraints on different parameters that enforce appropriate model dynamics vary widely from some very tightly controlled parameters to others that are almost arbitrary. The model also allows predictions for the effect of blocking selected ionic currents and to prove that the origin of irregular dynamics in the neuron model is proper chaoticity and that this chaoticity is typical in an appropriate sense. Our results indicate that data driven models are useful tools for the in-depth analysis of neuronal dynamics. The better consistency of responses to perturbations, in the real neurons as well as in the model, suggests a paradigm shift away from measuring autonomous dynamics alone towards protocols of controlled perturbations. Our predictions for the impact of channel blockers on the neuronal dynamics and the proof of chaoticity underscore the wide scope of our approach.

  14. Weather models as virtual sensors to data-driven rainfall predictions in urban watersheds

    Science.gov (United States)

    Cozzi, Lorenzo; Galelli, Stefano; Pascal, Samuel Jolivet De Marc; Castelletti, Andrea

    2013-04-01

    Weather and climate predictions are a key element of urban hydrology where they are used to inform water management and assist in flood warning delivering. Indeed, the modelling of the very fast dynamics of urbanized catchments can be substantially improved by the use of weather/rainfall predictions. For example, in Singapore Marina Reservoir catchment runoff processes have a very short time of concentration (roughly one hour) and observational data are thus nearly useless for runoff predictions and weather prediction are required. Unfortunately, radar nowcasting methods do not allow to carrying out long - term weather predictions, whereas numerical models are limited by their coarse spatial scale. Moreover, numerical models are usually poorly reliable because of the fast motion and limited spatial extension of rainfall events. In this study we investigate the combined use of data-driven modelling techniques and weather variables observed/simulated with a numerical model as a way to improve rainfall prediction accuracy and lead time in the Singapore metropolitan area. To explore the feasibility of the approach, we use a Weather Research and Forecast (WRF) model as a virtual sensor network for the input variables (the states of the WRF model) to a machine learning rainfall prediction model. More precisely, we combine an input variable selection method and a non-parametric tree-based model to characterize the empirical relation between the rainfall measured at the catchment level and all possible weather input variables provided by WRF model. We explore different lead time to evaluate the model reliability for different long - term predictions, as well as different time lags to see how past information could improve results. Results show that the proposed approach allow a significant improvement of the prediction accuracy of the WRF model on the Singapore urban area.

  15. Neighborhood Disorder and the Sense of Personal Control: Which Factors Moderate the Association?

    Science.gov (United States)

    Kim, Joongbaeck; Conley, Meghan E.

    2011-01-01

    This study examines whether and how select individual characteristics moderate the relationship between neighborhood disorder and a sense of personal control. Our findings show that neighborhood disorder is associated with a decreased sense of control. However, regression analyses including interaction terms of neighborhood disorder and some…

  16. Robust Data-Driven Inference for Density-Weighted Average Derivatives

    DEFF Research Database (Denmark)

    Cattaneo, Matias D.; Crump, Richard K.; Jansson, Michael

    This paper presents a new data-driven bandwidth selector compatible with the small bandwidth asymptotics developed in Cattaneo, Crump, and Jansson (2009) for density- weighted average derivatives. The new bandwidth selector is of the plug-in variety, and is obtained based on a mean squared error...

  17. Ability Grouping and Differentiated Instruction in an Era of Data-Driven Decision Making

    Science.gov (United States)

    Park, Vicki; Datnow, Amanda

    2017-01-01

    Despite data-driven decision making being a ubiquitous part of policy and school reform efforts, little is known about how teachers use data for instructional decision making. Drawing on data from a qualitative case study of four elementary schools, we examine the logic and patterns of teacher decision making about differentiation and ability…

  18. Data-driven diagnostics of terrestrial carbon dynamics over North America

    Science.gov (United States)

    Jingfeng Xiao; Scott V. Ollinger; Steve Frolking; George C. Hurtt; David Y. Hollinger; Kenneth J. Davis; Yude Pan; Xiaoyang Zhang; Feng Deng; Jiquan Chen; Dennis D. Baldocchi; Bevery E. Law; M. Altaf Arain; Ankur R. Desai; Andrew D. Richardson; Ge Sun; Brian Amiro; Hank Margolis; Lianhong Gu; Russell L. Scott; Peter D. Blanken; Andrew E. Suyker

    2014-01-01

    The exchange of carbon dioxide is a key measure of ecosystem metabolism and a critical intersection between the terrestrial biosphere and the Earth's climate. Despite the general agreement that the terrestrial ecosystems in North America provide a sizeable carbon sink, the size and distribution of the sink remain uncertain. We use a data-driven approach to upscale...

  19. Data-driven haemodynamic response function extraction using Fourier-wavelet regularised deconvolution

    NARCIS (Netherlands)

    Wink, Alle Meije; Hoogduin, Hans; Roerdink, Jos B.T.M.

    2008-01-01

    Background: We present a simple, data-driven method to extract haemodynamic response functions (HRF) from functional magnetic resonance imaging (fMRI) time series, based on the Fourier-wavelet regularised deconvolution (ForWaRD) technique. HRF data are required for many fMRI applications, such as

  20. Data-driven haemodynamic response function extraction using Fourier-wavelet regularised deconvolution

    NARCIS (Netherlands)

    Wink, Alle Meije; Hoogduin, Hans; Roerdink, Jos B.T.M.

    2010-01-01

    Background: We present a simple, data-driven method to extract haemodynamic response functions (HRF) from functional magnetic resonance imaging (fMRI) time series, based on the Fourier-wavelet regularised deconvolution (ForWaRD) technique. HRF data are required for many fMRI applications, such as

  1. Perspectives of data-driven LPV modeling of high-purity distillation columns

    NARCIS (Netherlands)

    Bachnas, A.A.; Toth, R.; Mesbah, A.; Ludlage, J.H.A.

    2013-01-01

    Abstract—This paper investigates data-driven, Linear- Parameter-Varying (LPV) modeling of a high-purity distillation column. Two LPV modeling approaches are studied: a local approach, corresponding to the interpolation of Linear Time- Invariant (LTI) models identified at steady-state purity levels,

  2. The Role of Guided Induction in Paper-Based Data-Driven Learning

    Science.gov (United States)

    Smart, Jonathan

    2014-01-01

    This study examines the role of guided induction as an instructional approach in paper-based data-driven learning (DDL) in the context of an ESL grammar course during an intensive English program at an American public university. Specifically, it examines whether corpus-informed grammar instruction is more effective through inductive, data-driven…

  3. Design and evaluation of a data-driven scenario generation framework for game-based training

    NARCIS (Netherlands)

    Luo, L.; Yin, H.; Cai, W.; Zhong, J.; Lees, M.

    Generating suitable game scenarios that can cater for individual players has become an emerging challenge in procedural content generation. In this paper, we propose a data-driven scenario generation framework for game-based training. An evolutionary scenario generation process is designed with a

  4. Data-driven Development of ROTEM and TEG Algorithms for the Management of Trauma Hemorrhage

    DEFF Research Database (Denmark)

    Baksaas-Aasen, Kjersti; Van Dieren, Susan; Balvers, Kirsten

    2018-01-01

    for ROTEM, TEG, and CCTs to be used in addition to ratio driven transfusion and tranexamic acid. CONCLUSIONS: We describe a systematic approach to define threshold parameters for ROTEM and TEG. These parameters were incorporated into algorithms to support data-driven adjustments of resuscitation...

  5. Teacher Talk about Student Ability and Achievement in the Era of Data-Driven Decision Making

    Science.gov (United States)

    Datnow, Amanda; Choi, Bailey; Park, Vicki; St. John, Elise

    2018-01-01

    Background: Data-driven decision making continues to be a common feature of educational reform agendas across the globe. In many U.S. schools, the teacher team meeting is a key setting in which data use is intended to take place, with the aim of planning instruction to address students' needs. However, most prior research has not examined how the…

  6. Big-Data-Driven Stem Cell Science and Tissue Engineering: Vision and Unique Opportunities.

    Science.gov (United States)

    Del Sol, Antonio; Thiesen, Hans J; Imitola, Jaime; Carazo Salas, Rafael E

    2017-02-02

    Achieving the promises of stem cell science to generate precise disease models and designer cell samples for personalized therapeutics will require harnessing pheno-genotypic cell-level data quantitatively and predictively in the lab and clinic. Those requirements could be met by developing a Big-Data-driven stem cell science strategy and community. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Exploring Techniques of Developing Writing Skill in IELTS Preparatory Courses: A Data-Driven Study

    Science.gov (United States)

    Ostovar-Namaghi, Seyyed Ali; Safaee, Seyyed Esmail

    2017-01-01

    Being driven by the hypothetico-deductive mode of inquiry, previous studies have tested the effectiveness of theory-driven interventions under controlled experimental conditions to come up with universally applicable generalizations. To make a case in the opposite direction, this data-driven study aims at uncovering techniques and strategies…

  8. A framework for the automated data-driven constitutive characterization of composites

    Science.gov (United States)

    J.G. Michopoulos; John Hermanson; T. Furukawa; A. Iliopoulos

    2010-01-01

    We present advances on the development of a mechatronically and algorithmically automated framework for the data-driven identification of constitutive material models based on energy density considerations. These models can capture both the linear and nonlinear constitutive response of multiaxially loaded composite materials in a manner that accounts for progressive...

  9. Writing through Big Data: New Challenges and Possibilities for Data-Driven Arguments

    Science.gov (United States)

    Beveridge, Aaron

    2017-01-01

    As multimodal writing continues to shift and expand in the era of Big Data, writing studies must confront the new challenges and possibilities emerging from data mining, data visualization, and data-driven arguments. Often collected under the broad banner of "data literacy," students' experiences of data visualization and data-driven…

  10. Data-driven directions for effective footwear provision for the high-risk diabetic foot

    NARCIS (Netherlands)

    Arts, M. L. J.; de Haart, M.; Waaijman, R.; Dahmen, R.; Berendsen, H.; Nollet, F.; Bus, S. A.

    2015-01-01

    Custom-made footwear is used to offload the diabetic foot to prevent plantar foot ulcers. This prospective study evaluates the offloading effects of modifying custom-made footwear and aims to provide data-driven directions for the provision of effectively offloading footwear in clinical practice.

  11. Toward Data-Driven Design of Educational Courses: A Feasibility Study

    Science.gov (United States)

    Agrawal, Rakesh; Golshan, Behzad; Papalexakis, Evangelos

    2016-01-01

    A study plan is the choice of concepts and the organization and sequencing of the concepts to be covered in an educational course. While a good study plan is essential for the success of any course offering, the design of study plans currently remains largely a manual task. We present a novel data-driven method, which given a list of concepts can…

  12. Retesting the Limits of Data-Driven Learning: Feedback and Error Correction

    Science.gov (United States)

    Crosthwaite, Peter

    2017-01-01

    An increasing number of studies have looked at the value of corpus-based data-driven learning (DDL) for second language (L2) written error correction, with generally positive results. However, a potential conundrum for language teachers involved in the process is how to provide feedback on students' written production for DDL. The study looks at…

  13. Articulatory Distinctiveness of Vowels and Consonants: A Data-Driven Approach

    Science.gov (United States)

    Wang, Jun; Green, Jordan R.; Samal, Ashok; Yunusova, Yana

    2013-01-01

    Purpose: To quantify the articulatory distinctiveness of 8 major English vowels and 11 English consonants based on tongue and lip movement time series data using a data-driven approach. Method: Tongue and lip movements of 8 vowels and 11 consonants from 10 healthy talkers were

  14. Data-Driven Hint Generation in Vast Solution Spaces: A Self-Improving Python Programming Tutor

    Science.gov (United States)

    Rivers, Kelly; Koedinger, Kenneth R.

    2017-01-01

    To provide personalized help to students who are working on code-writing problems, we introduce a data-driven tutoring system, ITAP (Intelligent Teaching Assistant for Programming). ITAP uses state abstraction, path construction, and state reification to automatically generate personalized hints for students, even when given states that have not…

  15. Neighborhoods, US, 2017, Zillow, SEGS

    Data.gov (United States)

    U.S. Environmental Protection Agency — This web service depicts nearly 17,000 neighborhood boundaries in over 650 U.S. cities. Zillow created the neighborhood boundaries and is sharing them with the...

  16. NeighborHood

    OpenAIRE

    Corominola Ocaña, Víctor

    2015-01-01

    NeighborHood és una aplicació basada en el núvol, adaptable a qualsevol dispositiu (mòbil, tablet, desktop). L'objectiu d'aquesta aplicació és poder permetre als usuaris introduir a les persones del seu entorn més immediat i que aquestes persones siguin visibles per a la resta d'usuaris. NeighborHood es una aplicación basada en la nube, adaptable a cualquier dispositivo (móvil, tablet, desktop). El objetivo de esta aplicación es poder permitir a los usuarios introducir a las personas de su...

  17. Community, Democracy, and Neighborhood News.

    Science.gov (United States)

    Hindman, Elizabeth Blanks

    1998-01-01

    Contributes to scholarship on democracy, community, and journalism by examining the interplay between communication, democracy, and community at an inner-city neighborhood newspaper. Concludes that, through its focus on neighborhood culture, acknowledgment of conflict, and attempts to provide a forum for the neighborhood's self-definition, the…

  18. An Open Framework for Dynamic Big-data-driven Application Systems (DBDDAS) Development

    KAUST Repository

    Douglas, Craig

    2014-01-01

    In this paper, we outline key features that dynamic data-driven application systems (DDDAS) have. A DDDAS is an application that has data assimilation that can change the models and/or scales of the computation and that the application controls the data collection based on the computational results. The term Big Data (BD) has come into being in recent years that is highly applicable to most DDDAS since most applications use networks of sensors that generate an overwhelming amount of data in the lifespan of the application runs. We describe what a dynamic big-data-driven application system (DBDDAS) toolkit must have in order to provide all of the essential building blocks that are necessary to easily create new DDDAS without re-inventing the building blocks.

  19. Data-Driven Iterative Vibration Signal Enhancement Strategy Using Alpha Stable Distribution

    Directory of Open Access Journals (Sweden)

    Grzegorz Żak

    2017-01-01

    Full Text Available The authors propose a novel procedure for enhancement of the signal to noise ratio in vibration data acquired from machines working in mining industry environment. Proposed method allows performing data-driven reduction of the deterministic, high energy, and low frequency components. Furthermore, it provides a way to enhance signal of interest. Procedure incorporates application of the time-frequency decomposition, α-stable distribution based signal modeling, and stability parameter in the time domain as a stoppage criterion for iterative part of the procedure. An advantage of the proposed algorithm is data-driven, automative detection of the informative frequency band as well as band with high energy due to the properties of the used distribution. Furthermore, there is no need to have knowledge regarding kinematics, speed, and so on. The proposed algorithm is applied towards real data acquired from the belt conveyor pulley drive’s gearbox.

  20. Data Driven Modelling of the Dynamic Wake Between Two Wind Turbines

    DEFF Research Database (Denmark)

    Knudsen, Torben; Bak, Thomas

    2012-01-01

    turbine. This paper establishes flow models relating the wind speeds at turbines in a farm. So far, research in this area has been mainly based on first principles static models and the data driven modelling done has not included the loading of the upwind turbine and its impact on the wind speed downwind......Wind turbines in a wind farm, influence each other through the wind flow. Downwind turbines are in the wake of upwind turbines and the wind speed experienced at downwind turbines is hence a function of the wind speeds at upwind turbines but also the momentum extracted from the wind by the upwind....... This paper is the first where modern commercial mega watt turbines are used for data driven modelling including the upwind turbine loading by changing power reference. Obtaining the necessary data is difficult and data is therefore limited. A simple dynamic extension to the Jensen wake model is tested...

  1. Pipe break prediction based on evolutionary data-driven methods with brief recorded data

    International Nuclear Information System (INIS)

    Xu Qiang; Chen Qiuwen; Li Weifeng; Ma Jinfeng

    2011-01-01

    Pipe breaks often occur in water distribution networks, imposing great pressure on utility managers to secure stable water supply. However, pipe breaks are hard to detect by the conventional method. It is therefore necessary to develop reliable and robust pipe break models to assess the pipe's probability to fail and then to optimize the pipe break detection scheme. In the absence of deterministic physical models for pipe break, data-driven techniques provide a promising approach to investigate the principles underlying pipe break. In this paper, two data-driven techniques, namely Genetic Programming (GP) and Evolutionary Polynomial Regression (EPR) are applied to develop pipe break models for the water distribution system of Beijing City. The comparison with the recorded pipe break data from 1987 to 2005 showed that the models have great capability to obtain reliable predictions. The models can be used to prioritize pipes for break inspection and then improve detection efficiency.

  2. Data-driven modeling and real-time distributed control for energy efficient manufacturing systems

    International Nuclear Information System (INIS)

    Zou, Jing; Chang, Qing; Arinez, Jorge; Xiao, Guoxian

    2017-01-01

    As manufacturers face the challenges of increasing global competition and energy saving requirements, it is imperative to seek out opportunities to reduce energy waste and overall cost. In this paper, a novel data-driven stochastic manufacturing system modeling method is proposed to identify and predict energy saving opportunities and their impact on production. A real-time distributed feedback production control policy, which integrates the current and predicted system performance, is established to improve the overall profit and energy efficiency. A case study is presented to demonstrate the effectiveness of the proposed control policy. - Highlights: • A data-driven stochastic manufacturing system model is proposed. • Real-time system performance and energy saving opportunity identification method is developed. • Prediction method for future potential system performance and energy saving opportunity is developed. • A real-time distributed feedback control policy is established to improve energy efficiency and overall system profit.

  3. An Open Framework for Dynamic Big-data-driven Application Systems (DBDDAS) Development

    KAUST Repository

    Douglas, Craig

    2014-06-06

    In this paper, we outline key features that dynamic data-driven application systems (DDDAS) have. A DDDAS is an application that has data assimilation that can change the models and/or scales of the computation and that the application controls the data collection based on the computational results. The term Big Data (BD) has come into being in recent years that is highly applicable to most DDDAS since most applications use networks of sensors that generate an overwhelming amount of data in the lifespan of the application runs. We describe what a dynamic big-data-driven application system (DBDDAS) toolkit must have in order to provide all of the essential building blocks that are necessary to easily create new DDDAS without re-inventing the building blocks.

  4. A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults

    OpenAIRE

    Rui Sun; Qi Cheng; Guanyu Wang; Washington Yotto Ochieng

    2017-01-01

    The use of Unmanned Aerial Vehicles (UAVs) has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs’ flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS)-based approach is presented for the detection of on-board navigation sensor faults in ...

  5. Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains

    OpenAIRE

    Sethi, Tegjyot Singh; Kantardzic, Mehmed

    2017-01-01

    While modern day web applications aim to create impact at the civilization level, they have become vulnerable to adversarial activity, where the next cyber-attack can take any shape and can originate from anywhere. The increasing scale and sophistication of attacks, has prompted the need for a data driven solution, with machine learning forming the core of many cybersecurity systems. Machine learning was not designed with security in mind, and the essential assumption of stationarity, requiri...

  6. Data Driven Marketing in Apple and Back to School Campaign 2011

    OpenAIRE

    Bernátek, Martin

    2011-01-01

    Out of the campaign analysis the most important contribution is that Data-Driven Marketing makes sense only once it is already part of the marketing plan. So the team preparing the marketing plan defines the goals and sets the proper measurement matrix according to those goals. It enables to adjust the marketing plan to extract more value, watch the execution and do adjustments if necessary and evaluate at the end of the campaign.

  7. Data-driven automatic parking constrained control for four-wheeled mobile vehicles

    OpenAIRE

    Wenxu Yan; Jing Deng; Dezhi Xu

    2016-01-01

    In this article, a novel data-driven constrained control scheme is proposed for automatic parking systems. The design of the proposed scheme only depends on the steering angle and the orientation angle of the car, and it does not involve any model information of the car. Therefore, the proposed scheme-based automatic parking system is applicable to different kinds of cars. In order to further reduce the desired trajectory coordinate tracking errors, a coordinates compensation algorithm is als...

  8. Extension of a data-driven gating technique to 3D, whole body PET studies

    International Nuclear Information System (INIS)

    Schleyer, Paul J; O'Doherty, Michael J; Marsden, Paul K

    2011-01-01

    Respiratory gating can be used to separate a PET acquisition into a series of near motion-free bins. This is typically done using additional gating hardware; however, software-based methods can derive the respiratory signal from the acquired data itself. The aim of this work was to extend a data-driven respiratory gating method to acquire gated, 3D, whole body PET images of clinical patients. The existing method, previously demonstrated with 2D, single bed-position data, uses a spectral analysis to find regions in raw PET data which are subject to respiratory motion. The change in counts over time within these regions is then used to estimate the respiratory signal of the patient. In this work, the gating method was adapted to only accept lines of response from a reduced set of axial angles, and the respiratory frequency derived from the lung bed position was used to help identify the respiratory frequency in all other bed positions. As the respiratory signal does not identify the direction of motion, a registration-based technique was developed to align the direction for all bed positions. Data from 11 clinical FDG PET patients were acquired, and an optical respiratory monitor was used to provide a hardware-based signal for comparison. All data were gated using both the data-driven and hardware methods, and reconstructed. The centre of mass of manually defined regions on gated images was calculated, and the overall displacement was defined as the change in the centre of mass between the first and last gates. The mean displacement was 10.3 mm for the data-driven gated images and 9.1 mm for the hardware gated images. No significant difference was found between the two gating methods when comparing the displacement values. The adapted data-driven gating method was demonstrated to successfully produce respiratory gated, 3D, whole body, clinical PET acquisitions.

  9. A data-driven approach for retrieving temperatures and abundances in brown dwarf atmospheres

    OpenAIRE

    Line, MR; Fortney, JJ; Marley, MS; Sorahana, S

    2014-01-01

    © 2014. The American Astronomical Society. All rights reserved. Brown dwarf spectra contain a wealth of information about their molecular abundances, temperature structure, and gravity. We present a new data driven retrieval approach, previously used in planetary atmosphere studies, to extract the molecular abundances and temperature structure from brown dwarf spectra. The approach makes few a priori physical assumptions about the state of the atmosphere. The feasibility of the approach is fi...

  10. Using Two Different Approaches to Assess Dietary Patterns: Hypothesis-Driven and Data-Driven Analysis

    Directory of Open Access Journals (Sweden)

    Ágatha Nogueira Previdelli

    2016-09-01

    Full Text Available The use of dietary patterns to assess dietary intake has become increasingly common in nutritional epidemiology studies due to the complexity and multidimensionality of the diet. Currently, two main approaches have been widely used to assess dietary patterns: data-driven and hypothesis-driven analysis. Since the methods explore different angles of dietary intake, using both approaches simultaneously might yield complementary and useful information; thus, we aimed to use both approaches to gain knowledge of adolescents’ dietary patterns. Food intake from a cross-sectional survey with 295 adolescents was assessed by 24 h dietary recall (24HR. In hypothesis-driven analysis, based on the American National Cancer Institute method, the usual intake of Brazilian Healthy Eating Index Revised components were estimated. In the data-driven approach, the usual intake of foods/food groups was estimated by the Multiple Source Method. In the results, hypothesis-driven analysis showed low scores for Whole grains, Total vegetables, Total fruit and Whole fruits, while, in data-driven analysis, fruits and whole grains were not presented in any pattern. High intakes of sodium, fats and sugars were observed in hypothesis-driven analysis with low total scores for Sodium, Saturated fat and SoFAA (calories from solid fat, alcohol and added sugar components in agreement, while the data-driven approach showed the intake of several foods/food groups rich in these nutrients, such as butter/margarine, cookies, chocolate powder, whole milk, cheese, processed meat/cold cuts and candies. In this study, using both approaches at the same time provided consistent and complementary information with regard to assessing the overall dietary habits that will be important in order to drive public health programs, and improve their efficiency to monitor and evaluate the dietary patterns of populations.

  11. Data-Driven and Expectation-Driven Discovery of Empirical Laws.

    Science.gov (United States)

    1982-10-10

    occurred in small integer proportions to each other. In 1809, Joseph Gay- Lussac found evidence for his law of combining volumes, which stated that a...of Empirical Laws Patrick W. Langley Gary L. Bradshaw Herbert A. Simon T1he Robotics Institute Carnegie-Mellon University Pittsburgh, Pennsylvania...Subtitle) S. TYPE OF REPORT & PERIOD COVERED Data-Driven and Expectation-Driven Discovery Interim Report 2/82-10/82 of Empirical Laws S. PERFORMING ORG

  12. Data-driven non-linear elasticity: constitutive manifold construction and problem discretization

    Science.gov (United States)

    Ibañez, Ruben; Borzacchiello, Domenico; Aguado, Jose Vicente; Abisset-Chavanne, Emmanuelle; Cueto, Elias; Ladeveze, Pierre; Chinesta, Francisco

    2017-11-01

    The use of constitutive equations calibrated from data has been implemented into standard numerical solvers for successfully addressing a variety problems encountered in simulation-based engineering sciences (SBES). However, the complexity remains constantly increasing due to the need of increasingly detailed models as well as the use of engineered materials. Data-Driven simulation constitutes a potential change of paradigm in SBES. Standard simulation in computational mechanics is based on the use of two very different types of equations. The first one, of axiomatic character, is related to balance laws (momentum, mass, energy,\\ldots ), whereas the second one consists of models that scientists have extracted from collected, either natural or synthetic, data. Data-driven (or data-intensive) simulation consists of directly linking experimental data to computers in order to perform numerical simulations. These simulations will employ laws, universally recognized as epistemic, while minimizing the need of explicit, often phenomenological, models. The main drawback of such an approach is the large amount of required data, some of them inaccessible from the nowadays testing facilities. Such difficulty can be circumvented in many cases, and in any case alleviated, by considering complex tests, collecting as many data as possible and then using a data-driven inverse approach in order to generate the whole constitutive manifold from few complex experimental tests, as discussed in the present work.

  13. Data-Driven Anomaly Detection Performance for the Ares I-X Ground Diagnostic Prototype

    Science.gov (United States)

    Martin, Rodney A.; Schwabacher, Mark A.; Matthews, Bryan L.

    2010-01-01

    In this paper, we will assess the performance of a data-driven anomaly detection algorithm, the Inductive Monitoring System (IMS), which can be used to detect simulated Thrust Vector Control (TVC) system failures. However, the ability of IMS to detect these failures in a true operational setting may be related to the realistic nature of how they are simulated. As such, we will investigate both a low fidelity and high fidelity approach to simulating such failures, with the latter based upon the underlying physics. Furthermore, the ability of IMS to detect anomalies that were previously unknown and not previously simulated will be studied in earnest, as well as apparent deficiencies or misapplications that result from using the data-driven paradigm. Our conclusions indicate that robust detection performance of simulated failures using IMS is not appreciably affected by the use of a high fidelity simulation. However, we have found that the inclusion of a data-driven algorithm such as IMS into a suite of deployable health management technologies does add significant value.

  14. Data-Driven User Feedback: An Improved Neurofeedback Strategy considering the Interindividual Variability of EEG Features

    Directory of Open Access Journals (Sweden)

    Chang-Hee Han

    2016-01-01

    Full Text Available It has frequently been reported that some users of conventional neurofeedback systems can experience only a small portion of the total feedback range due to the large interindividual variability of EEG features. In this study, we proposed a data-driven neurofeedback strategy considering the individual variability of electroencephalography (EEG features to permit users of the neurofeedback system to experience a wider range of auditory or visual feedback without a customization process. The main idea of the proposed strategy is to adjust the ranges of each feedback level using the density in the offline EEG database acquired from a group of individuals. Twenty-two healthy subjects participated in offline experiments to construct an EEG database, and five subjects participated in online experiments to validate the performance of the proposed data-driven user feedback strategy. Using the optimized bin sizes, the number of feedback levels that each individual experienced was significantly increased to 139% and 144% of the original results with uniform bin sizes in the offline and online experiments, respectively. Our results demonstrated that the use of our data-driven neurofeedback strategy could effectively increase the overall range of feedback levels that each individual experienced during neurofeedback training.

  15. Parameterized data-driven fuzzy model based optimal control of a semi-batch reactor.

    Science.gov (United States)

    Kamesh, Reddi; Rani, K Yamuna

    2016-09-01

    A parameterized data-driven fuzzy (PDDF) model structure is proposed for semi-batch processes, and its application for optimal control is illustrated. The orthonormally parameterized input trajectories, initial states and process parameters are the inputs to the model, which predicts the output trajectories in terms of Fourier coefficients. Fuzzy rules are formulated based on the signs of a linear data-driven model, while the defuzzification step incorporates a linear regression model to shift the domain from input to output domain. The fuzzy model is employed to formulate an optimal control problem for single rate as well as multi-rate systems. Simulation study on a multivariable semi-batch reactor system reveals that the proposed PDDF modeling approach is capable of capturing the nonlinear and time-varying behavior inherent in the semi-batch system fairly accurately, and the results of operating trajectory optimization using the proposed model are found to be comparable to the results obtained using the exact first principles model, and are also found to be comparable to or better than parameterized data-driven artificial neural network model based optimization results. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  16. KNMI DataLab experiences in serving data-driven innovations

    Science.gov (United States)

    Noteboom, Jan Willem; Sluiter, Raymond

    2016-04-01

    Climate change research and innovations in weather forecasting rely more and more on (Big) data. Besides increasing data from traditional sources (such as observation networks, radars and satellites), the use of open data, crowd sourced data and the Internet of Things (IoT) is emerging. To deploy these sources of data optimally in our services and products, KNMI has established a DataLab to serve data-driven innovations in collaboration with public and private sector partners. Big data management, data integration, data analytics including machine learning and data visualization techniques are playing an important role in the DataLab. Cross-domain data-driven innovations that arise from public-private collaborative projects and research programmes can be explored, experimented and/or piloted by the KNMI DataLab. Furthermore, advice can be requested on (Big) data techniques and data sources. In support of collaborative (Big) data science activities, scalable environments are offered with facilities for data integration, data analysis and visualization. In addition, Data Science expertise is provided directly or from a pool of internal and external experts. At the EGU conference, gained experiences and best practices are presented in operating the KNMI DataLab to serve data-driven innovations for weather and climate applications optimally.

  17. Data-Driven User Feedback: An Improved Neurofeedback Strategy considering the Interindividual Variability of EEG Features.

    Science.gov (United States)

    Han, Chang-Hee; Lim, Jeong-Hwan; Lee, Jun-Hak; Kim, Kangsan; Im, Chang-Hwan

    2016-01-01

    It has frequently been reported that some users of conventional neurofeedback systems can experience only a small portion of the total feedback range due to the large interindividual variability of EEG features. In this study, we proposed a data-driven neurofeedback strategy considering the individual variability of electroencephalography (EEG) features to permit users of the neurofeedback system to experience a wider range of auditory or visual feedback without a customization process. The main idea of the proposed strategy is to adjust the ranges of each feedback level using the density in the offline EEG database acquired from a group of individuals. Twenty-two healthy subjects participated in offline experiments to construct an EEG database, and five subjects participated in online experiments to validate the performance of the proposed data-driven user feedback strategy. Using the optimized bin sizes, the number of feedback levels that each individual experienced was significantly increased to 139% and 144% of the original results with uniform bin sizes in the offline and online experiments, respectively. Our results demonstrated that the use of our data-driven neurofeedback strategy could effectively increase the overall range of feedback levels that each individual experienced during neurofeedback training.

  18. A Neighborhood Story

    Science.gov (United States)

    Gerrish, Michael

    2009-01-01

    Blue collar doesn't have to mean drab and dull. At least, not to Troy, New York, historian Mike Esposito, who is a member of a neighborhood revitalization movement seeking to celebrate the people and events that brought diversity, prosperity, and vitality to this upstate New York community more than 100 years ago. Esposito and others invited…

  19. Reacting to Neighborhood Cues?

    DEFF Research Database (Denmark)

    Danckert, Bolette; Dinesen, Peter Thisted; Sønderskov, Kim Mannemar

    2017-01-01

    is founded on politically sophisticated individuals having a greater comprehension of news and other mass-mediated sources, which makes them less likely to rely on neighborhood cues as sources of information relevant for political attitudes. Based on a unique panel data set with fine-grained information...

  20. External radioactive markers for PET data-driven respiratory gating in positron emission tomography.

    Science.gov (United States)

    Büther, Florian; Ernst, Iris; Hamill, James; Eich, Hans T; Schober, Otmar; Schäfers, Michael; Schäfers, Klaus P

    2013-04-01

    Respiratory gating is an established approach to overcoming respiration-induced image artefacts in PET. Of special interest in this respect are raw PET data-driven gating methods which do not require additional hardware to acquire respiratory signals during the scan. However, these methods rely heavily on the quality of the acquired PET data (statistical properties, data contrast, etc.). We therefore combined external radioactive markers with data-driven respiratory gating in PET/CT. The feasibility and accuracy of this approach was studied for [(18)F]FDG PET/CT imaging in patients with malignant liver and lung lesions. PET data from 30 patients with abdominal or thoracic [(18)F]FDG-positive lesions (primary tumours or metastases) were included in this prospective study. The patients underwent a 10-min list-mode PET scan with a single bed position following a standard clinical whole-body [(18)F]FDG PET/CT scan. During this scan, one to three radioactive point sources (either (22)Na or (18)F, 50-100 kBq) in a dedicated holder were attached the patient's abdomen. The list mode data acquired were retrospectively analysed for respiratory signals using established data-driven gating approaches and additionally by tracking the motion of the point sources in sinogram space. Gated reconstructions were examined qualitatively, in terms of the amount of respiratory displacement and in respect of changes in local image intensity in the gated images. The presence of the external markers did not affect whole-body PET/CT image quality. Tracking of the markers led to characteristic respiratory curves in all patients. Applying these curves for gated reconstructions resulted in images in which motion was well resolved. Quantitatively, the performance of the external marker-based approach was similar to that of the best intrinsic data-driven methods. Overall, the gain in measured tumour uptake from the nongated to the gated images indicating successful removal of respiratory motion

  1. Data driven analysis of rain events: feature extraction, clustering, microphysical /macro physical relationship

    Science.gov (United States)

    Djallel Dilmi, Mohamed; Mallet, Cécile; Barthes, Laurent; Chazottes, Aymeric

    2017-04-01

    that a rain time series can be considered by an alternation of independent rain event and no rain period. The five selected feature are used to perform a hierarchical clustering of the events. The well-known division between stratiform and convective events appears clearly. This classification into two classes is then refined in 5 fairly homogeneous subclasses. The data driven analysis performed on whole rain events instead of fixed length samples allows identifying strong relationships between macrophysics (based on rain rate) and microphysics (based on raindrops) features. We show that among the 5 identified subclasses some of them have specific microphysics characteristics. Obtaining information on microphysical characteristics of rainfall events from rain gauges measurement suggests many implications in development of the quantitative precipitation estimation (QPE), for the improvement of rain rate retrieval algorithm in remote sensing context.

  2. Walks on SPR neighborhoods.

    Science.gov (United States)

    Caceres, Alan Joseph J; Castillo, Juan; Lee, Jinnie; St John, Katherine

    2013-01-01

    A nearest-neighbor-interchange (NNI)-walk is a sequence of unrooted phylogenetic trees, T1, T2, . . . , T(k) where each consecutive pair of trees differs by a single NNI move. We give tight bounds on the length of the shortest NNI-walks that visit all trees in a subtree-prune-and-regraft (SPR) neighborhood of a given tree. For any unrooted, binary tree, T, on n leaves, the shortest walk takes Θ(n²) additional steps more than the number of trees in the SPR neighborhood. This answers Bryant’s Second Combinatorial Challenge from the Phylogenetics Challenges List, the Isaac Newton Institute, 2011, and the Penny Ante Problem List, 2009.

  3. Data-driven Inference and Investigation of Thermosphere Dynamics and Variations

    Science.gov (United States)

    Mehta, P. M.; Linares, R.

    2017-12-01

    This paper presents a methodology for data-driven inference and investigation of thermosphere dynamics and variations. The approach uses data-driven modal analysis to extract the most energetic modes of variations for neutral thermospheric species using proper orthogonal decomposition, where the time-independent modes or basis represent the dynamics and the time-depedent coefficients or amplitudes represent the model parameters. The data-driven modal analysis approach combined with sparse, discrete observations is used to infer amplitues for the dynamic modes and to calibrate the energy content of the system. In this work, two different data-types, namely the number density measurements from TIMED/GUVI and the mass density measurements from CHAMP/GRACE are simultaneously ingested for an accurate and self-consistent specification of the thermosphere. The assimilation process is achieved with a non-linear least squares solver and allows estimation/tuning of the model parameters or amplitudes rather than the driver. In this work, we use the Naval Research Lab's MSIS model to derive the most energetic modes for six different species, He, O, N2, O2, H, and N. We examine the dominant drivers of variations for helium in MSIS and observe that seasonal latitudinal variation accounts for about 80% of the dynamic energy with a strong preference of helium for the winter hemisphere. We also observe enhanced helium presence near the poles at GRACE altitudes during periods of low solar activity (Feb 2007) as previously deduced. We will also examine the storm-time response of helium derived from observations. The results are expected to be useful in tuning/calibration of the physics-based models.

  4. Data-driven criteria to assess fear remission and phenotypic variability of extinction in rats.

    Science.gov (United States)

    Shumake, Jason; Jones, Carolyn; Auchter, Allison; Monfils, Marie-Hélène

    2018-03-19

    Fear conditioning is widely employed to examine the mechanisms that underlie dysregulations of the fear system. Various manipulations are often used following fear acquisition to attenuate fear memories. In rodent studies, freezing is often the main output measure to quantify 'fear'. Here, we developed data-driven criteria for defining a standard benchmark that indicates remission from conditioned fear and for identifying subgroups with differential treatment responses. These analyses will enable a better understanding of individual differences in treatment responding.This article is part of a discussion meeting issue 'Of mice and mental health: facilitating dialogue between basic and clinical neuroscientists'. © 2018 The Author(s).

  5. Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study

    OpenAIRE

    Kim, Eun-Kyung; Kim, Hyo-Eun; Han, Kyunghwa; Kang, Bong Joo; Sohn, Yu-Mee; Woo, Ok Hee; Lee, Chan Wha

    2018-01-01

    We assessed the feasibility of a data-driven imaging biomarker based on weakly supervised learning (DIB; an imaging biomarker derived from large-scale medical image data with deep learning technology) in mammography (DIB-MG). A total of 29,107 digital mammograms from five institutions (4,339 cancer cases and 24,768 normal cases) were included. After matching patients’ age, breast density, and equipment, 1,238 and 1,238 cases were chosen as validation and test sets, respectively, and the remai...

  6. Building Data-Driven Pathways From Routinely Collected Hospital Data: A Case Study on Prostate Cancer

    Science.gov (United States)

    Clark, Jeremy; Cooper, Colin S; Mills, Robert; Rayward-Smith, Victor J; de la Iglesia, Beatriz

    2015-01-01

    Background Routinely collected data in hospitals is complex, typically heterogeneous, and scattered across multiple Hospital Information Systems (HIS). This big data, created as a byproduct of health care activities, has the potential to provide a better understanding of diseases, unearth hidden patterns, and improve services and cost. The extent and uses of such data rely on its quality, which is not consistently checked, nor fully understood. Nevertheless, using routine data for the construction of data-driven clinical pathways, describing processes and trends, is a key topic receiving increasing attention in the literature. Traditional algorithms do not cope well with unstructured processes or data, and do not produce clinically meaningful visualizations. Supporting systems that provide additional information, context, and quality assurance inspection are needed. Objective The objective of the study is to explore how routine hospital data can be used to develop data-driven pathways that describe the journeys that patients take through care, and their potential uses in biomedical research; it proposes a framework for the construction, quality assessment, and visualization of patient pathways for clinical studies and decision support using a case study on prostate cancer. Methods Data pertaining to prostate cancer patients were extracted from a large UK hospital from eight different HIS, validated, and complemented with information from the local cancer registry. Data-driven pathways were built for each of the 1904 patients and an expert knowledge base, containing rules on the prostate cancer biomarker, was used to assess the completeness and utility of the pathways for a specific clinical study. Software components were built to provide meaningful visualizations for the constructed pathways. Results The proposed framework and pathway formalism enable the summarization, visualization, and querying of complex patient-centric clinical information, as well as the

  7. PHYCAA: Data-driven measurement and removal of physiological noise in BOLD fMRI

    DEFF Research Database (Denmark)

    Churchill, Nathan W.; Yourganov, Grigori; Spring, Robyn

    2012-01-01

    , autocorrelated physiological noise sources with reproducible spatial structure, using an adaptation of Canonical Correlation Analysis performed in a split-half resampling framework. The technique is able to identify physiological effects with vascular-linked spatial structure, and an intrinsic dimensionality...... with physiological noise, and real data-driven model prediction and reproducibility, for both block and event-related task designs. This is demonstrated compared to no physiological noise correction, and to the widely used RETROICOR (Glover et al., 2000) physiological denoising algorithm, which uses externally...

  8. Classification Systems, their Digitization and Consequences for Data-Driven Decision Making

    DEFF Research Database (Denmark)

    Stein, Mari-Klara; Newell, Sue; Galliers, Robert D.

    2013-01-01

    Classification systems are foundational in many standardized software tools. This digitization of classification systems gives them a new ‘materiality’ that, jointly with the social practices of information producers/consumers, has significant consequences on the representational quality of such ...... and the foundational role of representational quality in understanding the success and consequences of data-driven decision-making.......-narration and meta-narration), and three different information production/consumption situations. We contribute to the relational theorization of representational quality and extend classification systems research by drawing explicit attention to the importance of ‘materialization’ of classification systems...

  9. Data-driven Discovery: A New Era of Exploiting the Literature and Data

    Directory of Open Access Journals (Sweden)

    Ying Ding

    2016-11-01

    Full Text Available In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.

  10. A data driven method to measure electron charge mis-identification rate

    CERN Document Server

    Bakhshiansohi, Hamed

    2009-01-01

    Electron charge mis-measurement is an important challenge in analyses which depend on the charge of electron. To estimate the probability of {\\it electron charge mis-measurement} a data driven method is introduced and a good agreement with MC based methods is achieved.\\\\ The third moment of $\\phi$ distribution of hits in electron SuperCluster is studied. The correlation between this variable and the electron charge is also investigated. Using this `new' variable and some other variables the electron charge measurement is improved by two different approaches.

  11. A data-driven soft sensor for needle deflection in heterogeneous tissue using just-in-time modelling.

    Science.gov (United States)

    Rossa, Carlos; Lehmann, Thomas; Sloboda, Ronald; Usmani, Nawaid; Tavakoli, Mahdi

    2017-08-01

    Global modelling has traditionally been the approach taken to estimate needle deflection in soft tissue. In this paper, we propose a new method based on local data-driven modelling of needle deflection. External measurement of needle-tissue interactions is collected from several insertions in ex vivo tissue to form a cloud of data. Inputs to the system are the needle insertion depth, axial rotations, and the forces and torques measured at the needle base by a force sensor. When a new insertion is performed, the just-in-time learning method estimates the model outputs given the current inputs to the needle-tissue system and the historical database. The query is compared to every observation in the database and is given weights according to some similarity criteria. Only a subset of historical data that is most relevant to the query is selected and a local linear model is fit to the selected points to estimate the query output. The model outputs the 3D deflection of the needle tip and the needle insertion force. The proposed approach is validated in ex vivo multilayered biological tissue in different needle insertion scenarios. Experimental results in five different case studies indicate an accuracy in predicting needle deflection of 0.81 and 1.24 mm in the horizontal and vertical lanes, respectively, and an accuracy of 0.5 N in predicting the needle insertion force over 216 needle insertions.

  12. Neighborhood preference, walkability and walking in overweight/obese men.

    Science.gov (United States)

    Norman, Gregory J; Carlson, Jordan A; O'Mara, Stephanie; Sallis, James F; Patrick, Kevin; Frank, Lawrence D; Godbole, Suneeta V

    2013-03-01

    To investigate whether self-selection moderated the effects of walkability on walking in overweight and obese men. 240 overweight and obese men completed measures on importance of walkability when choosing a neighborhood (selection) and preference for walkable features in general (preference). IPAQ measured walking. A walkbility index was derived from geographic information systems (GIS). Walkability was associated with walking for transportation (p = .027) and neighborhood selection was associated with walking for transportation (p = .002) and total walking (p = .001). Preference was associated with leisure walking (p = .045) and preference moderated the relationship between walkability and total walking (p = .059). Walkability and self-selection are both important to walking behavior.

  13. A data-driven approach to reverse engineering customer engagement models: towards functional constructs.

    Directory of Open Access Journals (Sweden)

    Natalie Jane de Vries

    Full Text Available Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.

  14. A Data-driven Concept Schema for Defining Clinical Research Data Needs

    Science.gov (United States)

    Hruby, Gregory W.; Hoxha, Julia; Ravichandran, Praveen Chandar; Mendonça, Eneida A.; Hanauer, David A; Weng, Chunhua

    2016-01-01

    OBJECTIVES The Patient, Intervention, Control/Comparison, and Outcome (PICO) framework is an effective technique for framing a clinical question. We aim to develop the counterpart of PICO to structure clinical research data needs. METHODS We use a data-driven approach to abstracting key concepts representing clinical research data needs by adapting and extending an expert-derived framework originally developed for defining cancer research data needs. We annotated clinical trial eligibility criteria, EHR data request logs, and data queries to electronic health records (EHR), to extract and harmonize concept classes representing clinical research data needs. We evaluated the class coverage, class preservation from the original framework, schema generalizability, schema understandability, and schema structural correctness through a semi-structured interview with eight multidisciplinary domain experts. We iteratively refined the schema based on the evaluations. RESULTS Our data-driven schema preserved 68% of the 63 classes from the original framework and covered 88% (73/82) of the classes proposed by evaluators. Class coverage for participants of different backgrounds ranged from 60% to 100% with a median value of 95% agreement among the individual evaluators. The schema was found understandable and structurally sound. CONCLUSIONS Our proposed schema may serve as the counterpart to PICO for improving the research data needs communication between researchers and informaticians. PMID:27185504

  15. A copula-based sampling method for data-driven prognostics

    International Nuclear Information System (INIS)

    Xi, Zhimin; Jing, Rong; Wang, Pingfeng; Hu, Chao

    2014-01-01

    This paper develops a Copula-based sampling method for data-driven prognostics. The method essentially consists of an offline training process and an online prediction process: (i) the offline training process builds a statistical relationship between the failure time and the time realizations at specified degradation levels on the basis of off-line training data sets; and (ii) the online prediction process identifies probable failure times for online testing units based on the statistical model constructed in the offline process and the online testing data. Our contributions in this paper are three-fold, namely the definition of a generic health index system to quantify the health degradation of an engineering system, the construction of a Copula-based statistical model to learn the statistical relationship between the failure time and the time realizations at specified degradation levels, and the development of a simulation-based approach for the prediction of remaining useful life (RUL). Two engineering case studies, namely the electric cooling fan health prognostics and the 2008 IEEE PHM challenge problem, are employed to demonstrate the effectiveness of the proposed methodology. - Highlights: • We develop a novel mechanism for data-driven prognostics. • A generic health index system quantifies health degradation of engineering systems. • Off-line training model is constructed based on the Bayesian Copula model. • Remaining useful life is predicted from a simulation-based approach

  16. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization.

    Science.gov (United States)

    Peng, Huan-Kai; Lee, Hao-Chih; Pan, Jia-Yu; Marculescu, Radu

    2016-01-01

    In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications.

  17. General Purpose Data-Driven Online System Health Monitoring with Applications to Space Operations

    Science.gov (United States)

    Iverson, David L.; Spirkovska, Lilly; Schwabacher, Mark

    2010-01-01

    Modern space transportation and ground support system designs are becoming increasingly sophisticated and complex. Determining the health state of these systems using traditional parameter limit checking, or model-based or rule-based methods is becoming more difficult as the number of sensors and component interactions grows. Data-driven monitoring techniques have been developed to address these issues by analyzing system operations data to automatically characterize normal system behavior. System health can be monitored by comparing real-time operating data with these nominal characterizations, providing detection of anomalous data signatures indicative of system faults, failures, or precursors of significant failures. The Inductive Monitoring System (IMS) is a general purpose, data-driven system health monitoring software tool that has been successfully applied to several aerospace applications and is under evaluation for anomaly detection in vehicle and ground equipment for next generation launch systems. After an introduction to IMS application development, we discuss these NASA online monitoring applications, including the integration of IMS with complementary model-based and rule-based methods. Although the examples presented in this paper are from space operations applications, IMS is a general-purpose health-monitoring tool that is also applicable to power generation and transmission system monitoring.

  18. A data-driven approach to reverse engineering customer engagement models: towards functional constructs.

    Science.gov (United States)

    de Vries, Natalie Jane; Carlson, Jamie; Moscato, Pablo

    2014-01-01

    Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.

  19. Data-driven risk identification in phase III clinical trials using central statistical monitoring.

    Science.gov (United States)

    Timmermans, Catherine; Venet, David; Burzykowski, Tomasz

    2016-02-01

    Our interest lies in quality control for clinical trials, in the context of risk-based monitoring (RBM). We specifically study the use of central statistical monitoring (CSM) to support RBM. Under an RBM paradigm, we claim that CSM has a key role to play in identifying the "risks to the most critical data elements and processes" that will drive targeted oversight. In order to support this claim, we first see how to characterize the risks that may affect clinical trials. We then discuss how CSM can be understood as a tool for providing a set of data-driven key risk indicators (KRIs), which help to organize adaptive targeted monitoring. Several case studies are provided where issues in a clinical trial have been identified thanks to targeted investigation after the identification of a risk using CSM. Using CSM to build data-driven KRIs helps to identify different kinds of issues in clinical trials. This ability is directly linked with the exhaustiveness of the CSM approach and its flexibility in the definition of the risks that are searched for when identifying the KRIs. In practice, a CSM assessment of the clinical database seems essential to ensure data quality. The atypical data patterns found in some centers and variables are seen as KRIs under a RBM approach. Targeted monitoring or data management queries can be used to confirm whether the KRIs point to an actual issue or not.

  20. Data-driven integration of genome-scale regulatory and metabolic network models

    Science.gov (United States)

    Imam, Saheed; Schäuble, Sascha; Brooks, Aaron N.; Baliga, Nitin S.; Price, Nathan D.

    2015-01-01

    Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription, and signaling) have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert—a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system. PMID:25999934

  1. Data-driven HR how to use analytics and metrics to drive performance

    CERN Document Server

    Marr, Bernard

    2018-01-01

    Traditionally seen as a purely people function unconcerned with numbers, HR is now uniquely placed to use company data to drive performance, both of the people in the organization and the organization as a whole. Data-driven HR is a practical guide which enables HR practitioners to leverage the value of the vast amount of data available at their fingertips. Covering how to identify the most useful sources of data, how to collect information in a transparent way that is in line with data protection requirements and how to turn this data into tangible insights, this book marks a turning point for the HR profession. Covering all the key elements of HR including recruitment, employee engagement, performance management, wellbeing and training, Data-driven HR examines the ways data can contribute to organizational success by, among other things, optimizing processes, driving performance and improving HR decision making. Packed with case studies and real-life examples, this is essential reading for all HR profession...

  2. Data-driven integration of genome-scale regulatory and metabolic network models

    Directory of Open Access Journals (Sweden)

    Saheed eImam

    2015-05-01

    Full Text Available Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription and signaling have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert – a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system.

  3. Data-driven CT protocol review and management—experience from a large academic hospital.

    Science.gov (United States)

    Zhang, Da; Savage, Cristy A; Li, Xinhua; Liu, Bob

    2015-03-01

    Protocol review plays a critical role in CT quality assurance, but large numbers of protocols and inconsistent protocol names on scanners and in exam records make thorough protocol review formidable. In this investigation, we report on a data-driven cataloging process that can be used to assist in the reviewing and management of CT protocols. We collected lists of scanner protocols, as well as 18 months of recent exam records, for 10 clinical scanners. We developed computer algorithms to automatically deconstruct the protocol names on the scanner and in the exam records into core names and descriptive components. Based on the core names, we were able to group the scanner protocols into a much smaller set of "core protocols," and to easily link exam records with the scanner protocols. We calculated the percentage of usage for each core protocol, from which the most heavily used protocols were identified. From the percentage-of-usage data, we found that, on average, 18, 33, and 49 core protocols per scanner covered 80%, 90%, and 95%, respectively, of all exams. These numbers are one order of magnitude smaller than the typical numbers of protocols that are loaded on a scanner (200-300, as reported in the literature). Duplicated, outdated, and rarely used protocols on the scanners were easily pinpointed in the cataloging process. The data-driven cataloging process can facilitate the task of protocol review. Copyright © 2015 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  4. Data-driven approach for assessing utility of medical tests using electronic medical records.

    Science.gov (United States)

    Skrøvseth, Stein Olav; Augestad, Knut Magne; Ebadollahi, Shahram

    2015-02-01

    To precisely define the utility of tests in a clinical pathway through data-driven analysis of the electronic medical record (EMR). The information content was defined in terms of the entropy of the expected value of the test related to a given outcome. A kernel density classifier was used to estimate the necessary distributions. To validate the method, we used data from the EMR of the gastrointestinal department at a university hospital. Blood tests from patients undergoing surgery for gastrointestinal surgery were analyzed with respect to second surgery within 30 days of the index surgery. The information content is clearly reflected in the patient pathway for certain combinations of tests and outcomes. C-reactive protein tests coupled to anastomosis leakage, a severe complication show a clear pattern of information gain through the patient trajectory, where the greatest gain from the test is 3-4 days post index surgery. We have defined the information content in a data-driven and information theoretic way such that the utility of a test can be precisely defined. The results reflect clinical knowledge. In the case we used the tests carry little negative impact. The general approach can be expanded to cases that carry a substantial negative impact, such as in certain radiological techniques. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  5. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization.

    Directory of Open Access Journals (Sweden)

    Huan-Kai Peng

    Full Text Available In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications.

  6. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization

    Science.gov (United States)

    Peng, Huan-Kai; Lee, Hao-Chih; Pan, Jia-Yu; Marculescu, Radu

    2016-01-01

    In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications. PMID:26771830

  7. Data-driven directions for effective footwear provision for the high-risk diabetic foot.

    Science.gov (United States)

    Arts, M L J; de Haart, M; Waaijman, R; Dahmen, R; Berendsen, H; Nollet, F; Bus, S A

    2015-06-01

    Custom-made footwear is used to offload the diabetic foot to prevent plantar foot ulcers. This prospective study evaluates the offloading effects of modifying custom-made footwear and aims to provide data-driven directions for the provision of effectively offloading footwear in clinical practice. Eighty-five people with diabetic neuropathy and a recently healed plantar foot ulcer, who participated in a clinical trial on footwear effectiveness, had their custom-made footwear evaluated with in-shoe plantar pressure measurements at three-monthly intervals. Footwear was modified when peak pressure was ≥ 200 kPa. The effect of single and combined footwear modifications on in-shoe peak pressure at these high-pressure target locations was assessed. All footwear modifications significantly reduced peak pressure at the target locations compared with pre-modification levels (range -6.7% to -24.0%, P diabetic neuropathy and a recently healed plantar foot ulcer, significant offloading can be achieved at high-risk foot regions by modifying custom-made footwear. These results provide data-driven directions for the design and evaluation of custom-made footwear for high-risk people with diabetes, and essentially mean that each shoe prescribed should incorporate those design features that effectively offload the foot. © 2015 The Authors. Diabetic Medicine © 2015 Diabetes UK.

  8. Microenvironment temperature prediction between body and seat interface using autoregressive data-driven model.

    Science.gov (United States)

    Liu, Zhuofu; Wang, Lin; Luo, Zhongming; Heusch, Andrew I; Cascioli, Vincenzo; McCarthy, Peter W

    2015-11-01

    There is a need to develop a greater understanding of temperature at the skin-seat interface during prolonged seating from the perspectives of both industrial design (comfort/discomfort) and medical care (skin ulcer formation). Here we test the concept of predicting temperature at the seat surface and skin interface during prolonged sitting (such as required from wheelchair users). As caregivers are usually busy, such a method would give them warning ahead of a problem. This paper describes a data-driven model capable of predicting thermal changes and thus having the potential to provide an early warning (15- to 25-min ahead prediction) of an impending temperature that may increase the risk for potential skin damages for those subject to enforced sitting and who have little or no sensory feedback from this area. Initially, the oscillations of the original signal are suppressed using the reconstruction strategy of empirical mode decomposition (EMD). Consequentially, the autoregressive data-driven model can be used to predict future thermal trends based on a shorter period of acquisition, which reduces the possibility of introducing human errors and artefacts associated with longer duration "enforced" sitting by volunteers. In this study, the method had a maximum predictive error of body insensitivity and disability requiring them to be immobile in seats for prolonged periods. Copyright © 2015 Tissue Viability Society. Published by Elsevier Ltd. All rights reserved.

  9. NOvA Event Building, Buffering and Data-Driven Triggering From Within the DAQ System

    Energy Technology Data Exchange (ETDEWEB)

    Fischler, M. [Fermilab; Green, C. [Fermilab; Kowalkowski, J. [Fermilab; Norman, A. [Fermilab; Paterno, M. [Fermilab; Rechenmacher, R. [Fermilab

    2012-06-22

    To make its core measurements, the NOvA experiment needs to make real-time data-driven decisions involving beam-spill time correlation and other triggering issues. NOvA-DDT is a prototype Data-Driven Triggering system, built using the Fermilab artdaq generic DAQ/Event-building tools set. This provides the advantages of sharing online software infrastructure with other Intensity Frontier experiments, and of being able to use any offline analysis module--unchanged--as a component of the online triggering decisions. The NOvA-artdaq architecture chosen has significant advantages, including graceful degradation if the triggering decision software fails or cannot be done quickly enough for some fraction of the time-slice ``events.'' We have tested and measured the performance and overhead of NOvA-DDT using an actual Hough transform based trigger decision module taken from the NOvA offline software. The results of these tests--98 ms mean time per event on only 1/16 of th e available processing power of a node, and overheads of about 2 ms per event--provide a proof of concept: NOvA-DDT is a viable strategy for data acquisition, event building, and trigger processing at the NOvA far detector.

  10. The effects of data-driven learning activities on EFL learners' writing development.

    Science.gov (United States)

    Luo, Qinqin

    2016-01-01

    Data-driven learning has been proved as an effective approach in helping learners solve various writing problems such as correcting lexical or grammatical errors, improving the use of collocations and generating ideas in writing, etc. This article reports on an empirical study in which data-driven learning was accomplished with the assistance of the user-friendly BNCweb, and presents the evaluation of the outcome by comparing the effectiveness of BNCweb and a search engine Baidu which is most commonly used as reference resource by Chinese learners of English as a foreign language. The quantitative results about 48 Chinese college students revealed that the experimental group which used BNCweb performed significantly better in the post-test in terms of writing fluency and accuracy, as compared with the control group which used the search engine Baidu. However, no significant difference was found between the two groups in terms of writing complexity. The qualitative results about the interview revealed that learners generally showed a positive attitude toward the use of BNCweb but there were still some problems of using corpora in the writing process, thus the combined use of corpora and other types of reference resource was suggested as a possible way to counter the potential barriers for Chinese learners of English.

  11. Positive Neighborhood Norms Buffer Ethnic Diversity Effects on Neighborhood Dissatisfaction, Perceived Neighborhood Disadvantage, and Moving Intentions.

    Science.gov (United States)

    Van Assche, Jasper; Asbrock, Frank; Roets, Arne; Kauff, Mathias

    2018-05-01

    Positive neighborhood norms, such as strong local networks, are critical to people's satisfaction with, perceived disadvantage of, and intentions to stay in their neighborhood. At the same time, local ethnic diversity is said to be detrimental for these community outcomes. Integrating both frameworks, we tested whether the negative consequences of diversity occur even when perceived social norms are positive. Study 1 ( N = 1,760 German adults) showed that perceptions of positive neighborhood norms buffered against the effects of perceived diversity on moving intentions via neighborhood satisfaction and perceived neighborhood disadvantage. Study 2 ( N = 993 Dutch adults) replicated and extended this moderated mediation model using other characteristics of diversity (i.e., objective and estimated minority proportions). Multilevel analyses again revealed consistent buffering effects of positive neighborhood norms. Our findings are discussed in light of the ongoing public and political debate concerning diversity and social and communal life.

  12. Health, Safety and Environment (HSE assessment of neighborhoods: A case study in Tehran Municipality

    Directory of Open Access Journals (Sweden)

    Narmin Hassanzadeh- Rangi

    2014-09-01

    Full Text Available Urbanization is growing rapidly in recent centuries. This phenomenon can cause many changes in various aspects of human life including the economy, education and public health This study was conducted to assess the Health, Safety and Environment (HSE problems in Tehran neighborhoods. A new instrument was developed based on the results of a literature review and formulated during a pilot study. Through cluster sampling, 10 neighborhoods were selected based from 374 neighborhoods of Tehran. Six observers completed observational items during the field studies. Secondary data were used to obtain non-observation characteristics. Standard descriptive statistics were used to compare the HSE characteristics in sampled neighborhoods. Furthermore, control chart was used to as a decision rule to identify specific variation among sampled neighborhoods. Niavaran neighborhood had the best HSE status (52.80%±25.03 whereas Khak Sefid neighborhood had the worst one (20.09%±27.51. Standard deviations of HSE characteristics were high in different parts of a neighborhood. Statistical analysis indicated that significant differences in HSE characteristics exist among sampled neighborhoods. HSE status was in warning situation in both rich and poor neighborhoods. Community-based interventions were suggested as health promotion programs to involve and empower people in neighborhoods.

  13. Examining public open spaces by neighborhood-level walkability and deprivation.

    Science.gov (United States)

    Badland, Hannah M; Keam, Rosanna; Witten, Karen; Kearns, Robin

    2010-11-01

    Public open spaces (POS) are recognized as important to promote physical activity engagement. However, it is unclear how POS attributes, such as activities available, environmental quality, amenities present, and safety, are associated with neighborhood-level walkability and deprivation. Twelve neighborhoods were selected within 1 constituent city of Auckland, New Zealand based on higher (n = 6) or lower (n = 6) walkability characteristics. Neighborhoods were dichotomized as more (n = 7) or less (n = 5) socioeconomically deprived. POS (n = 69) were identified within these neighborhoods and audited using the New Zealand-Public Open Space Tool. Unpaired 1-way analysis of variance tests were applied to compare differences in attributes and overall score of POS by neighborhood walkability and deprivation. POS located in more walkable neighborhoods have significantly higher overall scores when compared with less walkable neighborhoods. Deprivation comparisons identified POS located in less deprived communities have better quality environments, but fewer activities and safety features present when compared with more deprived neighborhoods. A positive relationship existed between presence of POS attributes and neighborhood walkability, but the relationship between POS and neighborhood-level deprivation was less clear. Variation in neighborhood POS quality alone is unlikely to explain poorer health outcomes for residents in more deprived areas.

  14. Fault Detection for Nonlinear Process With Deterministic Disturbances: A Just-In-Time Learning Based Data Driven Method.

    Science.gov (United States)

    Yin, Shen; Gao, Huijun; Qiu, Jianbin; Kaynak, Okyay

    2017-11-01

    Data-driven fault detection plays an important role in industrial systems due to its applicability in case of unknown physical models. In fault detection, disturbances must be taken into account as an inherent characteristic of processes. Nevertheless, fault detection for nonlinear processes with deterministic disturbances still receive little attention, especially in data-driven field. To solve this problem, a just-in-time learning-based data-driven (JITL-DD) fault detection method for nonlinear processes with deterministic disturbances is proposed in this paper. JITL-DD employs JITL scheme for process description with local model structures to cope with processes dynamics and nonlinearity. The proposed method provides a data-driven fault detection solution for nonlinear processes with deterministic disturbances, and owns inherent online adaptation and high accuracy of fault detection. Two nonlinear systems, i.e., a numerical example and a sewage treatment process benchmark, are employed to show the effectiveness of the proposed method.

  15. China’s Neighborhood Environment and Options for Neighborhood Strategy

    Institute of Scientific and Technical Information of China (English)

    ZHOU FANGYIN

    2016-01-01

    Since the 18th CPC National Congress,especially since the Central Conference on Work Relating to Neighborhood Diplomacy held in October 2013,China’s neighborhood diplomacy has been energetic,proactive and promising,achieving important results in several aspects.At the same time,it is also in face of challenges

  16. Global retrieval of soil moisture and vegetation properties using data-driven methods

    Science.gov (United States)

    Rodriguez-Fernandez, Nemesio; Richaume, Philippe; Kerr, Yann

    2017-04-01

    Data-driven methods such as neural networks (NNs) are a powerful tool to retrieve soil moisture from multi-wavelength remote sensing observations at global scale. In this presentation we will review a number of recent results regarding the retrieval of soil moisture with the Soil Moisture and Ocean Salinity (SMOS) satellite, either using SMOS brightness temperatures as input data for the retrieval or using SMOS soil moisture retrievals as reference dataset for the training. The presentation will discuss several possibilities for both the input datasets and the datasets to be used as reference for the supervised learning phase. Regarding the input datasets, it will be shown that NNs take advantage of the synergy of SMOS data and data from other sensors such as the Advanced Scatterometer (ASCAT, active microwaves) and MODIS (visible and infra red). NNs have also been successfully used to construct long time series of soil moisture from the Advanced Microwave Scanning Radiometer - Earth Observing System (AMSR-E) and SMOS. A NN with input data from ASMR-E observations and SMOS soil moisture as reference for the training was used to construct a dataset sharing a similar climatology and without a significant bias with respect to SMOS soil moisture. Regarding the reference data to train the data-driven retrievals, we will show different possibilities depending on the application. Using actual in situ measurements is challenging at global scale due to the scarce distribution of sensors. In contrast, in situ measurements have been successfully used to retrieve SM at continental scale in North America, where the density of in situ measurement stations is high. Using global land surface models to train the NN constitute an interesting alternative to implement new remote sensing surface datasets. In addition, these datasets can be used to perform data assimilation into the model used as reference for the training. This approach has recently been tested at the European Centre

  17. PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry.

    Science.gov (United States)

    Nakata, Maho; Shimazaki, Tomomi

    2017-06-26

    Large-scale molecular databases play an essential role in the investigation of various subjects such as the development of organic materials, in silico drug design, and data-driven studies with machine learning. We have developed a large-scale quantum chemistry database based on first-principles methods. Our database currently contains the ground-state electronic structures of 3 million molecules based on density functional theory (DFT) at the B3LYP/6-31G* level, and we successively calculated 10 low-lying excited states of over 2 million molecules via time-dependent DFT with the B3LYP functional and the 6-31+G* basis set. To select the molecules calculated in our project, we referred to the PubChem Project, which was used as the source of the molecular structures in short strings using the InChI and SMILES representations. Accordingly, we have named our quantum chemistry database project "PubChemQC" ( http://pubchemqc.riken.jp/ ) and placed it in the public domain. In this paper, we show the fundamental features of the PubChemQC database and discuss the techniques used to construct the data set for large-scale quantum chemistry calculations. We also present a machine learning approach to predict the electronic structure of molecules as an example to demonstrate the suitability of the large-scale quantum chemistry database.

  18. Data-Driven Machine-Learning Model in District Heating System for Heat Load Prediction: A Comparison Study

    Directory of Open Access Journals (Sweden)

    Fisnik Dalipi

    2016-01-01

    Full Text Available We present our data-driven supervised machine-learning (ML model to predict heat load for buildings in a district heating system (DHS. Even though ML has been used as an approach to heat load prediction in literature, it is hard to select an approach that will qualify as a solution for our case as existing solutions are quite problem specific. For that reason, we compared and evaluated three ML algorithms within a framework on operational data from a DH system in order to generate the required prediction model. The algorithms examined are Support Vector Regression (SVR, Partial Least Square (PLS, and random forest (RF. We use the data collected from buildings at several locations for a period of 29 weeks. Concerning the accuracy of predicting the heat load, we evaluate the performance of the proposed algorithms using mean absolute error (MAE, mean absolute percentage error (MAPE, and correlation coefficient. In order to determine which algorithm had the best accuracy, we conducted performance comparison among these ML algorithms. The comparison of the algorithms indicates that, for DH heat load prediction, SVR method presented in this paper is the most efficient one out of the three also compared to other methods found in the literature.

  19. Data-Driven Analysis of Virtual 3D Exploration of a Large Sculpture Collection in Real-World Museum Exhibitions

    KAUST Repository

    Agus, Marco

    2018-01-29

    We analyze use of an interactive system for the exploration of highly detailed three-dimensional (3D) models of a collection of protostoric Mediterranean sculptures. In this system, when the object of interest is selected, its detailed 3D model and associated information are presented at high resolution on a large display controlled by a touch-enabled horizontal surface at a suitable distance. The user interface combines an object-Aware interactive camera controller with an interactive point-ofinterest selector and is implemented within a scalable implementation based on multiresolution structures shared between the rendering and user interaction subsystems. The system was installed in several temporary and permanent exhibitions and was extensively used by tens of thousands of visitors. We provide a data-driven analysis of usage experience based on logs gathered during a 27-month period at four exhibitions in archeological museums for a total of more than 75K exploration sessions. We focus on discerning the main visitor behaviors during 3D exploration by employing tools for deriving interest measures on surfaces and tools for clustering and knowledge discovery from high-dimensional data. The results highlight the main trends in visitor behavior during the interactive sessions. These results provide useful insights for the design of 3D exploration user interfaces in future digital installations.© 2017 ACM 1556-4673/2017/12-ART2 $15.00.

  20. Data-Driven Analysis of Virtual 3D Exploration of a Large Sculpture Collection in Real-World Museum Exhibitions

    KAUST Repository

    Agus, Marco; Marton, Fabio; Bettio, Fabio; Hadwiger, Markus; Gobbetti, Enrico

    2018-01-01

    We analyze use of an interactive system for the exploration of highly detailed three-dimensional (3D) models of a collection of protostoric Mediterranean sculptures. In this system, when the object of interest is selected, its detailed 3D model and associated information are presented at high resolution on a large display controlled by a touch-enabled horizontal surface at a suitable distance. The user interface combines an object-Aware interactive camera controller with an interactive point-ofinterest selector and is implemented within a scalable implementation based on multiresolution structures shared between the rendering and user interaction subsystems. The system was installed in several temporary and permanent exhibitions and was extensively used by tens of thousands of visitors. We provide a data-driven analysis of usage experience based on logs gathered during a 27-month period at four exhibitions in archeological museums for a total of more than 75K exploration sessions. We focus on discerning the main visitor behaviors during 3D exploration by employing tools for deriving interest measures on surfaces and tools for clustering and knowledge discovery from high-dimensional data. The results highlight the main trends in visitor behavior during the interactive sessions. These results provide useful insights for the design of 3D exploration user interfaces in future digital installations.© 2017 ACM 1556-4673/2017/12-ART2 $15.00.

  1. A perspective on bridging scales and design of models using low-dimensional manifolds and data-driven model inference

    KAUST Repository

    Tegner, Jesper; Zenil, Hector; Kiani, Narsis A.; Ball, Gordon; Gomez-Cabrero, David

    2016-01-01

    Systems in nature capable of collective behaviour are nonlinear, operating across several scales. Yet our ability to account for their collective dynamics differs in physics, chemistry and biology. Here, we briefly review the similarities and differences between mathematical modelling of adaptive living systems versus physico-chemical systems. We find that physics-based chemistry modelling and computational neuroscience have a shared interest in developing techniques for model reductions aiming at the identification of a reduced subsystem or slow manifold, capturing the effective dynamics. By contrast, as relations and kinetics between biological molecules are less characterized, current quantitative analysis under the umbrella of bioinformatics focuses on signal extraction, correlation, regression and machine-learning analysis. We argue that model reduction analysis and the ensuing identification of manifolds bridges physics and biology. Furthermore, modelling living systems presents deep challenges as how to reconcile rich molecular data with inherent modelling uncertainties (formalism, variables selection and model parameters). We anticipate a new generative data-driven modelling paradigm constrained by identified governing principles extracted from low-dimensional manifold analysis. The rise of a new generation of models will ultimately connect biology to quantitative mechanistic descriptions, thereby setting the stage for investigating the character of the model language and principles driving living systems.

  2. A perspective on bridging scales and design of models using low-dimensional manifolds and data-driven model inference

    KAUST Repository

    Tegner, Jesper

    2016-10-04

    Systems in nature capable of collective behaviour are nonlinear, operating across several scales. Yet our ability to account for their collective dynamics differs in physics, chemistry and biology. Here, we briefly review the similarities and differences between mathematical modelling of adaptive living systems versus physico-chemical systems. We find that physics-based chemistry modelling and computational neuroscience have a shared interest in developing techniques for model reductions aiming at the identification of a reduced subsystem or slow manifold, capturing the effective dynamics. By contrast, as relations and kinetics between biological molecules are less characterized, current quantitative analysis under the umbrella of bioinformatics focuses on signal extraction, correlation, regression and machine-learning analysis. We argue that model reduction analysis and the ensuing identification of manifolds bridges physics and biology. Furthermore, modelling living systems presents deep challenges as how to reconcile rich molecular data with inherent modelling uncertainties (formalism, variables selection and model parameters). We anticipate a new generative data-driven modelling paradigm constrained by identified governing principles extracted from low-dimensional manifold analysis. The rise of a new generation of models will ultimately connect biology to quantitative mechanistic descriptions, thereby setting the stage for investigating the character of the model language and principles driving living systems.

  3. Neighborhood Poverty and Adolescent Development

    Science.gov (United States)

    McBride Murry, Velma; Berkel, Cady; Gaylord-Harden, Noni K.; Copeland-Linder, Nikeea; Nation, Maury

    2011-01-01

    This article provides a comprehensive review of studies conducted over the past decade on the effects of neighborhood and poverty on adolescent normative and nonnormative development. Our review includes a summary of studies examining the associations between neighborhood poverty and adolescent identity development followed by a review of studies…

  4. Automatic translation of MPI source into a latency-tolerant, data-driven form

    International Nuclear Information System (INIS)

    Nguyen, Tan; Cicotti, Pietro; Bylaska, Eric; Quinlan, Dan; Baden, Scott

    2017-01-01

    Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo’s performance meets or exceeds that of labor-intensive hand coding. As a result, the translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a well-known library.

  5. Data-driven techniques to estimate parameters in a rate-dependent ferromagnetic hysteresis model

    International Nuclear Information System (INIS)

    Hu Zhengzheng; Smith, Ralph C.; Ernstberger, Jon M.

    2012-01-01

    The quantification of rate-dependent ferromagnetic hysteresis is important in a range of applications including high speed milling using Terfenol-D actuators. There exist a variety of frameworks for characterizing rate-dependent hysteresis including the magnetic model in Ref. , the homogenized energy framework, Preisach formulations that accommodate after-effects, and Prandtl-Ishlinskii models. A critical issue when using any of these models to characterize physical devices concerns the efficient estimation of model parameters through least squares data fits. A crux of this issue is the determination of initial parameter estimates based on easily measured attributes of the data. In this paper, we present data-driven techniques to efficiently and robustly estimate parameters in the homogenized energy model. This framework was chosen due to its physical basis and its applicability to ferroelectric, ferromagnetic and ferroelastic materials.

  6. Data-driven fault detection for industrial processes canonical correlation analysis and projection based methods

    CERN Document Server

    Chen, Zhiwen

    2017-01-01

    Zhiwen Chen aims to develop advanced fault detection (FD) methods for the monitoring of industrial processes. With the ever increasing demands on reliability and safety in industrial processes, fault detection has become an important issue. Although the model-based fault detection theory has been well studied in the past decades, its applications are limited to large-scale industrial processes because it is difficult to build accurate models. Furthermore, motivated by the limitations of existing data-driven FD methods, novel canonical correlation analysis (CCA) and projection-based methods are proposed from the perspectives of process input and output data, less engineering effort and wide application scope. For performance evaluation of FD methods, a new index is also developed. Contents A New Index for Performance Evaluation of FD Methods CCA-based FD Method for the Monitoring of Stationary Processes Projection-based FD Method for the Monitoring of Dynamic Processes Benchmark Study and Real-Time Implementat...

  7. Combining engineering and data-driven approaches: Development of a generic fire risk model facilitating calibration

    DEFF Research Database (Denmark)

    De Sanctis, G.; Fischer, K.; Kohler, J.

    2014-01-01

    Fire risk models support decision making for engineering problems under the consistent consideration of the associated uncertainties. Empirical approaches can be used for cost-benefit studies when enough data about the decision problem are available. But often the empirical approaches...... a generic risk model that is calibrated to observed fire loss data. Generic risk models assess the risk of buildings based on specific risk indicators and support risk assessment at a portfolio level. After an introduction to the principles of generic risk assessment, the focus of the present paper...... are not detailed enough. Engineering risk models, on the other hand, may be detailed but typically involve assumptions that may result in a biased risk assessment and make a cost-benefit study problematic. In two related papers it is shown how engineering and data-driven modeling can be combined by developing...

  8. A new data-driven controllability measure with application in intelligent buildings

    DEFF Research Database (Denmark)

    Shaker, Hamid Reza; Lazarova-Molnar, Sanja

    2017-01-01

    and instrumentation within today's intelligent buildings enable collecting high quality data which could be used directly in data-based analysis and control methods. The area of data-based systems analysis and control is concentrating on developing analysis and control methods that rely on data collected from meters...... and sensors, and information obtained by data processing. This differs from the traditional model-based approaches that are based on mathematical models of systems. We propose and describe a data-driven controllability measure for discrete-time linear systems. The concept is developed within a data......-based system analysis and control framework. Therefore, only measured data is used to obtain the proposed controllability measure. The proposed controllability measure not only shows if the system is controllable or not, but also reveals the level of controllability, which is the information its previous...

  9. Beyond Crowd Judgments: Data-driven Estimation of Market Value in Association Football

    DEFF Research Database (Denmark)

    Müller, Oliver; Simons, Alexander; Weinmann, Markus

    2017-01-01

    concern. Market values can be understood as estimates of transfer fees—that is, prices that could be paid for a player on the football market—so they play an important role in transfer negotiations. These values have traditionally been estimated by football experts, but crowdsourcing has emerged......Association football is a popular sport, but it is also a big business. From a managerial perspective, the most important decisions that team managers make concern player transfers, so issues related to player valuation, especially the determination of transfer fees and market values, are of major......’ market values using multilevel regression analysis. The regression results suggest that data-driven estimates of market value can overcome several of the crowd's practical limitations while producing comparably accurate numbers. Our results have important implications for football managers and scouts...

  10. Data-driven design of fault diagnosis systems nonlinear multimode processes

    CERN Document Server

    Haghani Abandan Sari, Adel

    2014-01-01

    In many industrial applications early detection and diagnosis of abnormal behavior of the plant is of great importance. During the last decades, the complexity of process plants has been drastically increased, which imposes great challenges in development of model-based monitoring approaches and it sometimes becomes unrealistic for modern large-scale processes. The main objective of Adel Haghani Abandan Sari is to study efficient fault diagnosis techniques for complex industrial systems using process historical data and considering the nonlinear behavior of the process. To this end, different methods are presented to solve the fault diagnosis problem based on the overall behavior of the process and its dynamics. Moreover, a novel technique is proposed for fault isolation and determination of the root-cause of the faults in the system, based on the fault impacts on the process measurements. Contents Process monitoring Fault diagnosis and fault-tolerant control Data-driven approaches and decision making Target...

  11. Data-driven modeling, control and tools for cyber-physical energy systems

    Science.gov (United States)

    Behl, Madhur

    Energy systems are experiencing a gradual but substantial change in moving away from being non-interactive and manually-controlled systems to utilizing tight integration of both cyber (computation, communications, and control) and physical representations guided by first principles based models, at all scales and levels. Furthermore, peak power reduction programs like demand response (DR) are becoming increasingly important as the volatility on the grid continues to increase due to regulation, integration of renewables and extreme weather conditions. In order to shield themselves from the risk of price volatility, end-user electricity consumers must monitor electricity prices and be flexible in the ways they choose to use electricity. This requires the use of control-oriented predictive models of an energy system's dynamics and energy consumption. Such models are needed for understanding and improving the overall energy efficiency and operating costs. However, learning dynamical models using grey/white box approaches is very cost and time prohibitive since it often requires significant financial investments in retrofitting the system with several sensors and hiring domain experts for building the model. We present the use of data-driven methods for making model capture easy and efficient for cyber-physical energy systems. We develop Model-IQ, a methodology for analysis of uncertainty propagation for building inverse modeling and controls. Given a grey-box model structure and real input data from a temporary set of sensors, Model-IQ evaluates the effect of the uncertainty propagation from sensor data to model accuracy and to closed-loop control performance. We also developed a statistical method to quantify the bias in the sensor measurement and to determine near optimal sensor placement and density for accurate data collection for model training and control. Using a real building test-bed, we show how performing an uncertainty analysis can reveal trends about

  12. Automatic sleep classification using a data-driven topic model reveals latent sleep states

    DEFF Research Database (Denmark)

    Koch, Henriette; Christensen, Julie Anja Engelhard; Frandsen, Rune

    2014-01-01

    Latent Dirichlet Allocation. Model application was tested on control subjects and patients with periodic leg movements (PLM) representing a non-neurodegenerative group, and patients with idiopathic REM sleep behavior disorder (iRBD) and Parkinson's Disease (PD) representing a neurodegenerative group......Background: The golden standard for sleep classification uses manual scoring of polysomnography despite points of criticism such as oversimplification, low inter-rater reliability and the standard being designed on young and healthy subjects. New method: To meet the criticism and reveal the latent...... sleep states, this study developed a general and automatic sleep classifier using a data-driven approach. Spectral EEG and EOG measures and eye correlation in 1 s windows were calculated and each sleep epoch was expressed as a mixture of probabilities of latent sleep states by using the topic model...

  13. Data-driven outbreak forecasting with a simple nonlinear growth model.

    Science.gov (United States)

    Lega, Joceline; Brown, Heidi E

    2016-12-01

    Recent events have thrown the spotlight on infectious disease outbreak response. We developed a data-driven method, EpiGro, which can be applied to cumulative case reports to estimate the order of magnitude of the duration, peak and ultimate size of an ongoing outbreak. It is based on a surprisingly simple mathematical property of many epidemiological data sets, does not require knowledge or estimation of disease transmission parameters, is robust to noise and to small data sets, and runs quickly due to its mathematical simplicity. Using data from historic and ongoing epidemics, we present the model. We also provide modeling considerations that justify this approach and discuss its limitations. In the absence of other information or in conjunction with other models, EpiGro may be useful to public health responders. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  14. A Data-Driven Frequency-Domain Approach for Robust Controller Design via Convex Optimization

    CERN Document Server

    AUTHOR|(CDS)2092751; Martino, Michele

    The objective of this dissertation is to develop data-driven frequency-domain methods for designing robust controllers through the use of convex optimization algorithms. Many of today's industrial processes are becoming more complex, and modeling accurate physical models for these plants using first principles may be impossible. Albeit a model may be available; however, such a model may be too complex to consider for an appropriate controller design. With the increased developments in the computing world, large amounts of measured data can be easily collected and stored for processing purposes. Data can also be collected and used in an on-line fashion. Thus it would be very sensible to make full use of this data for controller design, performance evaluation, and stability analysis. The design methods imposed in this work ensure that the dynamics of a system are captured in an experiment and avoids the problem of unmodeled dynamics associated with parametric models. The devised methods consider robust designs...

  15. Data-Driven Based Asynchronous Motor Control for Printing Servo Systems

    Science.gov (United States)

    Bian, Min; Guo, Qingyun

    Modern digital printing equipment aims to the environmental-friendly industry with high dynamic performances and control precision and low vibration and abrasion. High performance motion control system of printing servo systems was required. Control system of asynchronous motor based on data acquisition was proposed. Iterative learning control (ILC) algorithm was studied. PID control was widely used in the motion control. However, it was sensitive to the disturbances and model parameters variation. The ILC applied the history error data and present control signals to approximate the control signal directly in order to fully track the expect trajectory without the system models and structures. The motor control algorithm based on the ILC and PID was constructed and simulation results were given. The results show that data-driven control method is effective dealing with bounded disturbances for the motion control of printing servo systems.

  16. DOE High Performance Computing Operational Review (HPCOR): Enabling Data-Driven Scientific Discovery at HPC Facilities

    Energy Technology Data Exchange (ETDEWEB)

    Gerber, Richard; Allcock, William; Beggio, Chris; Campbell, Stuart; Cherry, Andrew; Cholia, Shreyas; Dart, Eli; England, Clay; Fahey, Tim; Foertter, Fernanda; Goldstone, Robin; Hick, Jason; Karelitz, David; Kelly, Kaki; Monroe, Laura; Prabhat,; Skinner, David; White, Julia

    2014-10-17

    U.S. Department of Energy (DOE) High Performance Computing (HPC) facilities are on the verge of a paradigm shift in the way they deliver systems and services to science and engineering teams. Research projects are producing a wide variety of data at unprecedented scale and level of complexity, with community-specific services that are part of the data collection and analysis workflow. On June 18-19, 2014 representatives from six DOE HPC centers met in Oakland, CA at the DOE High Performance Operational Review (HPCOR) to discuss how they can best provide facilities and services to enable large-scale data-driven scientific discovery at the DOE national laboratories. The report contains findings from that review.

  17. A data-driven fault-tolerant control design of linear multivariable systems with performance optimization.

    Science.gov (United States)

    Li, Zhe; Yang, Guang-Hong

    2017-09-01

    In this paper, an integrated data-driven fault-tolerant control (FTC) design scheme is proposed under the configuration of the Youla parameterization for multiple-input multiple-output (MIMO) systems. With unknown system model parameters, the canonical form identification technique is first applied to design the residual observer in fault-free case. In faulty case, with online tuning of the Youla parameters based on the system data via the gradient-based algorithm, the fault influence is attenuated with system performance optimization. In addition, to improve the robustness of the residual generator to a class of system deviations, a novel adaptive scheme is proposed for the residual generator to prevent its over-activation. Simulation results of a two-tank flow system demonstrate the optimized performance and effect of the proposed FTC scheme. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  18. Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics

    Directory of Open Access Journals (Sweden)

    Richard Mark Leggett

    2013-12-01

    Full Text Available The processes of quality assessment and control are an active area of research at The Genome Analysis Centre (TGAC. Unlike other sequencing centres that often concentrate on a certain species or technology, TGAC applies expertise in genomics and bioinformatics to a wide range of projects, often requiring bespoke wet lab and in silico workflows. TGAC is fortunate to have access to a diverse range of sequencing and analysis platforms, and we are at the forefront of investigations into library quality and sequence data assessment. We have developed and implemented a number of algorithms, tools, pipelines and packages to ascertain, store, and expose quality metrics across a number of next-generation sequencing platforms, allowing rapid and in-depth cross-platform QC bioinformatics. In this review, we describe these tools as a vehicle for data-driven informatics, offering the potential to provide richer context for downstream analysis and to inform experimental design.

  19. A data-driven multiplicative fault diagnosis approach for automation processes.

    Science.gov (United States)

    Hao, Haiyang; Zhang, Kai; Ding, Steven X; Chen, Zhiwen; Lei, Yaguo

    2014-09-01

    This paper presents a new data-driven method for diagnosing multiplicative key performance degradation in automation processes. Different from the well-established additive fault diagnosis approaches, the proposed method aims at identifying those low-level components which increase the variability of process variables and cause performance degradation. Based on process data, features of multiplicative fault are extracted. To identify the root cause, the impact of fault on each process variable is evaluated in the sense of contribution to performance degradation. Then, a numerical example is used to illustrate the functionalities of the method and Monte-Carlo simulation is performed to demonstrate the effectiveness from the statistical viewpoint. Finally, to show the practical applicability, a case study on the Tennessee Eastman process is presented. Copyright © 2013. Published by Elsevier Ltd.

  20. Data-driven gradient algorithm for high-precision quantum control

    Science.gov (United States)

    Wu, Re-Bing; Chu, Bing; Owens, David H.; Rabitz, Herschel

    2018-04-01

    In the quest to achieve scalable quantum information processing technologies, gradient-based optimal control algorithms (e.g., grape) are broadly used for implementing high-precision quantum gates, but their performance is often hindered by deterministic or random errors in the system model and the control electronics. In this paper, we show that grape can be taught to be more effective by jointly learning from the design model and the experimental data obtained from process tomography. The resulting data-driven gradient optimization algorithm (d-grape) can in principle correct all deterministic gate errors, with a mild efficiency loss. The d-grape algorithm may become more powerful with broadband controls that involve a large number of control parameters, while other algorithms usually slow down due to the increased size of the search space. These advantages are demonstrated by simulating the implementation of a two-qubit controlled-not gate.

  1. USACM Thematic Workshop On Uncertainty Quantification And Data-Driven Modeling.

    Energy Technology Data Exchange (ETDEWEB)

    Stewart, James R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-05-01

    The USACM Thematic Workshop on Uncertainty Quantification and Data-Driven Modeling was held on March 23-24, 2017, in Austin, TX. The organizers of the technical program were James R. Stewart of Sandia National Laboratories and Krishna Garikipati of University of Michigan. The administrative organizer was Ruth Hengst, who serves as Program Coordinator for the USACM. The organization of this workshop was coordinated through the USACM Technical Thrust Area on Uncertainty Quantification and Probabilistic Analysis. The workshop website (http://uqpm2017.usacm.org) includes the presentation agenda as well as links to several of the presentation slides (permission to access the presentations was granted by each of those speakers, respectively). Herein, this final report contains the complete workshop program that includes the presentation agenda, the presentation abstracts, and the list of posters.

  2. Linear dynamical modes as new variables for data-driven ENSO forecast

    Science.gov (United States)

    Gavrilov, Andrey; Seleznev, Aleksei; Mukhin, Dmitry; Loskutov, Evgeny; Feigin, Alexander; Kurths, Juergen

    2018-05-01

    A new data-driven model for analysis and prediction of spatially distributed time series is proposed. The model is based on a linear dynamical mode (LDM) decomposition of the observed data which is derived from a recently developed nonlinear dimensionality reduction approach. The key point of this approach is its ability to take into account simple dynamical properties of the observed system by means of revealing the system's dominant time scales. The LDMs are used as new variables for empirical construction of a nonlinear stochastic evolution operator. The method is applied to the sea surface temperature anomaly field in the tropical belt where the El Nino Southern Oscillation (ENSO) is the main mode of variability. The advantage of LDMs versus traditionally used empirical orthogonal function decomposition is demonstrated for this data. Specifically, it is shown that the new model has a competitive ENSO forecast skill in comparison with the other existing ENSO models.

  3. Impact of neighborhood separation on the spatial reciprocity in the prisoner’s dilemma game

    International Nuclear Information System (INIS)

    Xia, Chengyi; Miao, Qin; Zhang, Juanjuan

    2013-01-01

    Highlights: • We present a novel game model in which interaction and learning neighborhood is not identical. • The separation between interaction and learning neighborhood can largely influence the cooperative behaviors. • Monte Carlo simulations are utilized to verify the evolution of cooperation. • When IN is fixed to be 4, medium-sized LN = 8 is the optimal size to promote the cooperation. • When LN is fixed to be 4, the cooperation can also be highly enhanced when IN > 4. -- Abstract: The evolutionary game theory is a very powerful tool to understand the collective cooperation behavior in many real-world systems. In the spatial game model, the payoff is often first obtained within a specific neighborhood (i.e., interaction neighborhood) and then the focal player imitates or learns the behavior of a randomly selected one inside another neighborhood which is named after the learning neighborhood. However, most studies often assume that the interaction neighborhood is identical with the learning neighborhood. Beyond this assumption, we present a spatial prisoner’s dilemma game model to discuss the impact of separation between interaction neighborhood and learning neighborhood on the cooperative behaviors among players on the square lattice. Extensive numerical simulations demonstrate that separating the interaction neighborhood from the learning neighborhood can dramatically affect the density of cooperators (ρ C ) in the population at the stationary state. In particular, compared to the standard case, we find that the medium-sized learning (interaction) neighborhood allows the cooperators to thrive and substantially favors the evolution of cooperation and ρ C can be greatly elevated when the interaction (learning) neighborhood is fixed, that is, too little or much information is not beneficial for players to make the contributions for the collective cooperation. Current results are conducive to further analyzing and understanding the emergence of

  4. NOvA Event Building, Buffering and Data-Driven Triggering From Within the DAQ System

    International Nuclear Information System (INIS)

    Fischler, M; Rechenmacher, R; Green, C; Kowalkowski, J; Norman, A; Paterno, M

    2012-01-01

    The NOvA experiment is a long baseline neutrino experiment design to make precision probes of the structure of neutrino mixing. The experiment features a unique deadtimeless data acquisition system that is capable acquiring and building an event data stream from the continuous readout of the more than 360,000 far detector channels. In order to achieve its physics goals the experiment must be able to buffer, correlate and extract the data in this stream with the beam-spills that occur that Fermilab. In addition the NOvA experiment seeks to enhance its data collection efficiencies for rare class of event topologies that are valuable for calibration through the use of data driven triggering. The NOvA-DDT is a prototype Data-Driven Triggering system. NOvA-DDT has been developed using the Fermilab artdaq generic DAQ/Event-building toolkit. This toolkit provides the advantages of sharing online software infrastructure with other Intensity Frontier experiments, and of being able to use any offline analysis module-unchanged-as a component of the online triggering decisions. We have measured the performance and overhead of NOvA-DDT framework using a Hough transform based trigger decision module developed for the NOvA detector to identify cosmic rays. The results of these tests which were run on the NOvA prototype near detector, yielded a mean processing time of 98 ms per event, while consuming only 1/16th of the available processing capacity. These results provide a proof of concept that a NOvA-DDT based processing system is a viable strategy for data acquisition and triggering for the NOvA far detector.

  5. Testing the Accuracy of Data-driven MHD Simulations of Active Region Evolution

    Energy Technology Data Exchange (ETDEWEB)

    Leake, James E.; Linton, Mark G. [U.S. Naval Research Laboratory, 4555 Overlook Avenue, SW, Washington, DC 20375 (United States); Schuck, Peter W., E-mail: james.e.leake@nasa.gov [NASA Goddard Space Flight Center, 8800 Greenbelt Road, Greenbelt, MD 20771 (United States)

    2017-04-01

    Models for the evolution of the solar coronal magnetic field are vital for understanding solar activity, yet the best measurements of the magnetic field lie at the photosphere, necessitating the development of coronal models which are “data-driven” at the photosphere. We present an investigation to determine the feasibility and accuracy of such methods. Our validation framework uses a simulation of active region (AR) formation, modeling the emergence of magnetic flux from the convection zone to the corona, as a ground-truth data set, to supply both the photospheric information and to perform the validation of the data-driven method. We focus our investigation on how the accuracy of the data-driven model depends on the temporal frequency of the driving data. The Helioseismic and Magnetic Imager on NASA’s Solar Dynamics Observatory produces full-disk vector magnetic field measurements at a 12-minute cadence. Using our framework we show that ARs that emerge over 25 hr can be modeled by the data-driving method with only ∼1% error in the free magnetic energy, assuming the photospheric information is specified every 12 minutes. However, for rapidly evolving features, under-sampling of the dynamics at this cadence leads to a strobe effect, generating large electric currents and incorrect coronal morphology and energies. We derive a sampling condition for the driving cadence based on the evolution of these small-scale features, and show that higher-cadence driving can lead to acceptable errors. Future work will investigate the source of errors associated with deriving plasma variables from the photospheric magnetograms as well as other sources of errors, such as reduced resolution, instrument bias, and noise.

  6. Data-Driven Design of Intelligent Wireless Networks: An Overview and Tutorial

    Directory of Open Access Journals (Sweden)

    Merima Kulin

    2016-06-01

    Full Text Available Data science or “data-driven research” is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i clarifies when, why and how to use data science in wireless network research; (ii provides a generic framework for applying data science in wireless networks; (iii gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v provides the reader the necessary datasets and scripts to go through the tutorial steps themselves.

  7. First-principles data-driven discovery of transition metal oxides for artificial photosynthesis

    Science.gov (United States)

    Yan, Qimin

    We develop a first-principles data-driven approach for rapid identification of transition metal oxide (TMO) light absorbers and photocatalysts for artificial photosynthesis using the Materials Project. Initially focusing on Cr, V, and Mn-based ternary TMOs in the database, we design a broadly-applicable multiple-layer screening workflow automating density functional theory (DFT) and hybrid functional calculations of bulk and surface electronic and magnetic structures. We further assess the electrochemical stability of TMOs in aqueous environments from computed Pourbaix diagrams. Several promising earth-abundant low band-gap TMO compounds with desirable band edge energies and electrochemical stability are identified by our computational efforts and then synergistically evaluated using high-throughput synthesis and photoelectrochemical screening techniques by our experimental collaborators at Caltech. Our joint theory-experiment effort has successfully identified new earth-abundant copper and manganese vanadate complex oxides that meet highly demanding requirements for photoanodes, substantially expanding the known space of such materials. By integrating theory and experiment, we validate our approach and develop important new insights into structure-property relationships for TMOs for oxygen evolution photocatalysts, paving the way for use of first-principles data-driven techniques in future applications. This work is supported by the Materials Project Predictive Modeling Center and the Joint Center for Artificial Photosynthesis through the U.S. Department of Energy, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, under Contract No. DE-AC02-05CH11231. Computational resources also provided by the Department of Energy through the National Energy Supercomputing Center.

  8. Modeling and Predicting Carbon and Water Fluxes Using Data-Driven Techniques in a Forest Ecosystem

    Directory of Open Access Journals (Sweden)

    Xianming Dou

    2017-12-01

    Full Text Available Accurate estimation of carbon and water fluxes of forest ecosystems is of particular importance for addressing the problems originating from global environmental change, and providing helpful information about carbon and water content for analyzing and diagnosing past and future climate change. The main focus of the current work was to investigate the feasibility of four comparatively new methods, including generalized regression neural network, group method of data handling (GMDH, extreme learning machine and adaptive neuro-fuzzy inference system (ANFIS, for elucidating the carbon and water fluxes in a forest ecosystem. A comparison was made between these models and two widely used data-driven models, artificial neural network (ANN and support vector machine (SVM. All the models were evaluated based on the following statistical indices: coefficient of determination, Nash-Sutcliffe efficiency, root mean square error and mean absolute error. Results indicated that the data-driven models are capable of accounting for most variance in each flux with the limited meteorological variables. The ANN model provided the best estimates for gross primary productivity (GPP and net ecosystem exchange (NEE, while the ANFIS model achieved the best for ecosystem respiration (R, indicating that no single model was consistently superior to others for the carbon flux prediction. In addition, the GMDH model consistently produced somewhat worse results for all the carbon flux and evapotranspiration (ET estimations. On the whole, among the carbon and water fluxes, all the models produced similar highly satisfactory accuracy for GPP, R and ET fluxes, and did a reasonable job of reproducing the eddy covariance NEE. Based on these findings, it was concluded that these advanced models are promising alternatives to ANN and SVM for estimating the terrestrial carbon and water fluxes.

  9. Preface [HD3-2015: International meeting on high-dimensional data-driven science

    International Nuclear Information System (INIS)

    2016-01-01

    A never-ending series of innovations in measurement technology and evolutions in information and communication technologies have led to the ongoing generation and accumulation of large quantities of high-dimensional data every day. While detailed data-centric approaches have been pursued in respective research fields, situations have been encountered where the same mathematical framework of high-dimensional data analysis can be found in a wide variety of seemingly unrelated research fields, such as estimation on the basis of undersampled Fourier transform in nuclear magnetic resonance spectroscopy in chemistry, in magnetic resonance imaging in medicine, and in astronomical interferometry in astronomy. In such situations, bringing diverse viewpoints together therefore becomes a driving force for the creation of innovative developments in various different research fields. This meeting focuses on “Sparse Modeling” (SpM) as a methodology for creation of innovative developments through the incorporation of a wide variety of viewpoints in various research fields. The objective of this meeting is to offer a forum where researchers with interest in SpM can assemble and exchange information on the latest results and newly established methodologies, and discuss future directions of the interdisciplinary studies for High-Dimensional Data-Driven science (HD 3 ). The meeting was held in Kyoto from 14-17 December 2015. We are pleased to publish 22 papers contributed by invited speakers in this volume of Journal of Physics: Conference Series. We hope that this volume will promote further development of High-Dimensional Data-Driven science. (paper)

  10. A data-driven weighting scheme for multivariate phenotypic endpoints recapitulates zebrafish developmental cascades

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Guozhu, E-mail: gzhang6@ncsu.edu [Bioinformatics Research Center, North Carolina State University, Raleigh, NC (United States); Roell, Kyle R., E-mail: krroell@ncsu.edu [Bioinformatics Research Center, North Carolina State University, Raleigh, NC (United States); Truong, Lisa, E-mail: lisa.truong@oregonstate.edu [Department of Environmental and Molecular Toxicology, Sinnhuber Aquatic Research Laboratory, Oregon State University, Corvallis, OR (United States); Tanguay, Robert L., E-mail: robert.tanguay@oregonstate.edu [Department of Environmental and Molecular Toxicology, Sinnhuber Aquatic Research Laboratory, Oregon State University, Corvallis, OR (United States); Reif, David M., E-mail: dmreif@ncsu.edu [Bioinformatics Research Center, North Carolina State University, Raleigh, NC (United States); Department of Biological Sciences, Center for Human Health and the Environment, North Carolina State University, Raleigh, NC (United States)

    2017-01-01

    Zebrafish have become a key alternative model for studying health effects of environmental stressors, partly due to their genetic similarity to humans, fast generation time, and the efficiency of generating high-dimensional systematic data. Studies aiming to characterize adverse health effects in zebrafish typically include several phenotypic measurements (endpoints). While there is a solid biomedical basis for capturing a comprehensive set of endpoints, making summary judgments regarding health effects requires thoughtful integration across endpoints. Here, we introduce a Bayesian method to quantify the informativeness of 17 distinct zebrafish endpoints as a data-driven weighting scheme for a multi-endpoint summary measure, called weighted Aggregate Entropy (wAggE). We implement wAggE using high-throughput screening (HTS) data from zebrafish exposed to five concentrations of all 1060 ToxCast chemicals. Our results show that our empirical weighting scheme provides better performance in terms of the Receiver Operating Characteristic (ROC) curve for identifying significant morphological effects and improves robustness over traditional curve-fitting approaches. From a biological perspective, our results suggest that developmental cascade effects triggered by chemical exposure can be recapitulated by analyzing the relationships among endpoints. Thus, wAggE offers a powerful approach for analysis of multivariate phenotypes that can reveal underlying etiological processes. - Highlights: • Introduced a data-driven weighting scheme for multiple phenotypic endpoints. • Weighted Aggregate Entropy (wAggE) implies differential importance of endpoints. • Endpoint relationships reveal developmental cascade effects triggered by exposure. • wAggE is generalizable to multi-endpoint data of different shapes and scales.

  11. A data-driven prediction method for fast-slow systems

    Science.gov (United States)

    Groth, Andreas; Chekroun, Mickael; Kondrashov, Dmitri; Ghil, Michael

    2016-04-01

    In this work, we present a prediction method for processes that exhibit a mixture of variability on low and fast scales. The method relies on combining empirical model reduction (EMR) with singular spectrum analysis (SSA). EMR is a data-driven methodology for constructing stochastic low-dimensional models that account for nonlinearity and serial correlation in the estimated noise, while SSA provides a decomposition of the complex dynamics into low-order components that capture spatio-temporal behavior on different time scales. Our study focuses on the data-driven modeling of partial observations from dynamical systems that exhibit power spectra with broad peaks. The main result in this talk is that the combination of SSA pre-filtering with EMR modeling improves, under certain circumstances, the modeling and prediction skill of such a system, as compared to a standard EMR prediction based on raw data. Specifically, it is the separation into "fast" and "slow" temporal scales by the SSA pre-filtering that achieves the improvement. We show, in particular that the resulting EMR-SSA emulators help predict intermittent behavior such as rapid transitions between specific regions of the system's phase space. This capability of the EMR-SSA prediction will be demonstrated on two low-dimensional models: the Rössler system and a Lotka-Volterra model for interspecies competition. In either case, the chaotic dynamics is produced through a Shilnikov-type mechanism and we argue that the latter seems to be an important ingredient for the good prediction skills of EMR-SSA emulators. Shilnikov-type behavior has been shown to arise in various complex geophysical fluid models, such as baroclinic quasi-geostrophic flows in the mid-latitude atmosphere and wind-driven double-gyre ocean circulation models. This pervasiveness of the Shilnikow mechanism of fast-slow transition opens interesting perspectives for the extension of the proposed EMR-SSA approach to more realistic situations.

  12. Association between neighborhood safety and overweight status among urban adolescents

    Directory of Open Access Journals (Sweden)

    Johnson Renee M

    2009-08-01

    Full Text Available Abstract Background Neighborhood safety may be an important social environmental determinant of overweight. We examined the relationship between perceived neighborhood safety and overweight status, and assessed the validity of reported neighborhood safety among a representative community sample of urban adolescents (who were racially and ethnically diverse. Methods Data come from the 2006 Boston Youth Survey, a cross-sectional study in which public high school students in Boston, MA completed a pencil-and-paper survey. The study used a two-stage, stratified sampling design whereby schools and then 9th–12th grade classrooms within schools were selected (the analytic sample included 1,140 students. Students reported their perceptions of neighborhood safety and several associated dimensions. With self-reported height and weight data, we computed body mass index (BMI, kg/m2 for the adolescents based on CDC growth charts. Chi-square statistics and corresponding p-values were computed to compare perceived neighborhood safety by the several associated dimensions. Prevalence ratios (PRs and 95% confidence intervals (CI were calculated to examine the association between perceived neighborhood safety and the prevalence of overweight status controlling for relevant covariates and school site. Results More than one-third (35.6% of students said they always felt safe in their neighborhood, 43.9% said they sometimes felt safe, 11.6% rarely felt safe, and 8.9% never felt safe. Those students who reported that they rarely or never feel safe in their neighborhoods were more likely than those who said they always or sometimes feel safe to believe that gang violence was a serious problem in their neighborhood or school (68.0% vs. 44.1%, p p = 0.025. In the fully adjusted model (including grade and school stratified by race/ethnicity, we found a statistically significant association between feeling unsafe in one's own neighborhood and overweight status among

  13. Fork-join and data-driven execution models on multi-core architectures: Case study of the FMM

    KAUST Repository

    Amer, Abdelhalim

    2013-01-01

    Extracting maximum performance of multi-core architectures is a difficult task primarily due to bandwidth limitations of the memory subsystem and its complex hierarchy. In this work, we study the implications of fork-join and data-driven execution models on this type of architecture at the level of task parallelism. For this purpose, we use a highly optimized fork-join based implementation of the FMM and extend it to a data-driven implementation using a distributed task scheduling approach. This study exposes some limitations of the conventional fork-join implementation in terms of synchronization overheads. We find that these are not negligible and their elimination by the data-driven method, with a careful data locality strategy, was beneficial. Experimental evaluation of both methods on state-of-the-art multi-socket multi-core architectures showed up to 22% speed-ups of the data-driven approach compared to the original method. We demonstrate that a data-driven execution of FMM not only improves performance by avoiding global synchronization overheads but also reduces the memory-bandwidth pressure caused by memory-intensive computations. © 2013 Springer-Verlag.

  14. Tracking Invasive Alien Species (TrIAS: Building a data-driven framework to inform policy

    Directory of Open Access Journals (Sweden)

    Sonia Vanderhoeven

    2017-05-01

    Full Text Available Imagine a future where dynamically, from year to year, we can track the progression of alien species (AS, identify emerging problem species, assess their current and future risk and timely inform policy in a seamless data-driven workflow. One that is built on open science and open data infrastructures. By using international biodiversity standards and facilities, we would ensure interoperability, repeatability and sustainability. This would make the process adaptable to future requirements in an evolving AS policy landscape both locally and internationally. In recent years, Belgium has developed decision support tools to inform invasive alien species (IAS policy, including information systems, early warning initiatives and risk assessment protocols. However, the current workflows from biodiversity observations to IAS science and policy are slow, not easily repeatable, and their scope is often taxonomically, spatially and temporally limited. This is mainly caused by the diversity of actors involved and the closed, fragmented nature of the sources of these biodiversity data, which leads to considerable knowledge gaps for IAS research and policy. We will leverage expertise and knowledge from nine former and current BELSPO projects and initiatives: Alien Alert, Invaxen, Diars, INPLANBEL, Alien Impact, Ensis, CORDEX.be, Speedy and the Belgian Biodiversity Platform. The project will be built on two components: 1 The establishment of a data mobilization framework for AS data from diverse data sources and 2 the development of data-driven procedures for risk evaluation based on risk modelling, risk mapping and risk assessment. We will use facilities from the Global Biodiversity Information Facility (GBIF, standards from the Biodiversity Information Standards organization (TDWG and expertise from Lifewatch to create and facilitate a systematic workflow. Alien species data will be gathered from a large set of regional, national and international

  15. Neighborhood Context and Immigrant Young Children's Development

    Science.gov (United States)

    Leventhal, Tama; Shuey, Elizabeth A.

    2014-01-01

    This study explored how neighborhood social processes and resources, relevant to immigrant families and immigrant neighborhoods, contribute to young children's behavioral functioning and achievement across diverse racial/ethnic groups. Data were drawn from the Project on Human Development in Chicago Neighborhoods, a neighborhood-based,…

  16. Towards Data-Driven Simulations of Wildfire Spread using Ensemble-based Data Assimilation

    Science.gov (United States)

    Rochoux, M. C.; Bart, J.; Ricci, S. M.; Cuenot, B.; Trouvé, A.; Duchaine, F.; Morel, T.

    2012-12-01

    Real-time predictions of a propagating wildfire remain a challenging task because the problem involves both multi-physics and multi-scales. The propagation speed of wildfires, also called the rate of spread (ROS), is indeed determined by complex interactions between pyrolysis, combustion and flow dynamics, atmospheric dynamics occurring at vegetation, topographical and meteorological scales. Current operational fire spread models are mainly based on a semi-empirical parameterization of the ROS in terms of vegetation, topographical and meteorological properties. For the fire spread simulation to be predictive and compatible with operational applications, the uncertainty on the ROS model should be reduced. As recent progress made in remote sensing technology provides new ways to monitor the fire front position, a promising approach to overcome the difficulties found in wildfire spread simulations is to integrate fire modeling and fire sensing technologies using data assimilation (DA). For this purpose we have developed a prototype data-driven wildfire spread simulator in order to provide optimal estimates of poorly known model parameters [*]. The data-driven simulation capability is adapted for more realistic wildfire spread : it considers a regional-scale fire spread model that is informed by observations of the fire front location. An Ensemble Kalman Filter algorithm (EnKF) based on a parallel computing platform (OpenPALM) was implemented in order to perform a multi-parameter sequential estimation where wind magnitude and direction are in addition to vegetation properties (see attached figure). The EnKF algorithm shows its good ability to track a small-scale grassland fire experiment and ensures a good accounting for the sensitivity of the simulation outcomes to the control parameters. As a conclusion, it was shown that data assimilation is a promising approach to more accurately forecast time-varying wildfire spread conditions as new airborne-like observations of

  17. A non-linear dimension reduction methodology for generating data-driven stochastic input models

    Science.gov (United States)

    Ganapathysubramanian, Baskar; Zabaras, Nicholas

    2008-06-01

    Stochastic analysis of random heterogeneous media (polycrystalline materials, porous media, functionally graded materials) provides information of significance only if realistic input models of the topology and property variations are used. This paper proposes a framework to construct such input stochastic models for the topology and thermal diffusivity variations in heterogeneous media using a data-driven strategy. Given a set of microstructure realizations (input samples) generated from given statistical information about the medium topology, the framework constructs a reduced-order stochastic representation of the thermal diffusivity. This problem of constructing a low-dimensional stochastic representation of property variations is analogous to the problem of manifold learning and parametric fitting of hyper-surfaces encountered in image processing and psychology. Denote by M the set of microstructures that satisfy the given experimental statistics. A non-linear dimension reduction strategy is utilized to map M to a low-dimensional region, A. We first show that M is a compact manifold embedded in a high-dimensional input space Rn. An isometric mapping F from M to a low-dimensional, compact, connected set A⊂Rd(d≪n) is constructed. Given only a finite set of samples of the data, the methodology uses arguments from graph theory and differential geometry to construct the isometric transformation F:M→A. Asymptotic convergence of the representation of M by A is shown. This mapping F serves as an accurate, low-dimensional, data-driven representation of the property variations. The reduced-order model of the material topology and thermal diffusivity variations is subsequently used as an input in the solution of stochastic partial differential equations that describe the evolution of dependant variables. A sparse grid collocation strategy (Smolyak algorithm) is utilized to solve these stochastic equations efficiently. We showcase the methodology by constructing low

  18. Developing a Data Driven Process-Based Model for Remote Sensing of Ecosystem Production

    Science.gov (United States)

    Elmasri, B.; Rahman, A. F.

    2010-12-01

    Estimating ecosystem carbon fluxes at various spatial and temporal scales is essential for quantifying the global carbon cycle. Numerous models have been developed for this purpose using several environmental variables as well as vegetation indices derived from remotely sensed data. Here we present a data driven modeling approach for gross primary production (GPP) that is based on a process based model BIOME-BGC. The proposed model was run using available remote sensing data and it does not depend on look-up tables. Furthermore, this approach combines the merits of both empirical and process models, and empirical models were used to estimate certain input variables such as light use efficiency (LUE). This was achieved by using remotely sensed data to the mathematical equations that represent biophysical photosynthesis processes in the BIOME-BGC model. Moreover, a new spectral index for estimating maximum photosynthetic activity, maximum photosynthetic rate index (MPRI), is also developed and presented here. This new index is based on the ratio between the near infrared and the green bands (ρ858.5/ρ555). The model was tested and validated against MODIS GPP product and flux measurements from two eddy covariance flux towers located at Morgan Monroe State Forest (MMSF) in Indiana and Harvard Forest in Massachusetts. Satellite data acquired by the Advanced Microwave Scanning Radiometer (AMSR-E) and MODIS were used. The data driven model showed a strong correlation between the predicted and measured GPP at the two eddy covariance flux towers sites. This methodology produced better predictions of GPP than did the MODIS GPP product. Moreover, the proportion of error in the predicted GPP for MMSF and Harvard forest was dominated by unsystematic errors suggesting that the results are unbiased. The analysis indicated that maintenance respiration is one of the main factors that dominate the overall model outcome errors and improvement in maintenance respiration estimation

  19. Data-driven models of dominantly-inherited Alzheimer's disease progression.

    Science.gov (United States)

    Oxtoby, Neil P; Young, Alexandra L; Cash, David M; Benzinger, Tammie L S; Fagan, Anne M; Morris, John C; Bateman, Randall J; Fox, Nick C; Schott, Jonathan M; Alexander, Daniel C

    2018-03-22

    Dominantly-inherited Alzheimer's disease is widely hoped to hold the key to developing interventions for sporadic late onset Alzheimer's disease. We use emerging techniques in generative data-driven disease progression modelling to characterize dominantly-inherited Alzheimer's disease progression with unprecedented resolution, and without relying upon familial estimates of years until symptom onset. We retrospectively analysed biomarker data from the sixth data freeze of the Dominantly Inherited Alzheimer Network observational study, including measures of amyloid proteins and neurofibrillary tangles in the brain, regional brain volumes and cortical thicknesses, brain glucose hypometabolism, and cognitive performance from the Mini-Mental State Examination (all adjusted for age, years of education, sex, and head size, as appropriate). Data included 338 participants with known mutation status (211 mutation carriers in three subtypes: 163 PSEN1, 17 PSEN2, and 31 APP) and a baseline visit (age 19-66; up to four visits each, 1.1 ± 1.9 years in duration; spanning 30 years before, to 21 years after, parental age of symptom onset). We used an event-based model to estimate sequences of biomarker changes from baseline data across disease subtypes (mutation groups), and a differential equation model to estimate biomarker trajectories from longitudinal data (up to 66 mutation carriers, all subtypes combined). The two models concur that biomarker abnormality proceeds as follows: amyloid deposition in cortical then subcortical regions (∼24 ± 11 years before onset); phosphorylated tau (17 ± 8 years), tau and amyloid-β changes in cerebrospinal fluid; neurodegeneration first in the putamen and nucleus accumbens (up to 6 ± 2 years); then cognitive decline (7 ± 6 years), cerebral hypometabolism (4 ± 4 years), and further regional neurodegeneration. Our models predicted symptom onset more accurately than predictions that used familial estimates: root mean squared error of 1

  20. WIFIRE: A Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience Cyberinfrastructure for Wildfires

    Science.gov (United States)

    Altintas, I.; Block, J.; Braun, H.; de Callafon, R. A.; Gollner, M. J.; Smarr, L.; Trouve, A.

    2013-12-01

    Recent studies confirm that climate change will cause wildfires to increase in frequency and severity in the coming decades especially for California and in much of the North American West. The most critical sustainability issue in the midst of these ever-changing dynamics is how to achieve a new social-ecological equilibrium of this fire ecology. Wildfire wind speeds and directions change in an instant, and first responders can only be effective when they take action as quickly as the conditions change. To deliver information needed for sustainable policy and management in this dynamically changing fire regime, we must capture these details to understand the environmental processes. We are building an end-to-end cyberinfrastructure (CI), called WIFIRE, for real-time and data-driven simulation, prediction and visualization of wildfire behavior. The WIFIRE integrated CI system supports social-ecological resilience to the changing fire ecology regime in the face of urban dynamics and climate change. Networked observations, e.g., heterogeneous satellite data and real-time remote sensor data is integrated with computational techniques in signal processing, visualization, modeling and data assimilation to provide a scalable, technological, and educational solution to monitor weather patterns to predict a wildfire's Rate of Spread. Our collaborative WIFIRE team of scientists, engineers, technologists, government policy managers, private industry, and firefighters architects implement CI pathways that enable joint innovation for wildfire management. Scientific workflows are used as an integrative distributed programming model and simplify the implementation of engineering modules for data-driven simulation, prediction and visualization while allowing integration with large-scale computing facilities. WIFIRE will be scalable to users with different skill-levels via specialized web interfaces and user-specified alerts for environmental events broadcasted to receivers before

  1. A non-linear dimension reduction methodology for generating data-driven stochastic input models

    International Nuclear Information System (INIS)

    Ganapathysubramanian, Baskar; Zabaras, Nicholas

    2008-01-01

    Stochastic analysis of random heterogeneous media (polycrystalline materials, porous media, functionally graded materials) provides information of significance only if realistic input models of the topology and property variations are used. This paper proposes a framework to construct such input stochastic models for the topology and thermal diffusivity variations in heterogeneous media using a data-driven strategy. Given a set of microstructure realizations (input samples) generated from given statistical information about the medium topology, the framework constructs a reduced-order stochastic representation of the thermal diffusivity. This problem of constructing a low-dimensional stochastic representation of property variations is analogous to the problem of manifold learning and parametric fitting of hyper-surfaces encountered in image processing and psychology. Denote by M the set of microstructures that satisfy the given experimental statistics. A non-linear dimension reduction strategy is utilized to map M to a low-dimensional region, A. We first show that M is a compact manifold embedded in a high-dimensional input space R n . An isometric mapping F from M to a low-dimensional, compact, connected set A is contained in R d (d<< n) is constructed. Given only a finite set of samples of the data, the methodology uses arguments from graph theory and differential geometry to construct the isometric transformation F:M→A. Asymptotic convergence of the representation of M by A is shown. This mapping F serves as an accurate, low-dimensional, data-driven representation of the property variations. The reduced-order model of the material topology and thermal diffusivity variations is subsequently used as an input in the solution of stochastic partial differential equations that describe the evolution of dependant variables. A sparse grid collocation strategy (Smolyak algorithm) is utilized to solve these stochastic equations efficiently. We showcase the methodology

  2. Enabling Data-Driven Methodologies Across the Data Lifecycle and Ecosystem

    Science.gov (United States)

    Doyle, R. J.; Crichton, D.

    2017-12-01

    NASA has unlocked unprecedented scientific knowledge through exploration of the Earth, our solar system, and the larger universe. NASA is generating enormous amounts of data that are challenging traditional approaches to capturing, managing, analyzing and ultimately gaining scientific understanding from science data. New architectures, capabilities and methodologies are needed to span the entire observing system, from spacecraft to archive, while integrating data-driven discovery and analytic capabilities. NASA data have a definable lifecycle, from remote collection point to validated accessibility in multiple archives. Data challenges must be addressed across this lifecycle, to capture opportunities and avoid decisions that may limit or compromise what is achievable once data arrives at the archive. Data triage may be necessary when the collection capacity of the sensor or instrument overwhelms data transport or storage capacity. By migrating computational and analytic capability to the point of data collection, informed decisions can be made about which data to keep; in some cases, to close observational decision loops onboard, to enable attending to unexpected or transient phenomena. Along a different dimension than the data lifecycle, scientists and other end-users must work across an increasingly complex data ecosystem, where the range of relevant data is rarely owned by a single institution. To operate effectively, scalable data architectures and community-owned information models become essential. NASA's Planetary Data System is having success with this approach. Finally, there is the difficult challenge of reproducibility and trust. While data provenance techniques will be part of the solution, future interactive analytics environments must support an ability to provide a basis for a result: relevant data source and algorithms, uncertainty tracking, etc., to assure scientific integrity and to enable confident decision making. Advances in data science offer

  3. Measuring physical neighborhood quality related to health.

    Science.gov (United States)

    Rollings, Kimberly A; Wells, Nancy M; Evans, Gary W

    2015-04-29

    Although sociodemographic factors are one aspect of understanding the effects of neighborhood environments on health, equating neighborhood quality with socioeconomic status ignores the important role of physical neighborhood attributes. Prior work on neighborhood environments and health has relied primarily on level of socioeconomic disadvantage as the indicator of neighborhood quality without attention to physical neighborhood quality. A small but increasing number of studies have assessed neighborhood physical characteristics. Findings generally indicate that there is an association between living in deprived neighborhoods and poor health outcomes, but rigorous evidence linking specific physical neighborhood attributes to particular health outcomes is lacking. This paper discusses the methodological challenges and limitations of measuring physical neighborhood environments relevant to health and concludes with proposed directions for future work.

  4. A Data-Driven Reliability Estimation Approach for Phased-Mission Systems

    Directory of Open Access Journals (Sweden)

    Hua-Feng He

    2014-01-01

    Full Text Available We attempt to address the issues associated with reliability estimation for phased-mission systems (PMS and present a novel data-driven approach to achieve reliability estimation for PMS using the condition monitoring information and degradation data of such system under dynamic operating scenario. In this sense, this paper differs from the existing methods only considering the static scenario without using the real-time information, which aims to estimate the reliability for a population but not for an individual. In the presented approach, to establish a linkage between the historical data and real-time information of the individual PMS, we adopt a stochastic filtering model to model the phase duration and obtain the updated estimation of the mission time by Bayesian law at each phase. At the meanwhile, the lifetime of PMS is estimated from degradation data, which are modeled by an adaptive Brownian motion. As such, the mission reliability can be real time obtained through the estimated distribution of the mission time in conjunction with the estimated lifetime distribution. We demonstrate the usefulness of the developed approach via a numerical example.

  5. Data-driven classification of bipolar I disorder from longitudinal course of mood.

    Science.gov (United States)

    Cochran, A L; McInnis, M G; Forger, D B

    2016-10-11

    The Diagnostic and Statistical Manual of Mental Disorder (DSM) classification of bipolar disorder defines categories to reflect common understanding of mood symptoms rather than scientific evidence. This work aimed to determine whether bipolar I can be objectively classified from longitudinal mood data and whether resulting classes have clinical associations. Bayesian nonparametric hierarchical models with latent classes and patient-specific models of mood are fit to data from Longitudinal Interval Follow-up Evaluations (LIFE) of bipolar I patients (N=209). Classes are tested for clinical associations. No classes are justified using the time course of DSM-IV mood states. Three classes are justified using the course of subsyndromal mood symptoms. Classes differed in attempted suicides (P=0.017), disability status (P=0.012) and chronicity of affective symptoms (P=0.009). Thus, bipolar I disorder can be objectively classified from mood course, and individuals in the resulting classes share clinical features. Data-driven classification from mood course could be used to enrich sample populations for pharmacological and etiological studies.

  6. Data-driven automatic parking constrained control for four-wheeled mobile vehicles

    Directory of Open Access Journals (Sweden)

    Wenxu Yan

    2016-11-01

    Full Text Available In this article, a novel data-driven constrained control scheme is proposed for automatic parking systems. The design of the proposed scheme only depends on the steering angle and the orientation angle of the car, and it does not involve any model information of the car. Therefore, the proposed scheme-based automatic parking system is applicable to different kinds of cars. In order to further reduce the desired trajectory coordinate tracking errors, a coordinates compensation algorithm is also proposed. In the design procedure of the controller, a novel dynamic anti-windup compensator is used to deal with the change magnitude and rate saturations of automatic parking control input. It is theoretically proven that all the signals in the closed-loop system are uniformly ultimately bounded based on Lyapunov stability analysis method. Finally, a simulation comparison among the proposed scheme with coordinates compensation and Proportion Integration Differentiation (PID control algorithm is given. It is shown that the proposed scheme with coordinates compensation has smaller tracking errors and more rapid responses than PID scheme.

  7. The Facilitation of a Sustainable Power System: A Practice from Data-Driven Enhanced Boiler Control

    Directory of Open Access Journals (Sweden)

    Zhenlong Wu

    2018-04-01

    Full Text Available An increasing penetration of renewable energy may bring significant challenges to a power system due to its inherent intermittency. To achieve a sustainable future for renewable energy, a conventional power plant is required to be able to change its power output rapidly for a grid balance purpose. However, the rapid power change may result in the boiler operating in a dangerous manner. To this end, this paper aims to improve boiler control performance via a data-driven control strategy, namely Active Disturbance Rejection Control (ADRC. For practical implementation, a tuning method is developed for ADRC controller parameters to maximize its potential in controlling a boiler operating in different conditions. Based on a Monte Carlo simulation, a Probabilistic Robustness (PR index is subsequently formulated to represent the controller’s sensitivity to the varying conditions. The stability region of the ADRC controller is depicted to provide the search space in which the optimal group of parameters is searched for based on the PR index. Illustrative simulations are performed to verify the efficacy of the proposed method. Finally, the proposed method is experimentally applied to a boiler’s secondary air control system successfully. The results of the field application show that the proposed ADRC based on PR can ensure the expected control performance even though it works in a wider range of operating conditions. The field application depicts a promising future for the ADRC controller as an alternative solution in the power industry to integrate more renewable energy into the power grid.

  8. Data-driven analysis of functional brain interactions during free listening to music and speech.

    Science.gov (United States)

    Fang, Jun; Hu, Xintao; Han, Junwei; Jiang, Xi; Zhu, Dajiang; Guo, Lei; Liu, Tianming

    2015-06-01

    Natural stimulus functional magnetic resonance imaging (N-fMRI) such as fMRI acquired when participants were watching video streams or listening to audio streams has been increasingly used to investigate functional mechanisms of the human brain in recent years. One of the fundamental challenges in functional brain mapping based on N-fMRI is to model the brain's functional responses to continuous, naturalistic and dynamic natural stimuli. To address this challenge, in this paper we present a data-driven approach to exploring functional interactions in the human brain during free listening to music and speech streams. Specifically, we model the brain responses using N-fMRI by measuring the functional interactions on large-scale brain networks with intrinsically established structural correspondence, and perform music and speech classification tasks to guide the systematic identification of consistent and discriminative functional interactions when multiple subjects were listening music and speech in multiple categories. The underlying premise is that the functional interactions derived from N-fMRI data of multiple subjects should exhibit both consistency and discriminability. Our experimental results show that a variety of brain systems including attention, memory, auditory/language, emotion, and action networks are among the most relevant brain systems involved in classic music, pop music and speech differentiation. Our study provides an alternative approach to investigating the human brain's mechanism in comprehension of complex natural music and speech.

  9. Forecasting success via early adoptions analysis: A data-driven study.

    Directory of Open Access Journals (Sweden)

    Giulio Rossetti

    Full Text Available Innovations are continuously launched over markets, such as new products over the retail market or new artists over the music scene. Some innovations become a success; others don't. Forecasting which innovations will succeed at the beginning of their lifecycle is hard. In this paper, we provide a data-driven, large-scale account of the existence of a special niche among early adopters, individuals that consistently tend to adopt successful innovations before they reach success: we will call them Hit-Savvy. Hit-Savvy can be discovered in very different markets and retain over time their ability to anticipate the success of innovations. As our second contribution, we devise a predictive analytical process, exploiting Hit-Savvy as signals, which achieves high accuracy in the early-stage prediction of successful innovations, far beyond the reach of state-of-the-art time series forecasting models. Indeed, our findings and predictive model can be fruitfully used to support marketing strategies and product placement.

  10. A Data-Driven Response Virtual Sensor Technique with Partial Vibration Measurements Using Convolutional Neural Network

    Science.gov (United States)

    Sun, Shan-Bin; He, Yuan-Yuan; Zhou, Si-Da; Yue, Zhen-Jiang

    2017-01-01

    Measurement of dynamic responses plays an important role in structural health monitoring, damage detection and other fields of research. However, in aerospace engineering, the physical sensors are limited in the operational conditions of spacecraft, due to the severe environment in outer space. This paper proposes a virtual sensor model with partial vibration measurements using a convolutional neural network. The transmissibility function is employed as prior knowledge. A four-layer neural network with two convolutional layers, one fully connected layer, and an output layer is proposed as the predicting model. Numerical examples of two different structural dynamic systems demonstrate the performance of the proposed approach. The excellence of the novel technique is further indicated using a simply supported beam experiment comparing to a modal-model-based virtual sensor, which uses modal parameters, such as mode shapes, for estimating the responses of the faulty sensors. The results show that the presented data-driven response virtual sensor technique can predict structural response with high accuracy. PMID:29231868

  11. Optimizing preventive maintenance policy: A data-driven application for a light rail braking system.

    Science.gov (United States)

    Corman, Francesco; Kraijema, Sander; Godjevac, Milinko; Lodewijks, Gabriel

    2017-10-01

    This article presents a case study determining the optimal preventive maintenance policy for a light rail rolling stock system in terms of reliability, availability, and maintenance costs. The maintenance policy defines one of the three predefined preventive maintenance actions at fixed time-based intervals for each of the subsystems of the braking system. Based on work, maintenance, and failure data, we model the reliability degradation of the system and its subsystems under the current maintenance policy by a Weibull distribution. We then analytically determine the relation between reliability, availability, and maintenance costs. We validate the model against recorded reliability and availability and get further insights by a dedicated sensitivity analysis. The model is then used in a sequential optimization framework determining preventive maintenance intervals to improve on the key performance indicators. We show the potential of data-driven modelling to determine optimal maintenance policy: same system availability and reliability can be achieved with 30% maintenance cost reduction, by prolonging the intervals and re-grouping maintenance actions.

  12. Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study.

    Science.gov (United States)

    Kim, Eun-Kyung; Kim, Hyo-Eun; Han, Kyunghwa; Kang, Bong Joo; Sohn, Yu-Mee; Woo, Ok Hee; Lee, Chan Wha

    2018-02-09

    We assessed the feasibility of a data-driven imaging biomarker based on weakly supervised learning (DIB; an imaging biomarker derived from large-scale medical image data with deep learning technology) in mammography (DIB-MG). A total of 29,107 digital mammograms from five institutions (4,339 cancer cases and 24,768 normal cases) were included. After matching patients' age, breast density, and equipment, 1,238 and 1,238 cases were chosen as validation and test sets, respectively, and the remainder were used for training. The core algorithm of DIB-MG is a deep convolutional neural network; a deep learning algorithm specialized for images. Each sample (case) is an exam composed of 4-view images (RCC, RMLO, LCC, and LMLO). For each case in a training set, the cancer probability inferred from DIB-MG is compared with the per-case ground-truth label. Then the model parameters in DIB-MG are updated based on the error between the prediction and the ground-truth. At the operating point (threshold) of 0.5, sensitivity was 75.6% and 76.1% when specificity was 90.2% and 88.5%, and AUC was 0.903 and 0.906 for the validation and test sets, respectively. This research showed the potential of DIB-MG as a screening tool for breast cancer.

  13. Using data-driven agent-based models for forecasting emerging infectious diseases

    Directory of Open Access Journals (Sweden)

    Srinivasan Venkatramanan

    2018-03-01

    Full Text Available Producing timely, well-informed and reliable forecasts for an ongoing epidemic of an emerging infectious disease is a huge challenge. Epidemiologists and policy makers have to deal with poor data quality, limited understanding of the disease dynamics, rapidly changing social environment and the uncertainty on effects of various interventions in place. Under this setting, detailed computational models provide a comprehensive framework for integrating diverse data sources into a well-defined model of disease dynamics and social behavior, potentially leading to better understanding and actions. In this paper, we describe one such agent-based model framework developed for forecasting the 2014–2015 Ebola epidemic in Liberia, and subsequently used during the Ebola forecasting challenge. We describe the various components of the model, the calibration process and summarize the forecast performance across scenarios of the challenge. We conclude by highlighting how such a data-driven approach can be refined and adapted for future epidemics, and share the lessons learned over the course of the challenge. Keywords: Emerging infectious diseases, Agent-based models, Simulation optimization, Bayesian calibration, Ebola

  14. Effective Data-Driven Calibration for a Galvanometric Laser Scanning System Using Binocular Stereo Vision.

    Science.gov (United States)

    Tu, Junchao; Zhang, Liyan

    2018-01-12

    A new solution to the problem of galvanometric laser scanning (GLS) system calibration is presented. Under the machine learning framework, we build a single-hidden layer feedforward neural network (SLFN)to represent the GLS system, which takes the digital control signal at the drives of the GLS system as input and the space vector of the corresponding outgoing laser beam as output. The training data set is obtained with the aid of a moving mechanism and a binocular stereo system. The parameters of the SLFN are efficiently solved in a closed form by using extreme learning machine (ELM). By quantitatively analyzing the regression precision with respective to the number of hidden neurons in the SLFN, we demonstrate that the proper number of hidden neurons can be safely chosen from a broad interval to guarantee good generalization performance. Compared to the traditional model-driven calibration, the proposed calibration method does not need a complex modeling process and is more accurate and stable. As the output of the network is the space vectors of the outgoing laser beams, it costs much less training time and can provide a uniform solution to both laser projection and 3D-reconstruction, in contrast with the existing data-driven calibration method which only works for the laser triangulation problem. Calibration experiment, projection experiment and 3D reconstruction experiment are respectively conducted to test the proposed method, and good results are obtained.

  15. Effective Data-Driven Calibration for a Galvanometric Laser Scanning System Using Binocular Stereo Vision

    Directory of Open Access Journals (Sweden)

    Junchao Tu

    2018-01-01

    Full Text Available A new solution to the problem of galvanometric laser scanning (GLS system calibration is presented. Under the machine learning framework, we build a single-hidden layer feedforward neural network (SLFN)to represent the GLS system, which takes the digital control signal at the drives of the GLS system as input and the space vector of the corresponding outgoing laser beam as output. The training data set is obtained with the aid of a moving mechanism and a binocular stereo system. The parameters of the SLFN are efficiently solved in a closed form by using extreme learning machine (ELM. By quantitatively analyzing the regression precision with respective to the number of hidden neurons in the SLFN, we demonstrate that the proper number of hidden neurons can be safely chosen from a broad interval to guarantee good generalization performance. Compared to the traditional model-driven calibration, the proposed calibration method does not need a complex modeling process and is more accurate and stable. As the output of the network is the space vectors of the outgoing laser beams, it costs much less training time and can provide a uniform solution to both laser projection and 3D-reconstruction, in contrast with the existing data-driven calibration method which only works for the laser triangulation problem. Calibration experiment, projection experiment and 3D reconstruction experiment are respectively conducted to test the proposed method, and good results are obtained.

  16. A priori data-driven multi-clustered reservoir generation algorithm for echo state network.

    Directory of Open Access Journals (Sweden)

    Xiumin Li

    Full Text Available Echo state networks (ESNs with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision.

  17. Data-free and data-driven spectral perturbations for RANS UQ

    Science.gov (United States)

    Edeling, Wouter; Mishra, Aashwin; Iaccarino, Gianluca

    2017-11-01

    Despite recent developments in high-fidelity turbulent flow simulations, RANS modeling is still vastly used by industry, due to its inherent low cost. Since accuracy is a concern in RANS modeling, model-form UQ is an essential tool for assessing the impacts of this uncertainty on quantities of interest. Applying the spectral decomposition to the modeled Reynolds-Stress Tensor (RST) allows for the introduction of decoupled perturbations into the baseline intensity (kinetic energy), shape (eigenvalues), and orientation (eigenvectors). This constitutes a natural methodology to evaluate the model form uncertainty associated to different aspects of RST modeling. In a predictive setting, one frequently encounters an absence of any relevant reference data. To make data-free predictions with quantified uncertainty we employ physical bounds to a-priori define maximum spectral perturbations. When propagated, these perturbations yield intervals of engineering utility. High-fidelity data opens up the possibility of inferring a distribution of uncertainty, by means of various data-driven machine-learning techniques. We will demonstrate our framework on a number of flow problems where RANS models are prone to failure. This research was partially supported by the Defense Advanced Research Projects Agency under the Enabling Quantification of Uncertainty in Physical Systems (EQUiPS) project (technical monitor: Dr Fariba Fahroo), and the DOE PSAAP-II program.

  18. BMI cyberworkstation: enabling dynamic data-driven brain-machine interface research through cyberinfrastructure.

    Science.gov (United States)

    Zhao, Ming; Rattanatamrong, Prapaporn; DiGiovanna, Jack; Mahmoudi, Babak; Figueiredo, Renato J; Sanchez, Justin C; Príncipe, José C; Fortes, José A B

    2008-01-01

    Dynamic data-driven brain-machine interfaces (DDDBMI) have great potential to advance the understanding of neural systems and improve the design of brain-inspired rehabilitative systems. This paper presents a novel cyberinfrastructure that couples in vivo neurophysiology experimentation with massive computational resources to provide seamless and efficient support of DDDBMI research. Closed-loop experiments can be conducted with in vivo data acquisition, reliable network transfer, parallel model computation, and real-time robot control. Behavioral experiments with live animals are supported with real-time guarantees. Offline studies can be performed with various configurations for extensive analysis and training. A Web-based portal is also provided to allow users to conveniently interact with the cyberinfrastructure, conducting both experimentation and analysis. New motor control models are developed based on this approach, which include recursive least square based (RLS) and reinforcement learning based (RLBMI) algorithms. The results from an online RLBMI experiment shows that the cyberinfrastructure can successfully support DDDBMI experiments and meet the desired real-time requirements.

  19. Data-driven modelling of structured populations a practical guide to the integral projection model

    CERN Document Server

    Ellner, Stephen P; Rees, Mark

    2016-01-01

    This book is a “How To” guide for modeling population dynamics using Integral Projection Models (IPM) starting from observational data. It is written by a leading research team in this area and includes code in the R language (in the text and online) to carry out all computations. The intended audience are ecologists, evolutionary biologists, and mathematical biologists interested in developing data-driven models for animal and plant populations. IPMs may seem hard as they involve integrals. The aim of this book is to demystify IPMs, so they become the model of choice for populations structured by size or other continuously varying traits. The book uses real examples of increasing complexity to show how the life-cycle of the study organism naturally leads to the appropriate statistical analysis, which leads directly to the IPM itself. A wide range of model types and analyses are presented, including model construction, computational methods, and the underlying theory, with the more technical material in B...

  20. Lessons learned from a data-driven college access program: The National College Advising Corps.

    Science.gov (United States)

    Horng, Eileen L; Evans, Brent J; Antonio, Anthony L; Foster, Jesse D; Kalamkarian, Hoori S; Hurd, Nicole F; Bettinger, Eric P

    2013-01-01

    This chapter discusses the collaboration between a national college access program, the National College Advising Corps (NCAC), and its research and evaluation team at Stanford University. NCAC is currently active in almost four hundred high schools and through the placement of a recent college graduate to serve as a college adviser provides necessary information and support for students who may find it difficult to navigate the complex college admission process. The advisers also conduct outreach to underclassmen in an effort to improve the school-wide college-going culture. Analyses include examination of both quantitative and qualitative data from numerous sources and partners with every level of the organization from the national office to individual high schools. The authors discuss balancing the pursuit of evaluation goals with academic scholarship. In an effort to benefit other programs seeking to form successful data-driven interventions, the authors provide explicit examples of the partnership and present several examples of how the program has benefited from the data gathered by the evaluation team. © WILEY PERIODICALS, INC.

  1. A data-driven approach for evaluating multi-modal therapy in traumatic brain injury.

    Science.gov (United States)

    Haefeli, Jenny; Ferguson, Adam R; Bingham, Deborah; Orr, Adrienne; Won, Seok Joon; Lam, Tina I; Shi, Jian; Hawley, Sarah; Liu, Jialing; Swanson, Raymond A; Massa, Stephen M

    2017-02-16

    Combination therapies targeting multiple recovery mechanisms have the potential for additive or synergistic effects, but experimental design and analyses of multimodal therapeutic trials are challenging. To address this problem, we developed a data-driven approach to integrate and analyze raw source data from separate pre-clinical studies and evaluated interactions between four treatments following traumatic brain injury. Histologic and behavioral outcomes were measured in 202 rats treated with combinations of an anti-inflammatory agent (minocycline), a neurotrophic agent (LM11A-31), and physical therapy consisting of assisted exercise with or without botulinum toxin-induced limb constraint. Data was curated and analyzed in a linked workflow involving non-linear principal component analysis followed by hypothesis testing with a linear mixed model. Results revealed significant benefits of the neurotrophic agent LM11A-31 on learning and memory outcomes after traumatic brain injury. In addition, modulations of LM11A-31 effects by co-administration of minocycline and by the type of physical therapy applied reached statistical significance. These results suggest a combinatorial effect of drug and physical therapy interventions that was not evident by univariate analysis. The study designs and analytic techniques applied here form a structured, unbiased, internally validated workflow that may be applied to other combinatorial studies, both in animals and humans.

  2. Forecasting success via early adoptions analysis: A data-driven study.

    Science.gov (United States)

    Rossetti, Giulio; Milli, Letizia; Giannotti, Fosca; Pedreschi, Dino

    2017-01-01

    Innovations are continuously launched over markets, such as new products over the retail market or new artists over the music scene. Some innovations become a success; others don't. Forecasting which innovations will succeed at the beginning of their lifecycle is hard. In this paper, we provide a data-driven, large-scale account of the existence of a special niche among early adopters, individuals that consistently tend to adopt successful innovations before they reach success: we will call them Hit-Savvy. Hit-Savvy can be discovered in very different markets and retain over time their ability to anticipate the success of innovations. As our second contribution, we devise a predictive analytical process, exploiting Hit-Savvy as signals, which achieves high accuracy in the early-stage prediction of successful innovations, far beyond the reach of state-of-the-art time series forecasting models. Indeed, our findings and predictive model can be fruitfully used to support marketing strategies and product placement.

  3. Data-Driven Astrochemistry: One Step Further within the Origin of Life Puzzle.

    Science.gov (United States)

    Ruf, Alexander; d'Hendecourt, Louis L S; Schmitt-Kopplin, Philippe

    2018-06-01

    Astrochemistry, meteoritics and chemical analytics represent a manifold scientific field, including various disciplines. In this review, clarifications on astrochemistry, comet chemistry, laboratory astrophysics and meteoritic research with respect to organic and metalorganic chemistry will be given. The seemingly large number of observed astrochemical molecules necessarily requires explanations on molecular complexity and chemical evolution, which will be discussed. Special emphasis should be placed on data-driven analytical methods including ultrahigh-resolving instruments and their interplay with quantum chemical computations. These methods enable remarkable insights into the complex chemical spaces that exist in meteorites and maximize the level of information on the huge astrochemical molecular diversity. In addition, they allow one to study even yet undescribed chemistry as the one involving organomagnesium compounds in meteorites. Both targeted and non-targeted analytical strategies will be explained and may touch upon epistemological problems. In addition, implications of (metal)organic matter toward prebiotic chemistry leading to the emergence of life will be discussed. The precise description of astrochemical organic and metalorganic matter as seeds for life and their interactions within various astrophysical environments may appear essential to further study questions regarding the emergence of life on a most fundamental level that is within the molecular world and its self-organization properties.

  4. Big Data-Driven Based Real-Time Traffic Flow State Identification and Prediction

    Directory of Open Access Journals (Sweden)

    Hua-pu Lu

    2015-01-01

    Full Text Available With the rapid development of urban informatization, the era of big data is coming. To satisfy the demand of traffic congestion early warning, this paper studies the method of real-time traffic flow state identification and prediction based on big data-driven theory. Traffic big data holds several characteristics, such as temporal correlation, spatial correlation, historical correlation, and multistate. Traffic flow state quantification, the basis of traffic flow state identification, is achieved by a SAGA-FCM (simulated annealing genetic algorithm based fuzzy c-means based traffic clustering model. Considering simple calculation and predictive accuracy, a bilevel optimization model for regional traffic flow correlation analysis is established to predict traffic flow parameters based on temporal-spatial-historical correlation. A two-stage model for correction coefficients optimization is put forward to simplify the bilevel optimization model. The first stage model is built to calculate the number of temporal-spatial-historical correlation variables. The second stage model is present to calculate basic model formulation of regional traffic flow correlation. A case study based on a real-world road network in Beijing, China, is implemented to test the efficiency and applicability of the proposed modeling and computing methods.

  5. A data-driven predictive approach for drug delivery using machine learning techniques.

    Directory of Open Access Journals (Sweden)

    Yuanyuan Li

    Full Text Available In drug delivery, there is often a trade-off between effective killing of the pathogen, and harmful side effects associated with the treatment. Due to the difficulty in testing every dosing scenario experimentally, a computational approach will be helpful to assist with the prediction of effective drug delivery methods. In this paper, we have developed a data-driven predictive system, using machine learning techniques, to determine, in silico, the effectiveness of drug dosing. The system framework is scalable, autonomous, robust, and has the ability to predict the effectiveness of the current drug treatment and the subsequent drug-pathogen dynamics. The system consists of a dynamic model incorporating both the drug concentration and pathogen population into distinct states. These states are then analyzed using a temporal model to describe the drug-cell interactions over time. The dynamic drug-cell interactions are learned in an adaptive fashion and used to make sequential predictions on the effectiveness of the dosing strategy. Incorporated into the system is the ability to adjust the sensitivity and specificity of the learned models based on a threshold level determined by the operator for the specific application. As a proof-of-concept, the system was validated experimentally using the pathogen Giardia lamblia and the drug metronidazole in vitro.

  6. Data-Driven Baseline Estimation of Residential Buildings for Demand Response

    Directory of Open Access Journals (Sweden)

    Saehong Park

    2015-09-01

    Full Text Available The advent of advanced metering infrastructure (AMI generates a large volume of data related with energy service. This paper exploits data mining approach for customer baseline load (CBL estimation in demand response (DR management. CBL plays a significant role in measurement and verification process, which quantifies the amount of demand reduction and authenticates the performance. The proposed data-driven baseline modeling is based on the unsupervised learning technique. Specifically we leverage both the self organizing map (SOM and K-means clustering for accurate estimation. This two-level approach efficiently reduces the large data set into representative weight vectors in SOM, and then these weight vectors are clustered by K-means clustering to find the load pattern that would be similar to the potential load pattern of the DR event day. To verify the proposed method, we conduct nationwide scale experiments where three major cities’ residential consumption is monitored by smart meters. Our evaluation compares the proposed solution with the various types of day matching techniques, showing that our approach outperforms the existing methods by up to a 68.5% lower error rate.

  7. Data-driven model-independent searches for long-lived particles at the LHC

    Science.gov (United States)

    Coccaro, Andrea; Curtin, David; Lubatti, H. J.; Russell, Heather; Shelton, Jessie

    2016-12-01

    Neutral long-lived particles (LLPs) are highly motivated by many beyond the Standard Model scenarios, such as theories of supersymmetry, baryogenesis, and neutral naturalness, and present both tremendous discovery opportunities and experimental challenges for the LHC. A major bottleneck for current LLP searches is the prediction of Standard Model backgrounds, which are often impossible to simulate accurately. In this paper, we propose a general strategy for obtaining differential, data-driven background estimates in LLP searches, thereby notably extending the range of LLP masses and lifetimes that can be discovered at the LHC. We focus on LLPs decaying in the ATLAS muon system, where triggers providing both signal and control samples are available at LHC run 2. While many existing searches require two displaced decays, a detailed knowledge of backgrounds will allow for very inclusive searches that require just one detected LLP decay. As we demonstrate for the h →X X signal model of LLP pair production in exotic Higgs decays, this results in dramatic sensitivity improvements for proper lifetimes ≳10 m . In theories of neutral naturalness, this extends reach to glueball masses far below the b ¯b threshold. Our strategy readily generalizes to other signal models and other detector subsystems. This framework therefore lends itself to the development of a systematic, model-independent LLP search program, in analogy to the highly successful simplified-model framework of prompt searches.

  8. Clinical review: optimizing enteral nutrition for critically ill patients - a simple data-driven formula

    Science.gov (United States)

    2011-01-01

    In modern critical care, the paradigm of 'therapeutic nutrition' is replacing traditional 'supportive nutrition'. Standard enteral formulas meet basic macro- and micronutrient needs; therapeutic enteral formulas meet these basic needs and also contain specific pharmaconutrients that may attenuate hyperinflammatory responses, enhance the immune responses to infection, or improve gastrointestinal tolerance. Choosing the right enteral feeding formula may positively affect a patient's outcome; targeted use of therapeutic formulas can reduce the incidence of infectious complications, shorten lengths of stay in the ICU and in the hospital, and lower risk for mortality. In this paper, we review principles of how to feed (enteral, parenteral, or both) and when to feed (early versus delayed start) patients who are critically ill. We discuss what to feed these patients in the context of specific pharmaconutrients in specialized feeding formulations, that is, arginine, glutamine, antioxidants, certain ω-3 and ω-6 fatty acids, hydrolyzed proteins, and medium-chain triglycerides. We summarize current expert guidelines for nutrition in patients with critical illness, and we present specific clinical evidence on the use of enteral formulas supplemented with anti-inflammatory or immune-modulating nutrients, and gastrointestinal tolerance-promoting nutritional formulas. Finally, we introduce an algorithm to help bedside clinicians make data-driven feeding decisions for patients with critical illness. PMID:22136305

  9. A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults.

    Science.gov (United States)

    Sun, Rui; Cheng, Qi; Wang, Guanyu; Ochieng, Washington Yotto

    2017-09-29

    The use of Unmanned Aerial Vehicles (UAVs) has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs' flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS)-based approach is presented for the detection of on-board navigation sensor faults in UAVs. Contrary to the classic UAV sensor fault detection algorithms, based on predefined or modelled faults, the proposed algorithm combines an online data training mechanism with the ANFIS-based decision system. The main advantages of this algorithm are that it allows real-time model-free residual analysis from Kalman Filter (KF) estimates and the ANFIS to build a reliable fault detection system. In addition, it allows fast and accurate detection of faults, which makes it suitable for real-time applications. Experimental results have demonstrated the effectiveness of the proposed fault detection method in terms of accuracy and misdetection rate.

  10. A Data-Driven Response Virtual Sensor Technique with Partial Vibration Measurements Using Convolutional Neural Network.

    Science.gov (United States)

    Sun, Shan-Bin; He, Yuan-Yuan; Zhou, Si-Da; Yue, Zhen-Jiang

    2017-12-12

    Measurement of dynamic responses plays an important role in structural health monitoring, damage detection and other fields of research. However, in aerospace engineering, the physical sensors are limited in the operational conditions of spacecraft, due to the severe environment in outer space. This paper proposes a virtual sensor model with partial vibration measurements using a convolutional neural network. The transmissibility function is employed as prior knowledge. A four-layer neural network with two convolutional layers, one fully connected layer, and an output layer is proposed as the predicting model. Numerical examples of two different structural dynamic systems demonstrate the performance of the proposed approach. The excellence of the novel technique is further indicated using a simply supported beam experiment comparing to a modal-model-based virtual sensor, which uses modal parameters, such as mode shapes, for estimating the responses of the faulty sensors. The results show that the presented data-driven response virtual sensor technique can predict structural response with high accuracy.

  11. Data-Driven Handover Optimization in Next Generation Mobile Communication Networks

    Directory of Open Access Journals (Sweden)

    Po-Chiang Lin

    2016-01-01

    Full Text Available Network densification is regarded as one of the important ingredients to increase capacity for next generation mobile communication networks. However, it also leads to mobility problems since users are more likely to hand over to another cell in dense or even ultradense mobile communication networks. Therefore, supporting seamless and robust connectivity through such networks becomes a very important issue. In this paper, we investigate handover (HO optimization in next generation mobile communication networks. We propose a data-driven handover optimization (DHO approach, which aims to mitigate mobility problems including too-late HO, too-early HO, HO to wrong cell, ping-pong HO, and unnecessary HO. The key performance indicator (KPI is defined as the weighted average of the ratios of these mobility problems. The DHO approach collects data from the mobile communication measurement results and provides a model to estimate the relationship between the KPI and features from the collected dataset. Based on the model, the handover parameters, including the handover margin and time-to-trigger, are optimized to minimize the KPI. Simulation results show that the proposed DHO approach could effectively mitigate mobility problems.

  12. Design of a data-driven predictive controller for start-up process of AMT vehicles.

    Science.gov (United States)

    Lu, Xiaohui; Chen, Hong; Wang, Ping; Gao, Bingzhao

    2011-12-01

    In this paper, a data-driven predictive controller is designed for the start-up process of vehicles with automated manual transmissions (AMTs). It is obtained directly from the input-output data of a driveline simulation model constructed by the commercial software AMESim. In order to obtain offset-free control for the reference input, the predictor equation is gained with incremental inputs and outputs. Because of the physical characteristics, the input and output constraints are considered explicitly in the problem formulation. The contradictory requirements of less friction losses and less driveline shock are included in the objective function. The designed controller is tested under nominal conditions and changed conditions. The simulation results show that, during the start-up process, the AMT clutch with the proposed controller works very well, and the process meets the control objectives: fast clutch lockup time, small friction losses, and the preservation of driver comfort, i.e., smooth acceleration of the vehicle. At the same time, the closed-loop system has the ability to reject uncertainties, such as the vehicle mass and road grade.

  13. Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds

    Science.gov (United States)

    Li, Rui; Chen, Lei; Li, Wen-Syan

    Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.

  14. VLAM-G: Interactive Data Driven Workflow Engine for Grid-Enabled Resources

    Directory of Open Access Journals (Sweden)

    Vladimir Korkhov

    2007-01-01

    Full Text Available Grid brings the power of many computers to scientists. However, the development of Grid-enabled applications requires knowledge about Grid infrastructure and low-level API to Grid services. In turn, workflow management systems provide a high-level environment for rapid prototyping of experimental computing systems. Coupling Grid and workflow paradigms is important for the scientific community: it makes the power of the Grid easily available to the end user. The paradigm of data driven workflow execution is one of the ways to enable distributed workflow on the Grid. The work presented in this paper is carried out in the context of the Virtual Laboratory for e-Science project. We present the VLAM-G workflow management system and its core component: the Run-Time System (RTS. The RTS is a dataflow driven workflow engine which utilizes Grid resources, hiding the complexity of the Grid from a scientist. Special attention is paid to the concept of dataflow and direct data streaming between distributed workflow components. We present the architecture and components of the RTS, describe the features of VLAM-G workflow execution, and evaluate the system by performance measurements and a real life use case.

  15. An asynchronous data-driven readout prototype for CEPC vertex detector

    Science.gov (United States)

    Yang, Ping; Sun, Xiangming; Huang, Guangming; Xiao, Le; Gao, Chaosong; Huang, Xing; Zhou, Wei; Ren, Weiping; Li, Yashu; Liu, Jianchao; You, Bihui; Zhang, Li

    2017-12-01

    The Circular Electron Positron Collider (CEPC) is proposed as a Higgs boson and/or Z boson factory for high-precision measurements on the Higgs boson. The precision of secondary vertex impact parameter plays an important role in such measurements which typically rely on flavor-tagging. Thus silicon CMOS Pixel Sensors (CPS) are the most promising technology candidate for a CEPC vertex detector, which can most likely feature a high position resolution, a low power consumption and a fast readout simultaneously. For the R&D of the CEPC vertex detector, we have developed a prototype MIC4 in the Towerjazz 180 nm CMOS Image Sensor (CIS) process. We have proposed and implemented a new architecture of asynchronous zero-suppression data-driven readout inside the matrix combined with a binary front-end inside the pixel. The matrix contains 128 rows and 64 columns with a small pixel pitch of 25 μm. The readout architecture has implemented the traditional OR-gate chain inside a super pixel combined with a priority arbiter tree between the super pixels, only reading out relevant pixels. The MIC4 architecture will be introduced in more detail in this paper. It will be taped out in May and will be characterized when the chip comes back.

  16. Dynamic model reduction using data-driven Loewner-framework applied to thermally morphing structures

    Science.gov (United States)

    Phoenix, Austin A.; Tarazaga, Pablo A.

    2017-05-01

    The work herein proposes the use of the data-driven Loewner-framework for reduced order modeling as applied to dynamic Finite Element Models (FEM) of thermally morphing structures. The Loewner-based modeling approach is computationally efficient and accurately constructs reduced models using analytical output data from a FEM. This paper details the two-step process proposed in the Loewner approach. First, a random vibration FEM simulation is used as the input for the development of a Single Input Single Output (SISO) data-based dynamic Loewner state space model. Second, an SVD-based truncation is used on the Loewner state space model, such that the minimal, dynamically representative, state space model is achieved. For this second part, varying levels of reduction are generated and compared. The work herein can be extended to model generation using experimental measurements by replacing the FEM output data in the first step and following the same procedure. This method will be demonstrated on two thermally morphing structures, a rigidly fixed hexapod in multiple geometric configurations and a low mass anisotropic morphing boom. This paper is working to detail the method and identify the benefits of the reduced model methodology.

  17. Outcomes from the GLEON fellowship program. Training graduate students in data driven network science.

    Science.gov (United States)

    Dugan, H.; Hanson, P. C.; Weathers, K. C.

    2016-12-01

    In the water sciences there is a massive need for graduate students who possess the analytical and technical skills to deal with large datasets and function in the new paradigm of open, collaborative -science. The Global Lake Ecological Observatory Network (GLEON) graduate fellowship program (GFP) was developed as an interdisciplinary training program to supplement the intensive disciplinary training of traditional graduate education. The primary goal of the GFP was to train a diverse cohort of graduate students in network science, open-web technologies, collaboration, and data analytics, and importantly to provide the opportunity to use these skills to conduct collaborative research resulting in publishable scientific products. The GFP is run as a series of three week-long workshops over two years that brings together a cohort of twelve students. In addition, fellows are expected to attend and contribute to at least one international GLEON all-hands' meeting. Here, we provide examples of training modules in the GFP (model building, data QA/QC, information management, bayesian modeling, open coding/version control, national data programs), as well as scientific outputs (manuscripts, software products, and new global datasets) produced by the fellows, as well as the process by which this team science was catalyzed. Data driven education that lets students apply learned skills to real research projects reinforces concepts, provides motivation, and can benefit their publication record. This program design is extendable to other institutions and networks.

  18. Automatic data-driven real-time segmentation and recognition of surgical workflow.

    Science.gov (United States)

    Dergachyova, Olga; Bouget, David; Huaulmé, Arnaud; Morandi, Xavier; Jannin, Pierre

    2016-06-01

    With the intention of extending the perception and action of surgical staff inside the operating room, the medical community has expressed a growing interest towards context-aware systems. Requiring an accurate identification of the surgical workflow, such systems make use of data from a diverse set of available sensors. In this paper, we propose a fully data-driven and real-time method for segmentation and recognition of surgical phases using a combination of video data and instrument usage signals, exploiting no prior knowledge. We also introduce new validation metrics for assessment of workflow detection. The segmentation and recognition are based on a four-stage process. Firstly, during the learning time, a Surgical Process Model is automatically constructed from data annotations to guide the following process. Secondly, data samples are described using a combination of low-level visual cues and instrument information. Then, in the third stage, these descriptions are employed to train a set of AdaBoost classifiers capable of distinguishing one surgical phase from others. Finally, AdaBoost responses are used as input to a Hidden semi-Markov Model in order to obtain a final decision. On the MICCAI EndoVis challenge laparoscopic dataset we achieved a precision and a recall of 91 % in classification of 7 phases. Compared to the analysis based on one data type only, a combination of visual features and instrument signals allows better segmentation, reduction of the detection delay and discovery of the correct phase order.

  19. A Data-Driven Air Transportation Delay Propagation Model Using Epidemic Process Models

    Directory of Open Access Journals (Sweden)

    B. Baspinar

    2016-01-01

    Full Text Available In air transport network management, in addition to defining the performance behavior of the system’s components, identification of their interaction dynamics is a delicate issue in both strategic and tactical decision-making process so as to decide which elements of the system are “controlled” and how. This paper introduces a novel delay propagation model utilizing epidemic spreading process, which enables the definition of novel performance indicators and interaction rates of the elements of the air transportation network. In order to understand the behavior of the delay propagation over the network at different levels, we have constructed two different data-driven epidemic models approximating the dynamics of the system: (a flight-based epidemic model and (b airport-based epidemic model. The flight-based epidemic model utilizing SIS epidemic model focuses on the individual flights where each flight can be in susceptible or infected states. The airport-centric epidemic model, in addition to the flight-to-flight interactions, allows us to define the collective behavior of the airports, which are modeled as metapopulations. In network model construction, we have utilized historical flight-track data of Europe and performed analysis for certain days involving certain disturbances. Through this effort, we have validated the proposed delay propagation models under disruptive events.

  20. A transparent and data-driven global tectonic regionalization model for seismic hazard assessment

    Science.gov (United States)

    Chen, Yen-Shin; Weatherill, Graeme; Pagani, Marco; Cotton, Fabrice

    2018-05-01

    A key concept that is common to many assumptions inherent within seismic hazard assessment is that of tectonic similarity. This recognizes that certain regions of the globe may display similar geophysical characteristics, such as in the attenuation of seismic waves, the magnitude scaling properties of seismogenic sources or the seismic coupling of the lithosphere. Previous attempts at tectonic regionalization, particularly within a seismic hazard assessment context, have often been based on expert judgements; in most of these cases, the process for delineating tectonic regions is neither reproducible nor consistent from location to location. In this work, the regionalization process is implemented in a scheme that is reproducible, comprehensible from a geophysical rationale, and revisable when new relevant data are published. A spatial classification-scheme is developed based on fuzzy logic, enabling the quantification of concepts that are approximate rather than precise. Using the proposed methodology, we obtain a transparent and data-driven global tectonic regionalization model for seismic hazard applications as well as the subjective probabilities (e.g. degree of being active/degree of being cratonic) that indicate the degree to which a site belongs in a tectonic category.

  1. Modern data-driven decision support systems: the role of computing with words and computational linguistics

    Science.gov (United States)

    Kacprzyk, Janusz; Zadrożny, Sławomir

    2010-05-01

    We present how the conceptually and numerically simple concept of a fuzzy linguistic database summary can be a very powerful tool for gaining much insight into the very essence of data. The use of linguistic summaries provides tools for the verbalisation of data analysis (mining) results which, in addition to the more commonly used visualisation, e.g. via a graphical user interface, can contribute to an increased human consistency and ease of use, notably for supporting decision makers via the data-driven decision support system paradigm. Two new relevant aspects of the analysis are also outlined which were first initiated by the authors. First, following Kacprzyk and Zadrożny, it is further considered how linguistic data summarisation is closely related to some types of solutions used in natural language generation (NLG). This can make it possible to use more and more effective and efficient tools and techniques developed in NLG. Second, similar remarks are given on relations to systemic functional linguistics. Moreover, following Kacprzyk and Zadrożny, comments are given on an extremely relevant aspect of scalability of linguistic summarisation of data, using a new concept of a conceptual scalability.

  2. Data-driven classification of ventilated lung tissues using electrical impedance tomography

    International Nuclear Information System (INIS)

    Gómez-Laberge, Camille; Hogan, Matthew J; Elke, Gunnar; Weiler, Norbert; Frerichs, Inéz; Adler, Andy

    2011-01-01

    Current methods for identifying ventilated lung regions utilizing electrical impedance tomography images rely on dividing the image into arbitrary regions of interest (ROI), manually delineating ROI, or forming ROI with pixels whose signal properties surpass an arbitrary threshold. In this paper, we propose a novel application of a data-driven classification method to identify ventilated lung ROI based on forming k clusters from pixels with correlated signals. A standard first-order model for lung mechanics is then applied to determine which ROI correspond to ventilated lung tissue. We applied the method in an experimental study of 16 mechanically ventilated swine in the supine position, which underwent changes in positive end-expiratory pressure (PEEP) and fraction of inspired oxygen (F I O 2 ). In each stage of the experimental protocol, the method performed best with k = 4 and consistently identified 3 lung tissue ROI and 1 boundary tissue ROI in 15 of the 16 subjects. When testing for changes from baseline in lung position, tidal volume, and respiratory system compliance, we found that PEEP displaced the ventilated lung region dorsally by 2 cm, decreased tidal volume by 1.3%, and increased the respiratory system compliance time constant by 0.3 s. F I O 2 decreased tidal volume by 0.7%. All effects were tested at p < 0.05 with n = 16. These findings suggest that the proposed ROI detection method is robust and sensitive to ventilation dynamics in the experimental setting

  3. Data-driven asthma endotypes defined from blood biomarker and gene expression data.

    Directory of Open Access Journals (Sweden)

    Barbara Jane George

    Full Text Available The diagnosis and treatment of childhood asthma is complicated by its mechanistically distinct subtypes (endotypes driven by genetic susceptibility and modulating environmental factors. Clinical biomarkers and blood gene expression were collected from a stratified, cross-sectional study of asthmatic and non-asthmatic children from Detroit, MI. This study describes four distinct asthma endotypes identified via a purely data-driven method. Our method was specifically designed to integrate blood gene expression and clinical biomarkers in a way that provides new mechanistic insights regarding the different asthma endotypes. For example, we describe metabolic syndrome-induced systemic inflammation as an associated factor in three of the four asthma endotypes. Context provided by the clinical biomarker data was essential in interpreting gene expression patterns and identifying putative endotypes, which emphasizes the importance of integrated approaches when studying complex disease etiologies. These synthesized patterns of gene expression and clinical markers from our research may lead to development of novel serum-based biomarker panels.

  4. Data Driven - Android based displays on data acquisition and system status

    CERN Document Server

    Canilho, Paulo

    2014-01-01

    For years, both hardware and software engineers have struggled with the acquisition of device information in a flexible and fast perspective, numerous devices cannot have their status quickly tested due to time limitation associated with the travelling to a computer terminal. For instance, in order to test a scintillator status, one has to inject beam into the device and quickly return to a terminal to see the results, this is not only time demanding but extremely inconvenient for the person responsible, it consumes time that would be used in more pressing matters. In this train of thoughts, the proposal of creating an interface to bring a stable, flexible, user friendly and data driven solution to this problem was created. Being the most common operative system for mobile display, the Android API proved to have the best efficient in financing, since it is based on an open source software, and in implementation difficulty since it’s backend development resides in JAVA calls and XML for visual representation...

  5. NERI PROJECT 99-119. TASK 2. DATA-DRIVEN PREDICTION OF PROCESS VARIABLES. FINAL REPORT

    Energy Technology Data Exchange (ETDEWEB)

    Upadhyaya, B.R.

    2003-04-10

    This report describes the detailed results for task 2 of DOE-NERI project number 99-119 entitled ''Automatic Development of Highly Reliable Control Architecture for Future Nuclear Power Plants''. This project is a collaboration effort between the Oak Ridge National Laboratory (ORNL,) The University of Tennessee, Knoxville (UTK) and the North Carolina State University (NCSU). UTK is the lead organization for Task 2 under contract number DE-FG03-99SF21906. Under task 2 we completed the development of data-driven models for the characterization of sub-system dynamics for predicting state variables, control functions, and expected control actions. We have also developed the ''Principal Component Analysis (PCA)'' approach for mapping system measurements, and a nonlinear system modeling approach called the ''Group Method of Data Handling (GMDH)'' with rational functions, and includes temporal data information for transient characterization. The majority of the results are presented in detailed reports for Phases 1 through 3 of our research, which are attached to this report.

  6. Data-driven fault detection, isolation and estimation of aircraft gas turbine engine actuator and sensors

    Science.gov (United States)

    Naderi, E.; Khorasani, K.

    2018-02-01

    In this work, a data-driven fault detection, isolation, and estimation (FDI&E) methodology is proposed and developed specifically for monitoring the aircraft gas turbine engine actuator and sensors. The proposed FDI&E filters are directly constructed by using only the available system I/O data at each operating point of the engine. The healthy gas turbine engine is stimulated by a sinusoidal input containing a limited number of frequencies. First, the associated system Markov parameters are estimated by using the FFT of the input and output signals to obtain the frequency response of the gas turbine engine. These data are then used for direct design and realization of the fault detection, isolation and estimation filters. Our proposed scheme therefore does not require any a priori knowledge of the system linear model or its number of poles and zeros at each operating point. We have investigated the effects of the size of the frequency response data on the performance of our proposed schemes. We have shown through comprehensive case studies simulations that desirable fault detection, isolation and estimation performance metrics defined in terms of the confusion matrix criterion can be achieved by having access to only the frequency response of the system at only a limited number of frequencies.

  7. A data-driven, mathematical model of mammalian cell cycle regulation.

    Directory of Open Access Journals (Sweden)

    Michael C Weis

    Full Text Available Few of >150 published cell cycle modeling efforts use significant levels of data for tuning and validation. This reflects the difficultly to generate correlated quantitative data, and it points out a critical uncertainty in modeling efforts. To develop a data-driven model of cell cycle regulation, we used contiguous, dynamic measurements over two time scales (minutes and hours calculated from static multiparametric cytometry data. The approach provided expression profiles of cyclin A2, cyclin B1, and phospho-S10-histone H3. The model was built by integrating and modifying two previously published models such that the model outputs for cyclins A and B fit cyclin expression measurements and the activation of B cyclin/Cdk1 coincided with phosphorylation of histone H3. The model depends on Cdh1-regulated cyclin degradation during G1, regulation of B cyclin/Cdk1 activity by cyclin A/Cdk via Wee1, and transcriptional control of the mitotic cyclins that reflects some of the current literature. We introduced autocatalytic transcription of E2F, E2F regulated transcription of cyclin B, Cdc20/Cdh1 mediated E2F degradation, enhanced transcription of mitotic cyclins during late S/early G2 phase, and the sustained synthesis of cyclin B during mitosis. These features produced a model with good correlation between state variable output and real measurements. Since the method of data generation is extensible, this model can be continually modified based on new correlated, quantitative data.

  8. A Dynamic Remote Sensing Data-Driven Approach for Oil Spill Simulation in the Sea

    Directory of Open Access Journals (Sweden)

    Jining Yan

    2015-05-01

    Full Text Available In view of the fact that oil spill remote sensing could only generate the oil slick information at a specific time and that traditional oil spill simulation models were not designed to deal with dynamic conditions, a dynamic data-driven application system (DDDAS was introduced. The DDDAS entails both the ability to incorporate additional data into an executing application and, in reverse, the ability of applications to dynamically steer the measurement process. Based on the DDDAS, combing a remote sensor system that detects oil spills with a numerical simulation, an integrated data processing, analysis, forecasting and emergency response system was established. Once an oil spill accident occurs, the DDDAS-based oil spill model receives information about the oil slick extracted from the dynamic remote sensor data in the simulation. Through comparison, information fusion and feedback updates, continuous and more precise oil spill simulation results can be obtained. Then, the simulation results can provide help for disaster control and clean-up. The Penglai, Xingang and Suizhong oil spill results showed our simulation model could increase the prediction accuracy and reduce the error caused by empirical parameters in existing simulation systems. Therefore, the DDDAS-based detection and simulation system can effectively improve oil spill simulation and diffusion forecasting, as well as provide decision-making information and technical support for emergency responses to oil spills.

  9. A data-driven decomposition approach to model aerodynamic forces on flapping airfoils

    Science.gov (United States)

    Raiola, Marco; Discetti, Stefano; Ianiro, Andrea

    2017-11-01

    In this work, we exploit a data-driven decomposition of experimental data from a flapping airfoil experiment with the aim of isolating the main contributions to the aerodynamic force and obtaining a phenomenological model. Experiments are carried out on a NACA 0012 airfoil in forward flight with both heaving and pitching motion. Velocity measurements of the near field are carried out with Planar PIV while force measurements are performed with a load cell. The phase-averaged velocity fields are transformed into the wing-fixed reference frame, allowing for a description of the field in a domain with fixed boundaries. The decomposition of the flow field is performed by means of the POD applied on the velocity fluctuations and then extended to the phase-averaged force data by means of the Extended POD approach. This choice is justified by the simple consideration that aerodynamic forces determine the largest contributions to the energetic balance in the flow field. Only the first 6 modes have a relevant contribution to the force. A clear relationship can be drawn between the force and the flow field modes. Moreover, the force modes are closely related (yet slightly different) to the contributions of the classic potential models in literature, allowing for their correction. This work has been supported by the Spanish MINECO under Grant TRA2013-41103-P.

  10. A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults

    Directory of Open Access Journals (Sweden)

    Rui Sun

    2017-09-01

    Full Text Available The use of Unmanned Aerial Vehicles (UAVs has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs’ flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS-based approach is presented for the detection of on-board navigation sensor faults in UAVs. Contrary to the classic UAV sensor fault detection algorithms, based on predefined or modelled faults, the proposed algorithm combines an online data training mechanism with the ANFIS-based decision system. The main advantages of this algorithm are that it allows real-time model-free residual analysis from Kalman Filter (KF estimates and the ANFIS to build a reliable fault detection system. In addition, it allows fast and accurate detection of faults, which makes it suitable for real-time applications. Experimental results have demonstrated the effectiveness of the proposed fault detection method in terms of accuracy and misdetection rate.

  11. Data-driven strategies for robust forecast of continuous glucose monitoring time-series.

    Science.gov (United States)

    Fiorini, Samuele; Martini, Chiara; Malpassi, Davide; Cordera, Renzo; Maggi, Davide; Verri, Alessandro; Barla, Annalisa

    2017-07-01

    Over the past decade, continuous glucose monitoring (CGM) has proven to be a very resourceful tool for diabetes management. To date, CGM devices are employed for both retrospective and online applications. Their use allows to better describe the patients' pathology as well as to achieve a better control of patients' level of glycemia. The analysis of CGM sensor data makes possible to observe a wide range of metrics, such as the glycemic variability during the day or the amount of time spent below or above certain glycemic thresholds. However, due to the high variability of the glycemic signals among sensors and individuals, CGM data analysis is a non-trivial task. Standard signal filtering solutions fall short when an appropriate model personalization is not applied. State-of-the-art data-driven strategies for online CGM forecasting rely upon the use of recursive filters. Each time a new sample is collected, such models need to adjust their parameters in order to predict the next glycemic level. In this paper we aim at demonstrating that the problem of online CGM forecasting can be successfully tackled by personalized machine learning models, that do not need to recursively update their parameters.

  12. Pengembangan Data Warehouse Menggunakan Pendekatan Data-Driven untuk Membantu Pengelolaan SDM

    Directory of Open Access Journals (Sweden)

    Mujiono Mujiono

    2016-01-01

    Full Text Available The basis of bureaucratic reform is the reform of human resources management. One supporting factor is the development of an employee database. To support the management of human resources required including data warehouse and business intelligent tools. The data warehouse is an integrated concept of reliable data storage to provide support to all the needs of the data analysis. In this study developed a data warehouse using the data-driven approach to the source data comes from SIMPEG, SAPK and electronic presence. Data warehouses are designed using the nine steps methodology and unified modeling language (UML notation. Extract transform load (ETL is done by using Pentaho Data Integration by applying transformation maps. Furthermore, to help human resource management, the system is built to perform online analytical processing (OLAP to facilitate web-based information. In this study generated BI application development framework with Model-View-Controller (MVC architecture and OLAP operations are built using the dynamic query generation, PivotTable, and HighChart to present information about PNS, CPNS, Retirement, Kenpa and Presence

  13. School Choice, Gentrification, and the Variable Significance of Racial Stratification in Urban Neighborhoods

    Science.gov (United States)

    Pearman, Francis A., III; Swain, Walker A.

    2017-01-01

    Racial and socioeconomic stratification have long governed patterns of residential sorting in the American metropolis. However, recent expansions of school choice policies that allow parents to select schools outside their neighborhood raise questions as to whether this weakening of the neighborhood-school connection might influence the…

  14. Abnormal Resting-State Functional Connectivity in Patients with Chronic Fatigue Syndrome: Results of Seed and Data-Driven Analyses.

    Science.gov (United States)

    Gay, Charles W; Robinson, Michael E; Lai, Song; O'Shea, Andrew; Craggs, Jason G; Price, Donald D; Staud, Roland

    2016-02-01

    Although altered resting-state functional connectivity (FC) is a characteristic of many chronic pain conditions, it has not yet been evaluated in patients with chronic fatigue. Our objective was to investigate the association between fatigue and altered resting-state FC in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). Thirty-six female subjects, 19 ME/CFS and 17 healthy controls, completed a fatigue inventory before undergoing functional magnetic resonance imaging. Two methods, (1) data driven and (2) model based, were used to estimate and compare the intraregional FC between both groups during the resting state (RS). The first approach using independent component analysis was applied to investigate five RS networks: the default mode network, salience network (SN), left frontoparietal networks (LFPN) and right frontoparietal networks, and the sensory motor network (SMN). The second approach used a priori selected seed regions demonstrating abnormal regional cerebral blood flow (rCBF) in ME/CFS patients at rest. In ME/CFS patients, Method-1 identified decreased intrinsic connectivity among regions within the LFPN. Furthermore, the FC of the left anterior midcingulate with the SMN and the connectivity of the left posterior cingulate cortex with the SN were significantly decreased. For Method-2, five distinct clusters within the right parahippocampus and occipital lobes, demonstrating significant rCBF reductions in ME/CFS patients, were used as seeds. The parahippocampal seed and three occipital lobe seeds showed altered FC with other brain regions. The degree of abnormal connectivity correlated with the level of self-reported fatigue. Our results confirm altered RS FC in patients with ME/CFS, which was significantly correlated with the severity of their chronic fatigue.

  15. Neighborhood perceptions and allostatic load

    DEFF Research Database (Denmark)

    van Deurzen, Ioana; Rod, Naja Hulvej; Christensen, Ulla

    2016-01-01

    An influential argument explaining why living in certain neighborhoods can become harmful to one's health maintains that individuals can perceive certain characteristics of the neighborhood as threatening and the prolonged exposure to a threatening environment could induce chronic stress. Following...... this line of argumentation, in the present study we test whether subjective perceptions of neighborhood characteristics relate to an objective measure of stress-related physiological functioning, namely allostatic load (AL). We use a large dataset of 5280 respondents living in different regions of Denmark...... and we account for two alternative mechanisms, i.e., the objective characteristics of the living environment and the socio-economic status of individuals. Our results support the chronic stress mechanisms linking neighborhood quality to health. Heightened perceptions of disorder and pollution were found...

  16. Durham Neighborhood Compass Block Groups

    Data.gov (United States)

    City and County of Durham, North Carolina — The Durham Neighborhood Compass is a quantitative indicators project with qualitative values, integrating data from local government, the Census Bureau and other...

  17. Conduct Disorder and Neighborhood Effects.

    Science.gov (United States)

    Jennings, Wesley G; Perez, Nicholas M; Reingle Gonzalez, Jennifer M

    2018-05-07

    There has been a considerable amount of scholarly attention to the relationship between neighborhood effects and conduct disorder, particularly in recent years. Having said this, it has been nearly two decades since a comprehensive synthesis of this literature has been conducted. Relying on a detailed and comprehensive search strategy and inclusion criteria, this article offers a systematic and interdisciplinary review of 47 empirical studies that have examined neighborhood effects and conduct disorder. Described results suggest that there are generally robust linkages between adverse neighborhood factors and conduct disorder and externalizing behavior problems, as 67 of the 93 (72.04%) effect sizes derived from these studies yielded statistically significant neighborhood effects. The review also identifies salient mediating and moderating influences. It discusses study limitations and directions for future research as well.

  18. Review of the Remaining Useful Life Prognostics of Vehicle Lithium-Ion Batteries Using Data-Driven Methodologies

    Directory of Open Access Journals (Sweden)

    Lifeng Wu

    2016-05-01

    Full Text Available Lithium-ion batteries are the primary power source in electric vehicles, and the prognosis of their remaining useful life is vital for ensuring the safety, stability, and long lifetime of electric vehicles. Accurately establishing a mechanism model of a vehicle lithium-ion battery involves a complex electrochemical process. Remaining useful life (RUL prognostics based on data-driven methods has become a focus of research. Current research on data-driven methodologies is summarized in this paper. By analyzing the problems of vehicle lithium-ion batteries in practical applications, the problems that need to be solved in the future are identified.

  19. Development of a Data-Driven Predictive Model of Supply Air Temperature in an Air-Handling Unit for Conserving Energy

    Directory of Open Access Journals (Sweden)

    Goopyo Hong

    2018-02-01

    Full Text Available The purpose of this study was to develop a data-driven predictive model that can predict the supply air temperature (SAT in an air-handling unit (AHU by using a neural network. A case study was selected, and AHU operational data from December 2015 to November 2016 was collected. A data-driven predictive model was generated through an evolving process that consisted of an initial model, an optimal model, and an adaptive model. In order to develop the optimal model, input variables, the number of neurons and hidden layers, and the period of the training data set were considered. Since AHU data changes over time, an adaptive model, which has the ability to actively cope with constantly changing data, was developed. This adaptive model determined the model with the lowest mean square error (MSE of the 91 models, which had two hidden layers and sets up a 12-hour test set at every prediction. The adaptive model used recently collected data as training data and utilized the sliding window technique rather than the accumulative data method. Furthermore, additional testing was performed to validate the adaptive model using AHU data from another building. The final adaptive model predicts SAT to a root mean square error (RMSE of less than 0.6 °C.

  20. Collaborative Project: The problem of bias in defining uncertainty in computationally enabled strategies for data-driven climate model development. Final Technical Report.

    Energy Technology Data Exchange (ETDEWEB)

    Huerta, Gabriel [Univ. of New Mexico, Albuquerque, NM (United States)

    2016-05-10

    The objective of the project is to develop strategies for better representing scientific sensibilities within statistical measures of model skill that then can be used within a Bayesian statistical framework for data-driven climate model development and improved measures of model scientific uncertainty. One of the thorny issues in model evaluation is quantifying the effect of biases on climate projections. While any bias is not desirable, only those biases that affect feedbacks affect scatter in climate projections. The effort at the University of Texas is to analyze previously calculated ensembles of CAM3.1 with perturbed parameters to discover how biases affect projections of global warming. The hypothesis is that compensating errors in the control model can be identified by their effect on a combination of processes and that developing metrics that are sensitive to dependencies among state variables would provide a way to select version of climate models that may reduce scatter in climate projections. Gabriel Huerta at the University of New Mexico is responsible for developing statistical methods for evaluating these field dependencies. The UT effort will incorporate these developments into MECS, which is a set of python scripts being developed at the University of Texas for managing the workflow associated with data-driven climate model development over HPC resources. This report reflects the main activities at the University of New Mexico where the PI (Huerta) and the Postdocs (Nosedal, Hattab and Karki) worked on the project.

  1. Nearest neighbors by neighborhood counting.

    Science.gov (United States)

    Wang, Hui

    2006-06-01

    Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard" and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it will work for other types of data.

  2. Intelligent fault diagnosis of rolling bearing based on kernel neighborhood rough sets and statistical features

    Energy Technology Data Exchange (ETDEWEB)

    Zhu, Xiao Ran; Zhang, You Yun; Zhu, Yong Sheng [Xi' an Jiaotong Univ., Xi' an (China)

    2012-09-15

    Intelligent fault diagnosis benefits from efficient feature selection. Neighborhood rough sets are effective in feature selection. However, determining the neighborhood value accurately remains a challenge. The wrapper feature selection algorithm is designed by combining the kernel method and neighborhood rough sets to self-adaptively select sensitive features. The combination effectively solves the shortcomings in selecting the neighborhood value in the previous application process. The statistical features of time and frequency domains are used to describe the characteristic of the rolling bearing to make the intelligent fault diagnosis approach work. Three classification algorithms, namely, classification and regression tree (CART), commercial version 4.5 (C4.5), and radial basis function support vector machines (RBFSVM), are used to test UCI datasets and 10 fault datasets of rolling bearing. The results indicate that the diagnostic approach presented could effectively select the sensitive fault features and simultaneously identify the type and degree of the fault.

  3. Intelligent fault diagnosis of rolling bearing based on kernel neighborhood rough sets and statistical features

    International Nuclear Information System (INIS)

    Zhu, Xiao Ran; Zhang, You Yun; Zhu, Yong Sheng

    2012-01-01

    Intelligent fault diagnosis benefits from efficient feature selection. Neighborhood rough sets are effective in feature selection. However, determining the neighborhood value accurately remains a challenge. The wrapper feature selection algorithm is designed by combining the kernel method and neighborhood rough sets to self-adaptively select sensitive features. The combination effectively solves the shortcomings in selecting the neighborhood value in the previous application process. The statistical features of time and frequency domains are used to describe the characteristic of the rolling bearing to make the intelligent fault diagnosis approach work. Three classification algorithms, namely, classification and regression tree (CART), commercial version 4.5 (C4.5), and radial basis function support vector machines (RBFSVM), are used to test UCI datasets and 10 fault datasets of rolling bearing. The results indicate that the diagnostic approach presented could effectively select the sensitive fault features and simultaneously identify the type and degree of the fault

  4. A systematic review of relations between neighborhoods and mental health.

    Science.gov (United States)

    Truong, Khoa D; Ma, Sai

    2006-09-01

    The relationship between neighborhood characteristics and resident mental health has been widely investigated in individual studies in recent years, but this literature is not adequately reviewed. To systematically review relevant individual research of the relation between neighborhoods and adult mental health by identifying and synthesizing all relevant studies in this literature. We conducted an electronic search with PubMed and PsycINFO, and manual reference-checking, resulting in 8,562 screened studies of which 29 were selected. Studies were included in the main synthesis if they (i) were published in English in peer reviewed journals; (ii) had relevant definitions and measures of neighborhood characteristics; (iii) utilized standardized measures of adult mental health; (iv) controlled for individual characteristics; (v) reported quantitative results; and, (vi) studied a population in a developed country. We focused on two key areas within this literature: the methodologies utilized to study neighborhood effects and quantitative results. With regard to the former, we examined five major issues: (i) definitions and measures of neighborhoods; (ii) definitions and measures of mental health; (iii) controls for individual level characteristics; (iv) conceptual models; and (v) analytical models. As for quantitative results, the relation was reviewed by types of neighborhood characteristics. We summarized general quantitative findings and drew common conclusions across groups of studies. 27/29 studies found statistically significant association between mental health and at least one measure of neighborhood characteristics, after adjusting for individual factors. This association was evident for all types of neighborhood features, varying from sociodemographic characteristics to physical environment, and from objective to subjective measures. Neighborhood effects were weakened when adding individual-level characteristics into the regression models, and were generally

  5. Open-source chemogenomic data-driven algorithms for predicting drug-target interactions.

    Science.gov (United States)

    Hao, Ming; Bryant, Stephen H; Wang, Yanli

    2018-02-06

    While novel technologies such as high-throughput screening have advanced together with significant investment by pharmaceutical companies during the past decades, the success rate for drug development has not yet been improved prompting researchers looking for new strategies of drug discovery. Drug repositioning is a potential approach to solve this dilemma. However, experimental identification and validation of potential drug targets encoded by the human genome is both costly and time-consuming. Therefore, effective computational approaches have been proposed to facilitate drug repositioning, which have proved to be successful in drug discovery. Doubtlessly, the availability of open-accessible data from basic chemical biology research and the success of human genome sequencing are crucial to develop effective in silico drug repositioning methods allowing the identification of potential targets for existing drugs. In this work, we review several chemogenomic data-driven computational algorithms with source codes publicly accessible for predicting drug-target interactions (DTIs). We organize these algorithms by model properties and model evolutionary relationships. We re-implemented five representative algorithms in R programming language, and compared these algorithms by means of mean percentile ranking, a new recall-based evaluation metric in the DTI prediction research field. We anticipate that this review will be objective and helpful to researchers who would like to further improve existing algorithms or need to choose appropriate algorithms to infer potential DTIs in the projects. The source codes for DTI predictions are available at: https://github.com/minghao2016/chemogenomicAlg4DTIpred. Published by Oxford University Press 2018. This work is written by US Government employees and is in the public domain in the US.

  6. Access Control with Delegated Authorization Policy Evaluation for Data-Driven Microservice Workflows

    Directory of Open Access Journals (Sweden)

    Davy Preuveneers

    2017-09-01

    Full Text Available Microservices offer a compelling competitive advantage for building data flow systems as a choreography of self-contained data endpoints that each implement a specific data processing functionality. Such a ‘single responsibility principle’ design makes them well suited for constructing scalable and flexible data integration and real-time data flow applications. In this paper, we investigate microservice based data processing workflows from a security point of view, i.e., (1 how to constrain data processing workflows with respect to dynamic authorization policies granting or denying access to certain microservice results depending on the flow of the data; (2 how to let multiple microservices contribute to a collective data-driven authorization decision and (3 how to put adequate measures in place such that the data within each individual microservice is protected against illegitimate access from unauthorized users or other microservices. Due to this multifold objective, enforcing access control on the data endpoints to prevent information leakage or preserve one’s privacy becomes far more challenging, as authorization policies can have dependencies and decision outcomes cross-cutting data in multiple microservices. To address this challenge, we present and evaluate a workflow-oriented authorization framework that enforces authorization policies in a decentralized manner and where the delegated policy evaluation leverages feature toggles that are managed at runtime by software circuit breakers to secure the distributed data processing workflows. The benefit of our solution is that, on the one hand, authorization policies restrict access to the data endpoints of the microservices, and on the other hand, microservices can safely rely on other data endpoints to collectively evaluate cross-cutting access control decisions without having to rely on a shared storage backend holding all the necessary information for the policy evaluation.

  7. A data-driven approach for denoising GNSS position time series

    Science.gov (United States)

    Li, Yanyan; Xu, Caijun; Yi, Lei; Fang, Rongxin

    2017-12-01

    Global navigation satellite system (GNSS) datasets suffer from common mode error (CME) and other unmodeled errors. To decrease the noise level in GNSS positioning, we propose a new data-driven adaptive multiscale denoising method in this paper. Both synthetic and real-world long-term GNSS datasets were employed to assess the performance of the proposed method, and its results were compared with those of stacking filtering, principal component analysis (PCA) and the recently developed multiscale multiway PCA. It is found that the proposed method can significantly eliminate the high-frequency white noise and remove the low-frequency CME. Furthermore, the proposed method is more precise for denoising GNSS signals than the other denoising methods. For example, in the real-world example, our method reduces the mean standard deviation of the north, east and vertical components from 1.54 to 0.26, 1.64 to 0.21 and 4.80 to 0.72 mm, respectively. Noise analysis indicates that for the original signals, a combination of power-law plus white noise model can be identified as the best noise model. For the filtered time series using our method, the generalized Gauss-Markov model is the best noise model with the spectral indices close to - 3, indicating that flicker walk noise can be identified. Moreover, the common mode error in the unfiltered time series is significantly reduced by the proposed method. After filtering with our method, a combination of power-law plus white noise model is the best noise model for the CMEs in the study region.

  8. Dynamic Data-Driven Prediction of Lean Blowout in a Swirl-Stabilized Combustor

    Directory of Open Access Journals (Sweden)

    Soumalya Sarkar

    2015-09-01

    Full Text Available This paper addresses dynamic data-driven prediction of lean blowout (LBO phenomena in confined combustion processes, which are prevalent in many physical applications (e.g., land-based and aircraft gas-turbine engines. The underlying concept is built upon pattern classification and is validated for LBO prediction with time series of chemiluminescence sensor data from a laboratory-scale swirl-stabilized dump combustor. The proposed method of LBO prediction makes use of the theory of symbolic dynamics, where (finite-length time series data are partitioned to produce symbol strings that, in turn, generate a special class of probabilistic finite state automata (PFSA. These PFSA, called D-Markov machines, have a deterministic algebraic structure and their states are represented by symbol blocks of length D or less, where D is a positive integer. The D-Markov machines are constructed in two steps: (i state splitting, i.e., the states are split based on their information contents, and (ii state merging, i.e., two or more states (of possibly different lengths are merged together to form a new state without any significant loss of the embedded information. The modeling complexity (e.g., number of states of a D-Markov machine model is observed to be drastically reduced as the combustor approaches LBO. An anomaly measure, based on Kullback-Leibler divergence, is constructed to predict the proximity of LBO. The problem of LBO prediction is posed in a pattern classification setting and the underlying algorithms have been tested on experimental data at different extents of fuel-air premixing and fuel/air ratio. It is shown that, over a wide range of fuel-air premixing, D-Markov machines with D > 1 perform better as predictors of LBO than those with D = 1.

  9. Full field reservoir modeling of shale assets using advanced data-driven analytics

    Directory of Open Access Journals (Sweden)

    Soodabeh Esmaili

    2016-01-01

    Full Text Available Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism (sorption process and flow behavior in complex fracture systems - induced or natural leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this technology is that it incorporates the so-called “hard data” directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The “hard data” refers to field measurements during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of “soft data” (non-measured, interpretive data such as frac length, width, height and conductivity in the reservoir models. The study focuses on a Marcellus shale asset that includes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.

  10. Assigning clinical codes with data-driven concept representation on Dutch clinical free text.

    Science.gov (United States)

    Scheurwegs, Elyne; Luyckx, Kim; Luyten, Léon; Goethals, Bart; Daelemans, Walter

    2017-05-01

    Clinical codes are used for public reporting purposes, are fundamental to determining public financing for hospitals, and form the basis for reimbursement claims to insurance providers. They are assigned to a patient stay to reflect the diagnosis and performed procedures during that stay. This paper aims to enrich algorithms for automated clinical coding by taking a data-driven approach and by using unsupervised and semi-supervised techniques for the extraction of multi-word expressions that convey a generalisable medical meaning (referred to as concepts). Several methods for extracting concepts from text are compared, two of which are constructed from a large unannotated corpus of clinical free text. A distributional semantic model (i.c. the word2vec skip-gram model) is used to generalize over concepts and retrieve relations between them. These methods are validated on three sets of patient stay data, in the disease areas of urology, cardiology, and gastroenterology. The datasets are in Dutch, which introduces a limitation on available concept definitions from expert-based ontologies (e.g. UMLS). The results show that when expert-based knowledge in ontologies is unavailable, concepts derived from raw clinical texts are a reliable alternative. Both concepts derived from raw clinical texts perform and concepts derived from expert-created dictionaries outperform a bag-of-words approach in clinical code assignment. Adding features based on tokens that appear in a semantically similar context has a positive influence for predicting diagnostic codes. Furthermore, the experiments indicate that a distributional semantics model can find relations between semantically related concepts in texts but also introduces erroneous and redundant relations, which can undermine clinical coding performance. Copyright © 2017. Published by Elsevier Inc.

  11. WaveSeq: a novel data-driven method of detecting histone modification enrichments using wavelets.

    Directory of Open Access Journals (Sweden)

    Apratim Mitra

    Full Text Available BACKGROUND: Chromatin immunoprecipitation followed by next-generation sequencing is a genome-wide analysis technique that can be used to detect various epigenetic phenomena such as, transcription factor binding sites and histone modifications. Histone modification profiles can be either punctate or diffuse which makes it difficult to distinguish regions of enrichment from background noise. With the discovery of histone marks having a wide variety of enrichment patterns, there is an urgent need for analysis methods that are robust to various data characteristics and capable of detecting a broad range of enrichment patterns. RESULTS: To address these challenges we propose WaveSeq, a novel data-driven method of detecting regions of significant enrichment in ChIP-Seq data. Our approach utilizes the wavelet transform, is free of distributional assumptions and is robust to diverse data characteristics such as low signal-to-noise ratios and broad enrichment patterns. Using publicly available datasets we showed that WaveSeq compares favorably with other published methods, exhibiting high sensitivity and precision for both punctate and diffuse enrichment regions even in the absence of a control data set. The application of our algorithm to a complex histone modification data set helped make novel functional discoveries which further underlined its utility in such an experimental setup. CONCLUSIONS: WaveSeq is a highly sensitive method capable of accurate identification of enriched regions in a broad range of data sets. WaveSeq can detect both narrow and broad peaks with a high degree of accuracy even in low signal-to-noise ratio data sets. WaveSeq is also suited for application in complex experimental scenarios, helping make biologically relevant functional discoveries.

  12. Data Science and its Relationship to Big Data and Data-Driven Decision Making.

    Science.gov (United States)

    Provost, Foster; Fawcett, Tom

    2013-03-01

    Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot-even "sexy"-career choice. However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science. One reason is that data science is intricately intertwined with other important concepts also of growing importance, such as big data and data-driven decision making. Another reason is the natural tendency to associate what a practitioner does with the definition of the practitioner's field; this can result in overlooking the fundamentals of the field. We believe that trying to define the boundaries of data science precisely is not of the utmost importance. We can debate the boundaries of the field in an academic setting, but in order for data science to serve business effectively, it is important (i) to understand its relationships to other important related concepts, and (ii) to begin to identify the fundamental principles underlying data science. Once we embrace (ii), we can much better understand and explain exactly what data science has to offer. Furthermore, only once we embrace (ii) should we be comfortable calling it data science. In this article, we present a perspective that addresses all these concepts. We close by offering, as examples, a partial list of fundamental principles underlying data science.

  13. Practical aspects of data-driven motion correction approach for brain SPECT

    International Nuclear Information System (INIS)

    Kyme, A.Z.; Hutton, B.F.; Hatton, R.L.; Skerrett, D.; Barnden, L.

    2002-01-01

    Full text: Patient motion can cause image artifacts in SPECT despite restraining measures. Data-driven detection and correction of motion can be achieved by comparison of acquired data with the forward-projections. By optimising the orientation of a partial reconstruction, parameters can be obtained for each misaligned projection and applied to update this volume using a 3D reconstruction algorithm. Phantom validation was performed to explore practical aspects of this approach. Noisy projection datasets simulating a patient undergoing at least one fully 3D movement during acquisition were compiled from various projections of the digital Hoffman brain phantom. Motion correction was then applied to the reconstructed studies. Correction success was assessed visually and quantitatively. Resilience with respect to subset order and missing data in the reconstruction and updating stages, detector geometry considerations, and the need for implementing an iterated correction were assessed in the process. Effective correction of the corrupted studies was achieved. Visually, artifactual regions in the reconstructed slices were suppressed and/or removed. Typically the ratio of mean square difference between the corrected and reference studies compared to that between the corrupted and reference studies was > 2. Although components of the motions are missed using a single-head implementation, improvement was still evident in the correction. The need for multiple iterations in the approach was small due to the bulk of misalignment errors being corrected in the first pass. Dispersion of subsets for reconstructing and updating the partial reconstruction appears to give optimal correction. Further validation is underway using triple-head physical phantom data. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  14. Data-driven haemodynamic response function extraction using Fourier-wavelet regularised deconvolution

    Directory of Open Access Journals (Sweden)

    Roerdink Jos BTM

    2008-04-01

    Full Text Available Abstract Background We present a simple, data-driven method to extract haemodynamic response functions (HRF from functional magnetic resonance imaging (fMRI time series, based on the Fourier-wavelet regularised deconvolution (ForWaRD technique. HRF data are required for many fMRI applications, such as defining region-specific HRFs, effciently representing a general HRF, or comparing subject-specific HRFs. Results ForWaRD is applied to fMRI time signals, after removing low-frequency trends by a wavelet-based method, and the output of ForWaRD is a time series of volumes, containing the HRF in each voxel. Compared to more complex methods, this extraction algorithm requires few assumptions (separability of signal and noise in the frequency and wavelet domains and the general linear model and it is fast (HRF extraction from a single fMRI data set takes about the same time as spatial resampling. The extraction method is tested on simulated event-related activation signals, contaminated with noise from a time series of real MRI images. An application for HRF data is demonstrated in a simple event-related experiment: data are extracted from a region with significant effects of interest in a first time series. A continuous-time HRF is obtained by fitting a nonlinear function to the discrete HRF coeffcients, and is then used to analyse a later time series. Conclusion With the parameters used in this paper, the extraction method presented here is very robust to changes in signal properties. Comparison of analyses with fitted HRFs and with a canonical HRF shows that a subject-specific, regional HRF significantly improves detection power. Sensitivity and specificity increase not only in the region from which the HRFs are extracted, but also in other regions of interest.

  15. Management and Nonlinear Analysis of Disinfection System of Water Distribution Networks Using Data Driven Methods

    Directory of Open Access Journals (Sweden)

    Mohammad Zounemat-Kermani

    2018-03-01

    Full Text Available Chlorination unit is widely used to supply safe drinking water and removal of pathogens from water distribution networks. Data-driven approach is one appropriate method for analyzing performance of chlorine in water supply network. In this study, multi-layer perceptron neural network (MLP with three training algorithms (gradient descent, conjugate gradient and BFGS and support vector machine (SVM with RBF kernel function were used to predict the concentration of residual chlorine in water supply networks of Ahmadabad Dafeh and Ahruiyeh villages in Kerman Province. Daily data including discharge (flow, chlorine consumption and residual chlorine were employed from the beginning of 1391 Hijri until the end of 1393 Hijri (for 3 years. To assess the performance of studied models, the criteria such as Nash-Sutcliffe efficiency (NS, root mean square error (RMSE, mean absolute percentage error (MAPE and correlation coefficient (CORR were used that in best modeling situation were 0.9484, 0.0255, 1.081, and 0.974 respectively which resulted from BFGS algorithm. The criteria indicated that MLP model with BFGS and conjugate gradient algorithms were better than all other models in 90 and 10 percent of cases respectively; while the MLP model based on gradient descent algorithm and the SVM model were better in none of the cases. According to the results of this study, proper management of chlorine concentration can be implemented by predicted values of residual chlorine in water supply network. Thus, decreased performance of perceptron network and support vector machine in water supply network of Ahruiyeh in comparison to Ahmadabad Dafeh can be inferred from improper management of chlorination.

  16. Simulation of shallow groundwater levels: Comparison of a data-driven and a conceptual model

    Science.gov (United States)

    Fahle, Marcus; Dietrich, Ottfried; Lischeid, Gunnar

    2015-04-01

    Despite an abundance of models aimed at simulating shallow groundwater levels, application of such models is often hampered by a lack of appropriate input data. Difficulties especially arise with regard to soil data, which are typically hard to obtain and prone to spatial variability, eventually leading to uncertainties in the model results. Modelling approaches relying entirely on easily measured quantities are therefore an alternative to encourage the applicability of models. We present and compare two models for calculating 1-day-ahead predictions of the groundwater level that are only based on measurements of potential evapotranspiration, precipitation and groundwater levels. The first model is a newly developed conceptual model that is parametrized using the White method (which estimates the actual evapotranspiration on basis of diurnal groundwater fluctuations) and a rainfall-response ratio. Inverted versions of the two latter approaches are then used to calculate the predictions of the groundwater level. Furthermore, as a completely data-driven alternative, a simple feed-forward multilayer perceptron neural network was trained based on the same inputs and outputs. Data of 4 growing periods (April to October) from a study site situated in the Spreewald wetland in North-east Germany were taken to set-up the models and compare their performance. In addition, response surfaces that relate model outputs to combinations of different input variables are used to reveal those aspects in which the two approaches coincide and those in which they differ. Finally, it will be evaluated whether the conceptual approach can be enhanced by extracting knowledge of the neural network. This is done by replacing in the conceptual model the default function that relates groundwater recharge and groundwater level, which is assumed to be linear, by the non-linear function extracted from the neural network.

  17. Alaska/Yukon Geoid Improvement by a Data-Driven Stokes's Kernel Modification Approach

    Science.gov (United States)

    Li, Xiaopeng; Roman, Daniel R.

    2015-04-01

    Geoid modeling over Alaska of USA and Yukon Canada being a trans-national issue faces a great challenge primarily due to the inhomogeneous surface gravity data (Saleh et al, 2013) and the dynamic geology (Freymueller et al, 2008) as well as its complex geological rheology. Previous study (Roman and Li 2014) used updated satellite models (Bruinsma et al 2013) and newly acquired aerogravity data from the GRAV-D project (Smith 2007) to capture the gravity field changes in the targeting areas primarily in the middle-to-long wavelength. In CONUS, the geoid model was largely improved. However, the precision of the resulted geoid model in Alaska was still in the decimeter level, 19cm at the 32 tide bench marks and 24cm on the 202 GPS/Leveling bench marks that gives a total of 23.8cm at all of these calibrated surface control points, where the datum bias was removed. Conventional kernel modification methods in this area (Li and Wang 2011) had limited effects on improving the precision of the geoid models. To compensate the geoid miss fits, a new Stokes's kernel modification method based on a data-driven technique is presented in this study. First, the method was tested on simulated data sets (Fig. 1), where the geoid errors have been reduced by 2 orders of magnitude (Fig 2). For the real data sets, some iteration steps are required to overcome the rank deficiency problem caused by the limited control data that are irregularly distributed in the target area. For instance, after 3 iterations, the standard deviation dropped about 2.7cm (Fig 3). Modification at other critical degrees can further minimize the geoid model miss fits caused either by the gravity error or the remaining datum error in the control points.

  18. Data Driven Trigger Design and Analysis for the NOvA Experiment

    Energy Technology Data Exchange (ETDEWEB)

    Kurbanov, Serdar [Univ. of Virginia, Charlottesville, VA (United States)

    2016-01-01

    This thesis primarily describes analysis related to studying the Moon shadow with cosmic rays, an analysis using upward-going muons trigger data, and other work done as part of MSc thesis work conducted at Fermi National Laboratory. While at Fermilab I made hardware and software contributions to two experiments - NOvA and Mu2e. NOvA is a neutrino experiment with the primary goal of measuring parameters related to neutrino oscillation. This is a running experiment, so it's possible to provide analysis of real beam and cosmic data. Most of this work was related to the Data-Driven Trigger (DDT) system of NOvA. The results of the Upward-Going muon analysis was presented at ICHEP in August 2016. The analysis demonstrates the proof of principle for a low-mass dark matter search. Mu2e is an experiment currently being built at Fermilab. Its primary goal is to detect the hypothetical neutrinoless conversion from a muon into an electron. I contributed to the production and tests of Cathode Strip Chambers (CSCs) which are required for testing the Cosmic Ray Veto (CRV) system for the experiment. This contribution is described in the last chapter along with a short description of the technical work provided for the DDT system of the NOvA experiment. All of the work described in this thesis will be extended by the next generation of UVA graduate students and postdocs as new data is collected by the experiment. I hope my eorts of have helped lay the foundation for many years of beautiful results from Mu2e and NOvA.

  19. Data-driven nutrient analysis and reality check: Human inputs, catchment delivery and management effects

    Science.gov (United States)

    Destouni, G.

    2017-12-01

    Measures for mitigating nutrient loads to aquatic ecosystems should have observable effects, e.g, in the Baltic region after joint first periods of nutrient management actions under the Baltic Sea Action Plan (BASP; since 2007) and the EU Water Framework Directive (WFD; since 2009). Looking for such observable effects, all openly available water and nutrient monitoring data since 2003 are compiled and analyzed for Sweden as a case study. Results show that hydro-climatically driven water discharge dominates the determination of waterborne loads of both phosphorus and nitrogen. Furthermore, the nutrient loads and water discharge are all similarly well correlated with the ecosystem status classification of Swedish water bodies according to the WFD. Nutrient concentrations, which are hydro-climatically correlated and should thus reflect human effects better than loads, have changed only slightly over the study period (2003-2013) and even increased in moderate-to-bad status waters, where the WFD and BSAP jointly target nutrient decreases. These results indicate insufficient distinction and mitigation of human-driven nutrient components by the internationally harmonized applications of both the WFD and the BSAP. Aiming for better general identification of such components, nutrient data for the large transboundary catchments of the Baltic Sea and the Sava River are compared. The comparison shows cross-regional consistency in nutrient relationships to driving hydro-climatic conditions (water discharge) for nutrient loads, and socio-economic conditions (population density and farmland share) for nutrient concentrations. A data-driven screening methodology is further developed for estimating nutrient input and retention-delivery in catchments. Its first application to nested Sava River catchments identifies characteristic regional values of nutrient input per area and relative delivery, and hotspots of much larger inputs, related to urban high-population areas.

  20. A review on data-driven fault severity assessment in rolling bearings

    Science.gov (United States)

    Cerrada, Mariela; Sánchez, René-Vinicio; Li, Chuan; Pacheco, Fannia; Cabrera, Diego; Valente de Oliveira, José; Vásquez, Rafael E.

    2018-01-01

    Health condition monitoring of rotating machinery is a crucial task to guarantee reliability in industrial processes. In particular, bearings are mechanical components used in most rotating devices and they represent the main source of faults in such equipments; reason for which research activities on detecting and diagnosing their faults have increased. Fault detection aims at identifying whether the device is or not in a fault condition, and diagnosis is commonly oriented towards identifying the fault mode of the device, after detection. An important step after fault detection and diagnosis is the analysis of the magnitude or the degradation level of the fault, because this represents a support to the decision-making process in condition based-maintenance. However, no extensive works are devoted to analyse this problem, or some works tackle it from the fault diagnosis point of view. In a rough manner, fault severity is associated with the magnitude of the fault. In bearings, fault severity can be related to the physical size of fault or a general degradation of the component. Due to literature regarding the severity assessment of bearing damages is limited, this paper aims at discussing the recent methods and techniques used to achieve the fault severity evaluation in the main components of the rolling bearings, such as inner race, outer race, and ball. The review is mainly focused on data-driven approaches such as signal processing for extracting the proper fault signatures associated with the damage degradation, and learning approaches that are used to identify degradation patterns with regards to health conditions. Finally, new challenges are highlighted in order to develop new contributions in this field.

  1. Data-Driven Modeling of Complex Systems by means of a Dynamical ANN

    Science.gov (United States)

    Seleznev, A.; Mukhin, D.; Gavrilov, A.; Loskutov, E.; Feigin, A.

    2017-12-01

    The data-driven methods for modeling and prognosis of complex dynamical systems become more and more popular in various fields due to growth of high-resolution data. We distinguish the two basic steps in such an approach: (i) determining the phase subspace of the system, or embedding, from available time series and (ii) constructing an evolution operator acting in this reduced subspace. In this work we suggest a novel approach combining these two steps by means of construction of an artificial neural network (ANN) with special topology. The proposed ANN-based model, on the one hand, projects the data onto a low-dimensional manifold, and, on the other hand, models a dynamical system on this manifold. Actually, this is a recurrent multilayer ANN which has internal dynamics and capable of generating time series. Very important point of the proposed methodology is the optimization of the model allowing us to avoid overfitting: we use Bayesian criterion to optimize the ANN structure and estimate both the degree of evolution operator nonlinearity and the complexity of nonlinear manifold which the data are projected on. The proposed modeling technique will be applied to the analysis of high-dimensional dynamical systems: Lorenz'96 model of atmospheric turbulence, producing high-dimensional space-time chaos, and quasi-geostrophic three-layer model of the Earth's atmosphere with the natural orography, describing the dynamics of synoptical vortexes as well as mesoscale blocking systems. The possibility of application of the proposed methodology to analyze real measured data is also discussed. The study was supported by the Russian Science Foundation (grant #16-12-10198).

  2. Data-Driven Method for Wind Turbine Yaw Angle Sensor Zero-Point Shifting Fault Detection

    Directory of Open Access Journals (Sweden)

    Yan Pei

    2018-03-01

    Full Text Available Wind turbine yaw control plays an important role in increasing the wind turbine production and also in protecting the wind turbine. Accurate measurement of yaw angle is the basis of an effective wind turbine yaw controller. The accuracy of yaw angle measurement is affected significantly by the problem of zero-point shifting. Hence, it is essential to evaluate the zero-point shifting error on wind turbines on-line in order to improve the reliability of yaw angle measurement in real time. Particularly, qualitative evaluation of the zero-point shifting error could be useful for wind farm operators to realize prompt and cost-effective maintenance on yaw angle sensors. In the aim of qualitatively evaluating the zero-point shifting error, the yaw angle sensor zero-point shifting fault is firstly defined in this paper. A data-driven method is then proposed to detect the zero-point shifting fault based on Supervisory Control and Data Acquisition (SCADA data. The zero-point shifting fault is detected in the proposed method by analyzing the power performance under different yaw angles. The SCADA data are partitioned into different bins according to both wind speed and yaw angle in order to deeply evaluate the power performance. An indicator is proposed in this method for power performance evaluation under each yaw angle. The yaw angle with the largest indicator is considered as the yaw angle measurement error in our work. A zero-point shifting fault would trigger an alarm if the error is larger than a predefined threshold. Case studies from several actual wind farms proved the effectiveness of the proposed method in detecting zero-point shifting fault and also in improving the wind turbine performance. Results of the proposed method could be useful for wind farm operators to realize prompt adjustment if there exists a large error of yaw angle measurement.

  3. Enhancing Transparency and Control When Drawing Data-Driven Inferences About Individuals.

    Science.gov (United States)

    Chen, Daizhuo; Fraiberger, Samuel P; Moakler, Robert; Provost, Foster

    2017-09-01

    Recent studies show the remarkable power of fine-grained information disclosed by users on social network sites to infer users' personal characteristics via predictive modeling. Similar fine-grained data are being used successfully in other commercial applications. In response, attention is turning increasingly to the transparency that organizations provide to users as to what inferences are drawn and why, as well as to what sort of control users can be given over inferences that are drawn about them. In this article, we focus on inferences about personal characteristics based on information disclosed by users' online actions. As a use case, we explore personal inferences that are made possible from "Likes" on Facebook. We first present a means for providing transparency into the information responsible for inferences drawn by data-driven models. We then introduce the "cloaking device"-a mechanism for users to inhibit the use of particular pieces of information in inference. Using these analytical tools we ask two main questions: (1) How much information must users cloak to significantly affect inferences about their personal traits? We find that usually users must cloak only a small portion of their actions to inhibit inference. We also find that, encouragingly, false-positive inferences are significantly easier to cloak than true-positive inferences. (2) Can firms change their modeling behavior to make cloaking more difficult? The answer is a definitive yes. We demonstrate a simple modeling change that requires users to cloak substantially more information to affect the inferences drawn. The upshot is that organizations can provide transparency and control even into complicated, predictive model-driven inferences, but they also can make control easier or harder for their users.

  4. Disruption of functional networks in dyslexia: A whole-brain, data-driven analysis of connectivity

    Science.gov (United States)

    Finn, Emily S.; Shen, Xilin; Holahan, John M.; Scheinost, Dustin; Lacadie, Cheryl; Papademetris, Xenophon; Shaywitz, Sally E.; Shaywitz, Bennett A.; Constable, R. Todd

    2013-01-01

    Background Functional connectivity analyses of fMRI data are a powerful tool for characterizing brain networks and how they are disrupted in neural disorders. However, many such analyses examine only one or a small number of a priori seed regions. Studies that consider the whole brain frequently rely on anatomic atlases to define network nodes, which may result in mixing distinct activation timecourses within a single node. Here, we improve upon previous methods by using a data-driven brain parcellation to compare connectivity profiles of dyslexic (DYS) versus non-impaired (NI) readers in the first whole-brain functional connectivity analysis of dyslexia. Methods Whole-brain connectivity was assessed in children (n = 75; 43 NI, 32 DYS) and adult (n = 104; 64 NI, 40 DYS) readers. Results Compared to NI readers, DYS readers showed divergent connectivity within the visual pathway and between visual association areas and prefrontal attention areas; increased right-hemisphere connectivity; reduced connectivity in the visual word-form area (part of the left fusiform gyrus specialized for printed words); and persistent connectivity to anterior language regions around the inferior frontal gyrus. Conclusions Together, findings suggest that NI readers are better able to integrate visual information and modulate their attention to visual stimuli, allowing them to recognize words based on their visual properties, while DYS readers recruit altered reading circuits and rely on laborious phonology-based “sounding out” strategies into adulthood. These results deepen our understanding of the neural basis of dyslexia and highlight the importance of synchrony between diverse brain regions for successful reading. PMID:24124929

  5. Improving Spoken Language Outcomes for Children With Hearing Loss: Data-driven Instruction.

    Science.gov (United States)

    Douglas, Michael

    2016-02-01

    To assess the effects of data-driven instruction (DDI) on spoken language outcomes of children with cochlear implants and hearing aids. Retrospective, matched-pairs comparison of post-treatment speech/language data of children who did and did not receive DDI. Private, spoken-language preschool for children with hearing loss. Eleven matched pairs of children with cochlear implants who attended the same spoken language preschool. Groups were matched for age of hearing device fitting, time in the program, degree of predevice fitting hearing loss, sex, and age at testing. Daily informal language samples were collected and analyzed over a 2-year period, per preschool protocol. Annual informal and formal spoken language assessments in articulation, vocabulary, and omnibus language were administered at the end of three time intervals: baseline, end of year one, and end of year two. The primary outcome measures were total raw score performance of spontaneous utterance sentence types and syntax element use as measured by the Teacher Assessment of Spoken Language (TASL). In addition, standardized assessments (the Clinical Evaluation of Language Fundamentals--Preschool Version 2 (CELF-P2), the Expressive One-Word Picture Vocabulary Test (EOWPVT), the Receptive One-Word Picture Vocabulary Test (ROWPVT), and the Goldman-Fristoe Test of Articulation 2 (GFTA2)) were also administered and compared with the control group. The DDI group demonstrated significantly higher raw scores on the TASL each year of the study. The DDI group also achieved statistically significant higher scores for total language on the CELF-P and expressive vocabulary on the EOWPVT, but not for articulation nor receptive vocabulary. Post-hoc assessment revealed that 78% of the students in the DDI group achieved scores in the average range compared with 59% in the control group. The preliminary results of this study support further investigation regarding DDI to investigate whether this method can consistently

  6. Telling Anthropocene Tales: Localizing the impacts of global change using data-driven story maps

    Science.gov (United States)

    Mychajliw, A.; Hadly, E. A.

    2016-12-01

    Navigating the Anthropocene requires innovative approaches for generating scientific knowledge and for its communication outside academia. The global, synergistic nature of the environmental challenges we face - climate change, human population growth, biodiversity loss, pollution, invasive species and diseases - highlight the need for public outreach strategies that incorporate multiple scales and perspectives in an easily understandable and rapidly accessible format. Data-driven story-telling maps are optimal in that they can display variable geographic scales and their intersections with the environmental challenges relevant to both scientists and non-scientists. Maps are a powerful way to present complex data to all stakeholders. We present an overview of best practices in community-engaged scientific story-telling and data translation for policy-makers by reviewing three Story Map projects that map the geographic impacts of global change across multiple spatial and policy scales: the entire United States, the state of California, and the town of Pescadero, California. We document a chain of translation from a primary scientific manscript to a policy document (Scientific Consensus Statement on Maintaining Humanity's Life Support Systems in the 21st Century) to a set of interactive ArcGIS Story Maps. We discuss the widening breadth of participants (students, community members) and audiences (White House, Governor's Office of California, California Congressional Offices, general public) involved. We highlight how scientists, through careful curation of popular news media articles and stakeholder interviews, can co-produce these communication modules with community partners such as non-governmental organizations and government agencies. The placement of scientific and citizen's everyday knowledge of global change into an appropriate geographic context allows for effective dissemination by political units such as congressional districts and agency management units

  7. Data-Driven Neural Network Model for Robust Reconstruction of Automobile Casting

    Science.gov (United States)

    Lin, Jinhua; Wang, Yanjie; Li, Xin; Wang, Lu

    2017-09-01

    In computer vision system, it is a challenging task to robustly reconstruct complex 3D geometries of automobile castings. However, 3D scanning data is usually interfered by noises, the scanning resolution is low, these effects normally lead to incomplete matching and drift phenomenon. In order to solve these problems, a data-driven local geometric learning model is proposed to achieve robust reconstruction of automobile casting. In order to relieve the interference of sensor noise and to be compatible with incomplete scanning data, a 3D convolution neural network is established to match the local geometric features of automobile casting. The proposed neural network combines the geometric feature representation with the correlation metric function to robustly match the local correspondence. We use the truncated distance field(TDF) around the key point to represent the 3D surface of casting geometry, so that the model can be directly embedded into the 3D space to learn the geometric feature representation; Finally, the training labels is automatically generated for depth learning based on the existing RGB-D reconstruction algorithm, which accesses to the same global key matching descriptor. The experimental results show that the matching accuracy of our network is 92.2% for automobile castings, the closed loop rate is about 74.0% when the matching tolerance threshold τ is 0.2. The matching descriptors performed well and retained 81.6% matching accuracy at 95% closed loop. For the sparse geometric castings with initial matching failure, the 3D matching object can be reconstructed robustly by training the key descriptors. Our method performs 3D reconstruction robustly for complex automobile castings.

  8. The Application of Cyber Physical System for Thermal Power Plants: Data-Driven Modeling

    Directory of Open Access Journals (Sweden)

    Yongping Yang

    2018-03-01

    Full Text Available Optimal operation of energy systems plays an important role to enhance their lifetime security and efficiency. The determination of optimal operating strategies requires intelligent utilization of massive data accumulated during operation or prediction. The investigation of these data solely without combining physical models may run the risk that the established relationships between inputs and outputs, the models which reproduce the behavior of the considered system/component in a wide range of boundary conditions, are invalid for certain boundary conditions, which never occur in the database employed. Therefore, combining big data with physical models via cyber physical systems (CPS is of great importance to derive highly-reliable and -accurate models and becomes more and more popular in practical applications. In this paper, we focus on the description of a systematic method to apply CPS to the performance analysis and decision making of thermal power plants. We proposed a general procedure of CPS with both offline and online phases for its application to thermal power plants and discussed the corresponding methods employed to support each sub-procedure. As an example, a data-driven model of turbine island of an existing air-cooling based thermal power plant is established with the proposed procedure and demonstrates its practicality, validity and flexibility. To establish such model, the historical operating data are employed in the cyber layer for modeling and linking each physical component. The decision-making procedure of optimal frequency of air-cooling condenser is also illustrated to show its applicability of online use. It is concluded that the cyber physical system with the data mining technique is effective and promising to facilitate the real-time analysis and control of thermal power plants.

  9. Employment relations: A data driven analysis of job markets using online job boards and online professional networks

    CSIR Research Space (South Africa)

    Marivate, Vukosi N

    2017-08-01

    Full Text Available Data from online job boards and online professional networks present an opportunity to understand job markets as well as how professionals transition from one job/career to another. We propose a data driven approach to begin to understand a slice...

  10. Keys to success for data-driven decision making: Lessons from participatory monitoring and collaborative adaptive management

    Science.gov (United States)

    Recent years have witnessed a call for evidence-based decisions in conservation and natural resource management, including data-driven decision-making. Adaptive management (AM) is one prevalent model for integrating scientific data into decision-making, yet AM has faced numerous challenges and limit...

  11. The Effects of Data-Driven Learning upon Vocabulary Acquisition for Secondary International School Students in Vietnam

    Science.gov (United States)

    Karras, Jacob Nolen

    2016-01-01

    Within the field of computer assisted language learning (CALL), scant literature exists regarding the effectiveness and practicality for secondary students to utilize data-driven learning (DDL) for vocabulary acquisition. In this study, there were 100 participants, who had a mean age of thirteen years, and were attending an international school in…

  12. Data-driven drug safety signal detection methods in pharmacovigilance using electronic primary care records: A population based study

    Directory of Open Access Journals (Sweden)

    Shang-Ming Zhou

    2017-04-01

    Data-driven analytic methods are a valuable aid to signal detection of ADEs from large electronic health records for drug safety monitoring. This study finds the methods can detect known ADE and so could potentially be used to detect unknown ADE.

  13. How Instructional Coaches Support Data-Driven Decision Making: Policy Implementation and Effects in Florida Middle Schools

    Science.gov (United States)

    Marsh, Julie A.; McCombs, Jennifer Sloan; Martorell, Francisco

    2010-01-01

    This article examines the convergence of two popular school improvement policies: instructional coaching and data-driven decision making (DDDM). Drawing on a mixed methods study of a statewide reading coach program in Florida middle schools, the article examines how coaches support DDDM and how this support relates to student and teacher outcomes.…

  14. Analyzing the Discourse of Chais Conferences for the Study of Innovation and Learning Technologies via a Data-Driven Approach

    Science.gov (United States)

    Silber-Varod, Vered; Eshet-Alkalai, Yoram; Geri, Nitza

    2016-01-01

    The current rapid technological changes confront researchers of learning technologies with the challenge of evaluating them, predicting trends, and improving their adoption and diffusion. This study utilizes a data-driven discourse analysis approach, namely culturomics, to investigate changes over time in the research of learning technologies. The…

  15. The Use of Linking Adverbials in Academic Essays by Non-Native Writers: How Data-Driven Learning Can Help

    Science.gov (United States)

    Garner, James Robert

    2013-01-01

    Over the past several decades, the TESOL community has seen an increased interest in the use of data-driven learning (DDL) approaches. Most studies of DDL have focused on the acquisition of vocabulary items, including a wide range of information necessary for their correct usage. One type of vocabulary that has yet to be properly investigated has…

  16. Examining Data Driven Decision Making via Formative Assessment: A Confluence of Technology, Data Interpretation Heuristics and Curricular Policy

    Science.gov (United States)

    Swan, Gerry; Mazur, Joan

    2011-01-01

    Although the term data-driven decision making (DDDM) is relatively new (Moss, 2007), the underlying concept of DDDM is not. For example, the practices of formative assessment and computer-managed instruction have historically involved the use of student performance data to guide what happens next in the instructional sequence (Morrison, Kemp, &…

  17. Strength in Numbers: Data-Driven Collaboration May Not Sound Sexy, But it Could Save Your Job

    Science.gov (United States)

    Buzzeo, Toni

    2010-01-01

    This article describes a practical, sure-fire way for media specialists to boost student achievement. The method is called data-driven collaboration, and it's a practical, easy-to-use technique in which media specialists and teachers work together to pinpoint kids' instructional needs and improve their essential skills. The author discusses the…

  18. Fork-join and data-driven execution models on multi-core architectures: Case study of the FMM

    KAUST Repository

    Amer, Abdelhalim; Maruyama, Naoya; Pericà s, Miquel; Taura, Kenjiro; Yokota, Rio; Matsuoka, Satoshi

    2013-01-01

    Extracting maximum performance of multi-core architectures is a difficult task primarily due to bandwidth limitations of the memory subsystem and its complex hierarchy. In this work, we study the implications of fork-join and data-driven execution

  19. Development of a data-driven algorithm to determine the W+jets background in t anti t events in ATLAS

    Energy Technology Data Exchange (ETDEWEB)

    Mehlhase, Sascha

    2010-07-12

    The physics of the top quark is one of the key components in the physics programme of the ATLAS experiment at the Large Hadron Collider at CERN. In this thesis, general studies of the jet trigger performance for top quark events using fully simulated Monte Carlo samples are presented and two data-driven techniques to estimate the multi-jet trigger efficiency and the W+Jets background in top pair events are introduced to the ATLAS experiment. In a tag-and-probe based method, using a simple and common event selection and a high transverse momentum lepton as tag object, the possibility to estimate the multijet trigger efficiency from data in ATLAS is investigated and it is shown that the method is capable of estimating the efficiency without introducing any significant bias by the given tag selection. In the second data-driven analysis a new method to estimate the W+Jets background in a top-pair event selection is introduced to ATLAS. By defining signal and background dominated regions by means of the jet multiplicity and the pseudo-rapidity distribution of the lepton in the event, the W+Jets contribution is extrapolated from the background dominated into the signal dominated region. The method is found to estimate the given background contribution as a function of the jet multiplicity with an accuracy of about 25% for most of the top dominated region with an integrated luminosity of above 100 pb{sup -1} at {radical}(s) = 10 TeV. This thesis also covers a study summarising the thermal behaviour and expected performance of the Pixel Detector of ATLAS. All measurements performed during the commissioning phase of 2008/09 yield results within the specification of the system and the performance is expected to stay within those even after several years of running under LHC conditions. (orig.)

  20. Development of a data-driven algorithm to determine the W+jets background in t anti t events in ATLAS

    International Nuclear Information System (INIS)

    Mehlhase, Sascha

    2010-01-01

    The physics of the top quark is one of the key components in the physics programme of the ATLAS experiment at the Large Hadron Collider at CERN. In this thesis, general studies of the jet trigger performance for top quark events using fully simulated Monte Carlo samples are presented and two data-driven techniques to estimate the multi-jet trigger efficiency and the W+Jets background in top pair events are introduced to the ATLAS experiment. In a tag-and-probe based method, using a simple and common event selection and a high transverse momentum lepton as tag object, the possibility to estimate the multijet trigger efficiency from data in ATLAS is investigated and it is shown that the method is capable of estimating the efficiency without introducing any significant bias by the given tag selection. In the second data-driven analysis a new method to estimate the W+Jets background in a top-pair event selection is introduced to ATLAS. By defining signal and background dominated regions by means of the jet multiplicity and the pseudo-rapidity distribution of the lepton in the event, the W+Jets contribution is extrapolated from the background dominated into the signal dominated region. The method is found to estimate the given background contribution as a function of the jet multiplicity with an accuracy of about 25% for most of the top dominated region with an integrated luminosity of above 100 pb -1 at √(s) = 10 TeV. This thesis also covers a study summarising the thermal behaviour and expected performance of the Pixel Detector of ATLAS. All measurements performed during the commissioning phase of 2008/09 yield results within the specification of the system and the performance is expected to stay within those even after several years of running under LHC conditions. (orig.)

  1. DATA-DRIVEN RIGHTSIZING: INTEGRATING PRESERVATION INTO THE LEGACY CITIES LANDSCAPE

    Directory of Open Access Journals (Sweden)

    E. Evans

    2017-08-01

    Full Text Available Legacy cities, whose built environments are undergoing transformations due to population loss, are at a critical juncture in their urban history and the historic preservation field has an important role to play. Rapid mobile surveys provide an opportunity for data collection that expands beyond traditional historic criteria, and positions preservationists to be proactive decision-makers and to align with multi-disciplinary partners. Rapid mobile surveys are being utilized in conjunction with in-depth data analysis of comprehensive livability metrics at the parcel, neighborhood, and citywide levels to develop recommendations for reactivating vacant properties. Historic preservationists are spearheading these efforts through a tool called Relocal, which uses 70–85 distinct metrics and a community priority survey to generate parcel-level recommendations for every vacant lot and vacant building in the areas in which it is applied. Local volunteer-led rapid mobile surveys are key to gathering on-the-ground, real-time metrics that serve as Relocal’s foundation. These new survey techniques generate usable data sets for historic preservation practitioners, land banks, planners, and other entities to inform strategic rightsizing decisions across legacy cities.

  2. Unravelling abiotic and biotic controls on the seasonal water balance using data-driven dimensionless diagnostics

    Directory of Open Access Journals (Sweden)

    S. P. Seibert

    2017-06-01

    Full Text Available The baffling diversity of runoff generation processes, alongside our sketchy understanding of how physiographic characteristics control fundamental hydrological functions of water collection, storage, and release, continue to pose major research challenges in catchment hydrology. Here, we propose innovative data-driven diagnostic signatures for overcoming the prevailing status quo in catchment inter-comparison. More specifically, we present dimensionless double mass curves (dDMC which allow inference of information on runoff generation and the water balance at the seasonal and annual timescales. By separating the vegetation and winter periods, dDMC furthermore provide information on the role of biotic and abiotic controls in seasonal runoff formation. A key aspect we address in this paper is the derivation of dimensionless expressions of fluxes which ensure the comparability of the signatures in space and time. We achieve this by using the limiting factors of a hydrological process as a scaling reference. We show that different references result in different diagnostics. As such we define two kinds of dDMC which allow us to derive seasonal runoff coefficients and to characterize dimensionless streamflow release as a function of the potential renewal rate of the soil storage. We expect these signatures for storage controlled seasonal runoff formation to remain invariant, as long as the ratios of release over supply and supply over storage capacity develop similarly in different catchments. We test the proposed methods by applying them to an operational data set comprising 22 catchments (12–166 km2 from different environments in southern Germany and hydrometeorological data from 4 hydrological years. The diagnostics are used to compare the sites and to reveal the dominant controls on runoff formation. The key findings are that dDMC are meaningful signatures for catchment runoff formation at the seasonal to annual scale and that the type of

  3. qPortal: A platform for data-driven biomedical research.

    Science.gov (United States)

    Mohr, Christopher; Friedrich, Andreas; Wojnar, David; Kenar, Erhan; Polatkan, Aydin Can; Codrea, Marius Cosmin; Czemmel, Stefan; Kohlbacher, Oliver; Nahnsen, Sven

    2018-01-01

    Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on

  4. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models.

    Directory of Open Access Journals (Sweden)

    David Henriques

    2017-02-01

    Full Text Available Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models, which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks: it builds dynamic (based on ordinary differential equation models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training. For this task, SELDOM's ensemble prediction is not only consistently better

  5. Current Trends in the Detection of Sociocultural Signatures: Data-Driven Models

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Bell, Eric B.; Corley, Courtney D.

    2014-09-15

    available that are shaping social computing as a strongly data-driven experimental discipline with an increasingly stronger impact on the decision-making process of groups and individuals alike. In this chapter, we review current advances and trends in the detection of sociocultural signatures. Specific embodiments of the issues discussed are provided with respect to the assessment of violent intent and sociopolitical contention. We begin by reviewing current approaches to the detection of sociocultural signatures in these domains. Next, we turn to the review of novel data harvesting methods for social media content. Finally, we discuss the application of sociocultural models to social media content, and conclude by commenting on current challenges and future developments.

  6. The global financial crisis and neighborhood decline

    NARCIS (Netherlands)

    Zwiers, Merle; Bolt, Gideon; Van Ham, Maarten; Van Kempen, Ronald

    2016-01-01

    Neighborhood decline is a complex and multidimensional process. National and regional variations in economic and political structures (including varieties in national welfare state arrangements), combined with differences in neighborhood history, development, and population composition, make it

  7. Neighborhood size and local geographic variation of health and social determinants

    Directory of Open Access Journals (Sweden)

    Emch Michael

    2005-06-01

    Full Text Available Abstract Background Spatial filtering using a geographic information system (GIS is often used to smooth health and ecological data. Smoothing disease data can help us understand local (neighborhood geographic variation and ecological risk of diseases. Analyses that use small neighborhood sizes yield individualistic patterns and large sizes reveal the global structure of data where local variation is obscured. Therefore, choosing an optimal neighborhood size is important for understanding ecological associations with diseases. This paper uses Hartley's test of homogeneity of variance (Fmax as a methodological solution for selecting optimal neighborhood sizes. The data from a study area in Vietnam are used to test the suitability of this method. Results The Hartley's Fmax test was applied to spatial variables for two enteric diseases and two socioeconomic determinants. Various neighbourhood sizes were tested by using a two step process to implement the Fmaxtest. First the variance of each neighborhood was compared to the highest neighborhood variance (upper, Fmax1 and then they were compared with the lowest neighborhood variance (lower, Fmax2. A significant value of Fmax1 indicates that the neighborhood does not reveal the global structure of data, and in contrast, a significant value in Fmax2 implies that the neighborhood data are not individualistic. The neighborhoods that are between the lower and the upper limits are the optimal neighbourhood sizes. Conclusion The results of tests provide different neighbourhood sizes for different variables suggesting that optimal neighbourhood size is data dependent. In ecology, it is well known that observation scales may influence ecological inference. Therefore, selecting optimal neigborhood size is essential for understanding disease ecologies. The optimal neighbourhood selection method that is tested in this paper can be useful in health and ecological studies.

  8. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks.

    Science.gov (United States)

    Vlachas, Pantelis R; Byeon, Wonmin; Wan, Zhong Y; Sapsis, Themistoklis P; Koumoutsakos, Petros

    2018-05-01

    We introduce a data-driven forecasting method for high-dimensional chaotic systems using long short-term memory (LSTM) recurrent neural networks. The proposed LSTM neural networks perform inference of high-dimensional dynamical systems in their reduced order space and are shown to be an effective set of nonlinear approximators of their attractor. We demonstrate the forecasting performance of the LSTM and compare it with Gaussian processes (GPs) in time series obtained from the Lorenz 96 system, the Kuramoto-Sivashinsky equation and a prototype climate model. The LSTM networks outperform the GPs in short-term forecasting accuracy in all applications considered. A hybrid architecture, extending the LSTM with a mean stochastic model (MSM-LSTM), is proposed to ensure convergence to the invariant measure. This novel hybrid method is fully data-driven and extends the forecasting capabilities of LSTM networks.

  9. Cognitive load privileges memory-based over data-driven processing, not group-level over person-level processing.

    Science.gov (United States)

    Skorich, Daniel P; Mavor, Kenneth I

    2013-09-01

    In the current paper, we argue that categorization and individuation, as traditionally discussed and as experimentally operationalized, are defined in terms of two confounded underlying dimensions: a person/group dimension and a memory-based/data-driven dimension. In a series of three experiments, we unconfound these dimensions and impose a cognitive load. Across the three experiments, two with laboratory-created targets and one with participants' friends as the target, we demonstrate that cognitive load privileges memory-based over data-driven processing, not group- over person-level processing. We discuss the results in terms of their implications for conceptualizations of the categorization/individuation distinction, for the equivalence of person and group processes, for the ultimate 'purpose' and meaningfulness of group-based perception and, fundamentally, for the process of categorization, broadly defined. © 2012 The British Psychological Society.

  10. Data-driven technology for engineering systems health management design approach, feature construction, fault diagnosis, prognosis, fusion and decisions

    CERN Document Server

    Niu, Gang

    2017-01-01

    This book introduces condition-based maintenance (CBM)/data-driven prognostics and health management (PHM) in detail, first explaining the PHM design approach from a systems engineering perspective, then summarizing and elaborating on the data-driven methodology for feature construction, as well as feature-based fault diagnosis and prognosis. The book includes a wealth of illustrations and tables to help explain the algorithms, as well as practical examples showing how to use this tool to solve situations for which analytic solutions are poorly suited. It equips readers to apply the concepts discussed in order to analyze and solve a variety of problems in PHM system design, feature construction, fault diagnosis and prognosis.

  11. A Data-Driven Stochastic Reactive Power Optimization Considering Uncertainties in Active Distribution Networks and Decomposition Method

    DEFF Research Database (Denmark)

    Ding, Tao; Yang, Qingrun; Yang, Yongheng

    2018-01-01

    To address the uncertain output of distributed generators (DGs) for reactive power optimization in active distribution networks, the stochastic programming model is widely used. The model is employed to find an optimal control strategy with minimum expected network loss while satisfying all......, in this paper, a data-driven modeling approach is introduced to assume that the probability distribution from the historical data is uncertain within a confidence set. Furthermore, a data-driven stochastic programming model is formulated as a two-stage problem, where the first-stage variables find the optimal...... control for discrete reactive power compensation equipment under the worst probability distribution of the second stage recourse. The second-stage variables are adjusted to uncertain probability distribution. In particular, this two-stage problem has a special structure so that the second-stage problem...

  12. Knowledge Based Cloud FE simulation - data-driven material characterization guidelines for the hot stamping of aluminium alloys

    Science.gov (United States)

    Wang, Ailing; Zheng, Yang; Liu, Jun; El Fakir, Omer; Masen, Marc; Wang, Liliang

    2016-08-01

    The Knowledge Based Cloud FEA (KBC-FEA) simulation technique allows multiobjective FE simulations to be conducted on a cloud-computing environment, which effectively reduces computation time and expands the capability of FE simulation software. In this paper, a novel functional module was developed for the data mining of experimentally verified FE simulation results for metal forming processes obtained from KBC-FE. Through this functional module, the thermo-mechanical characteristics of a metal forming process were deduced, enabling a systematic and data-driven guideline for mechanical property characterization to be developed, which will directly guide the material tests for a metal forming process towards the most efficient and effective scheme. Successful application of this data-driven guideline would reduce the efforts for material characterization, leading to the development of more accurate material models, which in turn enhance the accuracy of FE simulations.

  13. Schools, Neighborhood Risk Factors, and Crime

    Science.gov (United States)

    Willits, Dale; Broidy, Lisa; Denman, Kristine

    2013-01-01

    Prior research has identified a link between schools (particularly high schools) and neighborhood crime rates. However, it remains unclear whether the relationship between schools and crime is a reflection of other criminogenic dynamics at the neighborhood level or whether schools influence neighborhood crime patterns independently of other…

  14. Data-driven gating in PET: Influence of respiratory signal noise on motion resolution.

    Science.gov (United States)

    Büther, Florian; Ernst, Iris; Frohwein, Lynn Johann; Pouw, Joost; Schäfers, Klaus Peter; Stegger, Lars

    2018-05-21

    Data-driven gating (DDG) approaches for positron emission tomography (PET) are interesting alternatives to conventional hardware-based gating methods. In DDG, the measured PET data themselves are utilized to calculate a respiratory signal, that is, subsequently used for gating purposes. The success of gating is then highly dependent on the statistical quality of the PET data. In this study, we investigate how this quality determines signal noise and thus motion resolution in clinical PET scans using a center-of-mass-based (COM) DDG approach, specifically with regard to motion management of target structures in future radiotherapy planning applications. PET list mode datasets acquired in one bed position of 19 different radiotherapy patients undergoing pretreatment [ 18 F]FDG PET/CT or [ 18 F]FDG PET/MRI were included into this retrospective study. All scans were performed over a region with organs (myocardium, kidneys) or tumor lesions of high tracer uptake and under free breathing. Aside from the original list mode data, datasets with progressively decreasing PET statistics were generated. From these, COM DDG signals were derived for subsequent amplitude-based gating of the original list mode file. The apparent respiratory shift d from end-expiration to end-inspiration was determined from the gated images and expressed as a function of signal-to-noise ratio SNR of the determined gating signals. This relation was tested against additional 25 [ 18 F]FDG PET/MRI list mode datasets where high-precision MR navigator-like respiratory signals were available as reference signal for respiratory gating of PET data, and data from a dedicated thorax phantom scan. All original 19 high-quality list mode datasets demonstrated the same behavior in terms of motion resolution when reducing the amount of list mode events for DDG signal generation. Ratios and directions of respiratory shifts between end-respiratory gates and the respective nongated image were constant over all

  15. Duration and Timing of Exposure to Neighborhood Poverty and the Risk of Adolescent Parenthood

    Science.gov (United States)

    Wodtke, Geoffrey T.

    2013-01-01

    Theory suggests that the impact of neighborhood poverty depends on both the duration and timing of exposure. Previous research, however, does not properly analyze the sequence of neighborhoods to which children are exposed throughout the early life course. This study investigates the effects of different longitudinal patterns of exposure to disadvantaged neighborhoods on the risk of adolescent parenthood. It follows a cohort of children in the PSID from age 4 to 19 and uses novel methods for time-varying exposures that overcome critical limitations of conventional regression when selection processes are dynamic. Results indicate that sustained exposure to poor neighborhoods substantially increases the risk of becoming a teen parent and that exposure to neighborhood poverty during adolescence may be more consequential than exposure earlier during childhood. PMID:23720166

  16. Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

    OpenAIRE

    Bettadapura, Vinay; Schindler, Grant; Plotz, Thomaz; Essa, Irfan

    2015-01-01

    We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use of randomly sampled regular ...

  17. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology

    Directory of Open Access Journals (Sweden)

    A. Elshorbagy

    2010-10-01

    Full Text Available A comprehensive data driven modeling experiment is presented in a two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed, in the second paper, for the modeling experiment. Twelve different realizations (groups from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both prediction accuracy and uncertainty of the modeling techniques can be evaluated. The description of the datasets, the implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.

  18. Minimization of energy consumption in HVAC systems with data-driven models and an interior-point method

    International Nuclear Information System (INIS)

    Kusiak, Andrew; Xu, Guanglin; Zhang, Zijun

    2014-01-01

    Highlights: • We study the energy saving of HVAC systems with a data-driven approach. • We conduct an in-depth analysis of the topology of developed Neural Network based HVAC model. • We apply interior-point method to solving a Neural Network based HVAC optimization model. • The uncertain building occupancy is incorporated in the minimization of HVAC energy consumption. • A significant potential of saving HVAC energy is discovered. - Abstract: In this paper, a data-driven approach is applied to minimize energy consumption of a heating, ventilating, and air conditioning (HVAC) system while maintaining the thermal comfort of a building with uncertain occupancy level. The uncertainty of arrival and departure rate of occupants is modeled by the Poisson and uniform distributions, respectively. The internal heating gain is calculated from the stochastic process of the building occupancy. Based on the observed and simulated data, a multilayer perceptron algorithm is employed to model and simulate the HVAC system. The data-driven models accurately predict future performance of the HVAC system based on the control settings and the observed historical information. An optimization model is formulated and solved with the interior-point method. The optimization results are compared with the results produced by the simulation models

  19. A data-driven approach for modeling post-fire debris-flow volumes and their uncertainty

    Science.gov (United States)

    Friedel, Michael J.

    2011-01-01

    This study demonstrates the novel application of genetic programming to evolve nonlinear post-fire debris-flow volume equations from variables associated with a data-driven conceptual model of the western United States. The search space is constrained using a multi-component objective function that simultaneously minimizes root-mean squared and unit errors for the evolution of fittest equations. An optimization technique is then used to estimate the limits of nonlinear prediction uncertainty associated with the debris-flow equations. In contrast to a published multiple linear regression three-variable equation, linking basin area with slopes greater or equal to 30 percent, burn severity characterized as area burned moderate plus high, and total storm rainfall, the data-driven approach discovers many nonlinear and several dimensionally consistent equations that are unbiased and have less prediction uncertainty. Of the nonlinear equations, the best performance (lowest prediction uncertainty) is achieved when using three variables: average basin slope, total burned area, and total storm rainfall. Further reduction in uncertainty is possible for the nonlinear equations when dimensional consistency is not a priority and by subsequently applying a gradient solver to the fittest solutions. The data-driven modeling approach can be applied to nonlinear multivariate problems in all fields of study.

  20. Internet Bad Neighborhoods temporal behavior

    NARCIS (Netherlands)

    Moreira Moura, Giovane; Sadre, R.; Pras, Aiko

    2014-01-01

    Malicious hosts tend to be concentrated in certain areas of the IP addressing space, forming the so-called Bad Neighborhoods. Knowledge about this concentration is valuable in predicting attacks from unseen IP addresses. This observation has been employed in previous works to filter out spam. In

  1. Internet Bad Neighborhoods Temporal Behavior

    NARCIS (Netherlands)

    Moreira Moura, G.C.; Sadre, R.; Pras, A.

    2014-01-01

    Malicious hosts tend to be concentrated in certain areas of the IP addressing space, forming the so-called Bad Neighborhoods. Knowledge about this concentration is valuable in predicting attacks from unseen IP addresses. This observation has been employed in previous works to filter out spam. In

  2. Bad Neighborhoods on the Internet

    NARCIS (Netherlands)

    Moreira Moura, G.C.; Sadre, R.; Pras, A.

    2014-01-01

    Analogous to the real world, sources of malicious activities on the Internet tend to be concentrated in certain networks instead of being evenly distributed. In this article, we formally define and frame such areas as Internet Bad Neighborhoods. By extending the reputation of malicious IP addresses

  3. Subjective neighborhood assessment and physical inactivity: An examination of neighborhood-level variance.

    Science.gov (United States)

    Prochaska, John D; Buschmann, Robert N; Jupiter, Daniel; Mutambudzi, Miriam; Peek, M Kristen

    2018-06-01

    Research suggests a linkage between perceptions of neighborhood quality and the likelihood of engaging in leisure-time physical activity. Often in these studies, intra-neighborhood variance is viewed as something to be controlled for statistically. However, we hypothesized that intra-neighborhood variance in perceptions of neighborhood quality may be contextually relevant. We examined the relationship between intra-neighborhood variance of subjective neighborhood quality and neighborhood-level reported physical inactivity across 48 neighborhoods within a medium-sized city, Texas City, Texas using survey data from 2706 residents collected between 2004 and 2006. Neighborhoods where the aggregated perception of neighborhood quality was poor also had a larger proportion of residents reporting being physically inactive. However, higher degrees of disagreement among residents within neighborhoods about their neighborhood quality was significantly associated with a lower proportion of residents reporting being physically inactive (p=0.001). Our results suggest that intra-neighborhood variability may be contextually relevant in studies seeking to better understand the relationship between neighborhood quality and behaviors sensitive to neighborhood environments, like physical activity. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. The Financial and Non-Financial Aspects of Developing a Data-Driven Decision-Making Mindset in an Undergraduate Business Curriculum

    Science.gov (United States)

    Bohler, Jeffrey; Krishnamoorthy, Anand; Larson, Benjamin

    2017-01-01

    Making data-driven decisions is becoming more important for organizations faced with confusing and often contradictory information available to them from their operating environment. This article examines one college of business' journey of developing a data-driven decision-making mindset within its undergraduate curriculum. Lessons learned may be…

  5. Challenges and best practices for big data-driven healthcare innovations conducted by profit–non-profit partnerships – a quantitative prioritization

    NARCIS (Netherlands)

    Witjas-Paalberends, E. R.; van Laarhoven, L. P.M.; van de Burgwal, L. H.M.; Feilzer, J.; de Swart, J.; Claassen, H.J.H.M.; Jansen, W. T.M.

    2017-01-01

    Big data-driven innovations are key in improving healthcare system sustainability. Given the complexity, these are frequently conducted by public-private-partnerships (PPPs) between profit and non-profit parties. However, information on how to manage big data-driven healthcare innovations by PPPs is

  6. Conduct disorder in girls: neighborhoods, family characteristics, and parenting behaviors

    Directory of Open Access Journals (Sweden)

    Chang Chien-Ni

    2008-10-01

    Full Text Available Abstract Background Little is known about the social context of girls with conduct disorder (CD, a question of increasing importance to clinicians and researchers. The purpose of this study was to examine the associations between three social context domains (neighborhood, family characteristics, and parenting behaviors and CD in adolescent girls, additionally testing for race moderation effects. We predicted that disadvantaged neighborhoods, family characteristics such as parental marital status, and parenting behaviors such as negative discipline would characterize girls with CD. We also hypothesized that parenting behaviors would mediate the associations between neighborhood and family characteristics and CD. Methods We recruited 93 15–17 year-old girls from the community and used a structured psychiatric interview to assign participants to a CD group (n = 52 or a demographically matched group with no psychiatric disorder (n = 41. Each girl and parent also filled out questionnaires about neighborhood, family characteristics, and parenting behaviors. Results Neighborhood quality was not associated with CD in girls. Some family characteristics (parental antisociality and parenting behaviors (levels of family activities and negative discipline were characteristic of girls with CD, but notll. There was no moderation by race. Our hypothesis that the association between family characteristics and CD would be mediated by parenting behaviors was not supported. Conclusion This study expanded upon previous research by investigating multiple social context domains in girls with CD and by selecting a comparison group who were not different in age, social class, or race. When these factors are thus controlled, CD in adolescent girls is not significantly associated with neighborhood, but is associated with some family characteristics and some types of parental behaviors. However, the mechanisms underlying these relationships need to be further

  7. Urbanism, Neighborhood Context, and Social Networks.

    Science.gov (United States)

    Cornwell, Erin York; Behler, Rachel L

    2015-09-01

    Theories of urbanism suggest that the urban context erodes individuals' strong social ties with friends and family. Recent research has narrowed focus to the neighborhood context, emphasizing how localized structural disadvantage affects community-level cohesion and social capital. In this paper, we argue that neighborhood context also shapes social ties with friends and family- particularly for community-dwelling seniors. We hypothesize that neighborhood disadvantage, residential instability, and disorder restrict residents' abilities to cultivate close relationships with neighbors and non-neighbor friends and family. Using data from the National Social Life, Health, and Aging Project (NSHAP), we find that older adults who live in disadvantaged neighborhoods have smaller social networks. Neighborhood disadvantage is also associated with less close network ties and less frequent interaction - but only among men. Furthermore, residents of disordered neighborhoods have smaller networks and weaker ties. We urge scholars to pay greater attention to how neighborhood context contributes to disparities in network-based access to resources.

  8. Data-Driven Jump Detection Thresholds for Application in Jump Regressions

    Directory of Open Access Journals (Sweden)

    Robert Davies

    2018-03-01

    Full Text Available This paper develops a method to select the threshold in threshold-based jump detection methods. The method is motivated by an analysis of threshold-based jump detection methods in the context of jump-diffusion models. We show that over the range of sampling frequencies a researcher is most likely to encounter that the usual in-fill asymptotics provide a poor guide for selecting the jump threshold. Because of this we develop a sample-based method. Our method estimates the number of jumps over a grid of thresholds and selects the optimal threshold at what we term the ‘take-off’ point in the estimated number of jumps. We show that this method consistently estimates the jumps and their indices as the sampling interval goes to zero. In several Monte Carlo studies we evaluate the performance of our method based on its ability to accurately locate jumps and its ability to distinguish between true jumps and large diffusive moves. In one of these Monte Carlo studies we evaluate the performance of our method in a jump regression context. Finally, we apply our method in two empirical studies. In one we estimate the number of jumps and report the jump threshold our method selects for three commonly used market indices. In the other empirical application we perform a series of jump regressions using our method to select the jump threshold.

  9. Congruence of Home, Social and Sex Neighborhoods among Men Who Have Sex with Men, NYCM2M Study.

    Science.gov (United States)

    Koblin, Beryl A; Egan, James E; Nandi, Vijay; Sang, Jordan M; Cerdá, Magdalena; Tieu, Hong-Van; Ompad, Danielle C; Hoover, Donald R; Frye, Victoria

    2017-06-01

    Substantial literature demonstrates the influence of the neighborhood environment on health behaviors and outcomes. But limited research examines on how gay and bisexual men experience and exist in various geographic and virtual spaces and how this relates to their sexual behavior. New York City Men 2 Men (NYCM2M) was a cross-sectional study designed to identify neighborhood-level characteristics within the urban environment that influence sexual risk behaviors, substance use, and depression among men who have sex with men (MSM) living in NYC. The sample was recruited using a modified venue-based time-space sampling methodology and through select websites and mobile applications. Whether key neighborhoods of human activity, where a participant resided (termed home), socialized (termed social), or had sex most often (termed sex), were the same or different was evaluated. "Congruence" (or the sameness) of home, social, and most often sex neighborhood was reported by 17 % of men, while 30 % reported that none of their neighborhoods were the same. The largest group of men (39 %) reported that their home and sex neighborhoods were the same but their social neighborhood was different while 10 % reported that their home neighborhood was different than their social and sex neighborhood; 5 % men reported same home and social neighborhoods with a different sex neighborhood. Complete neighborhood incongruence was highest among men who were Black and/or Latino, had lower education and personal income levels, and had greater financial insecurity. In adjusted analysis, serodiscordant condomless anal intercourse and condomless anal intercourse with partners from the Internet or mobile applications were significantly associated with having the same social and sex (but not home) neighborhoods. Understanding the complexity of how different spaces and places relate to the health and sexual behavior of MSM is essential for focusing interventions to best reach various populations

  10. Data-Driven Photovoltaic System Modeling Based on Nonlinear System Identification

    Directory of Open Access Journals (Sweden)

    Ayedh Alqahtani

    2016-01-01

    Full Text Available Solar photovoltaic (PV energy sources are rapidly gaining potential growth and popularity compared to conventional fossil fuel sources. As the merging of PV systems with existing power sources increases, reliable and accurate PV system identification is essential, to address the highly nonlinear change in PV system dynamic and operational characteristics. This paper deals with the identification of a PV system characteristic with a switch-mode power converter. Measured input-output data are collected from a real PV panel to be used for the identification. The data are divided into estimation and validation sets. The identification methodology is discussed. A Hammerstein-Wiener model is identified and selected due to its suitability to best capture the PV system dynamics, and results and discussion are provided to demonstrate the accuracy of the selected model structure.

  11. Impact of neighborhood design on energy performance and GHG emissions

    International Nuclear Information System (INIS)

    Hachem, Caroline

    2016-01-01

    Highlights: • Energy use and GHG emissions of different neighborhood designs are investigated. • Improving buildings energy performance reduces energy use and GHG emissions by 75%. • Density as isolated factor has limited effect on transport on per capita basis. • Distance to central business district impacts transport GHG emission significantly. - Abstract: This paper presents an innovative and holistic approach to the analysis of the impact of selected design parameters of a new solar community on its environmental performance, in terms of energy efficiency and carbon footprint (green-house gas (GHG) emissions). The design parameters include energy performance level of buildings, density, type of the neighborhood (mixed-use vs residential), location of the commercial center relative to residential areas and the design of the streets. Energy performance is measured as the balance between overall energy consumption for building operations (assuming an all-electric neighborhood) and electricity generation potential through integration of PV panels on available roof surfaces. Greenhouse gas emissions are those associated with building operations and transport. Results of simulations carried out on prototype neighborhoods located in the vicinity of Calgary, Alberta, Canada indicate that, while adopting high-energy efficiency measures can reduce the buildings’ impact by up to 75% in terms of energy consumption and GHG emissions, transport still has a large environmental impact. The parameters of highest impact on transport and its associated GHG emissions are the design of the neighborhood and the distance to the business center. Density, as isolated parameter, has a modest effect on the selected mode of transportation, in terms of using private or public transportation. While this study relates to a specific location and a range of design assumptions, the methodology employed can serve as a template for evaluating design alternatives of new sustainable

  12. Developing a Metadata Infrastructure to facilitate data driven science gateway and to provide Inspire/GEMINI compliance for CLIPC

    Science.gov (United States)

    Mihajlovski, Andrej; Plieger, Maarten; Som de Cerff, Wim; Page, Christian

    2016-04-01

    indicators Key is the availability of standardized metadata, describing indicator data and services. This will enable standardization and interoperability between the different distributed services of CLIPC. To disseminate CLIPC indicator data, transformed data products to enable impacts assessments and climate change impact indicators a standardized meta-data infrastructure is provided. The challenge is that compliance of existing metadata to INSPIRE ISO standards and GEMINI standards needs to be extended to further allow the web portal to be generated from the available metadata blueprint. The information provided in the headers of netCDF files available through multiple catalogues, allow us to generate ISO compliant meta data which is in turn used to generate web based interface content, as well as OGC compliant web services such as WCS and WMS for front end and WPS interactions for the scientific users to combine and generate new datasets. The goal of the metadata infrastructure is to provide a blueprint for creating a data driven science portal, generated from the underlying: GIS data, web services and processing infrastructure. In the presentation we will present the results and lessons learned.

  13. How Neighborhood Disadvantage Reduces Birth Weight

    Directory of Open Access Journals (Sweden)

    Emily Moiduddin

    2008-06-01

    Full Text Available In this analysis we connect structural neighborhood conditions to birth outcomes through their intermediate effects on mothers’ perceptions of neighborhood danger and their tendency to abuse substances during pregnancy. We hypothesize that neighborhood poverty and racial/ethnic concentration combine to produce environments that mothers perceive as unsafe, thereby increasing the likelihood of negative coping behaviors (substance abuse. We expect these behaviors, in turn, to produce lower birth weights. Using data from the Fragile Families and Child Wellbeing Study, a survey of a cohort of children born between 1998 and 2000 and their mothers in large cities in the United States, we find little evidence to suggest that neighborhood circumstances have strong, direct effects on birth weight. Living in a neighborhood with more foreigners had a positive effect on birth weight. To the extent that neighborhood conditions influence birth weight, the effect mainly occurs through an association with perceived neighborhood danger and subsequent negative coping behaviors. Poverty and racial/ethnic concentration increase a mother’s sense that her neighborhood is unsafe. The perception of an unsafe neighborhood, in turn, associates with a greater likelihood of smoking cigarettes and using illegal drugs, and these behaviors have strong and significant effects in reducing birth weight. However, demographic characteristics, rather than perceived danger or substance abuse, mediate the influence of neighborhood characteristics on birth weight.

  14. Data-Driven Identification of Risk Factors of Patient Satisfaction at a Large Urban Academic Medical Center.

    Science.gov (United States)

    Li, Li; Lee, Nathan J; Glicksberg, Benjamin S; Radbill, Brian D; Dudley, Joel T

    2016-01-01

    The Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey is the first publicly reported nationwide survey to evaluate and compare hospitals. Increasing patient satisfaction is an important goal as it aims to achieve a more effective and efficient healthcare delivery system. In this study, we develop and apply an integrative, data-driven approach to identify clinical risk factors that associate with patient satisfaction outcomes. We included 1,771 unique adult patients who completed the HCAHPS survey and were discharged from the inpatient Medicine service from 2010 to 2012. We collected 266 clinical features including patient demographics, lab measurements, medications, disease categories, and procedures. We developed and applied a data-driven approach to identify risk factors that associate with patient satisfaction outcomes. We identify 102 significant risk factors associating with 18 surveyed questions. The most significantly recurrent clinical risk factors were: self-evaluation of health, education level, Asian, White, treatment in BMT oncology division, being prescribed a new medication. Patients who were prescribed pregabalin were less satisfied particularly in relation to communication with nurses and pain management. Explanation of medication usage was associated with communication with nurses (q = 0.001); however, explanation of medication side effects was associated with communication with doctors (q = 0.003). Overall hospital rating was associated with hospital environment, communication with doctors, and communication about medicines. However, patient likelihood to recommend hospital was associated with hospital environment, communication about medicines, pain management, and communication with nurse. Our study identified a number of putatively novel clinical risk factors for patient satisfaction that suggest new opportunities to better understand and manage patient satisfaction. Hospitals can use a data-driven approach to

  15. A data-driven approach to identify controls on global fire activity from satellite and climate observations (SOFIA V1

    Directory of Open Access Journals (Sweden)

    M. Forkel

    2017-12-01

    Full Text Available Vegetation fires affect human infrastructures, ecosystems, global vegetation distribution, and atmospheric composition. However, the climatic, environmental, and socioeconomic factors that control global fire activity in vegetation are only poorly understood, and in various complexities and formulations are represented in global process-oriented vegetation-fire models. Data-driven model approaches such as machine learning algorithms have successfully been used to identify and better understand controlling factors for fire activity. However, such machine learning models cannot be easily adapted or even implemented within process-oriented global vegetation-fire models. To overcome this gap between machine learning-based approaches and process-oriented global fire models, we introduce a new flexible data-driven fire modelling approach here (Satellite Observations to predict FIre Activity, SOFIA approach version 1. SOFIA models can use several predictor variables and functional relationships to estimate burned area that can be easily adapted with more complex process-oriented vegetation-fire models. We created an ensemble of SOFIA models to test the importance of several predictor variables. SOFIA models result in the highest performance in predicting burned area if they account for a direct restriction of fire activity under wet conditions and if they include a land cover-dependent restriction or allowance of fire activity by vegetation density and biomass. The use of vegetation optical depth data from microwave satellite observations, a proxy for vegetation biomass and water content, reaches higher model performance than commonly used vegetation variables from optical sensors. We further analyse spatial patterns of the sensitivity between anthropogenic, climate, and vegetation predictor variables and burned area. We finally discuss how multiple observational datasets on climate, hydrological, vegetation, and socioeconomic variables together with

  16. Limited angle CT reconstruction by simultaneous spatial and Radon domain regularization based on TV and data-driven tight frame

    Science.gov (United States)

    Zhang, Wenkun; Zhang, Hanming; Wang, Linyuan; Cai, Ailong; Li, Lei; Yan, Bin

    2018-02-01

    Limited angle computed tomography (CT) reconstruction is widely performed in medical diagnosis and industrial testing because of the size of objects, engine/armor inspection requirements, and limited scan flexibility. Limited angle reconstruction necessitates usage of optimization-based methods that utilize additional sparse priors. However, most of conventional methods solely exploit sparsity priors of spatial domains. When CT projection suffers from serious data deficiency or various noises, obtaining reconstruction images that meet the requirement of quality becomes difficult and challenging. To solve this problem, this paper developed an adaptive reconstruction method for limited angle CT problem. The proposed method simultaneously uses spatial and Radon domain regularization model based on total variation (TV) and data-driven tight frame. Data-driven tight frame being derived from wavelet transformation aims at exploiting sparsity priors of sinogram in Radon domain. Unlike existing works that utilize pre-constructed sparse transformation, the framelets of the data-driven regularization model can be adaptively learned from the latest projection data in the process of iterative reconstruction to provide optimal sparse approximations for given sinogram. At the same time, an effective alternating direction method is designed to solve the simultaneous spatial and Radon domain regularization model. The experiments for both simulation and real data demonstrate that the proposed algorithm shows better performance in artifacts depression and details preservation than the algorithms solely using regularization model of spatial domain. Quantitative evaluations for the results also indicate that the proposed algorithm applying learning strategy performs better than the dual domains algorithms without learning regularization model

  17. A data-driven approach to identify controls on global fire activity from satellite and climate observations (SOFIA V1)

    Science.gov (United States)

    Forkel, Matthias; Dorigo, Wouter; Lasslop, Gitta; Teubner, Irene; Chuvieco, Emilio; Thonicke, Kirsten

    2017-12-01

    Vegetation fires affect human infrastructures, ecosystems, global vegetation distribution, and atmospheric composition. However, the climatic, environmental, and socioeconomic factors that control global fire activity in vegetation are only poorly understood, and in various complexities and formulations are represented in global process-oriented vegetation-fire models. Data-driven model approaches such as machine learning algorithms have successfully been used to identify and better understand controlling factors for fire activity. However, such machine learning models cannot be easily adapted or even implemented within process-oriented global vegetation-fire models. To overcome this gap between machine learning-based approaches and process-oriented global fire models, we introduce a new flexible data-driven fire modelling approach here (Satellite Observations to predict FIre Activity, SOFIA approach version 1). SOFIA models can use several predictor variables and functional relationships to estimate burned area that can be easily adapted with more complex process-oriented vegetation-fire models. We created an ensemble of SOFIA models to test the importance of several predictor variables. SOFIA models result in the highest performance in predicting burned area if they account for a direct restriction of fire activity under wet conditions and if they include a land cover-dependent restriction or allowance of fire activity by vegetation density and biomass. The use of vegetation optical depth data from microwave satellite observations, a proxy for vegetation biomass and water content, reaches higher model performance than commonly used vegetation variables from optical sensors. We further analyse spatial patterns of the sensitivity between anthropogenic, climate, and vegetation predictor variables and burned area. We finally discuss how multiple observational datasets on climate, hydrological, vegetation, and socioeconomic variables together with data-driven

  18. CEREF: A hybrid data-driven model for forecasting annual streamflow from a socio-hydrological system

    Science.gov (United States)

    Zhang, Hongbo; Singh, Vijay P.; Wang, Bin; Yu, Yinghao

    2016-09-01

    Hydrological forecasting is complicated by flow regime alterations in a coupled socio-hydrologic system, encountering increasingly non-stationary, nonlinear and irregular changes, which make decision support difficult for future water resources management. Currently, many hybrid data-driven models, based on the decomposition-prediction-reconstruction principle, have been developed to improve the ability to make predictions of annual streamflow. However, there exist many problems that require further investigation, the chief among which is the direction of trend components decomposed from annual streamflow series and is always difficult to ascertain. In this paper, a hybrid data-driven model was proposed to capture this issue, which combined empirical mode decomposition (EMD), radial basis function neural networks (RBFNN), and external forces (EF) variable, also called the CEREF model. The hybrid model employed EMD for decomposition and RBFNN for intrinsic mode function (IMF) forecasting, and determined future trend component directions by regression with EF as basin water demand representing the social component in the socio-hydrologic system. The Wuding River basin was considered for the case study, and two standard statistical measures, root mean squared error (RMSE) and mean absolute error (MAE), were used to evaluate the performance of CEREF model and compare with other models: the autoregressive (AR), RBFNN and EMD-RBFNN. Results indicated that the CEREF model had lower RMSE and MAE statistics, 42.8% and 7.6%, respectively, than did other models, and provided a superior alternative for forecasting annual runoff in the Wuding River basin. Moreover, the CEREF model can enlarge the effective intervals of streamflow forecasting compared to the EMD-RBFNN model by introducing the water demand planned by the government department to improve long-term prediction accuracy. In addition, we considered the high-frequency component, a frequent subject of concern in EMD

  19. Feature Extraction for Digging Operation of Excavator Based on Data-Driven Skill-Based PID Controller

    Directory of Open Access Journals (Sweden)

    Kazushige Koiwai

    2017-11-01

    Full Text Available Improvement of the work efficiency is demanded by aging and reducing of the working population in the construction field, so that some automation technologies are applied to construction equipment, such as bulldozers and excavators. However, not only the automation technologies but also expert skills are necessary to improve the work efficiency. In this paper, the human skill evaluation is proposed by the data-driven skill-based PID controller. The proposed method is applied to the excavator digging operation. As the result, the difference between the novice operation and the skilled operation is extracted. Moreover, the numerical difference is clarified based on the result.

  20. Functional Interpretation of Neighborhood Public Spaces in Terms of Identity

    Directory of Open Access Journals (Sweden)

    Hamid Majedi

    2015-03-01

    Full Text Available The aim of this article is to evaluate the effect of neighborhood public space transformation due to rapid urbanization in Tehran since 1960s, on the formation of neighborhood identity. In order to find the role of public spaces in enhancing neighborhood identities, two middle class neighborhoods with different spatial organizations are compared with each other: Nazi Abad a planned neighborhood and Mehran a typical unplanned neighborhood which developed through rapid urbanization.   Next, the effect of neighborhood public spaces on neighborhood inhabitants is evaluated from two perspectives: Perceptual dimension and social dimension. The findings indicate that planned spatial organization and various neighborhood public spaces result in stronger neighborhood identity. It enhances both perceptual dimension of neighborhood identity(place attachment and its social dimension (sense of community. In contrast unplanned spatial organization which is the typical feature of Tehran neighborhoods leads to weak neighborhood identity.

  1. Using data-driven approach for wind power prediction: A comparative study

    International Nuclear Information System (INIS)

    Taslimi Renani, Ehsan; Elias, Mohamad Fathi Mohamad; Rahim, Nasrudin Abd.

    2016-01-01

    Highlights: • Double exponential smoothing is the most accurate model in wind speed prediction. • A two-stage feature selection method is proposed to select most important inputs. • Direct prediction illustrates better accuracy than indirect prediction. • Adaptive neuro fuzzy inference system outperforms data mining algorithms. • Random forest performs the worst compared to other data mining algorithm. - Abstract: Although wind energy is intermittent and stochastic in nature, it is increasingly important in the power generation due to its sustainability and pollution-free. Increased utilization of wind energy sources calls for more robust and efficient prediction models to mitigate uncertainties associated with wind power. This research compares two different approaches in wind power forecasting which are indirect and direct prediction methods. In indirect method, several times series are applied to forecast the wind speed, whereas the logistic function with five parameters is then used to forecast the wind power. In this study, backtracking search algorithm with novel crossover and mutation operators is employed to find the best parameters of five-parameter logistic function. A new feature selection technique, combining the mutual information and neural network is proposed in this paper to extract the most informative features with a maximum relevancy and minimum redundancy. From the comparative study, the results demonstrate that, in the direct prediction approach where the historical weather data are used to predict the wind power generation directly, adaptive neuro fuzzy inference system outperforms five data mining algorithms namely, random forest, M5Rules, k-nearest neighbor, support vector machine and multilayer perceptron. Moreover, it is also found that the mean absolute percentage error of the direct prediction method using adaptive neuro fuzzy inference system is 1.47% which is approximately less than half of the error obtained with the

  2. Data-driven battery product development: Turn battery performance into a competitive advantage.

    Energy Technology Data Exchange (ETDEWEB)

    Sholklapper, Tal [Voltaiq, Inc.

    2016-04-19

    Poor battery performance is a primary source of user dissatisfaction across a broad range of applications, and is a key bottleneck hindering the growth of mobile technology, wearables, electric vehicles, and grid energy storage. Engineering battery systems is difficult, requiring extensive testing for vendor selection, BMS programming, and application-specific lifetime testing. This work also generates huge quantities of data. This presentation will explain how to leverage this data to help ship quality products faster using fewer resources while ensuring safety and reliability in the field, ultimately turning battery performance into a competitive advantage.

  3. Neighborhood and Network Disadvantage among Urban Renters

    Directory of Open Access Journals (Sweden)

    Matthew Desmond

    2015-06-01

    Full Text Available Drawing on novel survey data, this study maps the distribution of neighborhood and network disadvantage in a population of Milwaukee renters and evaluates the relationship between each disadvantage and multiple social and health outcomes. We find that many families live in neighborhoods with above average disadvantage but are embedded in networks with below average disadvantage, and vice versa. Neighborhood (but not network disadvantage is associated with lower levels of neighborly trust but also with higher levels of community support (e.g., providing neighbors with food. Network (but not neighborhood disadvantage is associated with lower levels of civic engagement. Asthma and diabetes are associated exclusively with neighborhood disadvantage, but depression is associated exclusively with network disadvantage. These findings imply that some social problems may be better addressed by neighborhood interventions and others by network interventions.

  4. Neighborhood crime and transit station access mode choice - phase III of neighborhood crime and travel behavior.

    Science.gov (United States)

    2015-08-01

    This report provides the findings from the third phase of a three-part study about the influences of neighborhood crimes on travel : mode choice. While previous phases found evidence that high levels of neighborhood crime discourage people from choos...

  5. Data-driven modeling and predictive control for boiler-turbine unit using fuzzy clustering and subspace methods.

    Science.gov (United States)

    Wu, Xiao; Shen, Jiong; Li, Yiguo; Lee, Kwang Y

    2014-05-01

    This paper develops a novel data-driven fuzzy modeling strategy and predictive controller for boiler-turbine unit using fuzzy clustering and subspace identification (SID) methods. To deal with the nonlinear behavior of boiler-turbine unit, fuzzy clustering is used to provide an appropriate division of the operation region and develop the structure of the fuzzy model. Then by combining the input data with the corresponding fuzzy membership functions, the SID method is extended to extract the local state-space model parameters. Owing to the advantages of the both methods, the resulting fuzzy model can represent the boiler-turbine unit very closely, and a fuzzy model predictive controller is designed based on this model. As an alternative approach, a direct data-driven fuzzy predictive control is also developed following the same clustering and subspace methods, where intermediate subspace matrices developed during the identification procedure are utilized directly as the predictor. Simulation results show the advantages and effectiveness of the proposed approach. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  6. EEG-based functional networks evoked by acupuncture at ST 36: A data-driven thresholding study

    Science.gov (United States)

    Li, Huiyan; Wang, Jiang; Yi, Guosheng; Deng, Bin; Zhou, Hexi

    2017-10-01

    This paper investigates how acupuncture at ST 36 modulates the brain functional network. 20 channel EEG signals from 15 healthy subjects are respectively recorded before, during and after acupuncture. The correlation between two EEG channels is calculated by using Pearson’s coefficient. A data-driven approach is applied to determine the threshold, which is performed by considering the connected set, connected edge and network connectivity. Based on such thresholding approach, the functional network in each acupuncture period is built with graph theory, and the associated functional connectivity is determined. We show that acupuncturing at ST 36 increases the connectivity of the EEG-based functional network, especially for the long distance ones between two hemispheres. The properties of the functional network in five EEG sub-bands are also characterized. It is found that the delta and gamma bands are affected more obviously by acupuncture than the other sub-bands. These findings highlight the modulatory effects of acupuncture on the EEG-based functional connectivity, which is helpful for us to understand how it participates in the cortical or subcortical activities. Further, the data-driven threshold provides an alternative approach to infer the functional connectivity under other physiological conditions.

  7. Data-Driven Diffusion Of Innovations: Successes And Challenges In 3 Large-Scale Innovative Delivery Models.

    Science.gov (United States)

    Dorr, David A; Cohen, Deborah J; Adler-Milstein, Julia

    2018-02-01

    Failed diffusion of innovations may be linked to an inability to use and apply data, information, and knowledge to change perceptions of current practice and motivate change. Using qualitative and quantitative data from three large-scale health care delivery innovations-accountable care organizations, advanced primary care practice, and EvidenceNOW-we assessed where data-driven innovation is occurring and where challenges lie. We found that implementation of some technological components of innovation (for example, electronic health records) has occurred among health care organizations, but core functions needed to use data to drive innovation are lacking. Deficits include the inability to extract and aggregate data from the records; gaps in sharing data; and challenges in adopting advanced data functions, particularly those related to timely reporting of performance data. The unexpectedly high costs and burden incurred during implementation of the innovations have limited organizations' ability to address these and other deficits. Solutions that could help speed progress in data-driven innovation include facilitating peer-to-peer technical assistance, providing tailored feedback reports to providers from data aggregators, and using practice facilitators skilled in using data technology for quality improvement to help practices transform. Policy efforts that promote these solutions may enable more rapid uptake of and successful participation in innovative delivery system reforms.

  8. Protein engineering of Bacillus acidopullulyticus pullulanase for enhanced thermostability using in silico data driven rational design methods.

    Science.gov (United States)

    Chen, Ana; Li, Yamei; Nie, Jianqi; McNeil, Brian; Jeffrey, Laura; Yang, Yankun; Bai, Zhonghu

    2015-10-01

    Thermostability has been considered as a requirement in the starch processing industry to maintain high catalytic activity of pullulanase under high temperatures. Four data driven rational design methods (B-FITTER, proline theory, PoPMuSiC-2.1, and sequence consensus approach) were adopted to identify the key residue potential links with thermostability, and 39 residues of Bacillus acidopullulyticus pullulanase were chosen as mutagenesis targets. Single mutagenesis followed by combined mutagenesis resulted in the best mutant E518I-S662R-Q706P, which exhibited an 11-fold half-life improvement at 60 °C and a 9.5 °C increase in Tm. The optimum temperature of the mutant increased from 60 to 65 °C. Fluorescence spectroscopy results demonstrated that the tertiary structure of the mutant enzyme was more compact than that of the wild-type (WT) enzyme. Structural change analysis revealed that the increase in thermostability was most probably caused by a combination of lower stability free-energy and higher hydrophobicity of E518I, more hydrogen bonds of S662R, and higher rigidity of Q706P compared with the WT. The findings demonstrated the effectiveness of combined data-driven rational design approaches in engineering an industrial enzyme to improve thermostability. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. The Orion GN and C Data-Driven Flight Software Architecture for Automated Sequencing and Fault Recovery

    Science.gov (United States)

    King, Ellis; Hart, Jeremy; Odegard, Ryan

    2010-01-01

    The Orion Crew Exploration Vehicle (CET) is being designed to include significantly more automation capability than either the Space Shuttle or the International Space Station (ISS). In particular, the vehicle flight software has requirements to accommodate increasingly automated missions throughout all phases of flight. A data-driven flight software architecture will provide an evolvable automation capability to sequence through Guidance, Navigation & Control (GN&C) flight software modes and configurations while maintaining the required flexibility and human control over the automation. This flexibility is a key aspect needed to address the maturation of operational concepts, to permit ground and crew operators to gain trust in the system and mitigate unpredictability in human spaceflight. To allow for mission flexibility and reconfrgurability, a data driven approach is being taken to load the mission event plan as well cis the flight software artifacts associated with the GN&C subsystem. A database of GN&C level sequencing data is presented which manages and tracks the mission specific and algorithm parameters to provide a capability to schedule GN&C events within mission segments. The flight software data schema for performing automated mission sequencing is presented with a concept of operations for interactions with ground and onboard crew members. A prototype architecture for fault identification, isolation and recovery interactions with the automation software is presented and discussed as a forward work item.

  10. A predictive estimation method for carbon dioxide transport by data-driven modeling with a physically-based data model

    Science.gov (United States)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young; Jun, Seong-Chun; Choung, Sungwook; Yun, Seong-Taek; Oh, Junho; Kim, Hyun-Jun

    2017-11-01

    In this study, a data-driven method for predicting CO2 leaks and associated concentrations from geological CO2 sequestration is developed. Several candidate models are compared based on their reproducibility and predictive capability for CO2 concentration measurements from the Environment Impact Evaluation Test (EIT) site in Korea. Based on the data mining results, a one-dimensional solution of the advective-dispersive equation for steady flow (i.e., Ogata-Banks solution) is found to be most representative for the test data, and this model is adopted as the data model for the developed method. In the validation step, the method is applied to estimate future CO2 concentrations with the reference estimation by the Ogata-Banks solution, where a part of earlier data is used as the training dataset. From the analysis, it is found that the ensemble mean of multiple estimations based on the developed method shows high prediction accuracy relative to the reference estimation. In addition, the majority of the data to be predicted are included in the proposed quantile interval, which suggests adequate representation of the uncertainty by the developed method. Therefore, the incorporation of a reasonable physically-based data model enhances the prediction capability of the data-driven model. The proposed method is not confined to estimations of CO2 concentration and may be applied to various real-time monitoring data from subsurface sites to develop automated control, management or decision-making systems.

  11. Analyzing the Discourse of Chais Conferences for the Study of Innovation and Learning Technologies via a Data-Driven Approach

    Directory of Open Access Journals (Sweden)

    Vered Silber-Varod

    2016-12-01

    Full Text Available The current rapid technological changes confront researchers of learning technologies with the challenge of evaluating them, predicting trends, and improving their adoption and diffusion. This study utilizes a data-driven discourse analysis approach, namely culturomics, to investigate changes over time in the research of learning technologies. The patterns and changes were examined on a corpus of articles published over the past decade (2006-2014 in the proceedings of Chais Conference for the Study of Innovation and Learning Technologies – the leading research conference on learning technologies in Israel. The interesting findings of the exhaustive process of analyzing all the words in the corpus were that the most commonly used terms (e.g., pupil, teacher, student and the most commonly used phrases (e.g., face-to-face in the field of learning technologies reflect a pedagogical rather than a technological aspect of learning technologies. The study also demonstrates two cases of change over time in prominent themes, such as “Facebook” and “the National Information and Communication Technology (ICT program”. Methodologically, this research demonstrates the effectiveness of a data-driven approach for identifying discourse trends over time.

  12. A data-driven model for influenza transmission incorporating media effects.

    Science.gov (United States)

    Mitchell, Lewis; Ross, Joshua V

    2016-10-01

    Numerous studies have attempted to model the effect of mass media on the transmission of diseases such as influenza; however, quantitative data on media engagement has until recently been difficult to obtain. With the recent explosion of 'big data' coming from online social media and the like, large volumes of data on a population's engagement with mass media during an epidemic are becoming available to researchers. In this study, we combine an online dataset comprising millions of shared messages relating to influenza with traditional surveillance data on flu activity to suggest a functional form for the relationship between the two. Using this data, we present a simple deterministic model for influenza dynamics incorporating media effects, and show that such a model helps explain the dynamics of historical influenza outbreaks. Furthermore, through model selection we show that the proposed media function fits historical data better than other media functions proposed in earlier studies.

  13. A Data-Driven Approach to Responder Subgroup Identification after Paired Continuous Theta Burst Stimulation

    Directory of Open Access Journals (Sweden)

    Tonio Heidegger

    2017-08-01

    Full Text Available Background: Modulation of cortical excitability by transcranial magnetic stimulation (TMS is used for investigating human brain functions. A common observation is the high variability of long-term depression (LTD-like changes in human (motor cortex excitability. This study aimed at analyzing the response subgroup distribution after paired continuous theta burst stimulation (cTBS as a basis for subject selection.Methods: The effects of paired cTBS using 80% active motor threshold (AMT in 31 healthy volunteers were assessed at the primary motor cortex (M1 corresponding to the representation of the first dorsal interosseous (FDI muscle of the left hand, before and up to 50 min after plasticity induction. The changes in motor evoked potentials (MEPs were analyzed using machine-learning derived methods implemented as Gaussian mixture modeling (GMM and computed ABC analysis.Results: The probability density distribution of the MEP changes from baseline was tri-modal, showing a clear separation at 80.9%. Subjects displaying at least this degree of LTD-like changes were n = 6 responders. By contrast, n = 7 subjects displayed a paradox response with increase in MEP. Reassessment using ABC analysis as alternative approach led to the same n = 6 subjects as a distinct category.Conclusion: Depressive effects of paired cTBS using 80% AMT endure at least 50 min, however, only in a small subgroup of healthy subjects. Hence, plasticity induction by paired cTBS might not reflect a general mechanism in human motor cortex excitability. A mathematically supported criterion is proposed to select responders for enrolment in assessments of human brain functional networks using virtual brain lesions.

  14. The Influence of Neighborhood Aesthetics, Safety, and Social Cohesion on Perceived Stress in Disadvantaged Communities.

    Science.gov (United States)

    Henderson, Heather; Child, Stephanie; Moore, Spencer; Moore, Justin B; Kaczynski, Andrew T

    2016-09-01

    Limited research has explored how specific elements of physical and social environments influence mental health indicators such as perceived stress, or whether such associations are moderated by gender. This study examined the relationship between selected neighborhood characteristics and perceived stress levels within a primarily low-income, older, African-American population in a mid-sized city in the Southeastern U.S. Residents (n = 394; mean age=55.3 years, 70.9% female, 89.3% African American) from eight historically disadvantaged neighborhoods completed surveys measuring perceptions of neighborhood safety, social cohesion, aesthetics, and stress. Multivariate linear regression models examined the association between each of the three neighborhood characteristics and perceived stress. Greater perceived safety, improved neighborhood aesthetics, and social cohesion were significantly associated with lower perceived stress. These associations were not moderated by gender. These findings suggest that improving social attributes of neighborhoods may have positive impacts on stress and related benefits for population health. Future research should examine how neighborhood characteristics influence stress over time. © Society for Community Research and Action 2016.

  15. Neighborhood-Specific and General Social Support: Which Buffers the Effect of Neighborhood Disorder on Depression?

    Science.gov (United States)

    Kim, Joongbaeck; Ross, Catherine E.

    2009-01-01

    Is neighborhood-specific social support the most effective type of social support for buffering the effect of neighborhood disorder on depression? Matching theory suggests that it is. The authors extend the research on neighborhood disorder and adult depression by showing that individuals who have higher levels of both general and…

  16. Connecting Schools to Neighborhood Revitalization: The Case of the Maple Heights Neighborhood Association

    Science.gov (United States)

    Pesch, Lawrence P.

    2014-01-01

    This case study focuses on the way a neighborhood association connects schools to broad change in an urban neighborhood of a large Midwestern city. The first section provides a review of the literature on community involvement in school and neighborhood reform. It reviews the historical origins of the current school-community relationship, the…

  17. Neighborhood context and health: How neighborhood social capital affects individual health

    NARCIS (Netherlands)

    Mohnen, S.M.

    2012-01-01

    Does it matter for my health in which neighborhood I live? The fact is, health is determined not only by individual characteristics but also by the neighborhood in which someone lives. This thesis shows that health clusters in Dutch neighborhoods and that this is not only a composition effect (that

  18. A Data-Driven Analysis of the Rules Defining Bilateral Leg Movements during Sleep.

    Science.gov (United States)

    Ferri, Raffaele; Manconi, Mauro; Rundo, Francesco; Zucconi, Marco; Aricò, Debora; Bruni, Oliviero; Ferini-Strambi, Luigi; Fulda, Stephany

    2016-02-01

    The aim of this study was to describe and analyze the association between bilateral leg movements (LMs) during sleep in subjects with restless legs syndrome (RLS), in order to eventually support or challenge the current scoring rules defining bilateral LMs. Polysomnographic recordings of 100 untreated patients with RLS (57 women and 43 males, mean age 57 y) were included. In each recording, we selected as reference all LMs that occurred during sleep and that were separated from another ipsilateral LM by at least 10 sec of EMG inactivity. For each reference LM and an evaluation interval from 5 sec before the onset to 5 sec after the offset of the reference LM, we evaluated (1) the presence or absence of contralateral leg movement activity and (2) the distribution of the onset-to-onset and (3) the offset-to-onset differences between bilateral LMs. We selected a mean of 368 (± 222 standard deviation [SD]) reference LMs per subject. For 42% (± 22%) of the reference LMs no contralateral leg movement activity was observed within the evaluation interval. In 55% (± 22%) exactly one and in 3% (± 2%) more than one contralateral LM was observed. A further evaluation of events where exactly one contralateral LM was observed showed that in most (1) the two LMs were overlapping (93% ± 9% SD) and (2) were classified as bilateral according to the World Association of Sleep Medicine and the International Restless Legs Syndrome Study Group (WASM/ IRLSSG) (96% ± 6% SD) and (3) the American Academy of Sleep Medicine scoring rules (99% ± 2% SD). Although there was a systematic and statistically significant difference in standard LM indices during sleep based on the two different definitions of bilateral LMs, the size of the difference was not clinically meaningful (maximum individual, absolute difference in LM indices ± 2.5). In addition, we found that the duration of LMs within bilateral LM pairs was longer compared to monolateral LMs and that the duration of the single LMs in

  19. Individual and Neighborhood Stressors, Air Pollution and Cardiovascular Disease

    Science.gov (United States)

    Hazlehurst, Marnie F.; Nurius, Paula S.; Hajat, Anjum

    2018-01-01

    additive scales. Modest interaction was observed between NDI and air pollution, supporting prior literature on the importance of neighborhood-level stressors in cardiovascular health and reinforcing the importance of NDI on air pollution health effects. ACEs may exert health effects through selection into disadvantaged neighborhoods and more work is needed to understand the accumulation of risk in multiple domains across the life course. PMID:29518012

  20. Individual and Neighborhood Stressors, Air Pollution and Cardiovascular Disease.

    Science.gov (United States)

    Hazlehurst, Marnie F; Nurius, Paula S; Hajat, Anjum

    2018-03-08

    multiplicative and additive scales. Modest interaction was observed between NDI and air pollution, supporting prior literature on the importance of neighborhood-level stressors in cardiovascular health and reinforcing the importance of NDI on air pollution health effects. ACEs may exert health effects through selection into disadvantaged neighborhoods and more work is needed to understand the accumulation of risk in multiple domains across the life course.

  1. Data-driven weights and restrictions in the construction of composite indicators

    Directory of Open Access Journals (Sweden)

    Ana Perišić

    2015-03-01

    Full Text Available Composite indicators are increasingly recognized as a useful tool in policy analysis and public communication. However, if poorly constructed, they can send misleading policy messages. Perhaps the most difficult aspect of constructing a composite indicator is choosing weights for the components. The categorization of Croatian territorial units for development policy is based on the value of the composite indicator called the development index. The main goal of this paper is to propose an empirical approach for weight selection. In order to generate the set of non-subjective weights, principal component analysis and linear programming methods have been applied. An application of data envelopment analysis to the field of composite indicators, known as the Benefit-of-the-Doubt approach, has been demonstrated subject to proportional sub-indicator share restrictions. Additionally, the Monte Carlo simulation of weights was conducted, and confidence intervals for the values of the development index were estimated. Owing to the fact that the examined weighting schemes have resulted in the different categorization of territorial units, use of unit-specific weights and incorporating uncertainty in the construction of a composite indicator looks promising for further work.

  2. A data-driven wavelet-based approach for generating jumping loads

    Science.gov (United States)

    Chen, Jun; Li, Guo; Racic, Vitomir

    2018-06-01

    This paper suggests an approach to generate human jumping loads using wavelet transform and a database of individual jumping force records. A total of 970 individual jumping force records of various frequencies were first collected by three experiments from 147 test subjects. For each record, every jumping pulse was extracted and decomposed into seven levels by wavelet transform. All the decomposition coefficients were stored in an information database. Probability distributions of jumping cycle period, contact ratio and energy of the jumping pulse were statistically analyzed. Inspired by the theory of DNA recombination, an approach was developed by interchanging the wavelet coefficients between different jumping pulses. To generate a jumping force time history with N pulses, wavelet coefficients were first selected randomly from the database at each level. They were then used to reconstruct N pulses by the inverse wavelet transform. Jumping cycle periods and contract ratios were then generated randomly based on their probabilistic functions. These parameters were assigned to each of the N pulses which were in turn scaled by the amplitude factors βi to account for energy relationship between successive pulses. The final jumping force time history was obtained by linking all the N cycles end to end. This simulation approach can preserve the non-stationary features of the jumping load force in time-frequency domain. Application indicates that this approach can be used to generate jumping force time history due to single people jumping and also can be extended further to stochastic jumping loads due to groups and crowds.

  3. Does substance use moderate the association of neighborhood disadvantage with perceived stress and safety in the activity spaces of urban youth?

    Science.gov (United States)

    Mennis, Jeremy; Mason, Michael; Light, John; Rusby, Julie; Westling, Erika; Way, Thomas; Zahakaris, Nikola; Flay, Brian

    2016-08-01

    This study investigates the association of activity space-based exposure to neighborhood disadvantage with momentary perceived stress and safety, and the moderation of substance use on those associations, among a sample of 139 urban, primarily African American, adolescents. Geospatial technologies are integrated with Ecological Momentary Assessment (EMA) to capture exposure to neighborhood disadvantage and perceived stress and safety in the activity space. A relative neighborhood disadvantage measure for each subject is calculated by conditioning the neighborhood disadvantage observed at the EMA location on that of the home neighborhood. Generalized estimating equations (GEE) are used to model the effect of relative neighborhood disadvantage on momentary perceived stress and safety, and the extent to which substance use moderates those associations. Relative neighborhood disadvantage is significantly associated with higher perceived stress, lower perceived safety, and greater substance use involvement. The association of relative neighborhood disadvantage with stress is significantly stronger among those with greater substance use involvement. This research highlights the value of integrating geospatial technologies with EMA and developing personalized measures of environmental exposure for investigating neighborhood effects on substance use, and suggests substance use intervention strategies aimed at neighborhood conditions. Future research should seek to disentangle the causal pathways of influence and selection that relate neighborhood environment, stress, and substance use, while also accounting for the role of gender and family and peer social contexts. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. A better START for low-acuity victims: data-driven refinement of mass casualty triage.

    Science.gov (United States)

    Cross, Keith P; Petry, Michael J; Cicero, Mark X

    2015-01-01

    Methods currently used to triage patients from mass casualty events have a sparse evidence basis. The objective of this project was to assess gaps of the widely used Simple Triage and Rapid Transport (START) algorithm using a large database when it is used to triage low-acuity patients. Subsequently, we developed and tested evidenced-based improvements to START. Using the National Trauma Database (NTDB), a large set of trauma victims were assigned START triage levels, which were then compared to recorded patient mortality outcomes using area under the receiver-operator curve (AUC). Subjects assigned to the "Minor/Green" level who nevertheless died prior to hospital discharge were considered mistriaged. Recursive partitioning identified factors associated with of these mistriaged patients. These factors were then used to develop candidate START models of improved triage, whose overall performance was then re-evaluated using data from the NTDB. This process of evaluating performance, identifying errors, and further adjusting candidate models was repeated iteratively. The study included 322,162 subjects assigned to "Minor/Green" of which 2,046 died before hospital discharge. Age was the primary predictor of under-triage by START. Candidate models which re-assigned patients from the "Minor/Green" triage level to the "Delayed/Yellow" triage level based on age (either for patients >60 or >75), reduced mortality in the "Minor/Green" group from 0.6% to 0.1% and 0.3%, respectively. These candidate START models also showed net improvement in the AUC for predicting mortality overall and in select subgroups. In this research model using trauma registry data, most START under-triage errors occurred in elderly patients. Overall START accuracy was improved by placing elderly but otherwise minimally injured-mass casualty victims into a higher risk triage level. Alternatively, such patients would be candidates for closer monitoring at the scene or expedited transport ahead of other

  5. Data-driven methodology illustrating mechanisms underlying word list recall: applications to clinical research.

    Science.gov (United States)

    Longenecker, Julia; Kohn, Philip; Liu, Stanley; Zoltick, Brad; Weinberger, Daniel R; Elvevåg, Brita

    2010-09-01

    Word list learning tasks such as the California Verbal Learning Test (CVLT; Delis, Kramer, Kaplan, & Ober, 1987) are widely used to investigate recall strategies. Participants who recall the most words generally employ semantic techniques, whereas those with poor recall (e.g., patients with schizophrenia) rely on serial techniques. However, these conclusions are based on formulas that assume that categories reflect semantic associations, bind strategy to overall performance, and neglect strategy changes over 5 trials. Therefore, we derived novel measures-independent of recall performance-to compute strategies across trials and identify whether diagnosis predicts recall strategy. Participants were included on the basis of performance on the CVLT (i.e., total words recalled over 5 trials). The 50 highest and 50 lowest performers among healthy volunteers (n = 100) and patients with schizophrenia (n = 100) were selected. Novel measures of recall and transition probability were calculated and analyzed by permutation tests. Recall patterns and strategies of patients resembled those of controls with similar performance levels: Regardless of diagnosis, low performers were more likely to recall the first 2 and last 4 items from the list; high performers increased engagement of semantically based transitions across the 5 trials, whereas low performers did not. Cognitive strategy must be considered independent of overall performance before attributing poor performance to degraded learning processes. Our results demonstrate the importance of departing from global scoring techniques, especially when working with clinical populations such as patients with schizophrenia for whom episodic memory deficits are a hallmark feature. Copyright 2010 APA, all rights reserved.

  6. Low-back electromyography (EMG data-driven load classification for dynamic lifting tasks.

    Directory of Open Access Journals (Sweden)

    Deema Totah

    Full Text Available Numerous devices have been designed to support the back during lifting tasks. To improve the utility of such devices, this research explores the use of preparatory muscle activity to classify muscle loading and initiate appropriate device activation. The goal of this study was to determine the earliest time window that enabled accurate load classification during a dynamic lifting task.Nine subjects performed thirty symmetrical lifts, split evenly across three weight conditions (no-weight, 10-lbs and 24-lbs, while low-back muscle activity data was collected. Seven descriptive statistics features were extracted from 100 ms windows of data. A multinomial logistic regression (MLR classifier was trained and tested, employing leave-one subject out cross-validation, to classify lifted load values. Dimensionality reduction was achieved through feature cross-correlation analysis and greedy feedforward selection. The time of full load support by the subject was defined as load-onset.Regions of highest average classification accuracy started at 200 ms before until 200 ms after load-onset with average accuracies ranging from 80% (±10% to 81% (±7%. The average recall for each class ranged from 69-92%.These inter-subject classification results indicate that preparatory muscle activity can be leveraged to identify the intent to lift a weight up to 100 ms prior to load-onset. The high accuracies shown indicate the potential to utilize intent classification for assistive device applications.Active assistive devices, e.g. exoskeletons, could prevent back injury by off-loading low-back muscles. Early intent classification allows more time for actuators to respond and integrate seamlessly with the user.

  7. SPECT acquisition using dynamic projections: a novel approach for data-driven respiratory gating

    International Nuclear Information System (INIS)

    Hutton, B.F.; Hatton, R.L.; Yip, N.

    2002-01-01

    Full text: Movement of the heart due to respiration has been previously demonstrated to produce potentially serious artefacts. On-line respiratory gating is difficult, as it requires a high level of patient cooperation. We demonstrate that use of dynamic acquisition of projections permits identification of the respiratory dynamics, allowing retrospective selection of data corresponding to a fixed point in the respiratory cycle. To demonstrate the feasibility of the technique a dynamic study was acquired just prior to myocardial per-fusion SPECT acquisition, using 5 frames/sec for 20 seconds (64*64 matrix) in anterior and lateral projections (using a dual-head right-angled configuration). The dynamic was processed a) by compressing frames in the transverse direction so as to illustrate time dependence, b) by plotting the centre of mass in the axial direction as a function of time. Respiratory motion was enhanced by use of temporal smoothing and intensity thresholding. In ten patients studied the cyclic pattern of motion due to respiratory dynamics was clearly visible in nine. Respiration typically resulted in around 1cm axial translation but in some individuals, movements as large as 3 cm were identified. The respiration rate ranged from 12-18 /min in agreement with independent observation of the patient's breathing pattern. These results suggest that retrospective respiratory gating is feasible without the need for any external respiratory monitoring device, provide that dynamic acquisition of SPECT projections is implemented. Correction for respiratory motion may also be feasible using this technique. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  8. Data-driven sampling method for building 3D anatomical models from serial histology

    Science.gov (United States)

    Salunke, Snehal Ulhas; Ablove, Tova; Danforth, Theresa; Tomaszewski, John; Doyle, Scott

    2017-03-01

    In this work, we investigate the effect of slice sampling on 3D models of tissue architecture using serial histopathology. We present a method for using a single fully-sectioned tissue block as pilot data, whereby we build a fully-realized 3D model and then determine the optimal set of slices needed to reconstruct the salient features of the model objects under biological investigation. In our work, we are interested in the 3D reconstruction of microvessel architecture in the trigone region between the vagina and the bladder. This region serves as a potential avenue for drug delivery to treat bladder infection. We collect and co-register 23 serial sections of CD31-stained tissue images (6 μm thick sections), from which four microvessels are selected for analysis. To build each model, we perform semi-automatic segmentation of the microvessels. Subsampled meshes are then created by removing slices from the stack, interpolating the missing data, and re-constructing the mesh. We calculate the Hausdorff distance between the full and subsampled meshes to determine the optimal sampling rate for the modeled structures. In our application, we found that a sampling rate of 50% (corresponding to just 12 slices) was sufficient to recreate the structure of the microvessels without significant deviation from the fullyrendered mesh. This pipeline effectively minimizes the number of histopathology slides required for 3D model reconstruction, and can be utilized to either (1) reduce the overall costs of a project, or (2) enable additional analysis on the intermediate slides.

  9. Spatial dimensions of the effect of neighborhood disadvantage on delinquency

    NARCIS (Netherlands)

    Vogel, M.S.; South, S.J.

    2016-01-01

    esearch examining the relationship between neighborhood socioeconomic disadvantage and adolescent offending typically examines only the influence of residential neighborhoods. This strategy may be problematic as 1) neighborhoods are rarely spatially independent of each other and 2) adolescents spend

  10. Healthy and Unhealthy Food Prices across Neighborhoods and Their Association with Neighborhood Socioeconomic Status and Proportion Black/Hispanic.

    Science.gov (United States)

    Kern, David M; Auchincloss, Amy H; Robinson, Lucy F; Stehr, Mark F; Pham-Kanter, Genevieve

    2017-08-01

    This paper evaluates variation in food prices within and between neighborhoods to improve our understanding of access to healthy foods in urbanized areas and potential economic incentives and barriers to consuming a higher-quality diet. Prices of a selection of healthier foods (dairy, fruit juice, and frozen vegetables) and unhealthy foods (soda, sweets, and salty snacks) were obtained from 1953 supermarkets across the USA during 2009-2012 and were linked to census block group socio-demographics. Analyses evaluated associations between neighborhood SES and proportion Black/Hispanic and the prices of healthier and unhealthy foods, and the relative price of healthier foods compared with unhealthy foods (healthy-to-unhealthy price ratio). Linear hierarchical regression models were used to explore geospatial variation and adjust for confounders. Overall, the price of healthier foods was nearly twice as high as the price of unhealthy foods ($0.590 vs $0.298 per serving; healthy-to-unhealthy price ratio of 1.99). This trend was consistent across all neighborhood characteristics. After adjusting for covariates, no association was found between food prices (healthy, unhealthy, or the healthy-to-unhealthy ratio) and neighborhood SES. Similarly, there was no association between the proportion Black/Hispanic and healthier food price, a very small positive association with unhealthy price, and a modest negative association with the healthy-to-unhealthy ratio. No major differences were seen in food prices across levels of neighborhood SES and proportion Black/Hispanic; however, the price of healthier food was twice as expensive as unhealthy food per serving on average.

  11. Comprehensive Neighborhood Portraits and Child Asthma Disparities.

    Science.gov (United States)

    Kranjac, Ashley W; Kimbro, Rachel T; Denney, Justin T; Osiecki, Kristin M; Moffett, Brady S; Lopez, Keila N

    2017-07-01

    Objectives Previous research has established links between child, family, and neighborhood disadvantages and child asthma. We add to this literature by first characterizing neighborhoods in Houston, TX by demographic, economic, and air quality characteristics to establish differences in pediatric asthma diagnoses across neighborhoods. Second, we identify the relative risk of social, economic, and environmental risk factors for child asthma diagnoses. Methods We geocoded and linked electronic pediatric medical records to neighborhood-level social and economic indicators. Using latent profile modeling techniques, we identified Advantaged, Middle-class, and Disadvantaged neighborhoods. We then used a modified version of the Blinder-Oaxaca regression decomposition method to examine differences in asthma diagnoses across children in these different neighborhoods. Results Both compositional (the characteristics of the children and the ambient air quality in the neighborhood) and associational (the relationship between child and air quality characteristics and asthma) differences within the distinctive neighborhood contexts influence asthma outcomes. For example, unequal exposure to PM 2.5 and O 3 among children in Disadvantaged and Middle-class neighborhoods contribute to asthma diagnosis disparities within these contexts. For children in Disadvantaged and Advantaged neighborhoods, associational differences between racial/ethnic and socioeconomic characteristics and asthma diagnoses explain a significant proportion of the gap. Conclusions for Practice Our results provide evidence that differential exposure to pollution and protective factors associated with non-Hispanic White children and children from affluent families contribute to asthma disparities between neighborhoods. Future researchers should consider social and racial inequalities as more proximate drivers, not merely as associated, with asthma disparities in children.

  12. Neighborhood Effects on Youth Crime

    DEFF Research Database (Denmark)

    Rotger, Gabriel Pons; Galster, George Charles

    We investigate the degree to which youth (ages 14-29) criminal offenses are influenced by neighbors, identifying causal effects with a natural experimental allocation of social housing in Copenhagen. We find that youth exposed to a one percentage point higher concentration of neighbors with drug...... criminal records are 6% more likely to be charged for criminal offenses (both drug and property crimes), and this impact manifests itself after six months of exposure. This neighborhood effect is stronger for previous offenders, and does not lead to criminal partnerships. Our exploration of alternative...

  13. Tensor Train Neighborhood Preserving Embedding

    Science.gov (United States)

    Wang, Wenqi; Aggarwal, Vaneet; Aeron, Shuchin

    2018-05-01

    In this paper, we propose a Tensor Train Neighborhood Preserving Embedding (TTNPE) to embed multi-dimensional tensor data into low dimensional tensor subspace. Novel approaches to solve the optimization problem in TTNPE are proposed. For this embedding, we evaluate novel trade-off gain among classification, computation, and dimensionality reduction (storage) for supervised learning. It is shown that compared to the state-of-the-arts tensor embedding methods, TTNPE achieves superior trade-off in classification, computation, and dimensionality reduction in MNIST handwritten digits and Weizmann face datasets.

  14. Data-driven prediction of adverse drug reactions induced by drug-drug interactions.

    Science.gov (United States)

    Liu, Ruifeng; AbdulHameed, Mohamed Diwan M; Kumar, Kamal; Yu, Xueping; Wallqvist, Anders; Reifman, Jaques

    2017-06-08

    via DDIs. This allowed us to identify potential DDI-induced ADRs not yet clinically reported. The ability of the models to quantify adverse effects between drug classes also suggests that we may be able to select drug combinations that minimize the risk of ADRs. Almost all information on DDI-induced ADRs is generated after drug approval. This situation poses significant health risks for vulnerable patient populations with comorbidities. To help mitigate the risks, we developed a robust probabilistic approach to prospectively predict DDI-induced ADRs. Based on this approach, we developed prediction models for 1,096 ADRs and used them to predict the propensity of all pairwise combinations of nearly 800 drugs to be associated with these ADRs via DDIs. We made the predictions publicly available via internet access.

  15. The associations between objectively-determined and self-reported urban form characteristics and neighborhood-based walking in adults.

    Science.gov (United States)

    Jack, Elizabeth; McCormack, Gavin R

    2014-06-04

    Self-reported and objectively-determined neighborhood built characteristics are associated with physical activity, yet little is known about their combined influence on walking. This study: 1) compared self-reported measures of the neighborhood built environment between objectively-determined low, medium, and high walkable neighborhoods; 2) estimated the relative associations between self-reported and objectively-determined neighborhood characteristics and walking and; 3) examined the extent to which the objectively-determined built environment moderates the association between self-reported measures of the neighborhood built environment and walking. A random cross-section of 1875 Canadian adults completed a telephone-interview and postal questionnaire capturing neighborhood walkability, neighborhood-based walking, socio-demographic characteristics, walking attitudes, and residential self-selection. Walkability of each respondent's neighborhood was objectively-determined (low [LW], medium [MW], and high walkable [HW]). Covariate-adjusted regression models estimated the associations between weekly participation and duration in transportation and recreational walking and self-reported and objectively-determined walkability. Compared with objectively-determined LW neighborhoods, respondents in HW neighborhoods positively perceived access to services, street connectivity, pedestrian infrastructure, and utilitarian and recreation destination mix, but negatively perceived motor vehicle traffic and crime related safety. Compared with residents of objectively-determined LW neighborhoods, residents of HW neighborhoods were more likely (p spend more time, per week (193 min/wk) transportation walking. Perceived access to services, street connectivity, motor vehicle safety, and mix of recreational destinations were also significantly associated with transportation walking. With regard to interactions, HW x utilitarian destination mix was positively associated with

  16. Association between neighborhood need and spatial access to food stores and fast food restaurants in neighborhoods of colonias.

    Science.gov (United States)

    Sharkey, Joseph R; Horel, Scott; Han, Daikwon; Huber, John C

    2009-02-16

    To determine the extent to which neighborhood needs (socioeconomic deprivation and vehicle availability) are associated with two criteria of food environment access: 1) distance to the nearest food store and fast food restaurant and 2) coverage (number) of food stores and fast food restaurants within a specified network distance of neighborhood areas of colonias, using ground-truthed methods. Data included locational points for 315 food stores and 204 fast food restaurants, and neighborhood characteristics from the 2000 U.S. Census for the 197 census block group (CBG) study area. Neighborhood deprivation and vehicle availability were calculated for each CBG. Minimum distance was determined by calculating network distance from the population-weighted center of each CBG to the nearest supercenter, supermarket, grocery, convenience store, dollar store, mass merchandiser, and fast food restaurant. Coverage was determined by calculating the number of each type of food store and fast food restaurant within a network distance of 1, 3, and 5 miles of each population-weighted CBG center. Neighborhood need and access were examined using Spearman ranked correlations, spatial autocorrelation, and multivariate regression models that adjusted for population density. Overall, neighborhoods had best access to convenience stores, fast food restaurants, and dollar stores. After adjusting for population density, residents in neighborhoods with increased deprivation had to travel a significantly greater distance to the nearest supercenter or supermarket, grocery store, mass merchandiser, dollar store, and pharmacy for food items. The results were quite different for association of need with the number of stores within 1 mile. Deprivation was only associated with fast food restaurants; greater deprivation was associated with fewer fast food restaurants within 1 mile. CBG with greater lack of vehicle availability had slightly better access to more supercenters or supermarkets, grocery

  17. Association between neighborhood need and spatial access to food stores and fast food restaurants in neighborhoods of Colonias

    Directory of Open Access Journals (Sweden)

    Han Daikwon

    2009-02-01

    Full Text Available Abstract Objective To determine the extent to which neighborhood needs (socioeconomic deprivation and vehicle availability are associated with two criteria of food environment access: 1 distance to the nearest food store and fast food restaurant and 2 coverage (number of food stores and fast food restaurants within a specified network distance of neighborhood areas of colonias, using ground-truthed methods. Methods Data included locational points for 315 food stores and 204 fast food restaurants, and neighborhood characteristics from the 2000 U.S. Census for the 197 census block group (CBG study area. Neighborhood deprivation and vehicle availability were calculated for each CBG. Minimum distance was determined by calculating network distance from the population-weighted center of each CBG to the nearest supercenter, supermarket, grocery, convenience store, dollar store, mass merchandiser, and fast food restaurant. Coverage was determined by calculating the number of each type of food store and fast food restaurant within a network distance of 1, 3, and 5 miles of each population-weighted CBG center. Neighborhood need and access were examined using Spearman ranked correlations, spatial autocorrelation, and multivariate regression models that adjusted for population density. Results Overall, neighborhoods had best access to convenience stores, fast food restaurants, and dollar stores. After adjusting for population density, residents in neighborhoods with increased deprivation had to travel a significantly greater distance to the nearest supercenter or supermarket, grocery store, mass merchandiser, dollar store, and pharmacy for food items. The results were quite different for association of need with the number of stores within 1 mile. Deprivation was only associated with fast food restaurants; greater deprivation was associated with fewer fast food restaurants within 1 mile. CBG with greater lack of vehicle availability had slightly better

  18. Study of the Influence of Age in 18F-FDG PET Images Using a Data-Driven Approach and Its Evaluation in Alzheimer’s Disease

    Directory of Open Access Journals (Sweden)

    Jiehui Jiang

    2018-01-01

    Full Text Available Objectives. 18F-FDG PET scan is one of the most frequently used neural imaging scans. However, the influence of age has proven to be the greatest interfering factor for many clinical dementia diagnoses when analyzing 18F-FDG PET images, since radiologists encounter difficulties when deciding whether the abnormalities in specific regions correlate with normal aging, disease, or both. In the present paper, the authors aimed to define specific brain regions and determine an age-correction mathematical model. Methods. A data-driven approach was used based on 255 healthy subjects. Results. The inferior frontal gyrus, the left medial part and the left medial orbital part of superior frontal gyrus, the right insula, the left anterior cingulate, the left median cingulate, and paracingulate gyri, and bilateral superior temporal gyri were found to have a strong negative correlation with age. For evaluation, an age-correction model was applied to 262 healthy subjects and 50 AD subjects selected from the ADNI database, and partial correlations between SUVR mean and three clinical results were carried out before and after age correction. Conclusion. All correlation coefficients were significantly improved after the age correction. The proposed model was effective in the age correction of both healthy and AD subjects.

  19. Forecasting monthly inflow discharge of the Iffezheim reservoir using data-driven models

    Science.gov (United States)

    Zhang, Qing; Aljoumani, Basem; Hillebrand, Gudrun; Hoffmann, Thomas; Hinkelmann, Reinhard

    2017-04-01

    River stream flow is an essential element in hydrology study fields, especially for reservoir management, since it defines input into reservoirs. Forecasting this stream flow plays an important role in short or long-term planning and management in the reservoir, e.g. optimized reservoir and hydroelectric operation or agricultural irrigation. Highly accurate flow forecasting can significantly reduce economic losses and is always pursued by reservoir operators. Therefore, hydrologic time series forecasting has received tremendous attention of researchers. Many models have been proposed to improve the hydrological forecasting. Due to the fact that most natural phenomena occurring in environmental systems appear to behave in random or probabilistic ways, different cases may need a different methods to forecast the inflow and even a unique treatment to improve the forecast accuracy. The purpose of this study is to determine an appropriate model for forecasting monthly inflow to the Iffezheim reservoir in Germany, which is the last of the barrages in the Upper Rhine. Monthly time series of discharges, measured from 1946 to 2001 at the Plittersdorf station, which is located 6 km downstream of the Iffezheim reservoir, were applied. The accuracies of the used stochastic models - Fiering model and Auto-Regressive Integrated Moving Average models (ARIMA) are compared with Artificial Intelligence (AI) models - single Artificial Neural Network (ANN) and Wavelet ANN models (WANN). The Fiering model is a linear stochastic model and used for generating synthetic monthly data. The basic idea in modeling time series using ARIMA is to identify a simple model with as few model parameters as possible in order to provide a good statistical fit to the data. To identify and fit the ARIMA models, four phase approaches were used: identification, parameter estimation, diagnostic checking, and forecasting. An automatic selection criterion, such as the Akaike information criterion, is utilized

  20. Classification of iRBD and Parkinson's patients using a general data-driven sleep staging model built on EEG

    DEFF Research Database (Denmark)

    Koch, Henriette; Christensen, Julie Anja Engelhard; Frandsen, Rune

    2013-01-01

    Sleep analysis is an important diagnostic tool for sleep disorders. However, the current manual sleep scoring is time-consuming as it is a crude discretization in time and stages. This study changes Esbroeck and Westover's [1] latent sleep staging model into a global model. The proposed data......-driven method trained a topic mixture model on 10 control subjects and was applied on 10 other control subjects, 10 iRBD patients and 10 Parkinson's patients. In that way 30 topic mixture diagrams were obtained from which features reflecting distinct sleep architectures between control subjects and patients...... were extracted. Two features calculated on basis of two latent sleep states classified subjects as “control” or “patient” by a simple clustering algorithm. The mean sleep staging accuracy compared to classical AASM scoring was 72.4% for control subjects and a clustering of the derived features resulted...

  1. Prognostic and health management for engineering systems: a review of the data-driven approach and algorithms

    Directory of Open Access Journals (Sweden)

    Thamo Sutharssan

    2015-07-01

    Full Text Available Prognostics and health management (PHM has become an important component of many engineering systems and products, where algorithms are used to detect anomalies, diagnose faults and predict remaining useful lifetime (RUL. PHM can provide many advantages to users and maintainers. Although primary goals are to ensure the safety, provide state of the health and estimate RUL of the components and systems, there are also financial benefits such as operational and maintenance cost reductions and extended lifetime. This study aims at reviewing the current status of algorithms and methods used to underpin different existing PHM approaches. The focus is on providing a structured and comprehensive classification of the existing state-of-the-art PHM approaches, data-driven approaches and algorithms.

  2. An Interactive Platform to Visualize Data-Driven Clinical Pathways for the Management of Multiple Chronic Conditions.

    Science.gov (United States)

    Zhang, Yiye; Padman, Rema

    2017-01-01

    Patients with multiple chronic conditions (MCC) pose an increasingly complex health management challenge worldwide, particularly due to the significant gap in our understanding of how to provide coordinated care. Drawing on our prior research on learning data-driven clinical pathways from actual practice data, this paper describes a prototype, interactive platform for visualizing the pathways of MCC to support shared decision making. Created using Python web framework, JavaScript library and our clinical pathway learning algorithm, the visualization platform allows clinicians and patients to learn the dominant patterns of co-progression of multiple clinical events from their own data, and interactively explore and interpret the pathways. We demonstrate functionalities of the platform using a cluster of 36 patients, identified from a dataset of 1,084 patients, who are diagnosed with at least chronic kidney disease, hypertension, and diabetes. Future evaluation studies will explore the use of this platform to better understand and manage MCC.

  3. Data-driven modeling of sleep EEG and EOG reveals characteristics indicative of pre-Parkinson's and Parkinson's disease

    DEFF Research Database (Denmark)

    Christensen, Julie Anja Engelhard; Zoetmulder, Marielle; Koch, Henriette

    2014-01-01

    patients with idiopathic REM sleep behavior disorder (iRBD) and 36 patients with Parkinson's disease (PD). The data were divided into training and validation datasets and features reflecting EEG and EOG characteristics based on topics were computed. The most discriminative feature subset for separating i...... and the ability to maintain NREM and REM sleep have potential as early PD biomarkers. Data-driven analysis of sleep may contribute to the evaluation of neurodegenerative patients. (C) 2014 Elsevier B.V. All rights reserved.......Background: Manual scoring of sleep relies on identifying certain characteristics in polysomnograph (PSG) signals. However, these characteristics are disrupted in patients with neurodegenerative diseases. New method: This study evaluates sleep using a topic modeling and unsupervised learning...

  4. Big data-driven business how to use big data to win customers, beat competitors, and boost profits

    CERN Document Server

    Glass, Russell

    2014-01-01

    Get the expert perspective and practical advice on big data The Big Data-Driven Business: How to Use Big Data to Win Customers, Beat Competitors, and Boost Profits makes the case that big data is for real, and more than just big hype. The book uses real-life examples-from Nate Silver to Copernicus, and Apple to Blackberry-to demonstrate how the winners of the future will use big data to seek the truth. Written by a marketing journalist and the CEO of a multi-million-dollar B2B marketing platform that reaches more than 90% of the U.S. business population, this book is a comprehens

  5. A data-driven adaptive controller for a class of unknown nonlinear discrete-time systems with estimated PPD

    Directory of Open Access Journals (Sweden)

    Chidentree Treesatayapun

    2015-06-01

    Full Text Available An adaptive control scheme based on data-driven controller (DDC is proposed in this article. Unlike several DDC techniques, the proposed controller is constructed by an adaptive fuzzy rule emulated network (FREN which is able to include human knowledge based on controlled plant's input–output signals within the format of IF-THEN rules. Regarding to this advantage, an on-line estimation of pseudo partial derivative (PPD and resetting algorithms, which are commonly used by DDC, can be omitted here. Furthermore, a novel adaptive algorithm is introduced to minimize for both tracking error and control effort with stability analysis for the closed-loop system. The experimental system with brushed DC-motor current control is constructed to validate the performance of the proposed control scheme. Comparative results with conventional DDC and radial basis function (RBF controllers demonstrate that the proposed controller can provide the less tracking error and minimize the control effort.

  6. Rural Neighborhood Walkability: Implications for Assessment.

    Science.gov (United States)

    Kegler, Michelle C; Alcantara, Iris; Haardörfer, Regine; Gemma, Alexandra; Ballard, Denise; Gazmararian, Julie

    2015-06-16

    Physical activity levels, including walking, are lower in the southern U.S., particularly in rural areas. This study investigated the concept of rural neighborhood walkability to aid in developing tools for assessing walkability and to identify intervention targets in rural communities. Semi-structured interviews were conducted with physically active adults (n = 29) in rural Georgia. Mean age of participants was 55.9 years; 66% were male, 76% were white, and 24% were African American. Participants drew maps of their neighborhoods and discussed the relevance of typical domains of walkability to their decisions to exercise. Comparative analyses were conducted to identify major themes. The majority felt the concept of neighborhood was applicable and viewed their neighborhood as small geographically (less than 0.5 square miles). Sidewalks were not viewed as essential for neighborhood-based physical activity and typical destinations for walking were largely absent. Destinations within walking distance included neighbors' homes and bodies of water. Views were mixed on whether shade, safety, dogs, and aesthetics affected decisions to exercise in their neighborhoods. Measures of neighborhood walkability in rural areas should acknowledge the small size of self-defined neighborhoods, that walking in rural areas is likely for leisure time exercise, and that some domains may not be relevant.

  7. Neighborhood quality and labor market outcomes

    DEFF Research Database (Denmark)

    Damm, Anna Piil

    2014-01-01

    of refugee men. Their labor market outcomes are also not affected by the overall employment rate and the overall average skill level in the neighborhood. However, an increase in the average skill level of non-Western immigrant men living in the neighborhood raises their employment probability, while...

  8. Community Gardening, Neighborhood Meetings, and Social Capital

    Science.gov (United States)

    Alaimo, Katherine; Reischl, Thomas M.; Allen, Julie Ober

    2010-01-01

    This study examined associations between participation in community gardening/beautification projects and neighborhood meetings with perceptions of social capital at both the individual and neighborhood levels. Data were analyzed from a cross-sectional stratified random telephone survey conducted in Flint, Michigan (N=1916). Hierarchical linear…

  9. Neighborhood social capital and individual health

    NARCIS (Netherlands)

    Mohnen, S.M.; Groenewegen, P.P.; Völker, B.G.M.; Flap, H.D.

    2010-01-01

    Neighborhood social capital is increasingly considered to be an important determinant of an individual’s health. Using data from the Netherlands we investigate the influence of neighborhood social capital on an individual’s self-reported health, while accounting for other conditions of health on

  10. Better Buildings Neighborhood Program Progress Stories

    Energy Technology Data Exchange (ETDEWEB)

    None

    2012-04-19

    n neighborhoods across the country, stories are emerging constantly of individuals, businesses, and organizations that are benefiting from energy efficiency. Included are the stories of real people making their homes, businesses, and communities better with the help of the Better Buildings Neighborhood Program.

  11. Perceived Neighborhood Safety and Adolescent School Functioning

    Science.gov (United States)

    Martin-Storey, Alexa; Crosnoe, Robert

    2014-01-01

    This study examined the association between adolescents' perceptions of their neighborhoods' safety and multiple elements of their functioning in school with data on 15 year olds from the NICHD Study of Early Child Care and Youth Development (n = 924). In general, perceived neighborhood safety was more strongly associated with aspects of schooling…

  12. Neighborhood social capital and individual health.

    NARCIS (Netherlands)

    Mohnen, S.M.; Groenewegen, P.P.; Völker, B.; Flap, H.

    2011-01-01

    Neighborhood social capital is increasingly considered to be an important determinant of an individual's health. Using data from the Netherlands we investigate the influence of neighborhood social capital on an individual's self-reported health, while accounting for other conditions of health on

  13. Data Driven Professional Development Design for Out-of-School Time Educators Using Planetary Science and Engineering Educational Materials

    Science.gov (United States)

    Clark, J.; Bloom, N.

    2017-12-01

    Data driven design practices should be the basis for any effective educational product, particularly those used to support STEM learning and literacy. Planetary Learning that Advances the Nexus of Engineering, Technology, and Science (PLANETS) is a five-year NASA-funded (NNX16AC53A) interdisciplinary and cross-institutional partnership to develop and disseminate STEM out-of-school time (OST) curricular and professional development units that integrate planetary science, technology, and engineering. The Center for Science Teaching and Learning at Northern Arizona University, the U.S. Geological Survey Astrogeology Science Center, and the Museum of Science Boston are partners in developing, piloting, and researching the impact of three out of school time units. Two units are for middle grades youth and one is for upper elementary aged youth. The presentation will highlight the data driven development process of the educational products used to provide support for educators teaching these curriculum units. This includes how data from the project needs assessment, curriculum pilot testing, and professional support product field tests are used in the design of products for out of school time educators. Based on data analysis, the project is developing and testing four tiers of professional support for OST educators. Tier 1 meets the immediate needs of OST educators to teach curriculum and include how-to videos and other direct support materials. Tier 2 provides additional content and pedagogical knowledge and includes short content videos designed to specifically address the content of the curriculum. Tier 3 elaborates on best practices in education and gives guidance on methods, for example, to develop cultural relevancy for underrepresented students. Tier 4 helps make connections to other NASA or educational products that support STEM learning in out of school settings. Examples of the tiers of support will be provided.

  14. Data-Driven Nonlinear Subspace Modeling for Prediction and Control of Molten Iron Quality Indices in Blast Furnace Ironmaking

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Ping; Song, Heda; Wang, Hong; Chai, Tianyou

    2017-09-01

    Blast furnace (BF) in ironmaking is a nonlinear dynamic process with complicated physical-chemical reactions, where multi-phase and multi-field coupling and large time delay occur during its operation. In BF operation, the molten iron temperature (MIT) as well as Si, P and S contents of molten iron are the most essential molten iron quality (MIQ) indices, whose measurement, modeling and control have always been important issues in metallurgic engineering and automation field. This paper develops a novel data-driven nonlinear state space modeling for the prediction and control of multivariate MIQ indices by integrating hybrid modeling and control techniques. First, to improve modeling efficiency, a data-driven hybrid method combining canonical correlation analysis and correlation analysis is proposed to identify the most influential controllable variables as the modeling inputs from multitudinous factors would affect the MIQ indices. Then, a Hammerstein model for the prediction of MIQ indices is established using the LS-SVM based nonlinear subspace identification method. Such a model is further simplified by using piecewise cubic Hermite interpolating polynomial method to fit the complex nonlinear kernel function. Compared to the original Hammerstein model, this simplified model can not only significantly reduce the computational complexity, but also has almost the same reliability and accuracy for a stable prediction of MIQ indices. Last, in order to verify the practicability of the developed model, it is applied in designing a genetic algorithm based nonlinear predictive controller for multivariate MIQ indices by directly taking the established model as a predictor. Industrial experiments show the advantages and effectiveness of the proposed approach.

  15. A data-driven emulation framework for representing water-food nexus in a changing cold region

    Science.gov (United States)

    Nazemi, A.; Zandmoghaddam, S.; Hatami, S.

    2017-12-01

    Water resource systems are under increasing pressure globally. Growing population along with competition between water demands and emerging effects of climate change have caused enormous vulnerabilities in water resource management across many regions. Diagnosing such vulnerabilities and provision of effective adaptation strategies requires the availability of simulation tools that can adequately represent the interactions between competing water demands for limiting water resources and inform decision makers about the critical vulnerability thresholds under a range of potential natural and anthropogenic conditions. Despite a significant progress in integrated modeling of water resource systems, regional models are often unable to fully represent the contemplating dynamics within the key elements of water resource systems locally. Here we propose a data-driven approach to emulate a complex regional water resource system model developed for Oldman River Basin in southern Alberta, Canada. The aim of the emulation is to provide a detailed understanding of the trade-offs and interaction at the Oldman Reservoir, which is the key to flood control and irrigated agriculture in this over-allocated semi-arid cold region. Different surrogate models are developed to represent the dynamic of irrigation demand and withdrawal as well as reservoir evaporation and release individually. The nan-falsified offline models are then integrated through the water balance equation at the reservoir location to provide a coupled model for representing the dynamic of reservoir operation and water allocation at the local scale. The performance of individual and integrated models are rigorously examined and sources of uncertainty are highlighted. To demonstrate the practical utility of such surrogate modeling approach, we use the integrated data-driven model for examining the trade-off in irrigation water supply, reservoir storage and release under a range of changing climate, upstream

  16. Data-driven Development of ROTEM and TEG Algorithms for the Management of Trauma Hemorrhage: A Prospective Observational Multicenter Study.

    Science.gov (United States)

    Baksaas-Aasen, Kjersti; Van Dieren, Susan; Balvers, Kirsten; Juffermans, Nicole P; Næss, Pål A; Rourke, Claire; Eaglestone, Simon; Ostrowski, Sisse R; Stensballe, Jakob; Stanworth, Simon; Maegele, Marc; Goslings, Carel; Johansson, Pär I; Brohi, Karim; Gaarder, Christine

    2018-05-23

    Developing pragmatic data-driven algorithms for management of trauma induced coagulopathy (TIC) during trauma hemorrhage for viscoelastic hemostatic assays (VHAs). Admission data from conventional coagulation tests (CCT), rotational thrombelastometry (ROTEM) and thrombelastography (TEG) were collected prospectively at 6 European trauma centers during 2008 to 2013. To identify significant VHA parameters capable of detecting TIC (defined as INR > 1.2), hypofibrinogenemia (< 2.0 g/L), and thrombocytopenia (< 100 x10/L), univariate regression models were constructed. Area under the curve (AUC) was calculated, and threshold values for TEG and ROTEM parameters with 70% sensitivity were included in the algorithms. A total of, 2287 adult trauma patients (ROTEM: 2019 and TEG: 968) were enrolled. FIBTEM clot amplitude at 5 minutes (CA5) had the largest AUC and 10 mm detected hypofibrinogenemia with 70% sensitivity. The corresponding value for functional fibrinogen (FF) TEG maximum amplitude (MA) was 19 mm. Thrombocytopenia was similarly detected using the calculated threshold EXTEM-FIBTEM CA5 30 mm. The corresponding rTEG-FF TEG MA was 46 mm. TIC was identified by EXTEM CA5 41 mm, rTEG MA 64 mm (80% sensitivity). For hyperfibrinolysis, we examined the relationship between viscoelastic lysis parameters and clinical outcomes, with resulting threshold values of 85% for EXTEM Li30 and 10% for rTEG Ly30.Based on these analyses, we constructed algorithms for ROTEM, TEG, and CCTs to be used in addition to ratio driven transfusion and tranexamic acid. We describe a systematic approach to define threshold parameters for ROTEM and TEG. These parameters were incorporated into algorithms to support data-driven adjustments of resuscitation with therapeutics, to optimize damage control resuscitation practice in trauma.

  17. Migraine Subclassification via a Data-Driven Automated Approach Using Multimodality Factor Mixture Modeling of Brain Structure Measurements.

    Science.gov (United States)

    Schwedt, Todd J; Si, Bing; Li, Jing; Wu, Teresa; Chong, Catherine D

    2017-07-01

    The current subclassification of migraine is according to headache frequency and aura status. The variability in migraine symptoms, disease course, and response to treatment suggest the presence of additional heterogeneity or subclasses within migraine. The study objective was to subclassify migraine via a data-driven approach, identifying latent factors by jointly exploiting multiple sets of brain structural features obtained via magnetic resonance imaging (MRI). Migraineurs (n = 66) and healthy controls (n = 54) had brain MRI measurements of cortical thickness, cortical surface area, and volumes for 68 regions. A multimodality factor mixture model was used to subclassify MRIs and to determine the brain structural factors that most contributed to the subclassification. Clinical characteristics of subjects in each subgroup were compared. Automated MRI classification divided the subjects into two subgroups. Migraineurs in subgroup #1 had more severe allodynia symptoms during migraines (6.1 ± 5.3 vs. 3.6 ± 3.2, P = .03), more years with migraine (19.2 ± 11.3 years vs 13 ± 8.3 years, P = .01), and higher Migraine Disability Assessment (MIDAS) scores (25 ± 22.9 vs 15.7 ± 12.2, P = .04). There were not differences in headache frequency or migraine aura status between the two subgroups. Data-driven subclassification of brain MRIs based upon structural measurements identified two subgroups. Amongst migraineurs, the subgroups differed in allodynia symptom severity, years with migraine, and migraine-related disability. Since allodynia is associated with this imaging-based subclassification of migraine and prior publications suggest that allodynia impacts migraine treatment response and disease prognosis, future migraine diagnostic criteria could consider allodynia when defining migraine subgroups. © 2017 American Headache Society.

  18. DeDaL: Cytoscape 3 app for producing and morphing data-driven and structure-driven network layouts.

    Science.gov (United States)

    Czerwinska, Urszula; Calzone, Laurence; Barillot, Emmanuel; Zinovyev, Andrei

    2015-08-14

    Visualization and analysis of molecular profiling data together with biological networks are able to provide new mechanistic insights into biological functions. Currently, it is possible to visualize high-throughput data on top of pre-defined network layouts, but they are not always adapted to a given data analysis task. A network layout based simultaneously on the network structure and the associated multidimensional data might be advantageous for data visualization and analysis in some cases. We developed a Cytoscape app, which allows constructing biological network layouts based on the data from molecular profiles imported as values of node attributes. DeDaL is a Cytoscape 3 app, which uses linear and non-linear algorithms of dimension reduction to produce data-driven network layouts based on multidimensional data (typically gene expression). DeDaL implements several data pre-processing and layout post-processing steps such as continuous morphing between two arbitrary network layouts and aligning one network layout with respect to another one by rotating and mirroring. The combination of all these functionalities facilitates the creation of insightful network layouts representing both structural network features and correlation patterns in multivariate data. We demonstrate the added value of applying DeDaL in several practical applications, including an example of a large protein-protein interaction network. DeDaL is a convenient tool for applying data dimensionality reduction methods and for designing insightful data displays based on data-driven layouts of biological networks, built within Cytoscape environment. DeDaL is freely available for downloading at http://bioinfo-out.curie.fr/projects/dedal/.

  19. Reproducibility of data-driven dietary patterns in two groups of adult Spanish women from different studies.

    Science.gov (United States)

    Castelló, Adela; Lope, Virginia; Vioque, Jesús; Santamariña, Carmen; Pedraz-Pingarrón, Carmen; Abad, Soledad; Ederra, Maria; Salas-Trejo, Dolores; Vidal, Carmen; Sánchez-Contador, Carmen; Aragonés, Nuria; Pérez-Gómez, Beatriz; Pollán, Marina

    2016-08-01

    The objective of the present study was to assess the reproducibility of data-driven dietary patterns in different samples extracted from similar populations. Dietary patterns were extracted by applying principal component analyses to the dietary information collected from a sample of 3550 women recruited from seven screening centres belonging to the Spanish breast cancer (BC) screening network (Determinants of Mammographic Density in Spain (DDM-Spain) study). The resulting patterns were compared with three dietary patterns obtained from a previous Spanish case-control study on female BC (Epidemiological study of the Spanish group for breast cancer research (GEICAM: grupo Español de investigación en cáncer de mama)) using the dietary intake data of 973 healthy participants. The level of agreement between patterns was determined using both the congruence coefficient (CC) between the pattern loadings (considering patterns with a CC≥0·85 as fairly similar) and the linear correlation between patterns scores (considering as fairly similar those patterns with a statistically significant correlation). The conclusions reached with both methods were compared. This is the first study exploring the reproducibility of data-driven patterns from two studies and the first using the CC to determine pattern similarity. We were able to reproduce the EpiGEICAM Western pattern in the DDM-Spain sample (CC=0·90). However, the reproducibility of the Prudent (CC=0·76) and Mediterranean (CC=0·77) patterns was not as good. The linear correlation between pattern scores was statistically significant in all cases, highlighting its arbitrariness for determining pattern similarity. We conclude that the reproducibility of widely prevalent dietary patterns is better than the reproducibility of more population-specific patterns. More methodological studies are needed to establish an objective measurement and threshold to determine pattern similarity.

  20. A predictive estimation method for carbon dioxide transport by data-driven modeling with a physically-based data model.

    Science.gov (United States)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young; Jun, Seong-Chun; Choung, Sungwook; Yun, Seong-Taek; Oh, Junho; Kim, Hyun-Jun

    2017-11-01

    In this study, a data-driven method for predicting CO 2 leaks and associated concentrations from geological CO 2 sequestration is developed. Several candidate models are compared based on their reproducibility and predictive capability for CO 2 concentration measurements from the Environment Impact Evaluation Test (EIT) site in Korea. Based on the data mining results, a one-dimensional solution of the advective-dispersive equation for steady flow (i.e., Ogata-Banks solution) is found to be most representative for the test data, and this model is adopted as the data model for the developed method. In the validation step, the method is applied to estimate future CO 2 concentrations with the reference estimation by the Ogata-Banks solution, where a part of earlier data is used as the training dataset. From the analysis, it is found that the ensemble mean of multiple estimations based on the developed method shows high prediction accuracy relative to the reference estimation. In addition, the majority of the data to be predicted are included in the proposed quantile interval, which suggests adequate representation of the uncertainty by the developed method. Therefore, the incorporation of a reasonable physically-based data model enhances the prediction capability of the data-driven model. The proposed method is not confined to estimations of CO 2 concentration and may be applied to various real-time monitoring data from subsurface sites to develop automated control, management or decision-making systems. Copyright © 2017 Elsevier B.V. All rights reserved.