WorldWideScience

Sample records for performance data-driven approach

  1. Combining engineering and data-driven approaches

    DEFF Research Database (Denmark)

    Fischer, Katharina; De Sanctis, Gianluca; Kohler, Jochen

    2015-01-01

    Two general approaches may be followed for the development of a fire risk model: statistical models based on observed fire losses can support simple cost-benefit studies but are usually not detailed enough for engineering decision-making. Engineering models, on the other hand, require many assump...... to the calibration of a generic fire risk model for single family houses to Swiss insurance data. The example demonstrates that the bias in the risk estimation can be strongly reduced by model calibration.......Two general approaches may be followed for the development of a fire risk model: statistical models based on observed fire losses can support simple cost-benefit studies but are usually not detailed enough for engineering decision-making. Engineering models, on the other hand, require many...... assumptions that may result in a biased risk assessment. In two related papers we show how engineering and data-driven modelling can be combined by developing generic risk models that are calibrated to statistical data on observed fire events. The focus of the present paper is on the calibration procedure...

  2. Data-Driven Controller Design The H2 Approach

    CERN Document Server

    Sanfelice Bazanella, Alexandre; Eckhard, Diego

    2012-01-01

    Data-driven methodologies have recently emerged as an important paradigm alternative to model-based controller design and several such methodologies are formulated as an H2 performance optimization. This book presents a comprehensive theoretical treatment of the H2 approach to data-driven control design. The fundamental properties implied by the H2 problem formulation are analyzed in detail, so that common features to all solutions are identified. Direct methods (VRFT) and iterative methods (IFT, DFT, CbT) are put under a common theoretical framework. The choice of the reference model, the experimental conditions, the optimization method to be used, and several other designer’s choices are crucial to the quality of the final outcome, and firm guidelines for all these choices are derived from the theoretical analysis presented. The practical application of the concepts in the book is illustrated with a large number of practical designs performed for different classes of processes: thermal, fluid processing a...

  3. Data-driven approach for auditory profiling

    DEFF Research Database (Denmark)

    Sanchez Lopez, Raul; Bianchi, Federica; Fereczkowski, Michal

    2017-01-01

    Nowadays, the pure-tone audiogram is the main tool used to characterizehearing loss and to fit hearing aids. However, the perceptual consequencesof hearing loss are typically not only associated with a loss of sensitivity, butalso with a clarity loss that is not captured by the audiogram. A detai......-in-noise perception. The current approach is promising for analyzingother existing data sets in order to select the most relevant tests for auditoryprofiling....

  4. Controller synthesis for negative imaginary systems: a data driven approach

    KAUST Repository

    Mabrok, Mohamed

    2016-02-17

    The negative imaginary (NI) property occurs in many important applications. For instance, flexible structure systems with collocated force actuators and position sensors can be modelled as negative imaginary systems. In this study, a data-driven controller synthesis methodology for NI systems is presented. In this approach, measured frequency response data of the plant is used to construct the controller frequency response at every frequency by minimising a cost function. Then, this controller response is used to identify the controller transfer function using system identification methods. © The Institution of Engineering and Technology 2016.

  5. Data driven approaches for diagnostics and optimization of NPP operation

    International Nuclear Information System (INIS)

    Pliska, J.; Machat, Z.

    2014-01-01

    The efficiency and heat rate is an important indicator of both the health of the power plant equipment and the quality of power plant operation. To achieve this challenges powerful tool is a statistical data processing of large data sets which are stored in data historians. These large data sets contain useful information about process quality and equipment and sensor health. The paper discusses data-driven approaches for model building of main power plant equipment such as condenser, cooling tower and the overall thermal cycle as well using multivariate regression techniques based on so called a regression triplet - data, model and method. Regression models comprise a base for diagnostics and optimization tasks. Diagnostics and optimization tasks are demonstrated on practical cases - diagnostics of main power plant equipment to early identify equipment fault, and optimization task of cooling circuit by cooling water flow control to achieve for a given boundary conditions the highest power output. (authors)

  6. A Data-Driven Approach to Realistic Shape Morphing

    KAUST Repository

    Gao, Lin; Lai, Yu-Kun; Huang, Qi-Xing; Hu, Shi-Min

    2013-01-01

    Morphing between 3D objects is a fundamental technique in computer graphics. Traditional methods of shape morphing focus on establishing meaningful correspondences and finding smooth interpolation between shapes. Such methods however only take geometric information as input and thus cannot in general avoid producing unnatural interpolation, in particular for large-scale deformations. This paper proposes a novel data-driven approach for shape morphing. Given a database with various models belonging to the same category, we treat them as data samples in the plausible deformation space. These models are then clustered to form local shape spaces of plausible deformations. We use a simple metric to reasonably represent the closeness between pairs of models. Given source and target models, the morphing problem is casted as a global optimization problem of finding a minimal distance path within the local shape spaces connecting these models. Under the guidance of intermediate models in the path, an extended as-rigid-as-possible interpolation is used to produce the final morphing. By exploiting the knowledge of plausible models, our approach produces realistic morphing for challenging cases as demonstrated by various examples in the paper. © 2013 The Eurographics Association and Blackwell Publishing Ltd.

  7. A data-driven approach to quality risk management.

    Science.gov (United States)

    Alemayehu, Demissie; Alvir, Jose; Levenstein, Marcia; Nickerson, David

    2013-10-01

    An effective clinical trial strategy to ensure patient safety as well as trial quality and efficiency involves an integrated approach, including prospective identification of risk factors, mitigation of the risks through proper study design and execution, and assessment of quality metrics in real-time. Such an integrated quality management plan may also be enhanced by using data-driven techniques to identify risk factors that are most relevant in predicting quality issues associated with a trial. In this paper, we illustrate such an approach using data collected from actual clinical trials. Several statistical methods were employed, including the Wilcoxon rank-sum test and logistic regression, to identify the presence of association between risk factors and the occurrence of quality issues, applied to data on quality of clinical trials sponsored by Pfizer. ONLY A SUBSET OF THE RISK FACTORS HAD A SIGNIFICANT ASSOCIATION WITH QUALITY ISSUES, AND INCLUDED: Whether study used Placebo, whether an agent was a biologic, unusual packaging label, complex dosing, and over 25 planned procedures. Proper implementation of the strategy can help to optimize resource utilization without compromising trial integrity and patient safety.

  8. A data-driven approach to quality risk management

    Directory of Open Access Journals (Sweden)

    Demissie Alemayehu

    2013-01-01

    Full Text Available Aim: An effective clinical trial strategy to ensure patient safety as well as trial quality and efficiency involves an integrated approach, including prospective identification of risk factors, mitigation of the risks through proper study design and execution, and assessment of quality metrics in real-time. Such an integrated quality management plan may also be enhanced by using data-driven techniques to identify risk factors that are most relevant in predicting quality issues associated with a trial. In this paper, we illustrate such an approach using data collected from actual clinical trials. Materials and Methods: Several statistical methods were employed, including the Wilcoxon rank-sum test and logistic regression, to identify the presence of association between risk factors and the occurrence of quality issues, applied to data on quality of clinical trials sponsored by Pfizer. Results: Only a subset of the risk factors had a significant association with quality issues, and included: Whether study used Placebo, whether an agent was a biologic, unusual packaging label, complex dosing, and over 25 planned procedures. Conclusion: Proper implementation of the strategy can help to optimize resource utilization without compromising trial integrity and patient safety.

  9. A Data-Driven Approach to Realistic Shape Morphing

    KAUST Repository

    Gao, Lin

    2013-05-01

    Morphing between 3D objects is a fundamental technique in computer graphics. Traditional methods of shape morphing focus on establishing meaningful correspondences and finding smooth interpolation between shapes. Such methods however only take geometric information as input and thus cannot in general avoid producing unnatural interpolation, in particular for large-scale deformations. This paper proposes a novel data-driven approach for shape morphing. Given a database with various models belonging to the same category, we treat them as data samples in the plausible deformation space. These models are then clustered to form local shape spaces of plausible deformations. We use a simple metric to reasonably represent the closeness between pairs of models. Given source and target models, the morphing problem is casted as a global optimization problem of finding a minimal distance path within the local shape spaces connecting these models. Under the guidance of intermediate models in the path, an extended as-rigid-as-possible interpolation is used to produce the final morphing. By exploiting the knowledge of plausible models, our approach produces realistic morphing for challenging cases as demonstrated by various examples in the paper. © 2013 The Eurographics Association and Blackwell Publishing Ltd.

  10. Data-driven approach for creating synthetic electronic medical records.

    Science.gov (United States)

    Buczak, Anna L; Babin, Steven; Moniz, Linda

    2010-10-14

    New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population. We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4

  11. Data-driven approach for creating synthetic electronic medical records

    Directory of Open Access Journals (Sweden)

    Moniz Linda

    2010-10-01

    Full Text Available Abstract Background New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. Methods This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia and for background records. The method developed has three major steps: 1 synthetic patient identity and basic information generation; 2 identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3 adaptation of these care patterns to the synthetic patient population. Results We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. Conclusions A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders. The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious

  12. Using Shape Memory Alloys: A Dynamic Data Driven Approach

    KAUST Repository

    Douglas, Craig C.

    2013-06-01

    Shape Memory Alloys (SMAs) are capable of changing their crystallographic structure due to changes of either stress or temperature. SMAs are used in a number of aerospace devices and are required in some devices in exotic environments. We are developing dynamic data driven application system (DDDAS) tools to monitor and change SMAs in real time for delivering payloads by aerospace vehicles. We must be able to turn on and off the sensors and heating units, change the stress on the SMA, monitor on-line data streams, change scales based on incoming data, and control what type of data is generated. The application must have the capability to be run and steered remotely as an unmanned feedback control loop.

  13. A robust data-driven approach for gene ontology annotation.

    Science.gov (United States)

    Li, Yanpeng; Yu, Hong

    2014-01-01

    Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks. © The Author(s) 2014. Published by Oxford University Press.

  14. Data-Driven Anomaly Detection Performance for the Ares I-X Ground Diagnostic Prototype

    Science.gov (United States)

    Martin, Rodney A.; Schwabacher, Mark A.; Matthews, Bryan L.

    2010-01-01

    In this paper, we will assess the performance of a data-driven anomaly detection algorithm, the Inductive Monitoring System (IMS), which can be used to detect simulated Thrust Vector Control (TVC) system failures. However, the ability of IMS to detect these failures in a true operational setting may be related to the realistic nature of how they are simulated. As such, we will investigate both a low fidelity and high fidelity approach to simulating such failures, with the latter based upon the underlying physics. Furthermore, the ability of IMS to detect anomalies that were previously unknown and not previously simulated will be studied in earnest, as well as apparent deficiencies or misapplications that result from using the data-driven paradigm. Our conclusions indicate that robust detection performance of simulated failures using IMS is not appreciably affected by the use of a high fidelity simulation. However, we have found that the inclusion of a data-driven algorithm such as IMS into a suite of deployable health management technologies does add significant value.

  15. Designing Data-Driven Battery Prognostic Approaches for Variable Loading Profiles: Some Lessons Learned

    Data.gov (United States)

    National Aeronautics and Space Administration — Among various approaches for implementing prognostic algorithms data-driven algorithms are popular in the industry due to their intuitive nature and relatively fast...

  16. Controller synthesis for negative imaginary systems: a data driven approach

    KAUST Repository

    Mabrok, Mohamed; Petersen, Ian R.

    2016-01-01

    -driven controller synthesis methodology for NI systems is presented. In this approach, measured frequency response data of the plant is used to construct the controller frequency response at every frequency by minimising a cost function. Then, this controller

  17. Data Driven Fault Tolerant Control : A Subspace Approach

    NARCIS (Netherlands)

    Dong, J.

    2009-01-01

    The main stream research on fault detection and fault tolerant control has been focused on model based methods. As far as a model is concerned, changes therein due to faults have to be extracted from measured data. Generally speaking, existing approaches process measured inputs and outputs either by

  18. A data driven approach for automating vehicle activated signs

    OpenAIRE

    Jomaa, Diala

    2016-01-01

    Vehicle activated signs (VAS) display a warning message when drivers exceed a particular threshold. VAS are often installed on local roads to display a warning message depending on the speed of the approaching vehicles. VAS are usually powered by electricity; however, battery and solar powered VAS are also commonplace. This thesis investigated devel-opment of an automatic trigger speed of vehicle activated signs in order to influence driver behaviour, the effect of which has been measured in ...

  19. Autonomous Soil Assessment System: A Data-Driven Approach to Planetary Mobility Hazard Detection

    Science.gov (United States)

    Raimalwala, K.; Faragalli, M.; Reid, E.

    2018-04-01

    The Autonomous Soil Assessment System predicts mobility hazards for rovers. Its development and performance are presented, with focus on its data-driven models, machine learning algorithms, and real-time sensor data fusion for predictive analytics.

  20. A data-driven approach to patient blood management.

    Science.gov (United States)

    Cohn, Claudia S; Welbig, Julie; Bowman, Robert; Kammann, Susan; Frey, Katherine; Zantek, Nicole

    2014-02-01

    Patient blood management (PBM) has become a topic of intense interest; however, implementing a robust PBM system in a large academic hospital can be a challenge. In a joint effort between transfusion medicine and information technology, we have developed three overlapping databases that allow for a comprehensive, semiautomated approach to monitoring up-to-date red blood cell (RBC) usage in our hospital. Data derived from this work have allowed us to target our PBM efforts. Information on transfusions is collected using three databases: daily report, discharge database, and denominator database. The daily report collects data on all transfusions in the past 24 hours. The discharge database integrates transfusion data and diagnostic billing codes. The denominator database allows for rate calculations by tracking all patients with a hemoglobin test ordered. A set of algorithms is applied to automatically audit RBC transfusions. The transfusions that do not fit the algorithms' rules are manually reviewed. Data from audits are compiled into reports and distributed to medical directors. Data are also used to target education efforts. Since our PBM program began, the percentage of appropriate RBC orders increased from an initial 70%-80% to 90%-95%, and the overall RBC transfusions/1000 patient-days has decreased by 67% in targeted areas of the hospital. Our PBM program has shaved approximately 3% from our hospital's blood budget. Our semiautomated auditing system allows us to quickly and comprehensively analyze and track blood usage throughout our hospital. Using this technology, we have seen improvements in our hospital's PBM. © 2013 American Association of Blood Banks.

  1. Data-driven performance evaluation method for CMS RPC trigger ...

    Indian Academy of Sciences (India)

    2012-10-06

    Oct 6, 2012 ... hardware-implemented algorithm, which performs the task of combining and merging information from muon ... Figure 1 shows the comparison of efficiencies obtained with the two methods containing .... [3] The CMS Collaboration, The trigger and data acquisition project, Volume 1, The Level 1. Trigger ...

  2. Data Driven Performance Evaluation of Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Antonio A. F. Loureiro

    2010-03-01

    Full Text Available Wireless Sensor Networks are presented as devices for signal sampling and reconstruction. Within this framework, the qualitative and quantitative influence of (i signal granularity, (ii spatial distribution of sensors, (iii sensors clustering, and (iv signal reconstruction procedure are assessed. This is done by defining an error metric and performing a Monte Carlo experiment. It is shown that all these factors have significant impact on the quality of the reconstructed signal. The extent of such impact is quantitatively assessed.

  3. A data-driven multiplicative fault diagnosis approach for automation processes.

    Science.gov (United States)

    Hao, Haiyang; Zhang, Kai; Ding, Steven X; Chen, Zhiwen; Lei, Yaguo

    2014-09-01

    This paper presents a new data-driven method for diagnosing multiplicative key performance degradation in automation processes. Different from the well-established additive fault diagnosis approaches, the proposed method aims at identifying those low-level components which increase the variability of process variables and cause performance degradation. Based on process data, features of multiplicative fault are extracted. To identify the root cause, the impact of fault on each process variable is evaluated in the sense of contribution to performance degradation. Then, a numerical example is used to illustrate the functionalities of the method and Monte-Carlo simulation is performed to demonstrate the effectiveness from the statistical viewpoint. Finally, to show the practical applicability, a case study on the Tennessee Eastman process is presented. Copyright © 2013. Published by Elsevier Ltd.

  4. Articulatory Distinctiveness of Vowels and Consonants: A Data-Driven Approach

    Science.gov (United States)

    Wang, Jun; Green, Jordan R.; Samal, Ashok; Yunusova, Yana

    2013-01-01

    Purpose: To quantify the articulatory distinctiveness of 8 major English vowels and 11 English consonants based on tongue and lip movement time series data using a data-driven approach. Method: Tongue and lip movements of 8 vowels and 11 consonants from 10 healthy talkers were

  5. A data-driven approach for retrieving temperatures and abundances in brown dwarf atmospheres

    OpenAIRE

    Line, MR; Fortney, JJ; Marley, MS; Sorahana, S

    2014-01-01

    © 2014. The American Astronomical Society. All rights reserved. Brown dwarf spectra contain a wealth of information about their molecular abundances, temperature structure, and gravity. We present a new data driven retrieval approach, previously used in planetary atmosphere studies, to extract the molecular abundances and temperature structure from brown dwarf spectra. The approach makes few a priori physical assumptions about the state of the atmosphere. The feasibility of the approach is fi...

  6. Using Two Different Approaches to Assess Dietary Patterns: Hypothesis-Driven and Data-Driven Analysis

    Directory of Open Access Journals (Sweden)

    Ágatha Nogueira Previdelli

    2016-09-01

    Full Text Available The use of dietary patterns to assess dietary intake has become increasingly common in nutritional epidemiology studies due to the complexity and multidimensionality of the diet. Currently, two main approaches have been widely used to assess dietary patterns: data-driven and hypothesis-driven analysis. Since the methods explore different angles of dietary intake, using both approaches simultaneously might yield complementary and useful information; thus, we aimed to use both approaches to gain knowledge of adolescents’ dietary patterns. Food intake from a cross-sectional survey with 295 adolescents was assessed by 24 h dietary recall (24HR. In hypothesis-driven analysis, based on the American National Cancer Institute method, the usual intake of Brazilian Healthy Eating Index Revised components were estimated. In the data-driven approach, the usual intake of foods/food groups was estimated by the Multiple Source Method. In the results, hypothesis-driven analysis showed low scores for Whole grains, Total vegetables, Total fruit and Whole fruits, while, in data-driven analysis, fruits and whole grains were not presented in any pattern. High intakes of sodium, fats and sugars were observed in hypothesis-driven analysis with low total scores for Sodium, Saturated fat and SoFAA (calories from solid fat, alcohol and added sugar components in agreement, while the data-driven approach showed the intake of several foods/food groups rich in these nutrients, such as butter/margarine, cookies, chocolate powder, whole milk, cheese, processed meat/cold cuts and candies. In this study, using both approaches at the same time provided consistent and complementary information with regard to assessing the overall dietary habits that will be important in order to drive public health programs, and improve their efficiency to monitor and evaluate the dietary patterns of populations.

  7. A data-driven decomposition approach to model aerodynamic forces on flapping airfoils

    Science.gov (United States)

    Raiola, Marco; Discetti, Stefano; Ianiro, Andrea

    2017-11-01

    In this work, we exploit a data-driven decomposition of experimental data from a flapping airfoil experiment with the aim of isolating the main contributions to the aerodynamic force and obtaining a phenomenological model. Experiments are carried out on a NACA 0012 airfoil in forward flight with both heaving and pitching motion. Velocity measurements of the near field are carried out with Planar PIV while force measurements are performed with a load cell. The phase-averaged velocity fields are transformed into the wing-fixed reference frame, allowing for a description of the field in a domain with fixed boundaries. The decomposition of the flow field is performed by means of the POD applied on the velocity fluctuations and then extended to the phase-averaged force data by means of the Extended POD approach. This choice is justified by the simple consideration that aerodynamic forces determine the largest contributions to the energetic balance in the flow field. Only the first 6 modes have a relevant contribution to the force. A clear relationship can be drawn between the force and the flow field modes. Moreover, the force modes are closely related (yet slightly different) to the contributions of the classic potential models in literature, allowing for their correction. This work has been supported by the Spanish MINECO under Grant TRA2013-41103-P.

  8. A data-driven approach for modeling post-fire debris-flow volumes and their uncertainty

    Science.gov (United States)

    Friedel, Michael J.

    2011-01-01

    This study demonstrates the novel application of genetic programming to evolve nonlinear post-fire debris-flow volume equations from variables associated with a data-driven conceptual model of the western United States. The search space is constrained using a multi-component objective function that simultaneously minimizes root-mean squared and unit errors for the evolution of fittest equations. An optimization technique is then used to estimate the limits of nonlinear prediction uncertainty associated with the debris-flow equations. In contrast to a published multiple linear regression three-variable equation, linking basin area with slopes greater or equal to 30 percent, burn severity characterized as area burned moderate plus high, and total storm rainfall, the data-driven approach discovers many nonlinear and several dimensionally consistent equations that are unbiased and have less prediction uncertainty. Of the nonlinear equations, the best performance (lowest prediction uncertainty) is achieved when using three variables: average basin slope, total burned area, and total storm rainfall. Further reduction in uncertainty is possible for the nonlinear equations when dimensional consistency is not a priority and by subsequently applying a gradient solver to the fittest solutions. The data-driven modeling approach can be applied to nonlinear multivariate problems in all fields of study.

  9. Practical aspects of data-driven motion correction approach for brain SPECT

    International Nuclear Information System (INIS)

    Kyme, A.Z.; Hutton, B.F.; Hatton, R.L.; Skerrett, D.; Barnden, L.

    2002-01-01

    Full text: Patient motion can cause image artifacts in SPECT despite restraining measures. Data-driven detection and correction of motion can be achieved by comparison of acquired data with the forward-projections. By optimising the orientation of a partial reconstruction, parameters can be obtained for each misaligned projection and applied to update this volume using a 3D reconstruction algorithm. Phantom validation was performed to explore practical aspects of this approach. Noisy projection datasets simulating a patient undergoing at least one fully 3D movement during acquisition were compiled from various projections of the digital Hoffman brain phantom. Motion correction was then applied to the reconstructed studies. Correction success was assessed visually and quantitatively. Resilience with respect to subset order and missing data in the reconstruction and updating stages, detector geometry considerations, and the need for implementing an iterated correction were assessed in the process. Effective correction of the corrupted studies was achieved. Visually, artifactual regions in the reconstructed slices were suppressed and/or removed. Typically the ratio of mean square difference between the corrected and reference studies compared to that between the corrupted and reference studies was > 2. Although components of the motions are missed using a single-head implementation, improvement was still evident in the correction. The need for multiple iterations in the approach was small due to the bulk of misalignment errors being corrected in the first pass. Dispersion of subsets for reconstructing and updating the partial reconstruction appears to give optimal correction. Further validation is underway using triple-head physical phantom data. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  10. Probing the dynamics of identified neurons with a data-driven modeling approach.

    Directory of Open Access Journals (Sweden)

    Thomas Nowotny

    2008-07-01

    Full Text Available In controlling animal behavior the nervous system has to perform within the operational limits set by the requirements of each specific behavior. The implications for the corresponding range of suitable network, single neuron, and ion channel properties have remained elusive. In this article we approach the question of how well-constrained properties of neuronal systems may be on the neuronal level. We used large data sets of the activity of isolated invertebrate identified cells and built an accurate conductance-based model for this cell type using customized automated parameter estimation techniques. By direct inspection of the data we found that the variability of the neurons is larger when they are isolated from the circuit than when in the intact system. Furthermore, the responses of the neurons to perturbations appear to be more consistent than their autonomous behavior under stationary conditions. In the developed model, the constraints on different parameters that enforce appropriate model dynamics vary widely from some very tightly controlled parameters to others that are almost arbitrary. The model also allows predictions for the effect of blocking selected ionic currents and to prove that the origin of irregular dynamics in the neuron model is proper chaoticity and that this chaoticity is typical in an appropriate sense. Our results indicate that data driven models are useful tools for the in-depth analysis of neuronal dynamics. The better consistency of responses to perturbations, in the real neurons as well as in the model, suggests a paradigm shift away from measuring autonomous dynamics alone towards protocols of controlled perturbations. Our predictions for the impact of channel blockers on the neuronal dynamics and the proof of chaoticity underscore the wide scope of our approach.

  11. A data-driven approach to identify controls on global fire activity from satellite and climate observations (SOFIA V1

    Directory of Open Access Journals (Sweden)

    M. Forkel

    2017-12-01

    Full Text Available Vegetation fires affect human infrastructures, ecosystems, global vegetation distribution, and atmospheric composition. However, the climatic, environmental, and socioeconomic factors that control global fire activity in vegetation are only poorly understood, and in various complexities and formulations are represented in global process-oriented vegetation-fire models. Data-driven model approaches such as machine learning algorithms have successfully been used to identify and better understand controlling factors for fire activity. However, such machine learning models cannot be easily adapted or even implemented within process-oriented global vegetation-fire models. To overcome this gap between machine learning-based approaches and process-oriented global fire models, we introduce a new flexible data-driven fire modelling approach here (Satellite Observations to predict FIre Activity, SOFIA approach version 1. SOFIA models can use several predictor variables and functional relationships to estimate burned area that can be easily adapted with more complex process-oriented vegetation-fire models. We created an ensemble of SOFIA models to test the importance of several predictor variables. SOFIA models result in the highest performance in predicting burned area if they account for a direct restriction of fire activity under wet conditions and if they include a land cover-dependent restriction or allowance of fire activity by vegetation density and biomass. The use of vegetation optical depth data from microwave satellite observations, a proxy for vegetation biomass and water content, reaches higher model performance than commonly used vegetation variables from optical sensors. We further analyse spatial patterns of the sensitivity between anthropogenic, climate, and vegetation predictor variables and burned area. We finally discuss how multiple observational datasets on climate, hydrological, vegetation, and socioeconomic variables together with

  12. A data-driven approach to identify controls on global fire activity from satellite and climate observations (SOFIA V1)

    Science.gov (United States)

    Forkel, Matthias; Dorigo, Wouter; Lasslop, Gitta; Teubner, Irene; Chuvieco, Emilio; Thonicke, Kirsten

    2017-12-01

    Vegetation fires affect human infrastructures, ecosystems, global vegetation distribution, and atmospheric composition. However, the climatic, environmental, and socioeconomic factors that control global fire activity in vegetation are only poorly understood, and in various complexities and formulations are represented in global process-oriented vegetation-fire models. Data-driven model approaches such as machine learning algorithms have successfully been used to identify and better understand controlling factors for fire activity. However, such machine learning models cannot be easily adapted or even implemented within process-oriented global vegetation-fire models. To overcome this gap between machine learning-based approaches and process-oriented global fire models, we introduce a new flexible data-driven fire modelling approach here (Satellite Observations to predict FIre Activity, SOFIA approach version 1). SOFIA models can use several predictor variables and functional relationships to estimate burned area that can be easily adapted with more complex process-oriented vegetation-fire models. We created an ensemble of SOFIA models to test the importance of several predictor variables. SOFIA models result in the highest performance in predicting burned area if they account for a direct restriction of fire activity under wet conditions and if they include a land cover-dependent restriction or allowance of fire activity by vegetation density and biomass. The use of vegetation optical depth data from microwave satellite observations, a proxy for vegetation biomass and water content, reaches higher model performance than commonly used vegetation variables from optical sensors. We further analyse spatial patterns of the sensitivity between anthropogenic, climate, and vegetation predictor variables and burned area. We finally discuss how multiple observational datasets on climate, hydrological, vegetation, and socioeconomic variables together with data-driven

  13. Data-driven approach for assessing utility of medical tests using electronic medical records.

    Science.gov (United States)

    Skrøvseth, Stein Olav; Augestad, Knut Magne; Ebadollahi, Shahram

    2015-02-01

    To precisely define the utility of tests in a clinical pathway through data-driven analysis of the electronic medical record (EMR). The information content was defined in terms of the entropy of the expected value of the test related to a given outcome. A kernel density classifier was used to estimate the necessary distributions. To validate the method, we used data from the EMR of the gastrointestinal department at a university hospital. Blood tests from patients undergoing surgery for gastrointestinal surgery were analyzed with respect to second surgery within 30 days of the index surgery. The information content is clearly reflected in the patient pathway for certain combinations of tests and outcomes. C-reactive protein tests coupled to anastomosis leakage, a severe complication show a clear pattern of information gain through the patient trajectory, where the greatest gain from the test is 3-4 days post index surgery. We have defined the information content in a data-driven and information theoretic way such that the utility of a test can be precisely defined. The results reflect clinical knowledge. In the case we used the tests carry little negative impact. The general approach can be expanded to cases that carry a substantial negative impact, such as in certain radiological techniques. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  14. Data-driven HR how to use analytics and metrics to drive performance

    CERN Document Server

    Marr, Bernard

    2018-01-01

    Traditionally seen as a purely people function unconcerned with numbers, HR is now uniquely placed to use company data to drive performance, both of the people in the organization and the organization as a whole. Data-driven HR is a practical guide which enables HR practitioners to leverage the value of the vast amount of data available at their fingertips. Covering how to identify the most useful sources of data, how to collect information in a transparent way that is in line with data protection requirements and how to turn this data into tangible insights, this book marks a turning point for the HR profession. Covering all the key elements of HR including recruitment, employee engagement, performance management, wellbeing and training, Data-driven HR examines the ways data can contribute to organizational success by, among other things, optimizing processes, driving performance and improving HR decision making. Packed with case studies and real-life examples, this is essential reading for all HR profession...

  15. Combining engineering and data-driven approaches: Development of a generic fire risk model facilitating calibration

    DEFF Research Database (Denmark)

    De Sanctis, G.; Fischer, K.; Kohler, J.

    2014-01-01

    Fire risk models support decision making for engineering problems under the consistent consideration of the associated uncertainties. Empirical approaches can be used for cost-benefit studies when enough data about the decision problem are available. But often the empirical approaches...... a generic risk model that is calibrated to observed fire loss data. Generic risk models assess the risk of buildings based on specific risk indicators and support risk assessment at a portfolio level. After an introduction to the principles of generic risk assessment, the focus of the present paper...... are not detailed enough. Engineering risk models, on the other hand, may be detailed but typically involve assumptions that may result in a biased risk assessment and make a cost-benefit study problematic. In two related papers it is shown how engineering and data-driven modeling can be combined by developing...

  16. DOE High Performance Computing Operational Review (HPCOR): Enabling Data-Driven Scientific Discovery at HPC Facilities

    Energy Technology Data Exchange (ETDEWEB)

    Gerber, Richard; Allcock, William; Beggio, Chris; Campbell, Stuart; Cherry, Andrew; Cholia, Shreyas; Dart, Eli; England, Clay; Fahey, Tim; Foertter, Fernanda; Goldstone, Robin; Hick, Jason; Karelitz, David; Kelly, Kaki; Monroe, Laura; Prabhat,; Skinner, David; White, Julia

    2014-10-17

    U.S. Department of Energy (DOE) High Performance Computing (HPC) facilities are on the verge of a paradigm shift in the way they deliver systems and services to science and engineering teams. Research projects are producing a wide variety of data at unprecedented scale and level of complexity, with community-specific services that are part of the data collection and analysis workflow. On June 18-19, 2014 representatives from six DOE HPC centers met in Oakland, CA at the DOE High Performance Operational Review (HPCOR) to discuss how they can best provide facilities and services to enable large-scale data-driven scientific discovery at the DOE national laboratories. The report contains findings from that review.

  17. A data-driven fault-tolerant control design of linear multivariable systems with performance optimization.

    Science.gov (United States)

    Li, Zhe; Yang, Guang-Hong

    2017-09-01

    In this paper, an integrated data-driven fault-tolerant control (FTC) design scheme is proposed under the configuration of the Youla parameterization for multiple-input multiple-output (MIMO) systems. With unknown system model parameters, the canonical form identification technique is first applied to design the residual observer in fault-free case. In faulty case, with online tuning of the Youla parameters based on the system data via the gradient-based algorithm, the fault influence is attenuated with system performance optimization. In addition, to improve the robustness of the residual generator to a class of system deviations, a novel adaptive scheme is proposed for the residual generator to prevent its over-activation. Simulation results of a two-tank flow system demonstrate the optimized performance and effect of the proposed FTC scheme. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  18. A Data-Driven Frequency-Domain Approach for Robust Controller Design via Convex Optimization

    CERN Document Server

    AUTHOR|(CDS)2092751; Martino, Michele

    The objective of this dissertation is to develop data-driven frequency-domain methods for designing robust controllers through the use of convex optimization algorithms. Many of today's industrial processes are becoming more complex, and modeling accurate physical models for these plants using first principles may be impossible. Albeit a model may be available; however, such a model may be too complex to consider for an appropriate controller design. With the increased developments in the computing world, large amounts of measured data can be easily collected and stored for processing purposes. Data can also be collected and used in an on-line fashion. Thus it would be very sensible to make full use of this data for controller design, performance evaluation, and stability analysis. The design methods imposed in this work ensure that the dynamics of a system are captured in an experiment and avoids the problem of unmodeled dynamics associated with parametric models. The devised methods consider robust designs...

  19. A Data-Driven Reliability Estimation Approach for Phased-Mission Systems

    Directory of Open Access Journals (Sweden)

    Hua-Feng He

    2014-01-01

    Full Text Available We attempt to address the issues associated with reliability estimation for phased-mission systems (PMS and present a novel data-driven approach to achieve reliability estimation for PMS using the condition monitoring information and degradation data of such system under dynamic operating scenario. In this sense, this paper differs from the existing methods only considering the static scenario without using the real-time information, which aims to estimate the reliability for a population but not for an individual. In the presented approach, to establish a linkage between the historical data and real-time information of the individual PMS, we adopt a stochastic filtering model to model the phase duration and obtain the updated estimation of the mission time by Bayesian law at each phase. At the meanwhile, the lifetime of PMS is estimated from degradation data, which are modeled by an adaptive Brownian motion. As such, the mission reliability can be real time obtained through the estimated distribution of the mission time in conjunction with the estimated lifetime distribution. We demonstrate the usefulness of the developed approach via a numerical example.

  20. Estimating the Probability of Wind Ramping Events: A Data-driven Approach

    OpenAIRE

    Wang, Cheng; Wei, Wei; Wang, Jianhui; Qiu, Feng

    2016-01-01

    This letter proposes a data-driven method for estimating the probability of wind ramping events without exploiting the exact probability distribution function (PDF) of wind power. Actual wind data validates the proposed method.

  1. A data-driven approach for evaluating multi-modal therapy in traumatic brain injury.

    Science.gov (United States)

    Haefeli, Jenny; Ferguson, Adam R; Bingham, Deborah; Orr, Adrienne; Won, Seok Joon; Lam, Tina I; Shi, Jian; Hawley, Sarah; Liu, Jialing; Swanson, Raymond A; Massa, Stephen M

    2017-02-16

    Combination therapies targeting multiple recovery mechanisms have the potential for additive or synergistic effects, but experimental design and analyses of multimodal therapeutic trials are challenging. To address this problem, we developed a data-driven approach to integrate and analyze raw source data from separate pre-clinical studies and evaluated interactions between four treatments following traumatic brain injury. Histologic and behavioral outcomes were measured in 202 rats treated with combinations of an anti-inflammatory agent (minocycline), a neurotrophic agent (LM11A-31), and physical therapy consisting of assisted exercise with or without botulinum toxin-induced limb constraint. Data was curated and analyzed in a linked workflow involving non-linear principal component analysis followed by hypothesis testing with a linear mixed model. Results revealed significant benefits of the neurotrophic agent LM11A-31 on learning and memory outcomes after traumatic brain injury. In addition, modulations of LM11A-31 effects by co-administration of minocycline and by the type of physical therapy applied reached statistical significance. These results suggest a combinatorial effect of drug and physical therapy interventions that was not evident by univariate analysis. The study designs and analytic techniques applied here form a structured, unbiased, internally validated workflow that may be applied to other combinatorial studies, both in animals and humans.

  2. A data-driven predictive approach for drug delivery using machine learning techniques.

    Directory of Open Access Journals (Sweden)

    Yuanyuan Li

    Full Text Available In drug delivery, there is often a trade-off between effective killing of the pathogen, and harmful side effects associated with the treatment. Due to the difficulty in testing every dosing scenario experimentally, a computational approach will be helpful to assist with the prediction of effective drug delivery methods. In this paper, we have developed a data-driven predictive system, using machine learning techniques, to determine, in silico, the effectiveness of drug dosing. The system framework is scalable, autonomous, robust, and has the ability to predict the effectiveness of the current drug treatment and the subsequent drug-pathogen dynamics. The system consists of a dynamic model incorporating both the drug concentration and pathogen population into distinct states. These states are then analyzed using a temporal model to describe the drug-cell interactions over time. The dynamic drug-cell interactions are learned in an adaptive fashion and used to make sequential predictions on the effectiveness of the dosing strategy. Incorporated into the system is the ability to adjust the sensitivity and specificity of the learned models based on a threshold level determined by the operator for the specific application. As a proof-of-concept, the system was validated experimentally using the pathogen Giardia lamblia and the drug metronidazole in vitro.

  3. Analyzing the Discourse of Chais Conferences for the Study of Innovation and Learning Technologies via a Data-Driven Approach

    Science.gov (United States)

    Silber-Varod, Vered; Eshet-Alkalai, Yoram; Geri, Nitza

    2016-01-01

    The current rapid technological changes confront researchers of learning technologies with the challenge of evaluating them, predicting trends, and improving their adoption and diffusion. This study utilizes a data-driven discourse analysis approach, namely culturomics, to investigate changes over time in the research of learning technologies. The…

  4. A data-driven approach for denoising GNSS position time series

    Science.gov (United States)

    Li, Yanyan; Xu, Caijun; Yi, Lei; Fang, Rongxin

    2017-12-01

    Global navigation satellite system (GNSS) datasets suffer from common mode error (CME) and other unmodeled errors. To decrease the noise level in GNSS positioning, we propose a new data-driven adaptive multiscale denoising method in this paper. Both synthetic and real-world long-term GNSS datasets were employed to assess the performance of the proposed method, and its results were compared with those of stacking filtering, principal component analysis (PCA) and the recently developed multiscale multiway PCA. It is found that the proposed method can significantly eliminate the high-frequency white noise and remove the low-frequency CME. Furthermore, the proposed method is more precise for denoising GNSS signals than the other denoising methods. For example, in the real-world example, our method reduces the mean standard deviation of the north, east and vertical components from 1.54 to 0.26, 1.64 to 0.21 and 4.80 to 0.72 mm, respectively. Noise analysis indicates that for the original signals, a combination of power-law plus white noise model can be identified as the best noise model. For the filtered time series using our method, the generalized Gauss-Markov model is the best noise model with the spectral indices close to - 3, indicating that flicker walk noise can be identified. Moreover, the common mode error in the unfiltered time series is significantly reduced by the proposed method. After filtering with our method, a combination of power-law plus white noise model is the best noise model for the CMEs in the study region.

  5. Geoscience Meets Social Science: A Flexible Data Driven Approach for Developing High Resolution Population Datasets at Global Scale

    Science.gov (United States)

    Rose, A.; McKee, J.; Weber, E.; Bhaduri, B. L.

    2017-12-01

    Leveraging decades of expertise in population modeling, and in response to growing demand for higher resolution population data, Oak Ridge National Laboratory is now generating LandScan HD at global scale. LandScan HD is conceived as a 90m resolution population distribution where modeling is tailored to the unique geography and data conditions of individual countries or regions by combining social, cultural, physiographic, and other information with novel geocomputation methods. Similarities among these areas are exploited in order to leverage existing training data and machine learning algorithms to rapidly scale development. Drawing on ORNL's unique set of capabilities, LandScan HD adapts highly mature population modeling methods developed for LandScan Global and LandScan USA, settlement mapping research and production in high-performance computing (HPC) environments, land use and neighborhood mapping through image segmentation, and facility-specific population density models. Adopting a flexible methodology to accommodate different geographic areas, LandScan HD accounts for the availability, completeness, and level of detail of relevant ancillary data. Beyond core population and mapped settlement inputs, these factors determine the model complexity for an area, requiring that for any given area, a data-driven model could support either a simple top-down approach, a more detailed bottom-up approach, or a hybrid approach.

  6. Enhanced dynamic data-driven fault detection approach: Application to a two-tank heater system

    KAUST Repository

    Harrou, Fouzi; Madakyaru, Muddu; Sun, Ying; Kammammettu, Sanjula

    2018-01-01

    on PCA approach a challenging task. Accounting for the dynamic nature of data can also reflect the performance of the designed fault detection approaches. In PCA-based methods, this dynamic characteristic of the data can be accounted for by using dynamic

  7. A Data-Driven Control Design Approach for Freeway Traffic Ramp Metering with Virtual Reference Feedback Tuning

    Directory of Open Access Journals (Sweden)

    Shangtai Jin

    2014-01-01

    Full Text Available ALINEA is a simple, efficient, and easily implemented ramp metering strategy. Virtual reference feedback tuning (VRFT is most suitable for many practical systems since it is a “one-shot” data-driven control design methodology. This paper presents an application of VRFT to a ramp metering problem of freeway traffic system. When there is not enough prior knowledge of the controlled system to select a proper parameter of ALINEA, the VRFT approach is used to optimize the ALINEA's parameter by only using a batch of input and output data collected from the freeway traffic system. The extensive simulations are built on both the macroscopic MATLAB platform and the microscopic PARAMICS platform to show the effectiveness and applicability of the proposed data-driven controller tuning approach.

  8. A data-driven approach to reverse engineering customer engagement models: towards functional constructs.

    Directory of Open Access Journals (Sweden)

    Natalie Jane de Vries

    Full Text Available Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.

  9. A data-driven approach to reverse engineering customer engagement models: towards functional constructs.

    Science.gov (United States)

    de Vries, Natalie Jane; Carlson, Jamie; Moscato, Pablo

    2014-01-01

    Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.

  10. Data-driven technology for engineering systems health management design approach, feature construction, fault diagnosis, prognosis, fusion and decisions

    CERN Document Server

    Niu, Gang

    2017-01-01

    This book introduces condition-based maintenance (CBM)/data-driven prognostics and health management (PHM) in detail, first explaining the PHM design approach from a systems engineering perspective, then summarizing and elaborating on the data-driven methodology for feature construction, as well as feature-based fault diagnosis and prognosis. The book includes a wealth of illustrations and tables to help explain the algorithms, as well as practical examples showing how to use this tool to solve situations for which analytic solutions are poorly suited. It equips readers to apply the concepts discussed in order to analyze and solve a variety of problems in PHM system design, feature construction, fault diagnosis and prognosis.

  11. Enhanced dynamic data-driven fault detection approach: Application to a two-tank heater system

    KAUST Repository

    Harrou, Fouzi

    2018-02-12

    Principal components analysis (PCA) has been intensively studied and used in monitoring industrial systems. However, data generated from chemical processes are usually correlated in time due to process dynamics, which makes the fault detection based on PCA approach a challenging task. Accounting for the dynamic nature of data can also reflect the performance of the designed fault detection approaches. In PCA-based methods, this dynamic characteristic of the data can be accounted for by using dynamic PCA (DPCA), in which lagged variables are used in the PCA model to capture the time evolution of the process. This paper presents a new approach that combines the DPCA to account for autocorrelation in data and generalized likelihood ratio (GLR) test to detect faults. A DPCA model is applied to perform dimension reduction while appropriately considering the temporal relationships in the data. Specifically, the proposed approach uses the DPCA to generate residuals, and then apply GLR test to reveal any abnormality. The performances of the proposed method are evaluated through a continuous stirred tank heater system.

  12. A Dynamic Remote Sensing Data-Driven Approach for Oil Spill Simulation in the Sea

    Directory of Open Access Journals (Sweden)

    Jining Yan

    2015-05-01

    Full Text Available In view of the fact that oil spill remote sensing could only generate the oil slick information at a specific time and that traditional oil spill simulation models were not designed to deal with dynamic conditions, a dynamic data-driven application system (DDDAS was introduced. The DDDAS entails both the ability to incorporate additional data into an executing application and, in reverse, the ability of applications to dynamically steer the measurement process. Based on the DDDAS, combing a remote sensor system that detects oil spills with a numerical simulation, an integrated data processing, analysis, forecasting and emergency response system was established. Once an oil spill accident occurs, the DDDAS-based oil spill model receives information about the oil slick extracted from the dynamic remote sensor data in the simulation. Through comparison, information fusion and feedback updates, continuous and more precise oil spill simulation results can be obtained. Then, the simulation results can provide help for disaster control and clean-up. The Penglai, Xingang and Suizhong oil spill results showed our simulation model could increase the prediction accuracy and reduce the error caused by empirical parameters in existing simulation systems. Therefore, the DDDAS-based detection and simulation system can effectively improve oil spill simulation and diffusion forecasting, as well as provide decision-making information and technical support for emergency responses to oil spills.

  13. Alaska/Yukon Geoid Improvement by a Data-Driven Stokes's Kernel Modification Approach

    Science.gov (United States)

    Li, Xiaopeng; Roman, Daniel R.

    2015-04-01

    Geoid modeling over Alaska of USA and Yukon Canada being a trans-national issue faces a great challenge primarily due to the inhomogeneous surface gravity data (Saleh et al, 2013) and the dynamic geology (Freymueller et al, 2008) as well as its complex geological rheology. Previous study (Roman and Li 2014) used updated satellite models (Bruinsma et al 2013) and newly acquired aerogravity data from the GRAV-D project (Smith 2007) to capture the gravity field changes in the targeting areas primarily in the middle-to-long wavelength. In CONUS, the geoid model was largely improved. However, the precision of the resulted geoid model in Alaska was still in the decimeter level, 19cm at the 32 tide bench marks and 24cm on the 202 GPS/Leveling bench marks that gives a total of 23.8cm at all of these calibrated surface control points, where the datum bias was removed. Conventional kernel modification methods in this area (Li and Wang 2011) had limited effects on improving the precision of the geoid models. To compensate the geoid miss fits, a new Stokes's kernel modification method based on a data-driven technique is presented in this study. First, the method was tested on simulated data sets (Fig. 1), where the geoid errors have been reduced by 2 orders of magnitude (Fig 2). For the real data sets, some iteration steps are required to overcome the rank deficiency problem caused by the limited control data that are irregularly distributed in the target area. For instance, after 3 iterations, the standard deviation dropped about 2.7cm (Fig 3). Modification at other critical degrees can further minimize the geoid model miss fits caused either by the gravity error or the remaining datum error in the control points.

  14. Least squares approach for initial data recovery in dynamic data-driven applications simulations

    KAUST Repository

    Douglas, C.

    2010-12-01

    In this paper, we consider the initial data recovery and the solution update based on the local measured data that are acquired during simulations. Each time new data is obtained, the initial condition, which is a representation of the solution at a previous time step, is updated. The update is performed using the least squares approach. The objective function is set up based on both a measurement error as well as a penalization term that depends on the prior knowledge about the solution at previous time steps (or initial data). Various numerical examples are considered, where the penalization term is varied during the simulations. Numerical examples demonstrate that the predictions are more accurate if the initial data are updated during the simulations. © Springer-Verlag 2011.

  15. Using data-driven approach for wind power prediction: A comparative study

    International Nuclear Information System (INIS)

    Taslimi Renani, Ehsan; Elias, Mohamad Fathi Mohamad; Rahim, Nasrudin Abd.

    2016-01-01

    Highlights: • Double exponential smoothing is the most accurate model in wind speed prediction. • A two-stage feature selection method is proposed to select most important inputs. • Direct prediction illustrates better accuracy than indirect prediction. • Adaptive neuro fuzzy inference system outperforms data mining algorithms. • Random forest performs the worst compared to other data mining algorithm. - Abstract: Although wind energy is intermittent and stochastic in nature, it is increasingly important in the power generation due to its sustainability and pollution-free. Increased utilization of wind energy sources calls for more robust and efficient prediction models to mitigate uncertainties associated with wind power. This research compares two different approaches in wind power forecasting which are indirect and direct prediction methods. In indirect method, several times series are applied to forecast the wind speed, whereas the logistic function with five parameters is then used to forecast the wind power. In this study, backtracking search algorithm with novel crossover and mutation operators is employed to find the best parameters of five-parameter logistic function. A new feature selection technique, combining the mutual information and neural network is proposed in this paper to extract the most informative features with a maximum relevancy and minimum redundancy. From the comparative study, the results demonstrate that, in the direct prediction approach where the historical weather data are used to predict the wind power generation directly, adaptive neuro fuzzy inference system outperforms five data mining algorithms namely, random forest, M5Rules, k-nearest neighbor, support vector machine and multilayer perceptron. Moreover, it is also found that the mean absolute percentage error of the direct prediction method using adaptive neuro fuzzy inference system is 1.47% which is approximately less than half of the error obtained with the

  16. Analyzing the Discourse of Chais Conferences for the Study of Innovation and Learning Technologies via a Data-Driven Approach

    Directory of Open Access Journals (Sweden)

    Vered Silber-Varod

    2016-12-01

    Full Text Available The current rapid technological changes confront researchers of learning technologies with the challenge of evaluating them, predicting trends, and improving their adoption and diffusion. This study utilizes a data-driven discourse analysis approach, namely culturomics, to investigate changes over time in the research of learning technologies. The patterns and changes were examined on a corpus of articles published over the past decade (2006-2014 in the proceedings of Chais Conference for the Study of Innovation and Learning Technologies – the leading research conference on learning technologies in Israel. The interesting findings of the exhaustive process of analyzing all the words in the corpus were that the most commonly used terms (e.g., pupil, teacher, student and the most commonly used phrases (e.g., face-to-face in the field of learning technologies reflect a pedagogical rather than a technological aspect of learning technologies. The study also demonstrates two cases of change over time in prominent themes, such as “Facebook” and “the National Information and Communication Technology (ICT program”. Methodologically, this research demonstrates the effectiveness of a data-driven approach for identifying discourse trends over time.

  17. Prognostic and health management for engineering systems: a review of the data-driven approach and algorithms

    Directory of Open Access Journals (Sweden)

    Thamo Sutharssan

    2015-07-01

    Full Text Available Prognostics and health management (PHM has become an important component of many engineering systems and products, where algorithms are used to detect anomalies, diagnose faults and predict remaining useful lifetime (RUL. PHM can provide many advantages to users and maintainers. Although primary goals are to ensure the safety, provide state of the health and estimate RUL of the components and systems, there are also financial benefits such as operational and maintenance cost reductions and extended lifetime. This study aims at reviewing the current status of algorithms and methods used to underpin different existing PHM approaches. The focus is on providing a structured and comprehensive classification of the existing state-of-the-art PHM approaches, data-driven approaches and algorithms.

  18. A data-driven wavelet-based approach for generating jumping loads

    Science.gov (United States)

    Chen, Jun; Li, Guo; Racic, Vitomir

    2018-06-01

    This paper suggests an approach to generate human jumping loads using wavelet transform and a database of individual jumping force records. A total of 970 individual jumping force records of various frequencies were first collected by three experiments from 147 test subjects. For each record, every jumping pulse was extracted and decomposed into seven levels by wavelet transform. All the decomposition coefficients were stored in an information database. Probability distributions of jumping cycle period, contact ratio and energy of the jumping pulse were statistically analyzed. Inspired by the theory of DNA recombination, an approach was developed by interchanging the wavelet coefficients between different jumping pulses. To generate a jumping force time history with N pulses, wavelet coefficients were first selected randomly from the database at each level. They were then used to reconstruct N pulses by the inverse wavelet transform. Jumping cycle periods and contract ratios were then generated randomly based on their probabilistic functions. These parameters were assigned to each of the N pulses which were in turn scaled by the amplitude factors βi to account for energy relationship between successive pulses. The final jumping force time history was obtained by linking all the N cycles end to end. This simulation approach can preserve the non-stationary features of the jumping load force in time-frequency domain. Application indicates that this approach can be used to generate jumping force time history due to single people jumping and also can be extended further to stochastic jumping loads due to groups and crowds.

  19. Monitoring a robot swarm using a data-driven fault detection approach

    KAUST Repository

    Khaldi, Belkacem; Harrou, Fouzi; Cherif, Foudil; Sun, Ying

    2017-01-01

    Using swarm robotics system, with one or more faulty robots, to accomplish specific tasks may lead to degradation in performances complying with the target requirements. In such circumstances, robot swarms require continuous monitoring to detect

  20. A Data-Driven Sparse-Learning Approach to Model Reduction in Chemical Reaction Networks

    OpenAIRE

    Harirchi, Farshad; Khalil, Omar A.; Liu, Sijia; Elvati, Paolo; Violi, Angela; Hero, Alfred O.

    2017-01-01

    In this paper, we propose an optimization-based sparse learning approach to identify the set of most influential reactions in a chemical reaction network. This reduced set of reactions is then employed to construct a reduced chemical reaction mechanism, which is relevant to chemical interaction network modeling. The problem of identifying influential reactions is first formulated as a mixed-integer quadratic program, and then a relaxation method is leveraged to reduce the computational comple...

  1. Monitoring a robot swarm using a data-driven fault detection approach

    KAUST Repository

    Khaldi, Belkacem

    2017-06-30

    Using swarm robotics system, with one or more faulty robots, to accomplish specific tasks may lead to degradation in performances complying with the target requirements. In such circumstances, robot swarms require continuous monitoring to detect abnormal events and to sustain normal operations. In this paper, an innovative exogenous fault detection method for monitoring robots swarm is presented. The method merges the flexibility of principal component analysis (PCA) models and the greater sensitivity of the exponentially-weighted moving average (EWMA) and cumulative sum (CUSUM) control charts to insidious changes. The method is tested and evaluated on a swarm of simulated foot-bot robots performing a circle formation task, via the viscoelastic control model. We illustrate through simulated data collected from the ARGoS simulator that a significant improvement in fault detection can be obtained by using the proposed method where compared to the conventional PCA-based methods (i.e., T2 and Q).

  2. A manifold learning approach to data-driven computational materials and processes

    Science.gov (United States)

    Ibañez, Ruben; Abisset-Chavanne, Emmanuelle; Aguado, Jose Vicente; Gonzalez, David; Cueto, Elias; Duval, Jean Louis; Chinesta, Francisco

    2017-10-01

    Standard simulation in classical mechanics is based on the use of two very different types of equations. The first one, of axiomatic character, is related to balance laws (momentum, mass, energy, …), whereas the second one consists of models that scientists have extracted from collected, natural or synthetic data. In this work we propose a new method, able to directly link data to computers in order to perform numerical simulations. These simulations will employ universal laws while minimizing the need of explicit, often phenomenological, models. They are based on manifold learning methodologies.

  3. A Data-Driven Approach to Responder Subgroup Identification after Paired Continuous Theta Burst Stimulation

    Directory of Open Access Journals (Sweden)

    Tonio Heidegger

    2017-08-01

    Full Text Available Background: Modulation of cortical excitability by transcranial magnetic stimulation (TMS is used for investigating human brain functions. A common observation is the high variability of long-term depression (LTD-like changes in human (motor cortex excitability. This study aimed at analyzing the response subgroup distribution after paired continuous theta burst stimulation (cTBS as a basis for subject selection.Methods: The effects of paired cTBS using 80% active motor threshold (AMT in 31 healthy volunteers were assessed at the primary motor cortex (M1 corresponding to the representation of the first dorsal interosseous (FDI muscle of the left hand, before and up to 50 min after plasticity induction. The changes in motor evoked potentials (MEPs were analyzed using machine-learning derived methods implemented as Gaussian mixture modeling (GMM and computed ABC analysis.Results: The probability density distribution of the MEP changes from baseline was tri-modal, showing a clear separation at 80.9%. Subjects displaying at least this degree of LTD-like changes were n = 6 responders. By contrast, n = 7 subjects displayed a paradox response with increase in MEP. Reassessment using ABC analysis as alternative approach led to the same n = 6 subjects as a distinct category.Conclusion: Depressive effects of paired cTBS using 80% AMT endure at least 50 min, however, only in a small subgroup of healthy subjects. Hence, plasticity induction by paired cTBS might not reflect a general mechanism in human motor cortex excitability. A mathematically supported criterion is proposed to select responders for enrolment in assessments of human brain functional networks using virtual brain lesions.

  4. Data-Driven Approach for Analyzing Hydrogeology and Groundwater Quality Across Multiple Scales.

    Science.gov (United States)

    Curtis, Zachary K; Li, Shu-Guang; Liao, Hua-Sheng; Lusch, David

    2017-08-29

    Recent trends of assimilating water well records into statewide databases provide a new opportunity for evaluating spatial dynamics of groundwater quality and quantity. However, these datasets are scarcely rigorously analyzed to address larger scientific problems because they are of lower quality and massive. We develop an approach for utilizing well databases to analyze physical and geochemical aspects of groundwater systems, and apply it to a multiscale investigation of the sources and dynamics of chloride (Cl - ) in the near-surface groundwater of the Lower Peninsula of Michigan. Nearly 500,000 static water levels (SWLs) were critically evaluated, extracted, and analyzed to delineate long-term, average groundwater flow patterns using a nonstationary kriging technique at the basin-scale (i.e., across the entire peninsula). Two regions identified as major basin-scale discharge zones-the Michigan and Saginaw Lowlands-were further analyzed with regional- and local-scale SWL models. Groundwater valleys ("discharge" zones) and mounds ("recharge" zones) were identified for all models, and the proportions of wells with elevated Cl - concentrations in each zone were calculated, visualized, and compared. Concentrations in discharge zones, where groundwater is expected to flow primarily upwards, are consistently and significantly higher than those in recharge zones. A synoptic sampling campaign in the Michigan Lowlands revealed concentrations generally increase with depth, a trend noted in previous studies of the Saginaw Lowlands. These strong, consistent SWL and Cl - distribution patterns across multiple scales suggest that a deep source (i.e., Michigan brines) is the primary cause for the elevated chloride concentrations observed in discharge areas across the peninsula. © 2017, National Ground Water Association.

  5. Data-driven battery product development: Turn battery performance into a competitive advantage.

    Energy Technology Data Exchange (ETDEWEB)

    Sholklapper, Tal [Voltaiq, Inc.

    2016-04-19

    Poor battery performance is a primary source of user dissatisfaction across a broad range of applications, and is a key bottleneck hindering the growth of mobile technology, wearables, electric vehicles, and grid energy storage. Engineering battery systems is difficult, requiring extensive testing for vendor selection, BMS programming, and application-specific lifetime testing. This work also generates huge quantities of data. This presentation will explain how to leverage this data to help ship quality products faster using fewer resources while ensuring safety and reliability in the field, ultimately turning battery performance into a competitive advantage.

  6. Modeling water quality in an urban river using hydrological factors--data driven approaches.

    Science.gov (United States)

    Chang, Fi-John; Tsai, Yu-Hsuan; Chen, Pin-An; Coynel, Alexandra; Vachaud, Georges

    2015-03-15

    Contrasting seasonal variations occur in river flow and water quality as a result of short duration, severe intensity storms and typhoons in Taiwan. Sudden changes in river flow caused by impending extreme events may impose serious degradation on river water quality and fateful impacts on ecosystems. Water quality is measured in a monthly/quarterly scale, and therefore an estimation of water quality in a daily scale would be of good help for timely river pollution management. This study proposes a systematic analysis scheme (SAS) to assess the spatio-temporal interrelation of water quality in an urban river and construct water quality estimation models using two static and one dynamic artificial neural networks (ANNs) coupled with the Gamma test (GT) based on water quality, hydrological and economic data. The Dahan River basin in Taiwan is the study area. Ammonia nitrogen (NH3-N) is considered as the representative parameter, a correlative indicator in judging the contamination level over the study. Key factors the most closely related to the representative parameter (NH3-N) are extracted by the Gamma test for modeling NH3-N concentration, and as a result, four hydrological factors (discharge, days w/o discharge, water temperature and rainfall) are identified as model inputs. The modeling results demonstrate that the nonlinear autoregressive with exogenous input (NARX) network furnished with recurrent connections can accurately estimate NH3-N concentration with a very high coefficient of efficiency value (0.926) and a low RMSE value (0.386 mg/l). Besides, the NARX network can suitably catch peak values that mainly occur in dry periods (September-April in the study area), which is particularly important to water pollution treatment. The proposed SAS suggests a promising approach to reliably modeling the spatio-temporal NH3-N concentration based solely on hydrological data, without using water quality sampling data. It is worth noticing that such estimation can be

  7. A Data-driven Approach for Forecasting Next-day River Discharge

    Science.gov (United States)

    Sharif, H. O.; Billah, K. S.

    2017-12-01

    This study focuses on evaluating the performance of the Soil and Water Assessment Tool (SWAT) eco-hydrological model, a simple Auto-Regressive with eXogenous input (ARX) model, and a Gene expression programming (GEP)-based model in one-day-ahead forecasting of discharge of a subtropical basin (the upper Kentucky River Basin). The three models were calibrated with daily flow at the US Geological Survey (USGS) stream gauging station not affected by flow regulation for the period of 2002-2005. The calibrated models were then validated at the same gauging station as well as another USGS gauge 88 km downstream for the period of 2008-2010. The results suggest that simple models outperform a sophisticated hydrological model with GEP having the advantage of being able to generate functional relationships that allow scientific investigation of the complex nonlinear interrelationships among input variables. Unlike SWAT, GEP, and to some extent, ARX are less sensitive to the length of the calibration time series and do not require a spin-up period.

  8. High-Performance Computing in Neuroscience for Data-Driven Discovery, Integration, and Dissemination

    International Nuclear Information System (INIS)

    Bouchard, Kristofer E.

    2016-01-01

    A lack of coherent plans to analyze, manage, and understand data threatens the various opportunities offered by new neuro-technologies. High-performance computing will allow exploratory analysis of massive datasets stored in standardized formats, hosted in open repositories, and integrated with simulations.

  9. Migraine Subclassification via a Data-Driven Automated Approach Using Multimodality Factor Mixture Modeling of Brain Structure Measurements.

    Science.gov (United States)

    Schwedt, Todd J; Si, Bing; Li, Jing; Wu, Teresa; Chong, Catherine D

    2017-07-01

    The current subclassification of migraine is according to headache frequency and aura status. The variability in migraine symptoms, disease course, and response to treatment suggest the presence of additional heterogeneity or subclasses within migraine. The study objective was to subclassify migraine via a data-driven approach, identifying latent factors by jointly exploiting multiple sets of brain structural features obtained via magnetic resonance imaging (MRI). Migraineurs (n = 66) and healthy controls (n = 54) had brain MRI measurements of cortical thickness, cortical surface area, and volumes for 68 regions. A multimodality factor mixture model was used to subclassify MRIs and to determine the brain structural factors that most contributed to the subclassification. Clinical characteristics of subjects in each subgroup were compared. Automated MRI classification divided the subjects into two subgroups. Migraineurs in subgroup #1 had more severe allodynia symptoms during migraines (6.1 ± 5.3 vs. 3.6 ± 3.2, P = .03), more years with migraine (19.2 ± 11.3 years vs 13 ± 8.3 years, P = .01), and higher Migraine Disability Assessment (MIDAS) scores (25 ± 22.9 vs 15.7 ± 12.2, P = .04). There were not differences in headache frequency or migraine aura status between the two subgroups. Data-driven subclassification of brain MRIs based upon structural measurements identified two subgroups. Amongst migraineurs, the subgroups differed in allodynia symptom severity, years with migraine, and migraine-related disability. Since allodynia is associated with this imaging-based subclassification of migraine and prior publications suggest that allodynia impacts migraine treatment response and disease prognosis, future migraine diagnostic criteria could consider allodynia when defining migraine subgroups. © 2017 American Headache Society.

  10. Flood probability quantification for road infrastructure: Data-driven spatial-statistical approach and case study applications.

    Science.gov (United States)

    Kalantari, Zahra; Cavalli, Marco; Cantone, Carolina; Crema, Stefano; Destouni, Georgia

    2017-03-01

    Climate-driven increase in the frequency of extreme hydrological events is expected to impose greater strain on the built environment and major transport infrastructure, such as roads and railways. This study develops a data-driven spatial-statistical approach to quantifying and mapping the probability of flooding at critical road-stream intersection locations, where water flow and sediment transport may accumulate and cause serious road damage. The approach is based on novel integration of key watershed and road characteristics, including also measures of sediment connectivity. The approach is concretely applied to and quantified for two specific study case examples in southwest Sweden, with documented road flooding effects of recorded extreme rainfall. The novel contributions of this study in combining a sediment connectivity account with that of soil type, land use, spatial precipitation-runoff variability and road drainage in catchments, and in extending the connectivity measure use for different types of catchments, improve the accuracy of model results for road flood probability. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. A data-driven modeling approach to identify disease-specific multi-organ networks driving physiological dysregulation.

    Directory of Open Access Journals (Sweden)

    Warren D Anderson

    2017-07-01

    Full Text Available Multiple physiological systems interact throughout the development of a complex disease. Knowledge of the dynamics and connectivity of interactions across physiological systems could facilitate the prevention or mitigation of organ damage underlying complex diseases, many of which are currently refractory to available therapeutics (e.g., hypertension. We studied the regulatory interactions operating within and across organs throughout disease development by integrating in vivo analysis of gene expression dynamics with a reverse engineering approach to infer data-driven dynamic network models of multi-organ gene regulatory influences. We obtained experimental data on the expression of 22 genes across five organs, over a time span that encompassed the development of autonomic nervous system dysfunction and hypertension. We pursued a unique approach for identification of continuous-time models that jointly described the dynamics and structure of multi-organ networks by estimating a sparse subset of ∼12,000 possible gene regulatory interactions. Our analyses revealed that an autonomic dysfunction-specific multi-organ sequence of gene expression activation patterns was associated with a distinct gene regulatory network. We analyzed the model structures for adaptation motifs, and identified disease-specific network motifs involving genes that exhibited aberrant temporal dynamics. Bioinformatic analyses identified disease-specific single nucleotide variants within or near transcription factor binding sites upstream of key genes implicated in maintaining physiological homeostasis. Our approach illustrates a novel framework for investigating the pathogenesis through model-based analysis of multi-organ system dynamics and network properties. Our results yielded novel candidate molecular targets driving the development of cardiovascular disease, metabolic syndrome, and immune dysfunction.

  12. Data-Driven Approaches for Computation in Intelligent Biomedical Devices: A Case Study of EEG Monitoring for Chronic Seizure Detection

    Directory of Open Access Journals (Sweden)

    Naveen Verma

    2011-04-01

    Full Text Available Intelligent biomedical devices implies systems that are able to detect specific physiological processes in patients so that particular responses can be generated. This closed-loop capability can have enormous clinical value when we consider the unprecedented modalities that are beginning to emerge for sensing and stimulating patient physiology. Both delivering therapy (e.g., deep-brain stimulation, vagus nerve stimulation, etc. and treating impairments (e.g., neural prosthesis requires computational devices that can make clinically relevant inferences, especially using minimally-intrusive patient signals. The key to such devices is algorithms that are based on data-driven signal modeling as well as hardware structures that are specialized to these. This paper discusses the primary application-domain challenges that must be overcome and analyzes the most promising methods for this that are emerging. We then look at how these methods are being incorporated in ultra-low-energy computational platforms and systems. The case study for this is a seizure-detection SoC that includes instrumentation and computation blocks in support of a system that exploits patient-specific modeling to achieve accurate performance for chronic detection. The SoC samples each EEG channel at a rate of 600 Hz and performs processing to derive signal features on every two second epoch, consuming 9 μJ/epoch/channel. Signal feature extraction reduces the data rate by a factor of over 40×, permitting wireless communication from the patient’s head while reducing the total power on the head by 14×.

  13. Data-driven batch schuduling

    Energy Technology Data Exchange (ETDEWEB)

    Bent, John [Los Alamos National Laboratory; Denehy, Tim [GOOGLE; Arpaci - Dusseau, Remzi [UNIV OF WISCONSIN; Livny, Miron [UNIV OF WISCONSIN; Arpaci - Dusseau, Andrea C [NON LANL

    2009-01-01

    In this paper, we develop data-driven strategies for batch computing schedulers. Current CPU-centric batch schedulers ignore the data needs within workloads and execute them by linking them transparently and directly to their needed data. When scheduled on remote computational resources, this elegant solution of direct data access can incur an order of magnitude performance penalty for data-intensive workloads. Adding data-awareness to batch schedulers allows a careful coordination of data and CPU allocation thereby reducing the cost of remote execution. We offer here new techniques by which batch schedulers can become data-driven. Such systems can use our analytical predictive models to select one of the four data-driven scheduling policies that we have created. Through simulation, we demonstrate the accuracy of our predictive models and show how they can reduce time to completion for some workloads by as much as 80%.

  14. Data-driven storytelling

    CERN Document Server

    Hurter, Christophe; Diakopoulos, Nicholas ed.; Carpendale, Sheelagh

    2018-01-01

    This book is an accessible introduction to data-driven storytelling, resulting from discussions between data visualization researchers and data journalists. This book will be the first to define the topic, present compelling examples and existing resources, as well as identify challenges and new opportunities for research.

  15. Performance of a data-driven technique to changes in wave height and its effect on beach response

    Directory of Open Access Journals (Sweden)

    Jose M. Horrillo-Caraballo

    2016-01-01

    Full Text Available In this study the medium-term response of beach profiles was investigated at two sites: a gently sloping sandy beach and a steeper mixed sand and gravel beach. The former is the Duck site in North Carolina, on the east coast of the USA, which is exposed to Atlantic Ocean swells and storm waves, and the latter is the Milford-on-Sea site at Christchurch Bay, on the south coast of England, which is partially sheltered from Atlantic swells but has a directionally bimodal wave exposure. The data sets comprise detailed bathymetric surveys of beach profiles covering a period of more than 25 years for the Duck site and over 18 years for the Milford-on-Sea site. The structure of the data sets and the data-driven methods are described. Canonical correlation analysis (CCA was used to find linkages between the wave characteristics and beach profiles. The sensitivity of the linkages was investigated by deploying a wave height threshold to filter out the smaller waves incrementally. The results of the analysis indicate that, for the gently sloping sandy beach, waves of all heights are important to the morphological response. For the mixed sand and gravel beach, filtering the smaller waves improves the statistical fit and it suggests that low-height waves do not play a primary role in the medium-term morphological response, which is primarily driven by the intermittent larger storm waves.

  16. Performance of a data-driven technique applied to changes in wave height and its effect on beach response

    Directory of Open Access Journals (Sweden)

    José M. Horrillo-Caraballo

    2016-01-01

    Full Text Available In this study the medium-term response of beach profiles was investigated at two sites: a gently sloping sandy beach and a steeper mixed sand and gravel beach. The former is the Duck site in North Carolina, on the east coast of the USA, which is exposed to Atlantic Ocean swells and storm waves, and the latter is the Milford-on-Sea site at Christchurch Bay, on the south coast of England, which is partially sheltered from Atlantic swells but has a directionally bimodal wave exposure. The data sets comprise detailed bathymetric surveys of beach profiles covering a period of more than 25 years for the Duck site and over 18 years for the Milford-on-Sea site. The structure of the data sets and the data-driven methods are described. Canonical correlation analysis (CCA was used to find linkages between the wave characteristics and beach profiles. The sensitivity of the linkages was investigated by deploying a wave height threshold to filter out the smaller waves incrementally. The results of the analysis indicate that, for the gently sloping sandy beach, waves of all heights are important to the morphological response. For the mixed sand and gravel beach, filtering the smaller waves improves the statistical fit and it suggests that low-height waves do not play a primary role in the medium-term morphological response, which is primarily driven by the intermittent larger storm waves.

  17. Data-Driven Security-Constrained OPF

    DEFF Research Database (Denmark)

    Thams, Florian; Halilbasic, Lejla; Pinson, Pierre

    2017-01-01

    considerations, while being less conservative than current approaches. Our approach can be scalable for large systems, accounts explicitly for power system security, and enables the electricity market to identify a cost-efficient dispatch avoiding redispatching actions. We demonstrate the performance of our......In this paper we unify electricity market operations with power system security considerations. Using data-driven techniques, we address both small signal stability and steady-state security, derive tractable decision rules in the form of line flow limits, and incorporate the resulting constraints...... in market clearing algorithms. Our goal is to minimize redispatching actions, and instead allow the market to determine the most cost-efficient dispatch while considering all security constraints. To maintain tractability of our approach we perform our security assessment offline, examining large datasets...

  18. Initial Results from an Energy-Aware Airborne Dynamic, Data-Driven Application System Performing Sampling in Coherent Boundary-Layer Structures

    Science.gov (United States)

    Frew, E.; Argrow, B. M.; Houston, A. L.; Weiss, C.

    2014-12-01

    The energy-aware airborne dynamic, data-driven application system (EA-DDDAS) performs persistent sampling in complex atmospheric conditions by exploiting wind energy using the dynamic data-driven application system paradigm. The main challenge for future airborne sampling missions is operation with tight integration of physical and computational resources over wireless communication networks, in complex atmospheric conditions. The physical resources considered here include sensor platforms, particularly mobile Doppler radar and unmanned aircraft, the complex conditions in which they operate, and the region of interest. Autonomous operation requires distributed computational effort connected by layered wireless communication. Onboard decision-making and coordination algorithms can be enhanced by atmospheric models that assimilate input from physics-based models and wind fields derived from multiple sources. These models are generally too complex to be run onboard the aircraft, so they need to be executed in ground vehicles in the field, and connected over broadband or other wireless links back to the field. Finally, the wind field environment drives strong interaction between the computational and physical systems, both as a challenge to autonomous path planning algorithms and as a novel energy source that can be exploited to improve system range and endurance. Implementation details of a complete EA-DDDAS will be provided, along with preliminary flight test results targeting coherent boundary-layer structures.

  19. Re-assessing Present Day Global Mass Transport and Glacial Isostatic Adjustment From a Data Driven Approach

    Science.gov (United States)

    Wu, X.; Jiang, Y.; Simonsen, S.; van den Broeke, M. R.; Ligtenberg, S.; Kuipers Munneke, P.; van der Wal, W.; Vermeersen, B. L. A.

    2017-12-01

    Determining present-day mass transport (PDMT) is complicated by the fact that most observations contain signals from both present day ice melting and Glacial Isostatic Adjustment (GIA). Despite decades of progress in geodynamic modeling and new observations, significant uncertainties remain in both. The key to separate present-day ice mass change and signals from GIA is to include data of different physical characteristics. We designed an approach to separate PDMT and GIA signatures by estimating them simultaneously using globally distributed interdisciplinary data with distinct physical information and a dynamically constructed a priori GIA model. We conducted a high-resolution global reappraisal of present-day ice mass balance with focus on Earth's polar regions and its contribution to global sea-level rise using a combination of ICESat, GRACE gravity, surface geodetic velocity data, and an ocean bottom pressure model. Adding ice altimetry supplies critically needed dual data types over the interiors of ice covered regions to enhance separation of PDMT and GIA signatures, and achieve half an order of magnitude expected higher accuracies for GIA and consequently ice mass balance estimates. The global data based approach can adequately address issues of PDMT and GIA induced geocenter motion and long-wavelength signatures important for large areas such as Antarctica and global mean sea level. In conjunction with the dense altimetry data, we solved for PDMT coefficients up to degree and order 180 by using a higher-resolution GRACE data set, and a high-resolution a priori PDMT model that includes detailed geographic boundaries. The high-resolution approach solves the problem of multiple resolutions in various data types, greatly reduces aliased errors from a low-degree truncation, and at the same time, enhances separation of signatures from adjacent regions such as Greenland and Canadian Arctic territories.

  20. Blood flow quantification from 2D phase contrast MRI in renal arteries using an unsupervised data driven approach

    Energy Technology Data Exchange (ETDEWEB)

    Zoellner, Frank Gerrit [Computer Assisted Clinical Medicine, Faculty of Medicine Mannheim, Univ. of Heidelberg, Mannheim (Germany); Section for Radiology, Dept. of Surgical Sciences, Univ. of Bergen (Norway); Monssen, Jan Ankar [Dept. of Radiology, Haukeland Univ. Hospital, Bergen (Norway); Roervik, Jarie [Section for Radiology, Dept. of Surgical Sciences, Univ. of Bergen (Norway); Dept. of Radiology, Haukeland Univ. Hospital, Bergen (Norway); Lundervold, Arvid [Dept. of Radiology, Haukeland Univ. Hospital, Bergen (Norway); Dept. of Biomedicine, Univ. of Bergen (Norway); Schad, Lothar R. [Computer Assisted Clinical Medicine, Faculty of Medicine Mannheim, Univ. of Heidelberg, Mannheim (Germany)

    2009-07-01

    We present a clustering approach to segment the renal artery from 2D PC Cine MR images to measure arterial blood velocity and flow. Such information is important in grading renal artery stenosis and to support the decision on surgical interventions like percutaneous transluminal angioplasty. Results from 20 data sets (volunteers, 7 patients) show that the renal arteries could be extracted automatically and the corresponding velocity profiles were close (r = 0.977) to that obtained by manual delineations of the vessel areas. (orig.)

  1. Blood flow quantification from 2D phase contrast MRI in renal arteries using an unsupervised data driven approach

    International Nuclear Information System (INIS)

    Zoellner, Frank Gerrit; Monssen, Jan Ankar; Roervik, Jarie; Lundervold, Arvid; Schad, Lothar R.

    2009-01-01

    We present a clustering approach to segment the renal artery from 2D PC Cine MR images to measure arterial blood velocity and flow. Such information is important in grading renal artery stenosis and to support the decision on surgical interventions like percutaneous transluminal angioplasty. Results from 20 data sets (volunteers, 7 patients) show that the renal arteries could be extracted automatically and the corresponding velocity profiles were close (r = 0.977) to that obtained by manual delineations of the vessel areas. (orig.)

  2. Assessing the flexibility potential of the residential load in smart electricity grids - A data-driven approach

    NARCIS (Netherlands)

    Azari, Delaram; Torbaghan, Shahab Shariat; Cappon, Hans; Gibescu, Madeleine; Keesman, Karel; Rijnaarts, Huub

    2017-01-01

    This paper proposes a framework for assessing the sensitivity of the performance of a hypothetical demand response (DR) program to consumers' preferences, should they enable DR and become prosumers. The proposed framework contains a data analytics module and a DR simulation module. The data

  3. Quantifying and reducing model-form uncertainties in Reynolds-averaged Navier–Stokes simulations: A data-driven, physics-informed Bayesian approach

    International Nuclear Information System (INIS)

    Xiao, H.; Wu, J.-L.; Wang, J.-X.; Sun, R.; Roy, C.J.

    2016-01-01

    Despite their well-known limitations, Reynolds-Averaged Navier–Stokes (RANS) models are still the workhorse tools for turbulent flow simulations in today's engineering analysis, design and optimization. While the predictive capability of RANS models depends on many factors, for many practical flows the turbulence models are by far the largest source of uncertainty. As RANS models are used in the design and safety evaluation of many mission-critical systems such as airplanes and nuclear power plants, quantifying their model-form uncertainties has significant implications in enabling risk-informed decision-making. In this work we develop a data-driven, physics-informed Bayesian framework for quantifying model-form uncertainties in RANS simulations. Uncertainties are introduced directly to the Reynolds stresses and are represented with compact parameterization accounting for empirical prior knowledge and physical constraints (e.g., realizability, smoothness, and symmetry). An iterative ensemble Kalman method is used to assimilate the prior knowledge and observation data in a Bayesian framework, and to propagate them to posterior distributions of velocities and other Quantities of Interest (QoIs). We use two representative cases, the flow over periodic hills and the flow in a square duct, to evaluate the performance of the proposed framework. Both cases are challenging for standard RANS turbulence models. Simulation results suggest that, even with very sparse observations, the obtained posterior mean velocities and other QoIs have significantly better agreement with the benchmark data compared to the baseline results. At most locations the posterior distribution adequately captures the true model error within the developed model form uncertainty bounds. The framework is a major improvement over existing black-box, physics-neutral methods for model-form uncertainty quantification, where prior knowledge and details of the models are not exploited. This approach has

  4. Quantifying and reducing model-form uncertainties in Reynolds-averaged Navier–Stokes simulations: A data-driven, physics-informed Bayesian approach

    Energy Technology Data Exchange (ETDEWEB)

    Xiao, H., E-mail: hengxiao@vt.edu; Wu, J.-L.; Wang, J.-X.; Sun, R.; Roy, C.J.

    2016-11-01

    Despite their well-known limitations, Reynolds-Averaged Navier–Stokes (RANS) models are still the workhorse tools for turbulent flow simulations in today's engineering analysis, design and optimization. While the predictive capability of RANS models depends on many factors, for many practical flows the turbulence models are by far the largest source of uncertainty. As RANS models are used in the design and safety evaluation of many mission-critical systems such as airplanes and nuclear power plants, quantifying their model-form uncertainties has significant implications in enabling risk-informed decision-making. In this work we develop a data-driven, physics-informed Bayesian framework for quantifying model-form uncertainties in RANS simulations. Uncertainties are introduced directly to the Reynolds stresses and are represented with compact parameterization accounting for empirical prior knowledge and physical constraints (e.g., realizability, smoothness, and symmetry). An iterative ensemble Kalman method is used to assimilate the prior knowledge and observation data in a Bayesian framework, and to propagate them to posterior distributions of velocities and other Quantities of Interest (QoIs). We use two representative cases, the flow over periodic hills and the flow in a square duct, to evaluate the performance of the proposed framework. Both cases are challenging for standard RANS turbulence models. Simulation results suggest that, even with very sparse observations, the obtained posterior mean velocities and other QoIs have significantly better agreement with the benchmark data compared to the baseline results. At most locations the posterior distribution adequately captures the true model error within the developed model form uncertainty bounds. The framework is a major improvement over existing black-box, physics-neutral methods for model-form uncertainty quantification, where prior knowledge and details of the models are not exploited. This approach

  5. A program wide framework for evaluating data driven teaching and learning - earth analytics approaches, results and lessons learned

    Science.gov (United States)

    Wasser, L. A.; Gold, A. U.

    2017-12-01

    There is a deluge of earth systems data available to address cutting edge science problems yet specific skills are required to work with these data. The Earth analytics education program, a core component of Earth Lab at the University of Colorado - Boulder - is building a data intensive program that provides training in realms including 1) interdisciplinary communication and collaboration 2) earth science domain knowledge including geospatial science and remote sensing and 3) reproducible, open science workflows ("earth analytics"). The earth analytics program includes an undergraduate internship, undergraduate and graduate level courses and a professional certificate / degree program. All programs share the goals of preparing a STEM workforce for successful earth analytics driven careers. We are developing an program-wide evaluation framework that assesses the effectiveness of data intensive instruction combined with domain science learning to better understand and improve data-intensive teaching approaches using blends of online, in situ, asynchronous and synchronous learning. We are using targeted online search engine optimization (SEO) to increase visibility and in turn program reach. Finally our design targets longitudinal program impacts on participant career tracts over time.. Here we present results from evaluation of both an interdisciplinary undergrad / graduate level earth analytics course and and undergraduate internship. Early results suggest that a blended approach to learning and teaching that includes both synchronous in-person teaching and active classroom hands-on learning combined with asynchronous learning in the form of online materials lead to student success. Further we will present our model for longitudinal tracking of participant's career focus overtime to better understand long-term program impacts. We also demonstrate the impact of SEO optimization on online content reach and program visibility.

  6. Development of a Computational Framework for Big Data-Driven Prediction of Long-Term Bridge Performance and Traffic Flow

    Science.gov (United States)

    2018-04-01

    Consistent efforts with dense sensor deployment and data gathering processes for bridge big data have accumulated profound information regarding bridge performance, associated environments, and traffic flows. However, direct applications of bridge bi...

  7. Growing Degree Vegetation Production Index (GDVPI): A Novel and Data-Driven Approach to Delimit Season Cycles

    Science.gov (United States)

    Graham, W. D.; Spruce, J.; Ross, K. W.; Gasser, J.; Grulke, N.

    2014-12-01

    Growing Degree Vegetation Production Index (GDVPI) is a parametric approach to delimiting vegetation seasonal growth and decline cycles using incremental growing degree days (GDD), and NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) normalized difference vegetation index (NDVI) 8-day composite cumulative integral data. We obtain a specific location's daily minimum and maximum temperatures from the nearest National Oceanic and Atmospheric Administration (NOAA) weather stations posted on the National Climate Data Center (NCDC) Climate Data Online (CDO) archive and compute GDD. The date range for this study is January 1, 2000 through December 31, 2012. We employ a novel process, a repeating logistic product (RLP), to compensate for short-term weather variability and data drops from the recording stations and fit a curve to the median daily GDD values, adjusting for asymmetry, amplitude, and phase shift that minimize the sum of squared errors when comparing the observed and predicted GDD. The resulting curve, here referred to as the surrogate GDD, is the time-temperature phasing parameter used to convert Cartesian NDVI values into polar coordinate pairs, multiplying the NDVI values as the radial by the cosine and sine of the surrogate GDD as the angular. Depending on the vegetation type and the original NDVI curve, the polar NDVI curve may be nearly circular, kidney-shaped, or pear-shaped in the case of conifers, deciduous, or agriculture, respectively. We examine the points of tangency about the polar coordinate NDVI curve, identifying values of 1, 0, -1, or infinity, as each of these represent natural inflection points. Lines connecting the origin to each tangent point illustrate and quantify the parametrically segmentation of the growing season based on the GDD and NDVI ostensible dependency. Furthermore, the area contained by each segment represents the apparent vegetation production. A particular benefit is that the inflection points are determined

  8. Pay-for-performance policy and data-driven decision making within nursing homes: a qualitative study.

    Science.gov (United States)

    Abrahamson, Kathleen; Miech, Edward; Davila, Heather Wood; Mueller, Christine; Cooke, Valerie; Arling, Greg

    2015-05-01

    Health systems globally and within the USA have introduced nursing home pay-for-performance (P4P) programmes in response to the need for improved nursing home quality. Central to the challenge of administering effective P4P is the availability of accurate, timely and clinically appropriate data for decision making. We aimed to explore ways in which data were collected, thought about and used as a result of participation in a P4P programme. Semistructured interviews were conducted with 232 nursing home employees from within 70 nursing homes that participated in P4P-sponsored quality improvement (QI) projects. Interview data were analysed to identify themes surrounding collecting, thinking about and using data for QI decision making. The term 'data' appeared 247 times in the interviews, and over 92% of these instances (228/247) were spontaneous references by nursing home staff. Overall, 34% of respondents (79/232) referred directly to 'data' in their interviews. Nursing home leadership more frequently discussed data use than direct care staff. Emergent themes included using data to identify a QI problem, gathering data in new ways at the local level, and measuring outcomes in response to P4P participation. Alterations in data use as a result of policy change were theoretically consistent with the revised version of the Promoting Action on Research Implementation in Health Services framework, which posits that successful implementation is a function of evidence, context and facilitation. Providing a reimbursement context that facilitates the collection and use of reliable local evidence may be an important consideration to others contemplating the adaptation of P4P policies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  9. A data-driven approach to {{\\rm{\\pi }}}^{0},{\\rm{\\eta }} and {{\\rm{\\eta }}}^{\\prime} single and double Dalitz decays

    Science.gov (United States)

    Escribano, Rafel; Gonzàlez-Solís, Sergi

    2018-01-01

    The dilepton invariant mass spectra and integrated branching ratios of the single and double Dalitz decays {\\mathscr{P}}\\to {{{l}}}+{{{l}}}-{{γ }} and {\\mathscr{P}}\\to {{{l}}}+{{{l}}}-{{{l}}}+{{{l}}}- ({\\mathscr{P}}={{{π }}}0,{{η }},{{{η }}}\\prime; {{l}}={{e}} or {{μ }}) are predicted by means of a data-driven approach based on the use of rational approximants applied to {{{π }}}0,{{η }} and {{{η }}}\\prime transition form factor experimental data in the space-like region. Supported by the FPI scholarship BES-2012-055371 (S.G-S), the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya under grant 2014 SGR 1450, the Ministerio de Ciencia e Innovación under grant FPA2011-25948, the Ministerio de Economía y Competitividad under grants CICYT-FEDER-FPA 2014-55613-P and SEV-2012-0234, the Spanish Consolider-Ingenio 2010 Program CPAN (CSD2007-00042), and the European Commission under program FP7-INFRASTRUCTURES-2011-1 (283286) S.G-S also Received Support from the CAS President’s International Fellowship Initiative for Young International Scientist (2017PM0031)

  10. Study of the Influence of Age in 18F-FDG PET Images Using a Data-Driven Approach and Its Evaluation in Alzheimer’s Disease

    Directory of Open Access Journals (Sweden)

    Jiehui Jiang

    2018-01-01

    Full Text Available Objectives. 18F-FDG PET scan is one of the most frequently used neural imaging scans. However, the influence of age has proven to be the greatest interfering factor for many clinical dementia diagnoses when analyzing 18F-FDG PET images, since radiologists encounter difficulties when deciding whether the abnormalities in specific regions correlate with normal aging, disease, or both. In the present paper, the authors aimed to define specific brain regions and determine an age-correction mathematical model. Methods. A data-driven approach was used based on 255 healthy subjects. Results. The inferior frontal gyrus, the left medial part and the left medial orbital part of superior frontal gyrus, the right insula, the left anterior cingulate, the left median cingulate, and paracingulate gyri, and bilateral superior temporal gyri were found to have a strong negative correlation with age. For evaluation, an age-correction model was applied to 262 healthy subjects and 50 AD subjects selected from the ADNI database, and partial correlations between SUVR mean and three clinical results were carried out before and after age correction. Conclusion. All correlation coefficients were significantly improved after the age correction. The proposed model was effective in the age correction of both healthy and AD subjects.

  11. Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.

    Science.gov (United States)

    Taylor, R Andrew; Pare, Joseph R; Venkatesh, Arjun K; Mowafi, Hani; Melnick, Edward R; Fleischman, William; Hall, M Kennedy

    2016-03-01

    Predictive analytics in emergency care has mostly been limited to the use of clinical decision rules (CDRs) in the form of simple heuristics and scoring systems. In the development of CDRs, limitations in analytic methods and concerns with usability have generally constrained models to a preselected small set of variables judged to be clinically relevant and to rules that are easily calculated. Furthermore, CDRs frequently suffer from questions of generalizability, take years to develop, and lack the ability to be updated as new information becomes available. Newer analytic and machine learning techniques capable of harnessing the large number of variables that are already available through electronic health records (EHRs) may better predict patient outcomes and facilitate automation and deployment within clinical decision support systems. In this proof-of-concept study, a local, big data-driven, machine learning approach is compared to existing CDRs and traditional analytic methods using the prediction of sepsis in-hospital mortality as the use case. This was a retrospective study of adult ED visits admitted to the hospital meeting criteria for sepsis from October 2013 to October 2014. Sepsis was defined as meeting criteria for systemic inflammatory response syndrome with an infectious admitting diagnosis in the ED. ED visits were randomly partitioned into an 80%/20% split for training and validation. A random forest model (machine learning approach) was constructed using over 500 clinical variables from data available within the EHRs of four hospitals to predict in-hospital mortality. The machine learning prediction model was then compared to a classification and regression tree (CART) model, logistic regression model, and previously developed prediction tools on the validation data set using area under the receiver operating characteristic curve (AUC) and chi-square statistics. There were 5,278 visits among 4,676 unique patients who met criteria for sepsis. Of

  12. The Effects of Open Enrollment, Curriculum Alignment, and Data-Driven Instruction on the Test Performance of English Language Learners (ELLs) and Re-Designated Fluent English Proficient Students (RFEPs) at Shangri-La High School

    Science.gov (United States)

    Miles, Eva

    2013-01-01

    The purpose of this study was to examine the impact of open enrollment, curriculum alignment, and data-driven instruction on the test performance of English Language Learners (ELLs) and Re-designated Fluent English Proficient students (RFEPs) at Shangri-la High School. Participants of this study consisted of the student population enrolled in…

  13. Data-Driven Methods to Diversify Knowledge of Human Psychology

    OpenAIRE

    Jack, Rachael E.; Crivelli, Carlos; Wheatley, Thalia

    2017-01-01

    open access article Psychology aims to understand real human behavior. However, cultural biases in the scientific process can constrain knowledge. We describe here how data-driven methods can relax these constraints to reveal new insights that theories can overlook. To advance knowledge we advocate a symbiotic approach that better combines data-driven methods with theory.

  14. Data driven innovations in structural health monitoring

    Science.gov (United States)

    Rosales, M. J.; Liyanapathirana, R.

    2017-05-01

    At present, substantial investments are being allocated to civil infrastructures also considered as valuable assets at a national or global scale. Structural Health Monitoring (SHM) is an indispensable tool required to ensure the performance and safety of these structures based on measured response parameters. The research to date on damage assessment has tended to focus on the utilization of wireless sensor networks (WSN) as it proves to be the best alternative over the traditional visual inspections and tethered or wired counterparts. Over the last decade, the structural health and behaviour of innumerable infrastructure has been measured and evaluated owing to several successful ventures of implementing these sensor networks. Various monitoring systems have the capability to rapidly transmit, measure, and store large capacities of data. The amount of data collected from these networks have eventually been unmanageable which paved the way to other relevant issues such as data quality, relevance, re-use, and decision support. There is an increasing need to integrate new technologies in order to automate the evaluation processes as well as to enhance the objectivity of data assessment routines. This paper aims to identify feasible methodologies towards the application of time-series analysis techniques to judiciously exploit the vast amount of readily available as well as the upcoming data resources. It continues the momentum of a greater effort to collect and archive SHM approaches that will serve as data-driven innovations for the assessment of damage through efficient algorithms and data analytics.

  15. Challenges of Data-driven Healthcare Management

    DEFF Research Database (Denmark)

    Bossen, Claus; Danholt, Peter; Ubbesen, Morten Bonde

    This paper describes the new kind of data-work involved in developing data-driven healthcare based on two cases from Denmark: The first case concerns a governance infrastructure based on Diagnose-Related Groups (DRG), which was introduced in Denmark in the 1990s. The DRG-system links healthcare...... activity and financing and relies of extensive data entry, reporting and calculations. This has required the development of new skills, work and work roles. The second case concerns a New Governance project aimed at developing new performance indicators for healthcare delivery as an alternative to DRG....... Here, a core challenge is select indicators and actually being able to acquire data upon them. The two cases point out that data-driven healthcare requires more and new kinds of work for which new skills, functions and work roles have to be developed....

  16. Data driven marketing for dummies

    CERN Document Server

    Semmelroth, David

    2013-01-01

    Embrace data and use it to sell and market your products Data is everywhere and it keeps growing and accumulating. Companies need to embrace big data and make it work harder to help them sell and market their products. Successful data analysis can help marketing professionals spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Data Driven Marketing For Dummies helps companies use all the data at their disposal to make current customers more satisfied, reach new customers, and sell to their most important customer segments more efficiently. Identifyi

  17. Dynamic Data-Driven UAV Network for Plume Characterization

    Science.gov (United States)

    2016-05-23

    AFRL-AFOSR-VA-TR-2016-0203 Dynamic Data-Driven UAV Network for Plume Characterization Kamran Mohseni UNIVERSITY OF FLORIDA Final Report 05/23/2016...AND SUBTITLE Dynamic Data-Driven UAV Network for Plume Characterization 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA9550-13-1-0090 5c.  PROGRAM ELEMENT...studied a dynamic data driven (DDD) approach to operation of a heterogeneous team of unmanned aerial vehicles ( UAVs ) or micro/miniature aerial

  18. Evaluating the Performance of Wavelet-based Data-driven Models for Multistep-ahead Flood Forecasting in an Urbanized Watershed

    Science.gov (United States)

    Kasaee Roodsari, B.; Chandler, D. G.

    2015-12-01

    A real-time flood forecast system is presented to provide emergency management authorities sufficient lead time to execute plans for evacuation and asset protection in urban watersheds. This study investigates the performance of two hybrid models for real-time flood forecasting at different subcatchments of Ley Creek watershed, a heavily urbanized watershed in the vicinity of Syracuse, New York. Hybrid models include Wavelet-Based Artificial Neural Network (WANN) and Wavelet-Based Adaptive Neuro-Fuzzy Inference System (WANFIS). Both models are developed on the basis of real time stream network sensing. The wavelet approach is applied to decompose the collected water depth timeseries to Approximation and Detail components. The Approximation component is then used as an input to ANN and ANFIS models to forecast water level at lead times of 1 to 10 hours. The performance of WANN and WANFIS models are compared to ANN and ANFIS models for different lead times. Initial results demonstrated greater predictive power of hybrid models.

  19. Data-driven execution of fast multipole methods

    KAUST Repository

    Ltaief, Hatem

    2013-09-17

    Fast multipole methods (FMMs) have O (N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation supercomputers. Their most common application is to accelerate N-body problems, but they can also be used to solve boundary integral equations. When the particle distribution is irregular and the tree structure is adaptive, load balancing becomes a non-trivial question. A common strategy for load balancing FMMs is to use the work load from the previous step as weights to statically repartition the next step. The authors discuss in the paper another approach based on data-driven execution to efficiently tackle this challenging load balancing problem. The core idea consists of breaking the most time-consuming stages of the FMMs into smaller tasks. The algorithm can then be represented as a directed acyclic graph where nodes represent tasks and edges represent dependencies among them. The execution of the algorithm is performed by asynchronously scheduling the tasks using the queueing and runtime for kernels runtime environment, in a way such that data dependencies are not violated for numerical correctness purposes. This asynchronous scheduling results in an out-of-order execution. The performance results of the data-driven FMM execution outperform the previous strategy and show linear speedup on a quad-socket quad-core Intel Xeon system.Copyright © 2013 John Wiley & Sons, Ltd. Copyright © 2013 John Wiley & Sons, Ltd.

  20. Data-Driven Predictive Direct Load Control of Refrigeration Systems

    DEFF Research Database (Denmark)

    Shafiei, Seyed Ehsan; Knudsen, Torben; Wisniewski, Rafal

    2015-01-01

    A predictive control using subspace identification is applied for the smart grid integration of refrigeration systems under a direct load control scheme. A realistic demand response scenario based on regulation of the electrical power consumption is considered. A receding horizon optimal control...... is proposed to fulfil two important objectives: to secure high coefficient of performance and to participate in power consumption management. Moreover, a new method for design of input signals for system identification is put forward. The control method is fully data driven without an explicit use of model...... against real data. The performance improvement results in a 22% reduction in the energy consumption. A comparative simulation is accomplished showing the superiority of the method over the existing approaches in terms of the load following performance....

  1. GOLD predictivity mapping in French Guiana using an expert-guided data-driven approach based on a regional-scale GIS

    Science.gov (United States)

    Cassard, Daniel; Billa, Mario; Lambert, Alain; Picot, Jean-Claude; Husson, Yves

    2008-05-01

    The realistic estimation of gold mining in French Guiana requires including the numerous illegal gold washing activities in predictivity mapping. The combination of a classical approach, based on the algebraic method of Knox-Robinson and Groves, with innovative processing grid-type geochemical and radiometric data, as well as cluster analysis technique provides a better understanding of the structure of studied mineralized areas.

  2. Data-driven workflows for microservices

    DEFF Research Database (Denmark)

    Safina, Larisa; Mazzara, Manuel; Montesi, Fabrizio

    2016-01-01

    Microservices is an architectural style inspired by service-oriented computing that has recently started gainingpopularity. Jolie is a programming language based on the microservices paradigm: the main building block of Jolie systems are services, in contrast to, e.g., functions or objects....... The primitives offered by the Jolie language elicit many of the recurring patterns found in microservices, like load balancers and structured processes. However, Jolie still lacks some useful constructs for dealing with message types and data manipulation that are present in service-oriented computing......). We show the impact of our implementation on some of the typical scenarios found in microservice systems. This shows how computation can move from a process-driven to a data-driven approach, and leads to the preliminary identification of recurring communication patterns that can be shaped as design...

  3. Older but still fluent? Insights from the intrinsically active baseline configuration of the aging brain using a data driven graph-theoretical approach.

    Science.gov (United States)

    Muller, Angela M; Mérillat, Susan; Jäncke, Lutz

    2016-02-15

    A major part of our knowledge about the functioning of the aging brain comes from task-induced activation paradigms. However, the aging brain's intrinsic functional organization may be already a limiting factor for the outcome of an actual behavior. In order to get a better understanding of how this functional baseline configuration of the aging brain may affect cognitive performance, we analyzed task-free fMRI data of older 186 participants (mean age=70.4, 97 female) and their performance data in verbal fluency: First, we conducted an intrinsic connectivity contrast analysis (ICC) for the purpose of evaluating the brain regions whose degree of connectedness was significantly correlated with fluency performance. Secondly, using connectivity analyses we investigated how the clusters from the ICC functionally related to the other major resting-state networks. Apart from the importance of intact fronto-parietal long-range connections, the preserved capacity of the DMN for a finely attuned interaction with the executive-control network and the language network seems to be crucial for successful verbal fluency performance in older people. We provide further evidence that the right frontal regions might be more prominently affected by age-related decline. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. A data-driven approach to estimating the number of clusters in hierarchical clustering [version 1; referees: 2 approved, 1 approved with reservations

    Directory of Open Access Journals (Sweden)

    Antoine E. Zambelli

    2016-12-01

    Full Text Available DNA microarray and gene expression problems often require a researcher to perform clustering on their data in a bid to better understand its structure. In cases where the number of clusters is not known, one can resort to hierarchical clustering methods. However, there currently exist very few automated algorithms for determining the true number of clusters in the data. We propose two new methods (mode and maximum difference for estimating the number of clusters in a hierarchical clustering framework to create a fully automated process with no human intervention. These methods are compared to the established elbow and gap statistic algorithms using simulated datasets and the Biobase Gene ExpressionSet. We also explore a data mixing procedure inspired by cross validation techniques. We find that the overall performance of the maximum difference method is comparable or greater to that of the gap statistic in multi-cluster scenarios, and achieves that performance at a fraction of the computational cost. This method also responds well to our mixing procedure, which opens the door to future research. We conclude that both the mode and maximum difference methods warrant further study related to their mixing and cross-validation potential. We particularly recommend the use of the maximum difference method in multi-cluster scenarios given its accuracy and execution times, and present it as an alternative to existing algorithms.

  5. NEBULAS A High Performance Data-Driven Event-Building Architecture based on an Asynchronous Self-Routing Packet-Switching Network

    CERN Multimedia

    Costa, M; Letheren, M; Djidi, K; Gustafsson, L; Lazraq, T; Minerskjold, M; Tenhunen, H; Manabe, A; Nomachi, M; Watase, Y

    2002-01-01

    RD31 : The project is evaluating a new approach to event building for level-two and level-three processor farms at high rate experiments. It is based on the use of commercial switching fabrics to replace the traditional bus-based architectures used in most previous data acquisition sytems. Switching fabrics permit the construction of parallel, expandable, hardware-driven event builders that can deliver higher aggregate throughput than the bus-based architectures. A standard industrial switching fabric technology is being evaluated. It is based on Asynchronous Transfer Mode (ATM) packet-switching network technology. Commercial, expandable ATM switching fabrics and processor interfaces, now being developed for the future Broadband ISDN infrastructure, could form the basis of an implementation. The goals of the project are to demonstrate the viability of this approach, to evaluate the trade-offs involved in make versus buy options, to study the interfacing of the physics frontend data buffers to such a fabric, a...

  6. Data-driven architectural production and operation

    NARCIS (Netherlands)

    Bier, H.H.; Mostafavi, S.

    2014-01-01

    Data-driven architectural production and operation as explored within Hyperbody rely heavily on system thinking implying that all parts of a system are to be understood in relation to each other. These relations are increasingly established bi-directionally so that data-driven architecture is not

  7. A Data-Driven Approach to Develop Physically Sound Predictors: Application to Depth-Averaged Velocities and Drag Coefficients on Vegetated Flows

    Science.gov (United States)

    Tinoco, R. O.; Goldstein, E. B.; Coco, G.

    2016-12-01

    We use a machine learning approach to seek accurate, physically sound predictors, to estimate two relevant flow parameters for open-channel vegetated flows: mean velocities and drag coefficients. A genetic programming algorithm is used to find a robust relationship between properties of the vegetation and flow parameters. We use data published from several laboratory experiments covering a broad range of conditions to obtain: a) in the case of mean flow, an equation that matches the accuracy of other predictors from recent literature while showing a less complex structure, and b) for drag coefficients, a predictor that relies on both single element and array parameters. We investigate different criteria for dataset size and data selection to evaluate their impact on the resulting predictor, as well as simple strategies to obtain only dimensionally consistent equations, and avoid the need for dimensional coefficients. The results show that a proper methodology can deliver physically sound models representative of the processes involved, such that genetic programming and machine learning techniques can be used as powerful tools to study complicated phenomena and develop not only purely empirical, but "hybrid" models, coupling results from machine learning methodologies into physics-based models.

  8. Data Driven Economic Model Predictive Control

    Directory of Open Access Journals (Sweden)

    Masoud Kheradmandi

    2018-04-01

    Full Text Available This manuscript addresses the problem of data driven model based economic model predictive control (MPC design. To this end, first, a data-driven Lyapunov-based MPC is designed, and shown to be capable of stabilizing a system at an unstable equilibrium point. The data driven Lyapunov-based MPC utilizes a linear time invariant (LTI model cognizant of the fact that the training data, owing to the unstable nature of the equilibrium point, has to be obtained from closed-loop operation or experiments. Simulation results are first presented demonstrating closed-loop stability under the proposed data-driven Lyapunov-based MPC. The underlying data-driven model is then utilized as the basis to design an economic MPC. The economic improvements yielded by the proposed method are illustrated through simulations on a nonlinear chemical process system example.

  9. Data-driven approach to detect common copy-number variations and frequency profiles in a population-based Korean cohort.

    Science.gov (United States)

    Moon, Sanghoon; Kim, Young Jin; Hong, Chang Bum; Kim, Dong-Joon; Lee, Jong-Young; Kim, Bong-Jo

    2011-11-01

    To date, hundreds of thousands of copy-number variation (CNV) data have been reported using various platforms. The proportion of Asians in these data is, however, relatively small as compared with that of other ethnic groups, such as Caucasians and Yorubas. Because of limitations in platform resolution and the high noise level in signal intensity, in most CNV studies (particularly those using single nucleotide polymorphism arrays), the average number of CNVs in an individual is less than the number of known CNVs. In this study, we ascertained reliable, common CNV regions (CNVRs) and identified actual frequency rates in the Korean population to provide more CNV information. We performed two-stage analyses for detecting structural variations with two platforms. We discovered 576 common CNVRs (88 CNV segments on average in an individual), and 87% (501 of 576) of these CNVRs overlapped by ≥1 bp with previously validated CNV events. Interestingly, from the frequency analysis of CNV profiles, 52 of 576 CNVRs had a frequency rate of population.

  10. Data-Driven Problems in Elasticity

    Science.gov (United States)

    Conti, S.; Müller, S.; Ortiz, M.

    2018-01-01

    We consider a new class of problems in elasticity, referred to as Data-Driven problems, defined on the space of strain-stress field pairs, or phase space. The problem consists of minimizing the distance between a given material data set and the subspace of compatible strain fields and stress fields in equilibrium. We find that the classical solutions are recovered in the case of linear elasticity. We identify conditions for convergence of Data-Driven solutions corresponding to sequences of approximating material data sets. Specialization to constant material data set sequences in turn establishes an appropriate notion of relaxation. We find that relaxation within this Data-Driven framework is fundamentally different from the classical relaxation of energy functions. For instance, we show that in the Data-Driven framework the relaxation of a bistable material leads to material data sets that are not graphs.

  11. Consistent data-driven computational mechanics

    Science.gov (United States)

    González, D.; Chinesta, F.; Cueto, E.

    2018-05-01

    We present a novel method, within the realm of data-driven computational mechanics, to obtain reliable and thermodynamically sound simulation from experimental data. We thus avoid the need to fit any phenomenological model in the construction of the simulation model. This kind of techniques opens unprecedented possibilities in the framework of data-driven application systems and, particularly, in the paradigm of industry 4.0.

  12. Data-Driven Learning: Reasonable Fears and Rational Reassurance

    Science.gov (United States)

    Boulton, Alex

    2009-01-01

    Computer corpora have many potential applications in teaching and learning languages, the most direct of which--when the learners explore a corpus themselves--has become known as data-driven learning (DDL). Despite considerable enthusiasm in the research community and interest in higher education, the approach has not made major inroads to…

  13. Data-Driven Learning of Q-Matrix

    Science.gov (United States)

    Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2012-01-01

    The recent surge of interests in cognitive assessment has led to developments of novel statistical models for diagnostic classification. Central to many such models is the well-known "Q"-matrix, which specifies the item-attribute relationships. This article proposes a data-driven approach to identification of the "Q"-matrix and estimation of…

  14. Fork-join and data-driven execution models on multi-core architectures: Case study of the FMM

    KAUST Repository

    Amer, Abdelhalim

    2013-01-01

    Extracting maximum performance of multi-core architectures is a difficult task primarily due to bandwidth limitations of the memory subsystem and its complex hierarchy. In this work, we study the implications of fork-join and data-driven execution models on this type of architecture at the level of task parallelism. For this purpose, we use a highly optimized fork-join based implementation of the FMM and extend it to a data-driven implementation using a distributed task scheduling approach. This study exposes some limitations of the conventional fork-join implementation in terms of synchronization overheads. We find that these are not negligible and their elimination by the data-driven method, with a careful data locality strategy, was beneficial. Experimental evaluation of both methods on state-of-the-art multi-socket multi-core architectures showed up to 22% speed-ups of the data-driven approach compared to the original method. We demonstrate that a data-driven execution of FMM not only improves performance by avoiding global synchronization overheads but also reduces the memory-bandwidth pressure caused by memory-intensive computations. © 2013 Springer-Verlag.

  15. Data driven modelling of vertical atmospheric radiation

    International Nuclear Information System (INIS)

    Antoch, Jaromir; Hlubinka, Daniel

    2011-01-01

    In the Czech Hydrometeorological Institute (CHMI) there exists a unique set of meteorological measurements consisting of the values of vertical atmospheric levels of beta and gamma radiation. In this paper a stochastic data-driven model based on nonlinear regression and on nonhomogeneous Poisson process is suggested. In the first part of the paper, growth curves were used to establish an appropriate nonlinear regression model. For comparison we considered a nonhomogeneous Poisson process with its intensity based on growth curves. In the second part both approaches were applied to the real data and compared. Computational aspects are briefly discussed as well. The primary goal of this paper is to present an improved understanding of the distribution of environmental radiation as obtained from the measurements of the vertical radioactivity profiles by the radioactivity sonde system. - Highlights: → We model vertical atmospheric levels of beta and gamma radiation. → We suggest appropriate nonlinear regression model based on growth curves. → We compare nonlinear regression modelling with Poisson process based modeling. → We apply both models to the real data.

  16. The Structural Consequences of Big Data-Driven Education.

    Science.gov (United States)

    Zeide, Elana

    2017-06-01

    Educators and commenters who evaluate big data-driven learning environments focus on specific questions: whether automated education platforms improve learning outcomes, invade student privacy, and promote equality. This article puts aside separate unresolved-and perhaps unresolvable-issues regarding the concrete effects of specific technologies. It instead examines how big data-driven tools alter the structure of schools' pedagogical decision-making, and, in doing so, change fundamental aspects of America's education enterprise. Technological mediation and data-driven decision-making have a particularly significant impact in learning environments because the education process primarily consists of dynamic information exchange. In this overview, I highlight three significant structural shifts that accompany school reliance on data-driven instructional platforms that perform core school functions: teaching, assessment, and credentialing. First, virtual learning environments create information technology infrastructures featuring constant data collection, continuous algorithmic assessment, and possibly infinite record retention. This undermines the traditional intellectual privacy and safety of classrooms. Second, these systems displace pedagogical decision-making from educators serving public interests to private, often for-profit, technology providers. They constrain teachers' academic autonomy, obscure student evaluation, and reduce parents' and students' ability to participate or challenge education decision-making. Third, big data-driven tools define what "counts" as education by mapping the concepts, creating the content, determining the metrics, and setting desired learning outcomes of instruction. These shifts cede important decision-making to private entities without public scrutiny or pedagogical examination. In contrast to the public and heated debates that accompany textbook choices, schools often adopt education technologies ad hoc. Given education

  17. Data-driven modeling of nano-nose gas sensor arrays

    DEFF Research Database (Denmark)

    Alstrøm, Tommy Sonne; Larsen, Jan; Nielsen, Claus Højgård

    2010-01-01

    We present a data-driven approach to classification of Quartz Crystal Microbalance (QCM) sensor data. The sensor is a nano-nose gas sensor that detects concentrations of analytes down to ppm levels using plasma polymorized coatings. Each sensor experiment takes approximately one hour hence...... the number of available training data is limited. We suggest a data-driven classification model which work from few examples. The paper compares a number of data-driven classification and quantification schemes able to detect the gas and the concentration level. The data-driven approaches are based on state...

  18. Data-driven regionalization of housing markets

    NARCIS (Netherlands)

    Helbich, M.; Brunauer, W.; Hagenauer, J.; Leitner, M.

    2013-01-01

    This article presents a data-driven framework for housing market segmentation. Local marginal house price surfaces are investigated by means of mixed geographically weighted regression and are reduced to a set of principal component maps, which in turn serve as input for spatial regionalization. The

  19. Data Driven Constraints for the SVM

    DEFF Research Database (Denmark)

    Darkner, Sune; Clemmensen, Line Katrine Harder

    2012-01-01

    We propose a generalized data driven constraint for support vector machines exemplified by classification of paired observations in general and specifically on the human ear canal. This is particularly interesting in dynamic cases such as tissue movement or pathologies developing over time. Assum...

  20. A data-driven framework for investigating customer retention

    OpenAIRE

    Mgbemena, Chidozie Simon

    2016-01-01

    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London. This study presents a data-driven simulation framework in order to understand customer behaviour and therefore improve customer retention. The overarching system design methodology used for this study is aligned with the design science paradigm. The Social Media Domain Analysis (SoMeDoA) approach is adopted and evaluated to build a model on the determinants of customer satisfaction ...

  1. Data-driven non-linear elasticity: constitutive manifold construction and problem discretization

    Science.gov (United States)

    Ibañez, Ruben; Borzacchiello, Domenico; Aguado, Jose Vicente; Abisset-Chavanne, Emmanuelle; Cueto, Elias; Ladeveze, Pierre; Chinesta, Francisco

    2017-11-01

    The use of constitutive equations calibrated from data has been implemented into standard numerical solvers for successfully addressing a variety problems encountered in simulation-based engineering sciences (SBES). However, the complexity remains constantly increasing due to the need of increasingly detailed models as well as the use of engineered materials. Data-Driven simulation constitutes a potential change of paradigm in SBES. Standard simulation in computational mechanics is based on the use of two very different types of equations. The first one, of axiomatic character, is related to balance laws (momentum, mass, energy,\\ldots ), whereas the second one consists of models that scientists have extracted from collected, either natural or synthetic, data. Data-driven (or data-intensive) simulation consists of directly linking experimental data to computers in order to perform numerical simulations. These simulations will employ laws, universally recognized as epistemic, while minimizing the need of explicit, often phenomenological, models. The main drawback of such an approach is the large amount of required data, some of them inaccessible from the nowadays testing facilities. Such difficulty can be circumvented in many cases, and in any case alleviated, by considering complex tests, collecting as many data as possible and then using a data-driven inverse approach in order to generate the whole constitutive manifold from few complex experimental tests, as discussed in the present work.

  2. External radioactive markers for PET data-driven respiratory gating in positron emission tomography.

    Science.gov (United States)

    Büther, Florian; Ernst, Iris; Hamill, James; Eich, Hans T; Schober, Otmar; Schäfers, Michael; Schäfers, Klaus P

    2013-04-01

    Respiratory gating is an established approach to overcoming respiration-induced image artefacts in PET. Of special interest in this respect are raw PET data-driven gating methods which do not require additional hardware to acquire respiratory signals during the scan. However, these methods rely heavily on the quality of the acquired PET data (statistical properties, data contrast, etc.). We therefore combined external radioactive markers with data-driven respiratory gating in PET/CT. The feasibility and accuracy of this approach was studied for [(18)F]FDG PET/CT imaging in patients with malignant liver and lung lesions. PET data from 30 patients with abdominal or thoracic [(18)F]FDG-positive lesions (primary tumours or metastases) were included in this prospective study. The patients underwent a 10-min list-mode PET scan with a single bed position following a standard clinical whole-body [(18)F]FDG PET/CT scan. During this scan, one to three radioactive point sources (either (22)Na or (18)F, 50-100 kBq) in a dedicated holder were attached the patient's abdomen. The list mode data acquired were retrospectively analysed for respiratory signals using established data-driven gating approaches and additionally by tracking the motion of the point sources in sinogram space. Gated reconstructions were examined qualitatively, in terms of the amount of respiratory displacement and in respect of changes in local image intensity in the gated images. The presence of the external markers did not affect whole-body PET/CT image quality. Tracking of the markers led to characteristic respiratory curves in all patients. Applying these curves for gated reconstructions resulted in images in which motion was well resolved. Quantitatively, the performance of the external marker-based approach was similar to that of the best intrinsic data-driven methods. Overall, the gain in measured tumour uptake from the nongated to the gated images indicating successful removal of respiratory motion

  3. Data Driven Tuning of Inventory Controllers

    DEFF Research Database (Denmark)

    Huusom, Jakob Kjøbsted; Santacoloma, Paloma Andrade; Poulsen, Niels Kjølstad

    2007-01-01

    A systematic method for criterion based tuning of inventory controllers based on data-driven iterative feedback tuning is presented. This tuning method circumvent problems with modeling bias. The process model used for the design of the inventory control is utilized in the tuning...... as an approximation to reduce time required on experiments. The method is illustrated in an application with a multivariable inventory control implementation on a four tank system....

  4. Data-Driven H∞ Control for Nonlinear Distributed Parameter Systems.

    Science.gov (United States)

    Luo, Biao; Huang, Tingwen; Wu, Huai-Ning; Yang, Xiong

    2015-11-01

    The data-driven H∞ control problem of nonlinear distributed parameter systems is considered in this paper. An off-policy learning method is developed to learn the H∞ control policy from real system data rather than the mathematical model. First, Karhunen-Loève decomposition is used to compute the empirical eigenfunctions, which are then employed to derive a reduced-order model (ROM) of slow subsystem based on the singular perturbation theory. The H∞ control problem is reformulated based on the ROM, which can be transformed to solve the Hamilton-Jacobi-Isaacs (HJI) equation, theoretically. To learn the solution of the HJI equation from real system data, a data-driven off-policy learning approach is proposed based on the simultaneous policy update algorithm and its convergence is proved. For implementation purpose, a neural network (NN)- based action-critic structure is developed, where a critic NN and two action NNs are employed to approximate the value function, control, and disturbance policies, respectively. Subsequently, a least-square NN weight-tuning rule is derived with the method of weighted residuals. Finally, the developed data-driven off-policy learning approach is applied to a nonlinear diffusion-reaction process, and the obtained results demonstrate its effectiveness.

  5. Architectural Strategies for Enabling Data-Driven Science at Scale

    Science.gov (United States)

    Crichton, D. J.; Law, E. S.; Doyle, R. J.; Little, M. M.

    2017-12-01

    The analysis of large data collections from NASA or other agencies is often executed through traditional computational and data analysis approaches, which require users to bring data to their desktops and perform local data analysis. Alternatively, data are hauled to large computational environments that provide centralized data analysis via traditional High Performance Computing (HPC). Scientific data archives, however, are not only growing massive, but are also becoming highly distributed. Neither traditional approach provides a good solution for optimizing analysis into the future. Assumptions across the NASA mission and science data lifecycle, which historically assume that all data can be collected, transmitted, processed, and archived, will not scale as more capable instruments stress legacy-based systems. New paradigms are needed to increase the productivity and effectiveness of scientific data analysis. This paradigm must recognize that architectural and analytical choices are interrelated, and must be carefully coordinated in any system that aims to allow efficient, interactive scientific exploration and discovery to exploit massive data collections, from point of collection (e.g., onboard) to analysis and decision support. The most effective approach to analyzing a distributed set of massive data may involve some exploration and iteration, putting a premium on the flexibility afforded by the architectural framework. The framework should enable scientist users to assemble workflows efficiently, manage the uncertainties related to data analysis and inference, and optimize deep-dive analytics to enhance scalability. In many cases, this "data ecosystem" needs to be able to integrate multiple observing assets, ground environments, archives, and analytics, evolving from stewardship of measurements of data to using computational methodologies to better derive insight from the data that may be fused with other sets of data. This presentation will discuss

  6. Ensemble of data-driven prognostic algorithms for robust prediction of remaining useful life

    International Nuclear Information System (INIS)

    Hu Chao; Youn, Byeng D.; Wang Pingfeng; Taek Yoon, Joung

    2012-01-01

    Prognostics aims at determining whether a failure of an engineered system (e.g., a nuclear power plant) is impending and estimating the remaining useful life (RUL) before the failure occurs. The traditional data-driven prognostic approach is to construct multiple candidate algorithms using a training data set, evaluate their respective performance using a testing data set, and select the one with the best performance while discarding all the others. This approach has three shortcomings: (i) the selected standalone algorithm may not be robust; (ii) it wastes the resources for constructing the algorithms that are discarded; (iii) it requires the testing data in addition to the training data. To overcome these drawbacks, this paper proposes an ensemble data-driven prognostic approach which combines multiple member algorithms with a weighted-sum formulation. Three weighting schemes, namely the accuracy-based weighting, diversity-based weighting and optimization-based weighting, are proposed to determine the weights of member algorithms. The k-fold cross validation (CV) is employed to estimate the prediction error required by the weighting schemes. The results obtained from three case studies suggest that the ensemble approach with any weighting scheme gives more accurate RUL predictions compared to any sole algorithm when member algorithms producing diverse RUL predictions have comparable prediction accuracy and that the optimization-based weighting scheme gives the best overall performance among the three weighting schemes.

  7. Retrospective data-driven respiratory gating for PET/CT

    International Nuclear Information System (INIS)

    Schleyer, Paul J; O'Doherty, Michael J; Barrington, Sally F; Marsden, Paul K

    2009-01-01

    Respiratory motion can adversely affect both PET and CT acquisitions. Respiratory gating allows an acquisition to be divided into a series of motion-reduced bins according to the respiratory signal, which is typically hardware acquired. In order that the effects of motion can potentially be corrected for, we have developed a novel, automatic, data-driven gating method which retrospectively derives the respiratory signal from the acquired PET and CT data. PET data are acquired in listmode and analysed in sinogram space, and CT data are acquired in cine mode and analysed in image space. Spectral analysis is used to identify regions within the CT and PET data which are subject to respiratory motion, and the variation of counts within these regions is used to estimate the respiratory signal. Amplitude binning is then used to create motion-reduced PET and CT frames. The method was demonstrated with four patient datasets acquired on a 4-slice PET/CT system. To assess the accuracy of the data-derived respiratory signal, a hardware-based signal was acquired for comparison. Data-driven gating was successfully performed on PET and CT datasets for all four patients. Gated images demonstrated respiratory motion throughout the bin sequences for all PET and CT series, and image analysis and direct comparison of the traces derived from the data-driven method with the hardware-acquired traces indicated accurate recovery of the respiratory signal.

  8. Data-driven motion correction in brain SPECT

    International Nuclear Information System (INIS)

    Kyme, A.Z.; Hutton, B.F.; Hatton, R.L.; Skerrett, D.W.

    2002-01-01

    Patient motion can cause image artifacts in SPECT despite restraining measures. Data-driven detection and correction of motion can be achieved by comparison of acquired data with the forward-projections. By optimising the orientation of the reconstruction, parameters can be obtained for each misaligned projection and applied to update this volume using a 3D reconstruction algorithm. Digital and physical phantom validation was performed to investigate this approach. Noisy projection data simulating at least one fully 3D patient head movement during acquisition were constructed by projecting the digital Huffman brain phantom at various orientations. Motion correction was applied to the reconstructed studies. The importance of including attenuation effects in the estimation of motion and the need for implementing an iterated correction were assessed in the process. Correction success was assessed visually for artifact reduction, and quantitatively using a mean square difference (MSD) measure. Physical Huffman phantom studies with deliberate movements introduced during the acquisition were also acquired and motion corrected. Effective artifact reduction in the simulated corrupt studies was achieved by motion correction. Typically the MSD ratio between the corrected and reference studies compared to the corrupted and reference studies was > 2. Motion correction could be achieved without inclusion of attenuation effects in the motion estimation stage, providing simpler implementation and greater efficiency. Moreover the additional improvement with multiple iterations of the approach was small. Improvement was also observed in the physical phantom data, though the technique appeared limited here by an object symmetry. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  9. Data-Driven Cyber-Physical Systems via Real-Time Stream Analytics and Machine Learning

    OpenAIRE

    Akkaya, Ilge

    2016-01-01

    Emerging distributed cyber-physical systems (CPSs) integrate a wide range of heterogeneous components that need to be orchestrated in a dynamic environment. While model-based techniques are commonly used in CPS design, they be- come inadequate in capturing the complexity as systems become larger and extremely dynamic. The adaptive nature of the systems makes data-driven approaches highly desirable, if not necessary.Traditionally, data-driven systems utilize large volumes of static data sets t...

  10. Dynamically adaptive data-driven simulation of extreme hydrological flows

    Science.gov (United States)

    Kumar Jain, Pushkar; Mandli, Kyle; Hoteit, Ibrahim; Knio, Omar; Dawson, Clint

    2018-02-01

    Hydrological hazards such as storm surges, tsunamis, and rainfall-induced flooding are physically complex events that are costly in loss of human life and economic productivity. Many such disasters could be mitigated through improved emergency evacuation in real-time and through the development of resilient infrastructure based on knowledge of how systems respond to extreme events. Data-driven computational modeling is a critical technology underpinning these efforts. This investigation focuses on the novel combination of methodologies in forward simulation and data assimilation. The forward geophysical model utilizes adaptive mesh refinement (AMR), a process by which a computational mesh can adapt in time and space based on the current state of a simulation. The forward solution is combined with ensemble based data assimilation methods, whereby observations from an event are assimilated into the forward simulation to improve the veracity of the solution, or used to invert for uncertain physical parameters. The novelty in our approach is the tight two-way coupling of AMR and ensemble filtering techniques. The technology is tested using actual data from the Chile tsunami event of February 27, 2010. These advances offer the promise of significantly transforming data-driven, real-time modeling of hydrological hazards, with potentially broader applications in other science domains.

  11. Dynamically adaptive data-driven simulation of extreme hydrological flows

    KAUST Repository

    Kumar Jain, Pushkar

    2017-12-27

    Hydrological hazards such as storm surges, tsunamis, and rainfall-induced flooding are physically complex events that are costly in loss of human life and economic productivity. Many such disasters could be mitigated through improved emergency evacuation in real-time and through the development of resilient infrastructure based on knowledge of how systems respond to extreme events. Data-driven computational modeling is a critical technology underpinning these efforts. This investigation focuses on the novel combination of methodologies in forward simulation and data assimilation. The forward geophysical model utilizes adaptive mesh refinement (AMR), a process by which a computational mesh can adapt in time and space based on the current state of a simulation. The forward solution is combined with ensemble based data assimilation methods, whereby observations from an event are assimilated into the forward simulation to improve the veracity of the solution, or used to invert for uncertain physical parameters. The novelty in our approach is the tight two-way coupling of AMR and ensemble filtering techniques. The technology is tested using actual data from the Chile tsunami event of February 27, 2010. These advances offer the promise of significantly transforming data-driven, real-time modeling of hydrological hazards, with potentially broader applications in other science domains.

  12. Data driven propulsion system weight prediction model

    Science.gov (United States)

    Gerth, Richard J.

    1994-10-01

    The objective of the research was to develop a method to predict the weight of paper engines, i.e., engines that are in the early stages of development. The impetus for the project was the Single Stage To Orbit (SSTO) project, where engineers need to evaluate alternative engine designs. Since the SSTO is a performance driven project the performance models for alternative designs were well understood. The next tradeoff is weight. Since it is known that engine weight varies with thrust levels, a model is required that would allow discrimination between engines that produce the same thrust. Above all, the model had to be rooted in data with assumptions that could be justified based on the data. The general approach was to collect data on as many existing engines as possible and build a statistical model of the engines weight as a function of various component performance parameters. This was considered a reasonable level to begin the project because the data would be readily available, and it would be at the level of most paper engines, prior to detailed component design.

  13. Data driven CAN node reliability assessment for manufacturing system

    Science.gov (United States)

    Zhang, Leiming; Yuan, Yong; Lei, Yong

    2017-01-01

    The reliability of the Controller Area Network(CAN) is critical to the performance and safety of the system. However, direct bus-off time assessment tools are lacking in practice due to inaccessibility of the node information and the complexity of the node interactions upon errors. In order to measure the mean time to bus-off(MTTB) of all the nodes, a novel data driven node bus-off time assessment method for CAN network is proposed by directly using network error information. First, the corresponding network error event sequence for each node is constructed using multiple-layer network error information. Then, the generalized zero inflated Poisson process(GZIP) model is established for each node based on the error event sequence. Finally, the stochastic model is constructed to predict the MTTB of the node. The accelerated case studies with different error injection rates are conducted on a laboratory network to demonstrate the proposed method, where the network errors are generated by a computer controlled error injection system. Experiment results show that the MTTB of nodes predicted by the proposed method agree well with observations in the case studies. The proposed data driven node time to bus-off assessment method for CAN networks can successfully predict the MTTB of nodes by directly using network error event data.

  14. Minimization of energy consumption in HVAC systems with data-driven models and an interior-point method

    International Nuclear Information System (INIS)

    Kusiak, Andrew; Xu, Guanglin; Zhang, Zijun

    2014-01-01

    Highlights: • We study the energy saving of HVAC systems with a data-driven approach. • We conduct an in-depth analysis of the topology of developed Neural Network based HVAC model. • We apply interior-point method to solving a Neural Network based HVAC optimization model. • The uncertain building occupancy is incorporated in the minimization of HVAC energy consumption. • A significant potential of saving HVAC energy is discovered. - Abstract: In this paper, a data-driven approach is applied to minimize energy consumption of a heating, ventilating, and air conditioning (HVAC) system while maintaining the thermal comfort of a building with uncertain occupancy level. The uncertainty of arrival and departure rate of occupants is modeled by the Poisson and uniform distributions, respectively. The internal heating gain is calculated from the stochastic process of the building occupancy. Based on the observed and simulated data, a multilayer perceptron algorithm is employed to model and simulate the HVAC system. The data-driven models accurately predict future performance of the HVAC system based on the control settings and the observed historical information. An optimization model is formulated and solved with the interior-point method. The optimization results are compared with the results produced by the simulation models

  15. Data-driven forward model inference for EEG brain imaging

    DEFF Research Database (Denmark)

    Hansen, Sofie Therese; Hauberg, Søren; Hansen, Lars Kai

    2016-01-01

    Electroencephalography (EEG) is a flexible and accessible tool with excellent temporal resolution but with a spatial resolution hampered by volume conduction. Reconstruction of the cortical sources of measured EEG activity partly alleviates this problem and effectively turns EEG into a brain......-of-concept study, we show that, even when anatomical knowledge is unavailable, a suitable forward model can be estimated directly from the EEG. We propose a data-driven approach that provides a low-dimensional parametrization of head geometry and compartment conductivities, built using a corpus of forward models....... Combined with only a recorded EEG signal, we are able to estimate both the brain sources and a person-specific forward model by optimizing this parametrization. We thus not only solve an inverse problem, but also optimize over its specification. Our work demonstrates that personalized EEG brain imaging...

  16. Data-Driven Assistance Functions for Industrial Automation Systems

    International Nuclear Information System (INIS)

    Windmann, Stefan; Niggemann, Oliver

    2015-01-01

    The increasing amount of data in industrial automation systems overburdens the user in process control and diagnosis tasks. One possibility to cope with these challenges consists of using smart assistance systems that automatically monitor and optimize processes. This article deals with aspects of data-driven assistance systems such as assistance functions, process models and data acquisition. The paper describes novel approaches for self-diagnosis and self-optimization, and shows how these assistance functions can be integrated in different industrial environments. The considered assistance functions are based on process models that are automatically learned from process data. Fault detection and isolation is based on the comparison of observations of the real system with predictions obtained by application of the process models. The process models are further employed for energy efficiency optimization of industrial processes. Experimental results are presented for fault detection and energy efficiency optimization of a drive system. (paper)

  17. Data-driven discovery of Koopman eigenfunctions using deep learning

    Science.gov (United States)

    Lusch, Bethany; Brunton, Steven L.; Kutz, J. Nathan

    2017-11-01

    Koopman operator theory transforms any autonomous non-linear dynamical system into an infinite-dimensional linear system. Since linear systems are well-understood, a mapping of non-linear dynamics to linear dynamics provides a powerful approach to understanding and controlling fluid flows. However, finding the correct change of variables remains an open challenge. We present a strategy to discover an approximate mapping using deep learning. Our neural networks find this change of variables, its inverse, and a finite-dimensional linear dynamical system defined on the new variables. Our method is completely data-driven and only requires measurements of the system, i.e. it does not require derivatives or knowledge of the governing equations. We find a minimal set of approximate Koopman eigenfunctions that are sufficient to reconstruct and advance the system to future states. We demonstrate the method on several dynamical systems.

  18. Econophysics and Data Driven Modelling of Market Dynamics

    CERN Document Server

    Aoyama, Hideaki; Chakrabarti, Bikas; Chakraborti, Anirban; Ghosh, Asim; Econophysics and Data Driven Modelling of Market Dynamics

    2015-01-01

    This book presents the works and research findings of physicists, economists, mathematicians, statisticians, and financial engineers who have undertaken data-driven modelling of market dynamics and other empirical studies in the field of Econophysics. During recent decades, the financial market landscape has changed dramatically with the deregulation of markets and the growing complexity of products. The ever-increasing speed and decreasing costs of computational power and networks have led to the emergence of huge databases. The availability of these data should permit the development of models that are better founded empirically, and econophysicists have accordingly been advocating that one should rely primarily on the empirical observations in order to construct models and validate them. The recent turmoil in financial markets and the 2008 crash appear to offer a strong rationale for new models and approaches. The Econophysics community accordingly has an important future role to play in market modelling....

  19. The effects of data-driven learning activities on EFL learners' writing development.

    Science.gov (United States)

    Luo, Qinqin

    2016-01-01

    Data-driven learning has been proved as an effective approach in helping learners solve various writing problems such as correcting lexical or grammatical errors, improving the use of collocations and generating ideas in writing, etc. This article reports on an empirical study in which data-driven learning was accomplished with the assistance of the user-friendly BNCweb, and presents the evaluation of the outcome by comparing the effectiveness of BNCweb and a search engine Baidu which is most commonly used as reference resource by Chinese learners of English as a foreign language. The quantitative results about 48 Chinese college students revealed that the experimental group which used BNCweb performed significantly better in the post-test in terms of writing fluency and accuracy, as compared with the control group which used the search engine Baidu. However, no significant difference was found between the two groups in terms of writing complexity. The qualitative results about the interview revealed that learners generally showed a positive attitude toward the use of BNCweb but there were still some problems of using corpora in the writing process, thus the combined use of corpora and other types of reference resource was suggested as a possible way to counter the potential barriers for Chinese learners of English.

  20. Observer and data-driven model based fault detection in Power Plant Coal Mills

    DEFF Research Database (Denmark)

    Fogh Odgaard, Peter; Lin, Bao; Jørgensen, Sten Bay

    2008-01-01

    model with motor power as the controlled variable, data-driven methods for fault detection are also investigated. Regression models that represent normal operating conditions (NOCs) are developed with both static and dynamic principal component analysis and partial least squares methods. The residual...... between process measurement and the NOC model prediction is used for fault detection. A hybrid approach, where a data-driven model is employed to derive an optimal unknown input observer, is also implemented. The three methods are evaluated with case studies on coal mill data, which includes a fault......This paper presents and compares model-based and data-driven fault detection approaches for coal mill systems. The first approach detects faults with an optimal unknown input observer developed from a simplified energy balance model. Due to the time-consuming effort in developing a first principles...

  1. Data-driven simulation methodology using DES 4-layer architecture

    Directory of Open Access Journals (Sweden)

    Aida Saez

    2016-05-01

    Full Text Available In this study, we present a methodology to build data-driven simulation models of manufacturing plants. We go further than other research proposals and we suggest focusing simulation model development under a 4-layer architecture (network, logic, database and visual reality. The Network layer includes system infrastructure. The Logic layer covers operations planning and control system, and material handling equipment system. The Database holds all the information needed to perform the simulation, the results used to analyze and the values that the Logic layer is using to manage the Plant. Finally, the Visual Reality displays an augmented reality system including not only the machinery and the movement but also blackboards and other Andon elements. This architecture provides numerous advantages as helps to build a simulation model that consistently considers the internal logistics, in a very flexible way.

  2. Data-driven system to predict academic grades and dropout

    Science.gov (United States)

    Rovira, Sergi; Puertas, Eloi

    2017-01-01

    Nowadays, the role of a tutor is more important than ever to prevent students dropout and improve their academic performance. This work proposes a data-driven system to extract relevant information hidden in the student academic data and, thus, help tutors to offer their pupils a more proactive personal guidance. In particular, our system, based on machine learning techniques, makes predictions of dropout intention and courses grades of students, as well as personalized course recommendations. Moreover, we present different visualizations which help in the interpretation of the results. In the experimental validation, we show that the system obtains promising results with data from the degree studies in Law, Computer Science and Mathematics of the Universitat de Barcelona. PMID:28196078

  3. Data-driven architectural design to production and operation

    NARCIS (Netherlands)

    Bier, H.H.; Mostafavi, S.

    2015-01-01

    Data-driven architectural production and operation explored within Hyperbody rely heavily on system thinking implying that all parts of a system are to be understood in relation to each other. These relations are established bi-directionally so that data-driven architecture is not only produced

  4. Perspectives of data-driven LPV modeling of high-purity distillation columns

    NARCIS (Netherlands)

    Bachnas, A.A.; Toth, R.; Mesbah, A.; Ludlage, J.H.A.

    2013-01-01

    Abstract—This paper investigates data-driven, Linear- Parameter-Varying (LPV) modeling of a high-purity distillation column. Two LPV modeling approaches are studied: a local approach, corresponding to the interpolation of Linear Time- Invariant (LTI) models identified at steady-state purity levels,

  5. Data driven mathematical models for policy making

    OpenAIRE

    Nannyonga, Betty

    2011-01-01

    This thesis consists of two papers. 1. Betty Nannyonga, D.J.T. Sumpter, J.Y.T. Mugisha and L.S. Luboobi: The Dynamics,causes and possible prevention of Hepaititis E outbreaks. 2. Betty Nannyonga, D.J.T. Sumpter, andStam Nicolis: A dynamical systems approach tosocial and economic development. The first paper deals with a deterministic approach of modelling a Hepatitis E outbreak whenmalaria is endemic in a population. We design three models based on the epidemiology ofHepatitis E, malaria, and...

  6. Dynamic Data Driven Applications Systems (DDDAS)

    Science.gov (United States)

    2012-05-03

    response) – Earthquakes, hurricanes, tornados, wildfires, floods, landslides, tsunamis, … • Critical Infrastructure systems – Electric-powergrid...Multiphase Flow Weather and Climate Structural Mechanics Seismic Processing Aerodynamics Geophysical Fluids Quantum Chemistry Actinide Chemistry...Alloys • Approach and Objectives:  Consider porous SMAs:  similar macroscopic behavior but mass /weight is less, and thus attractive for

  7. Data-driven wind plant control

    NARCIS (Netherlands)

    Gebraad, P.M.O.

    2014-01-01

    Each wind turbine in a cluster of wind turbines (a wind power plant) can influence the performance of other turbines through the wake that forms downstream of its rotor. The wake has a reduced wind velocity, since the turbine extracts energy from the flow, and the obstruction by the wind turbine

  8. Evidence-based and data-driven road safety management

    Directory of Open Access Journals (Sweden)

    Fred Wegman

    2015-07-01

    Full Text Available Over the past decades, road safety in highly-motorised countries has made significant progress. Although we have a fair understanding of the reasons for this progress, we don't have conclusive evidence for this. A new generation of road safety management approaches has entered road safety, starting when countries decided to guide themselves by setting quantitative targets (e.g. 50% less casualties in ten years' time. Setting realistic targets, designing strategies and action plans to achieve these targets and monitoring progress have resulted in more scientific research to support decision-making on these topics. Three subjects are key in this new approach of evidence-based and data-driven road safety management: ex-post and ex-ante evaluation of both individual interventions and intervention packages in road safety strategies, and transferability (external validity of the research results. In this article, we explore these subjects based on recent experiences in four jurisdictions (Western Australia, the Netherlands, Sweden and Switzerland. All four apply similar approaches and tools; differences are considered marginal. It is concluded that policy-making and political decisions were influenced to a great extent by the results of analysis and research. Nevertheless, to compensate for a relatively weak theoretical basis and to improve the power of this new approach, a number of issues will need further research. This includes ex-post and ex-ante evaluation, a better understanding of extrapolation of historical trends and the transferability of research results. This new approach cannot be realized without high-quality road safety data. Good data and knowledge are indispensable for this new and very promising approach.

  9. Global retrieval of soil moisture and vegetation properties using data-driven methods

    Science.gov (United States)

    Rodriguez-Fernandez, Nemesio; Richaume, Philippe; Kerr, Yann

    2017-04-01

    Data-driven methods such as neural networks (NNs) are a powerful tool to retrieve soil moisture from multi-wavelength remote sensing observations at global scale. In this presentation we will review a number of recent results regarding the retrieval of soil moisture with the Soil Moisture and Ocean Salinity (SMOS) satellite, either using SMOS brightness temperatures as input data for the retrieval or using SMOS soil moisture retrievals as reference dataset for the training. The presentation will discuss several possibilities for both the input datasets and the datasets to be used as reference for the supervised learning phase. Regarding the input datasets, it will be shown that NNs take advantage of the synergy of SMOS data and data from other sensors such as the Advanced Scatterometer (ASCAT, active microwaves) and MODIS (visible and infra red). NNs have also been successfully used to construct long time series of soil moisture from the Advanced Microwave Scanning Radiometer - Earth Observing System (AMSR-E) and SMOS. A NN with input data from ASMR-E observations and SMOS soil moisture as reference for the training was used to construct a dataset sharing a similar climatology and without a significant bias with respect to SMOS soil moisture. Regarding the reference data to train the data-driven retrievals, we will show different possibilities depending on the application. Using actual in situ measurements is challenging at global scale due to the scarce distribution of sensors. In contrast, in situ measurements have been successfully used to retrieve SM at continental scale in North America, where the density of in situ measurement stations is high. Using global land surface models to train the NN constitute an interesting alternative to implement new remote sensing surface datasets. In addition, these datasets can be used to perform data assimilation into the model used as reference for the training. This approach has recently been tested at the European Centre

  10. Data-driven non-Markovian closure models

    Science.gov (United States)

    Kondrashov, Dmitri; Chekroun, Mickaël D.; Ghil, Michael

    2015-03-01

    This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori-Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the MZ formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of MSM applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lotka-Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model's parameter

  11. Data-Driven Exercises for Chemistry: A New Digital Collection

    Science.gov (United States)

    Grubbs, W. Tandy

    2007-01-01

    The analysis presents a new digital collection for various data-driven exercises that are used for teaching chemistry to the students. Such methods are expected to help the students to think in a more scientific manner.

  12. Data-Driven Model Order Reduction for Bayesian Inverse Problems

    KAUST Repository

    Cui, Tiangang; Youssef, Marzouk; Willcox, Karen

    2014-01-01

    One of the major challenges in using MCMC for the solution of inverse problems is the repeated evaluation of computationally expensive numerical models. We develop a data-driven projection- based model order reduction technique to reduce

  13. Dynamically adaptive data-driven simulation of extreme hydrological flows

    KAUST Repository

    Kumar Jain, Pushkar; Mandli, Kyle; Hoteit, Ibrahim; Knio, Omar; Dawson, Clint

    2017-01-01

    evacuation in real-time and through the development of resilient infrastructure based on knowledge of how systems respond to extreme events. Data-driven computational modeling is a critical technology underpinning these efforts. This investigation focuses

  14. A data-driven weighting scheme for multivariate phenotypic endpoints recapitulates zebrafish developmental cascades

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Guozhu, E-mail: gzhang6@ncsu.edu [Bioinformatics Research Center, North Carolina State University, Raleigh, NC (United States); Roell, Kyle R., E-mail: krroell@ncsu.edu [Bioinformatics Research Center, North Carolina State University, Raleigh, NC (United States); Truong, Lisa, E-mail: lisa.truong@oregonstate.edu [Department of Environmental and Molecular Toxicology, Sinnhuber Aquatic Research Laboratory, Oregon State University, Corvallis, OR (United States); Tanguay, Robert L., E-mail: robert.tanguay@oregonstate.edu [Department of Environmental and Molecular Toxicology, Sinnhuber Aquatic Research Laboratory, Oregon State University, Corvallis, OR (United States); Reif, David M., E-mail: dmreif@ncsu.edu [Bioinformatics Research Center, North Carolina State University, Raleigh, NC (United States); Department of Biological Sciences, Center for Human Health and the Environment, North Carolina State University, Raleigh, NC (United States)

    2017-01-01

    Zebrafish have become a key alternative model for studying health effects of environmental stressors, partly due to their genetic similarity to humans, fast generation time, and the efficiency of generating high-dimensional systematic data. Studies aiming to characterize adverse health effects in zebrafish typically include several phenotypic measurements (endpoints). While there is a solid biomedical basis for capturing a comprehensive set of endpoints, making summary judgments regarding health effects requires thoughtful integration across endpoints. Here, we introduce a Bayesian method to quantify the informativeness of 17 distinct zebrafish endpoints as a data-driven weighting scheme for a multi-endpoint summary measure, called weighted Aggregate Entropy (wAggE). We implement wAggE using high-throughput screening (HTS) data from zebrafish exposed to five concentrations of all 1060 ToxCast chemicals. Our results show that our empirical weighting scheme provides better performance in terms of the Receiver Operating Characteristic (ROC) curve for identifying significant morphological effects and improves robustness over traditional curve-fitting approaches. From a biological perspective, our results suggest that developmental cascade effects triggered by chemical exposure can be recapitulated by analyzing the relationships among endpoints. Thus, wAggE offers a powerful approach for analysis of multivariate phenotypes that can reveal underlying etiological processes. - Highlights: • Introduced a data-driven weighting scheme for multiple phenotypic endpoints. • Weighted Aggregate Entropy (wAggE) implies differential importance of endpoints. • Endpoint relationships reveal developmental cascade effects triggered by exposure. • wAggE is generalizable to multi-endpoint data of different shapes and scales.

  15. Data-driven imaging in anisotropic media

    Energy Technology Data Exchange (ETDEWEB)

    Volker, Arno; Hunter, Alan [TNO Stieltjes weg 1, 2600 AD, Delft (Netherlands)

    2012-05-17

    Anisotropic materials are being used increasingly in high performance industrial applications, particularly in the aeronautical and nuclear industries. Some important examples of these materials are composites, single-crystal and heavy-grained metals. Ultrasonic array imaging in these materials requires exact knowledge of the anisotropic material properties. Without this information, the images can be adversely affected, causing a reduction in defect detection and characterization performance. The imaging operation can be formulated in two consecutive and reciprocal focusing steps, i.e., focusing the sources and then focusing the receivers. Applying just one of these focusing steps yields an interesting intermediate domain. The resulting common focus point gather (CFP-gather) can be interpreted to determine the propagation operator. After focusing the sources, the observed travel-time in the CFP-gather describes the propagation from the focus point to the receivers. If the correct propagation operator is used, the measured travel-times should be the same as the time-reversed focusing operator due to reciprocity. This makes it possible to iteratively update the focusing operator using the data only and allows the material to be imaged without explicit knowledge of the anisotropic material parameters. Furthermore, the determined propagation operator can also be used to invert for the anisotropic medium parameters. This paper details the proposed technique and demonstrates its use on simulated array data from a specimen of Inconel single-crystal alloy commonly used in the aeronautical and nuclear industries.

  16. Data-driven diagnostics of terrestrial carbon dynamics over North America

    Science.gov (United States)

    Jingfeng Xiao; Scott V. Ollinger; Steve Frolking; George C. Hurtt; David Y. Hollinger; Kenneth J. Davis; Yude Pan; Xiaoyang Zhang; Feng Deng; Jiquan Chen; Dennis D. Baldocchi; Bevery E. Law; M. Altaf Arain; Ankur R. Desai; Andrew D. Richardson; Ge Sun; Brian Amiro; Hank Margolis; Lianhong Gu; Russell L. Scott; Peter D. Blanken; Andrew E. Suyker

    2014-01-01

    The exchange of carbon dioxide is a key measure of ecosystem metabolism and a critical intersection between the terrestrial biosphere and the Earth's climate. Despite the general agreement that the terrestrial ecosystems in North America provide a sizeable carbon sink, the size and distribution of the sink remain uncertain. We use a data-driven approach to upscale...

  17. The Role of Guided Induction in Paper-Based Data-Driven Learning

    Science.gov (United States)

    Smart, Jonathan

    2014-01-01

    This study examines the role of guided induction as an instructional approach in paper-based data-driven learning (DDL) in the context of an ESL grammar course during an intensive English program at an American public university. Specifically, it examines whether corpus-informed grammar instruction is more effective through inductive, data-driven…

  18. Data-driven Development of ROTEM and TEG Algorithms for the Management of Trauma Hemorrhage

    DEFF Research Database (Denmark)

    Baksaas-Aasen, Kjersti; Van Dieren, Susan; Balvers, Kirsten

    2018-01-01

    for ROTEM, TEG, and CCTs to be used in addition to ratio driven transfusion and tranexamic acid. CONCLUSIONS: We describe a systematic approach to define threshold parameters for ROTEM and TEG. These parameters were incorporated into algorithms to support data-driven adjustments of resuscitation...

  19. Thermodynamically consistent data-driven computational mechanics

    Science.gov (United States)

    González, David; Chinesta, Francisco; Cueto, Elías

    2018-05-01

    In the paradigm of data-intensive science, automated, unsupervised discovering of governing equations for a given physical phenomenon has attracted a lot of attention in several branches of applied sciences. In this work, we propose a method able to avoid the identification of the constitutive equations of complex systems and rather work in a purely numerical manner by employing experimental data. In sharp contrast to most existing techniques, this method does not rely on the assumption on any particular form for the model (other than some fundamental restrictions placed by classical physics such as the second law of thermodynamics, for instance) nor forces the algorithm to find among a predefined set of operators those whose predictions fit best to the available data. Instead, the method is able to identify both the Hamiltonian (conservative) and dissipative parts of the dynamics while satisfying fundamental laws such as energy conservation or positive production of entropy, for instance. The proposed method is tested against some examples of discrete as well as continuum mechanics, whose accurate results demonstrate the validity of the proposed approach.

  20. Data Driven, Force Based Interaction for Quadrotors

    Science.gov (United States)

    McKinnon, Christopher D.

    Quadrotors are small and agile, and are becoming more capable for their compact size. They are expected perform a wide variety of tasks including inspection, physical interaction, and formation flight. In all of these tasks, the quadrotors can come into close proximity with infrastructure or other quadrotors, and may experience significant external forces and torques. Reacting properly in each case is essential to completing the task safely and effectively. In this thesis, we develop an algorithm, based on the Unscented Kalman Filter, to estimate such forces and torques without making assumptions about the source of the forces and torques. We then show in experiment how the proposed estimation algorithm can be used in conjunction with controls and machine learning to choose the appropriate actions in a wide variety of tasks including detecting downwash, tracking the wind induced by a fan, and detecting proximity to the wall.

  1. Data-driven in computational plasticity

    Science.gov (United States)

    Ibáñez, R.; Abisset-Chavanne, E.; Cueto, E.; Chinesta, F.

    2018-05-01

    Computational mechanics is taking an enormous importance in industry nowadays. On one hand, numerical simulations can be seen as a tool that allows the industry to perform fewer experiments, reducing costs. On the other hand, the physical processes that are intended to be simulated are becoming more complex, requiring new constitutive relationships to capture such behaviors. Therefore, when a new material is intended to be classified, an open question still remains: which constitutive equation should be calibrated. In the present work, the use of model order reduction techniques are exploited to identify the plastic behavior of a material, opening an alternative route with respect to traditional calibration methods. Indeed, the main objective is to provide a plastic yield function such that the mismatch between experiments and simulations is minimized. Therefore, once the experimental results just like the parameterization of the plastic yield function are provided, finding the optimal plastic yield function can be seen either as a traditional optimization or interpolation problem. It is important to highlight that the dimensionality of the problem is equal to the number of dimensions related to the parameterization of the yield function. Thus, the use of sparse interpolation techniques seems almost compulsory.

  2. Data-driven modeling, control and tools for cyber-physical energy systems

    Science.gov (United States)

    Behl, Madhur

    Energy systems are experiencing a gradual but substantial change in moving away from being non-interactive and manually-controlled systems to utilizing tight integration of both cyber (computation, communications, and control) and physical representations guided by first principles based models, at all scales and levels. Furthermore, peak power reduction programs like demand response (DR) are becoming increasingly important as the volatility on the grid continues to increase due to regulation, integration of renewables and extreme weather conditions. In order to shield themselves from the risk of price volatility, end-user electricity consumers must monitor electricity prices and be flexible in the ways they choose to use electricity. This requires the use of control-oriented predictive models of an energy system's dynamics and energy consumption. Such models are needed for understanding and improving the overall energy efficiency and operating costs. However, learning dynamical models using grey/white box approaches is very cost and time prohibitive since it often requires significant financial investments in retrofitting the system with several sensors and hiring domain experts for building the model. We present the use of data-driven methods for making model capture easy and efficient for cyber-physical energy systems. We develop Model-IQ, a methodology for analysis of uncertainty propagation for building inverse modeling and controls. Given a grey-box model structure and real input data from a temporary set of sensors, Model-IQ evaluates the effect of the uncertainty propagation from sensor data to model accuracy and to closed-loop control performance. We also developed a statistical method to quantify the bias in the sensor measurement and to determine near optimal sensor placement and density for accurate data collection for model training and control. Using a real building test-bed, we show how performing an uncertainty analysis can reveal trends about

  3. Data driven fault detection and isolation: a wind turbine scenario

    Directory of Open Access Journals (Sweden)

    Rubén Francisco Manrique Piramanrique

    2015-04-01

    Full Text Available One of the greatest drawbacks in wind energy generation is the high maintenance cost associated to mechanical faults. This problem becomes more evident in utility scale wind turbines, where the increased size and nominal capacity comes with additional problems associated with structural vibrations and aeroelastic effects in the blades. Due to the increased operation capability, it is imperative to detect system degradation and faults in an efficient manner, maintaining system integrity, reliability and reducing operation costs. This paper presents a comprehensive comparison of four different Fault Detection and Isolation (FDI filters based on “Data Driven” (DD techniques. In order to enhance FDI performance, a multi-level strategy is used where:  the first level detects the occurrence of any given fault (detection, while  the second identifies the source of the fault (isolation. Four different DD classification techniques (namely Support Vector Machines, Artificial Neural Networks, K Nearest Neighbors and Gaussian Mixture Models were studied and compared for each of the proposed classification levels. The best strategy at each level could be selected to build the final data driven FDI system. The performance of the proposed scheme is evaluated on a benchmark model of a commercial wind turbine. 

  4. Human body segmentation via data-driven graph cut.

    Science.gov (United States)

    Li, Shifeng; Lu, Huchuan; Shao, Xingqing

    2014-11-01

    Human body segmentation is a challenging and important problem in computer vision. Existing methods usually entail a time-consuming training phase for prior knowledge learning with complex shape matching for body segmentation. In this paper, we propose a data-driven method that integrates top-down body pose information and bottom-up low-level visual cues for segmenting humans in static images within the graph cut framework. The key idea of our approach is first to exploit human kinematics to search for body part candidates via dynamic programming for high-level evidence. Then, by using the body parts classifiers, obtaining bottom-up cues of human body distribution for low-level evidence. All the evidence collected from top-down and bottom-up procedures are integrated in a graph cut framework for human body segmentation. Qualitative and quantitative experiment results demonstrate the merits of the proposed method in segmenting human bodies with arbitrary poses from cluttered backgrounds.

  5. Data-driven classification of patients with primary progressive aphasia.

    Science.gov (United States)

    Hoffman, Paul; Sajjadi, Seyed Ahmad; Patterson, Karalyn; Nestor, Peter J

    2017-11-01

    Current diagnostic criteria classify primary progressive aphasia into three variants-semantic (sv), nonfluent (nfv) and logopenic (lv) PPA-though the adequacy of this scheme is debated. This study took a data-driven approach, applying k-means clustering to data from 43 PPA patients. The algorithm grouped patients based on similarities in language, semantic and non-linguistic cognitive scores. The optimum solution consisted of three groups. One group, almost exclusively those diagnosed as svPPA, displayed a selective semantic impairment. A second cluster, with impairments to speech production, repetition and syntactic processing, contained a majority of patients with nfvPPA but also some lvPPA patients. The final group exhibited more severe deficits to speech, repetition and syntax as well as semantic and other cognitive deficits. These results suggest that, amongst cases of non-semantic PPA, differentiation mainly reflects overall degree of language/cognitive impairment. The observed patterns were scarcely affected by inclusion/exclusion of non-linguistic cognitive scores. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  6. EXPLORING DATA-DRIVEN SPECTRAL MODELS FOR APOGEE M DWARFS

    Science.gov (United States)

    Lua Birky, Jessica; Hogg, David; Burgasser, Adam J.; Jessica Birky

    2018-01-01

    The Cannon (Ness et al. 2015; Casey et al. 2016) is a flexible, data-driven spectral modeling and parameter inference framework, demonstrated on high-resolution Apache Point Galactic Evolution Experiment (APOGEE; λ/Δλ~22,500, 1.5-1.7µm) spectra of giant stars to estimate stellar labels (Teff, logg, [Fe/H], and chemical abundances) to precisions higher than the model-grid pipeline. The lack of reliable stellar parameters reported by the APOGEE pipeline for temperatures less than ~3550K, motivates extension of this approach to M dwarf stars. Using a training set of 51 M dwarfs with spectral types ranging M0-M9 obtained from SDSS optical spectra, we demonstrate that the Cannon can infer spectral types to a precision of +/-0.6 types, making it an effective tool for classifying high-resolution near-infrared spectra. We discuss the potential for extending this work to determine the physical stellar labels Teff, logg, and [Fe/H].This work is supported by the SDSS Faculty and Student (FAST) initiative.

  7. Biogeochemical typing of paddy field by a data-driven approach revealing sub-systems within a complex environment--a pipeline to filtrate, organize and frame massive dataset from multi-omics analyses.

    Directory of Open Access Journals (Sweden)

    Diogo M O Ogawa

    Full Text Available We propose the technique of biogeochemical typing (BGC typing as a novel methodology to set forth the sub-systems of organismal communities associated to the correlated chemical profiles working within a larger complex environment. Given the intricate characteristic of both organismal and chemical consortia inherent to the nature, many environmental studies employ the holistic approach of multi-omics analyses undermining as much information as possible. Due to the massive amount of data produced applying multi-omics analyses, the results are hard to visualize and to process. The BGC typing analysis is a pipeline built using integrative statistical analysis that can treat such huge datasets filtering, organizing and framing the information based on the strength of the various mutual trends of the organismal and chemical fluctuations occurring simultaneously in the environment. To test our technique of BGC typing, we choose a rich environment abounding in chemical nutrients and organismal diversity: the surficial freshwater from Japanese paddy fields and surrounding waters. To identify the community consortia profile we employed metagenomics as high throughput sequencing (HTS for the fragments amplified from Archaea rRNA, universal 16S rRNA and 18S rRNA; to assess the elemental content we employed ionomics by inductively coupled plasma optical emission spectroscopy (ICP-OES; and for the organic chemical profile, metabolomics employing both Fourier transformed infrared (FT-IR spectroscopy and proton nuclear magnetic resonance (1H-NMR all these analyses comprised our multi-omics dataset. The similar trends between the community consortia against the chemical profiles were connected through correlation. The result was then filtered, organized and framed according to correlation strengths and peculiarities. The output gave us four BGC types displaying uniqueness in community and chemical distribution, diversity and richness. We conclude therefore that

  8. Data-Driven Model Uncertainty Estimation in Hydrologic Data Assimilation

    Science.gov (United States)

    Pathiraja, S.; Moradkhani, H.; Marshall, L.; Sharma, A.; Geenens, G.

    2018-02-01

    The increasing availability of earth observations necessitates mathematical methods to optimally combine such data with hydrologic models. Several algorithms exist for such purposes, under the umbrella of data assimilation (DA). However, DA methods are often applied in a suboptimal fashion for complex real-world problems, due largely to several practical implementation issues. One such issue is error characterization, which is known to be critical for a successful assimilation. Mischaracterized errors lead to suboptimal forecasts, and in the worst case, to degraded estimates even compared to the no assimilation case. Model uncertainty characterization has received little attention relative to other aspects of DA science. Traditional methods rely on subjective, ad hoc tuning factors or parametric distribution assumptions that may not always be applicable. We propose a novel data-driven approach (named SDMU) to model uncertainty characterization for DA studies where (1) the system states are partially observed and (2) minimal prior knowledge of the model error processes is available, except that the errors display state dependence. It includes an approach for estimating the uncertainty in hidden model states, with the end goal of improving predictions of observed variables. The SDMU is therefore suited to DA studies where the observed variables are of primary interest. Its efficacy is demonstrated through a synthetic case study with low-dimensional chaotic dynamics and a real hydrologic experiment for one-day-ahead streamflow forecasting. In both experiments, the proposed method leads to substantial improvements in the hidden states and observed system outputs over a standard method involving perturbation with Gaussian noise.

  9. EEG-based functional networks evoked by acupuncture at ST 36: A data-driven thresholding study

    Science.gov (United States)

    Li, Huiyan; Wang, Jiang; Yi, Guosheng; Deng, Bin; Zhou, Hexi

    2017-10-01

    This paper investigates how acupuncture at ST 36 modulates the brain functional network. 20 channel EEG signals from 15 healthy subjects are respectively recorded before, during and after acupuncture. The correlation between two EEG channels is calculated by using Pearson’s coefficient. A data-driven approach is applied to determine the threshold, which is performed by considering the connected set, connected edge and network connectivity. Based on such thresholding approach, the functional network in each acupuncture period is built with graph theory, and the associated functional connectivity is determined. We show that acupuncturing at ST 36 increases the connectivity of the EEG-based functional network, especially for the long distance ones between two hemispheres. The properties of the functional network in five EEG sub-bands are also characterized. It is found that the delta and gamma bands are affected more obviously by acupuncture than the other sub-bands. These findings highlight the modulatory effects of acupuncture on the EEG-based functional connectivity, which is helpful for us to understand how it participates in the cortical or subcortical activities. Further, the data-driven threshold provides an alternative approach to infer the functional connectivity under other physiological conditions.

  10. Neutrino Mass Matrix Textures: A Data-driven Approach

    CERN Document Server

    Bertuzzo, E; Machado, P A N

    2013-01-01

    We analyze the neutrino mass matrix entries and their correlations in a probabilistic fashion, constructing probability distribution functions using the latest results from neutrino oscillation fits. Two cases are considered: the standard three neutrino scenario as well as the inclusion of a new sterile neutrino that potentially explains the reactor and gallium anomalies. We discuss the current limits and future perspectives on the mass matrix elements that can be useful for model building.

  11. Using Shape Memory Alloys: A Dynamic Data Driven Approach

    KAUST Repository

    Douglas, Craig C.; Calo, Victor M.; Cerwinsky, Derrick; Deng, Li; Efendiev, Yalchin R.

    2013-01-01

    Shape Memory Alloys (SMAs) are capable of changing their crystallographic structure due to changes of either stress or temperature. SMAs are used in a number of aerospace devices and are required in some devices in exotic environments. We

  12. Data-Driven Approaches for Paraphrasing across Language Variations

    Science.gov (United States)

    Xu, Wei

    2014-01-01

    Our language changes very rapidly, accompanying political, social and cultural trends, as well as the evolution of science and technology. The Internet, especially the social media, has accelerated this process of change. This poses a severe challenge for both human beings and natural language processing (NLP) systems, which usually only model a…

  13. Temporal Data-Driven Sleep Scheduling and Spatial Data-Driven Anomaly Detection for Clustered Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Gang Li

    2016-09-01

    Full Text Available The spatial–temporal correlation is an important feature of sensor data in wireless sensor networks (WSNs. Most of the existing works based on the spatial–temporal correlation can be divided into two parts: redundancy reduction and anomaly detection. These two parts are pursued separately in existing works. In this work, the combination of temporal data-driven sleep scheduling (TDSS and spatial data-driven anomaly detection is proposed, where TDSS can reduce data redundancy. The TDSS model is inspired by transmission control protocol (TCP congestion control. Based on long and linear cluster structure in the tunnel monitoring system, cooperative TDSS and spatial data-driven anomaly detection are then proposed. To realize synchronous acquisition in the same ring for analyzing the situation of every ring, TDSS is implemented in a cooperative way in the cluster. To keep the precision of sensor data, spatial data-driven anomaly detection based on the spatial correlation and Kriging method is realized to generate an anomaly indicator. The experiment results show that cooperative TDSS can realize non-uniform sensing effectively to reduce the energy consumption. In addition, spatial data-driven anomaly detection is quite significant for maintaining and improving the precision of sensor data.

  14. Data-Driven Baseline Estimation of Residential Buildings for Demand Response

    Directory of Open Access Journals (Sweden)

    Saehong Park

    2015-09-01

    Full Text Available The advent of advanced metering infrastructure (AMI generates a large volume of data related with energy service. This paper exploits data mining approach for customer baseline load (CBL estimation in demand response (DR management. CBL plays a significant role in measurement and verification process, which quantifies the amount of demand reduction and authenticates the performance. The proposed data-driven baseline modeling is based on the unsupervised learning technique. Specifically we leverage both the self organizing map (SOM and K-means clustering for accurate estimation. This two-level approach efficiently reduces the large data set into representative weight vectors in SOM, and then these weight vectors are clustered by K-means clustering to find the load pattern that would be similar to the potential load pattern of the DR event day. To verify the proposed method, we conduct nationwide scale experiments where three major cities’ residential consumption is monitored by smart meters. Our evaluation compares the proposed solution with the various types of day matching techniques, showing that our approach outperforms the existing methods by up to a 68.5% lower error rate.

  15. Data-Driven Handover Optimization in Next Generation Mobile Communication Networks

    Directory of Open Access Journals (Sweden)

    Po-Chiang Lin

    2016-01-01

    Full Text Available Network densification is regarded as one of the important ingredients to increase capacity for next generation mobile communication networks. However, it also leads to mobility problems since users are more likely to hand over to another cell in dense or even ultradense mobile communication networks. Therefore, supporting seamless and robust connectivity through such networks becomes a very important issue. In this paper, we investigate handover (HO optimization in next generation mobile communication networks. We propose a data-driven handover optimization (DHO approach, which aims to mitigate mobility problems including too-late HO, too-early HO, HO to wrong cell, ping-pong HO, and unnecessary HO. The key performance indicator (KPI is defined as the weighted average of the ratios of these mobility problems. The DHO approach collects data from the mobile communication measurement results and provides a model to estimate the relationship between the KPI and features from the collected dataset. Based on the model, the handover parameters, including the handover margin and time-to-trigger, are optimized to minimize the KPI. Simulation results show that the proposed DHO approach could effectively mitigate mobility problems.

  16. Data-driven performance evaluation method for CMS RPC trigger ...

    Indian Academy of Sciences (India)

    level triggers, to handle the large stream of data produced in collision. The information transmitted from the three muon subsystems (DT, CSC and RPC) are collected by the Global Muon Trigger (GMT) Board and merged. A method for evaluating ...

  17. SIDEKICK: Genomic data driven analysis and decision-making framework

    Directory of Open Access Journals (Sweden)

    Yoon Kihoon

    2010-12-01

    Full Text Available Abstract Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to

  18. Data-driven Regulation and Governance in Smart Cities

    NARCIS (Netherlands)

    Ranchordás, Sofia; Klop, Abram; Mak, Vanessa; Berlee, Anna; Tjong Tjin Tai, Eric

    2018-01-01

    This chapter discusses the concept of data-driven regulation and governance in the context of smart cities by describing how these urban centres harness these technologies to collect and process information about citizens, traffic, urban planning or waste production. It describes how several smart

  19. Data-Driven Planning: Using Assessment in Strategic Planning

    Science.gov (United States)

    Bresciani, Marilee J.

    2010-01-01

    Data-driven planning or evidence-based decision making represents nothing new in its concept. For years, business leaders have claimed they have implemented planning informed by data that have been strategically and systematically gathered. Within higher education and student affairs, there may be less evidence of the actual practice of…

  20. Data-Driven Model Order Reduction for Bayesian Inverse Problems

    KAUST Repository

    Cui, Tiangang

    2014-01-06

    One of the major challenges in using MCMC for the solution of inverse problems is the repeated evaluation of computationally expensive numerical models. We develop a data-driven projection- based model order reduction technique to reduce the computational cost of numerical PDE evaluations in this context.

  1. Data mining, knowledge discovery and data-driven modelling

    NARCIS (Netherlands)

    Solomatine, D.P.; Velickov, S.; Bhattacharya, B.; Van der Wal, B.

    2003-01-01

    The project was aimed at exploring the possibilities of a new paradigm in modelling - data-driven modelling, often referred as "data mining". Several application areas were considered: sedimentation problems in the Port of Rotterdam, automatic soil classification on the basis of cone penetration

  2. Scalable data-driven short-term traffic prediction

    NARCIS (Netherlands)

    Friso, K.; Wismans, L. J.J.; Tijink, M. B.

    2017-01-01

    Short-term traffic prediction has a lot of potential for traffic management. However, most research has traditionally focused on either traffic models-which do not scale very well to large networks, computationally-or on data-driven methods for freeways, leaving out urban arterials completely. Urban

  3. Data-driven analysis of blood glucose management effectiveness

    NARCIS (Netherlands)

    Nannings, B.; Abu-Hanna, A.; Bosman, R. J.

    2005-01-01

    The blood-glucose-level (BGL) of Intensive Care (IC) patients requires close monitoring and control. In this paper we describe a general data-driven analytical method for studying the effectiveness of BGL management. The method is based on developing and studying a clinical outcome reflecting the

  4. Knowledge-Driven Versus Data-Driven Logics

    Czech Academy of Sciences Publication Activity Database

    Dubois, D.; Hájek, Petr; Prade, H.

    2000-01-01

    Roč. 9, č. 1 (2000), s. 65-89 ISSN 0925-8531 R&D Projects: GA AV ČR IAA1030601 Grant - others:CNRS(FR) 4008 Institutional research plan: AV0Z1030915 Keywords : epistemic logic * possibility theory * data-driven reasoning * deontic logic Subject RIV: BA - General Mathematics

  5. Developing Annotation Solutions for Online Data Driven Learning

    Science.gov (United States)

    Perez-Paredes, Pascual; Alcaraz-Calero, Jose M.

    2009-01-01

    Although "annotation" is a widely-researched topic in Corpus Linguistics (CL), its potential role in Data Driven Learning (DDL) has not been addressed in depth by Foreign Language Teaching (FLT) practitioners. Furthermore, most of the research in the use of DDL methods pays little attention to annotation in the design and implementation…

  6. Data-driven modelling of LTI systems using symbolic regression

    NARCIS (Netherlands)

    Khandelwal, D.; Toth, R.; Van den Hof, P.M.J.

    2017-01-01

    The aim of this project is to automate the task of data-driven identification of dynamical systems. The underlying goal is to develop an identification tool that models a physical system without distinguishing between classes of systems such as linear, nonlinear or possibly even hybrid systems. Such

  7. Data-driven analysis of functional brain interactions during free listening to music and speech.

    Science.gov (United States)

    Fang, Jun; Hu, Xintao; Han, Junwei; Jiang, Xi; Zhu, Dajiang; Guo, Lei; Liu, Tianming

    2015-06-01

    Natural stimulus functional magnetic resonance imaging (N-fMRI) such as fMRI acquired when participants were watching video streams or listening to audio streams has been increasingly used to investigate functional mechanisms of the human brain in recent years. One of the fundamental challenges in functional brain mapping based on N-fMRI is to model the brain's functional responses to continuous, naturalistic and dynamic natural stimuli. To address this challenge, in this paper we present a data-driven approach to exploring functional interactions in the human brain during free listening to music and speech streams. Specifically, we model the brain responses using N-fMRI by measuring the functional interactions on large-scale brain networks with intrinsically established structural correspondence, and perform music and speech classification tasks to guide the systematic identification of consistent and discriminative functional interactions when multiple subjects were listening music and speech in multiple categories. The underlying premise is that the functional interactions derived from N-fMRI data of multiple subjects should exhibit both consistency and discriminability. Our experimental results show that a variety of brain systems including attention, memory, auditory/language, emotion, and action networks are among the most relevant brain systems involved in classic music, pop music and speech differentiation. Our study provides an alternative approach to investigating the human brain's mechanism in comprehension of complex natural music and speech.

  8. A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults

    OpenAIRE

    Rui Sun; Qi Cheng; Guanyu Wang; Washington Yotto Ochieng

    2017-01-01

    The use of Unmanned Aerial Vehicles (UAVs) has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs’ flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS)-based approach is presented for the detection of on-board navigation sensor faults in ...

  9. Data-driven modeling and real-time distributed control for energy efficient manufacturing systems

    International Nuclear Information System (INIS)

    Zou, Jing; Chang, Qing; Arinez, Jorge; Xiao, Guoxian

    2017-01-01

    As manufacturers face the challenges of increasing global competition and energy saving requirements, it is imperative to seek out opportunities to reduce energy waste and overall cost. In this paper, a novel data-driven stochastic manufacturing system modeling method is proposed to identify and predict energy saving opportunities and their impact on production. A real-time distributed feedback production control policy, which integrates the current and predicted system performance, is established to improve the overall profit and energy efficiency. A case study is presented to demonstrate the effectiveness of the proposed control policy. - Highlights: • A data-driven stochastic manufacturing system model is proposed. • Real-time system performance and energy saving opportunity identification method is developed. • Prediction method for future potential system performance and energy saving opportunity is developed. • A real-time distributed feedback control policy is established to improve energy efficiency and overall system profit.

  10. Towards Data-Driven Simulations of Wildfire Spread using Ensemble-based Data Assimilation

    Science.gov (United States)

    Rochoux, M. C.; Bart, J.; Ricci, S. M.; Cuenot, B.; Trouvé, A.; Duchaine, F.; Morel, T.

    2012-12-01

    Real-time predictions of a propagating wildfire remain a challenging task because the problem involves both multi-physics and multi-scales. The propagation speed of wildfires, also called the rate of spread (ROS), is indeed determined by complex interactions between pyrolysis, combustion and flow dynamics, atmospheric dynamics occurring at vegetation, topographical and meteorological scales. Current operational fire spread models are mainly based on a semi-empirical parameterization of the ROS in terms of vegetation, topographical and meteorological properties. For the fire spread simulation to be predictive and compatible with operational applications, the uncertainty on the ROS model should be reduced. As recent progress made in remote sensing technology provides new ways to monitor the fire front position, a promising approach to overcome the difficulties found in wildfire spread simulations is to integrate fire modeling and fire sensing technologies using data assimilation (DA). For this purpose we have developed a prototype data-driven wildfire spread simulator in order to provide optimal estimates of poorly known model parameters [*]. The data-driven simulation capability is adapted for more realistic wildfire spread : it considers a regional-scale fire spread model that is informed by observations of the fire front location. An Ensemble Kalman Filter algorithm (EnKF) based on a parallel computing platform (OpenPALM) was implemented in order to perform a multi-parameter sequential estimation where wind magnitude and direction are in addition to vegetation properties (see attached figure). The EnKF algorithm shows its good ability to track a small-scale grassland fire experiment and ensures a good accounting for the sensitivity of the simulation outcomes to the control parameters. As a conclusion, it was shown that data assimilation is a promising approach to more accurately forecast time-varying wildfire spread conditions as new airborne-like observations of

  11. The Orion GN and C Data-Driven Flight Software Architecture for Automated Sequencing and Fault Recovery

    Science.gov (United States)

    King, Ellis; Hart, Jeremy; Odegard, Ryan

    2010-01-01

    The Orion Crew Exploration Vehicle (CET) is being designed to include significantly more automation capability than either the Space Shuttle or the International Space Station (ISS). In particular, the vehicle flight software has requirements to accommodate increasingly automated missions throughout all phases of flight. A data-driven flight software architecture will provide an evolvable automation capability to sequence through Guidance, Navigation & Control (GN&C) flight software modes and configurations while maintaining the required flexibility and human control over the automation. This flexibility is a key aspect needed to address the maturation of operational concepts, to permit ground and crew operators to gain trust in the system and mitigate unpredictability in human spaceflight. To allow for mission flexibility and reconfrgurability, a data driven approach is being taken to load the mission event plan as well cis the flight software artifacts associated with the GN&C subsystem. A database of GN&C level sequencing data is presented which manages and tracks the mission specific and algorithm parameters to provide a capability to schedule GN&C events within mission segments. The flight software data schema for performing automated mission sequencing is presented with a concept of operations for interactions with ground and onboard crew members. A prototype architecture for fault identification, isolation and recovery interactions with the automation software is presented and discussed as a forward work item.

  12. Data-driven importance distributions for articulated tracking

    DEFF Research Database (Denmark)

    Hauberg, Søren; Pedersen, Kim Steenstrup

    2011-01-01

    We present two data-driven importance distributions for particle filterbased articulated tracking; one based on background subtraction, another on depth information. In order to keep the algorithms efficient, we represent human poses in terms of spatial joint positions. To ensure constant bone le...... filter, where they improve both accuracy and efficiency of the tracker. In fact, they triple the effective number of samples compared to the most commonly used importance distribution at little extra computational cost....

  13. Authoring Data-Driven Videos with DataClips.

    Science.gov (United States)

    Amini, Fereshteh; Riche, Nathalie Henry; Lee, Bongshin; Monroy-Hernandez, Andres; Irani, Pourang

    2017-01-01

    Data videos, or short data-driven motion graphics, are an increasingly popular medium for storytelling. However, creating data videos is difficult as it involves pulling together a unique combination of skills. We introduce DataClips, an authoring tool aimed at lowering the barriers to crafting data videos. DataClips allows non-experts to assemble data-driven "clips" together to form longer sequences. We constructed the library of data clips by analyzing the composition of over 70 data videos produced by reputable sources such as The New York Times and The Guardian. We demonstrate that DataClips can reproduce over 90% of our data videos corpus. We also report on a qualitative study comparing the authoring process and outcome achieved by (1) non-experts using DataClips, and (2) experts using Adobe Illustrator and After Effects to create data-driven clips. Results indicated that non-experts are able to learn and use DataClips with a short training period. In the span of one hour, they were able to produce more videos than experts using a professional editing tool, and their clips were rated similarly by an independent audience.

  14. Pipe break prediction based on evolutionary data-driven methods with brief recorded data

    International Nuclear Information System (INIS)

    Xu Qiang; Chen Qiuwen; Li Weifeng; Ma Jinfeng

    2011-01-01

    Pipe breaks often occur in water distribution networks, imposing great pressure on utility managers to secure stable water supply. However, pipe breaks are hard to detect by the conventional method. It is therefore necessary to develop reliable and robust pipe break models to assess the pipe's probability to fail and then to optimize the pipe break detection scheme. In the absence of deterministic physical models for pipe break, data-driven techniques provide a promising approach to investigate the principles underlying pipe break. In this paper, two data-driven techniques, namely Genetic Programming (GP) and Evolutionary Polynomial Regression (EPR) are applied to develop pipe break models for the water distribution system of Beijing City. The comparison with the recorded pipe break data from 1987 to 2005 showed that the models have great capability to obtain reliable predictions. The models can be used to prioritize pipes for break inspection and then improve detection efficiency.

  15. Data-Driven and Expectation-Driven Discovery of Empirical Laws.

    Science.gov (United States)

    1982-10-10

    occurred in small integer proportions to each other. In 1809, Joseph Gay- Lussac found evidence for his law of combining volumes, which stated that a...of Empirical Laws Patrick W. Langley Gary L. Bradshaw Herbert A. Simon T1he Robotics Institute Carnegie-Mellon University Pittsburgh, Pennsylvania...Subtitle) S. TYPE OF REPORT & PERIOD COVERED Data-Driven and Expectation-Driven Discovery Interim Report 2/82-10/82 of Empirical Laws S. PERFORMING ORG

  16. A Data-Driven Response Virtual Sensor Technique with Partial Vibration Measurements Using Convolutional Neural Network

    Science.gov (United States)

    Sun, Shan-Bin; He, Yuan-Yuan; Zhou, Si-Da; Yue, Zhen-Jiang

    2017-01-01

    Measurement of dynamic responses plays an important role in structural health monitoring, damage detection and other fields of research. However, in aerospace engineering, the physical sensors are limited in the operational conditions of spacecraft, due to the severe environment in outer space. This paper proposes a virtual sensor model with partial vibration measurements using a convolutional neural network. The transmissibility function is employed as prior knowledge. A four-layer neural network with two convolutional layers, one fully connected layer, and an output layer is proposed as the predicting model. Numerical examples of two different structural dynamic systems demonstrate the performance of the proposed approach. The excellence of the novel technique is further indicated using a simply supported beam experiment comparing to a modal-model-based virtual sensor, which uses modal parameters, such as mode shapes, for estimating the responses of the faulty sensors. The results show that the presented data-driven response virtual sensor technique can predict structural response with high accuracy. PMID:29231868

  17. Using data-driven agent-based models for forecasting emerging infectious diseases

    Directory of Open Access Journals (Sweden)

    Srinivasan Venkatramanan

    2018-03-01

    Full Text Available Producing timely, well-informed and reliable forecasts for an ongoing epidemic of an emerging infectious disease is a huge challenge. Epidemiologists and policy makers have to deal with poor data quality, limited understanding of the disease dynamics, rapidly changing social environment and the uncertainty on effects of various interventions in place. Under this setting, detailed computational models provide a comprehensive framework for integrating diverse data sources into a well-defined model of disease dynamics and social behavior, potentially leading to better understanding and actions. In this paper, we describe one such agent-based model framework developed for forecasting the 2014–2015 Ebola epidemic in Liberia, and subsequently used during the Ebola forecasting challenge. We describe the various components of the model, the calibration process and summarize the forecast performance across scenarios of the challenge. We conclude by highlighting how such a data-driven approach can be refined and adapted for future epidemics, and share the lessons learned over the course of the challenge. Keywords: Emerging infectious diseases, Agent-based models, Simulation optimization, Bayesian calibration, Ebola

  18. BMI cyberworkstation: enabling dynamic data-driven brain-machine interface research through cyberinfrastructure.

    Science.gov (United States)

    Zhao, Ming; Rattanatamrong, Prapaporn; DiGiovanna, Jack; Mahmoudi, Babak; Figueiredo, Renato J; Sanchez, Justin C; Príncipe, José C; Fortes, José A B

    2008-01-01

    Dynamic data-driven brain-machine interfaces (DDDBMI) have great potential to advance the understanding of neural systems and improve the design of brain-inspired rehabilitative systems. This paper presents a novel cyberinfrastructure that couples in vivo neurophysiology experimentation with massive computational resources to provide seamless and efficient support of DDDBMI research. Closed-loop experiments can be conducted with in vivo data acquisition, reliable network transfer, parallel model computation, and real-time robot control. Behavioral experiments with live animals are supported with real-time guarantees. Offline studies can be performed with various configurations for extensive analysis and training. A Web-based portal is also provided to allow users to conveniently interact with the cyberinfrastructure, conducting both experimentation and analysis. New motor control models are developed based on this approach, which include recursive least square based (RLS) and reinforcement learning based (RLBMI) algorithms. The results from an online RLBMI experiment shows that the cyberinfrastructure can successfully support DDDBMI experiments and meet the desired real-time requirements.

  19. A Data-Driven Response Virtual Sensor Technique with Partial Vibration Measurements Using Convolutional Neural Network.

    Science.gov (United States)

    Sun, Shan-Bin; He, Yuan-Yuan; Zhou, Si-Da; Yue, Zhen-Jiang

    2017-12-12

    Measurement of dynamic responses plays an important role in structural health monitoring, damage detection and other fields of research. However, in aerospace engineering, the physical sensors are limited in the operational conditions of spacecraft, due to the severe environment in outer space. This paper proposes a virtual sensor model with partial vibration measurements using a convolutional neural network. The transmissibility function is employed as prior knowledge. A four-layer neural network with two convolutional layers, one fully connected layer, and an output layer is proposed as the predicting model. Numerical examples of two different structural dynamic systems demonstrate the performance of the proposed approach. The excellence of the novel technique is further indicated using a simply supported beam experiment comparing to a modal-model-based virtual sensor, which uses modal parameters, such as mode shapes, for estimating the responses of the faulty sensors. The results show that the presented data-driven response virtual sensor technique can predict structural response with high accuracy.

  20. Pengembangan Data Warehouse Menggunakan Pendekatan Data-Driven untuk Membantu Pengelolaan SDM

    Directory of Open Access Journals (Sweden)

    Mujiono Mujiono

    2016-01-01

    Full Text Available The basis of bureaucratic reform is the reform of human resources management. One supporting factor is the development of an employee database. To support the management of human resources required including data warehouse and business intelligent tools. The data warehouse is an integrated concept of reliable data storage to provide support to all the needs of the data analysis. In this study developed a data warehouse using the data-driven approach to the source data comes from SIMPEG, SAPK and electronic presence. Data warehouses are designed using the nine steps methodology and unified modeling language (UML notation. Extract transform load (ETL is done by using Pentaho Data Integration by applying transformation maps. Furthermore, to help human resource management, the system is built to perform online analytical processing (OLAP to facilitate web-based information. In this study generated BI application development framework with Model-View-Controller (MVC architecture and OLAP operations are built using the dynamic query generation, PivotTable, and HighChart to present information about PNS, CPNS, Retirement, Kenpa and Presence

  1. Simulation of shallow groundwater levels: Comparison of a data-driven and a conceptual model

    Science.gov (United States)

    Fahle, Marcus; Dietrich, Ottfried; Lischeid, Gunnar

    2015-04-01

    Despite an abundance of models aimed at simulating shallow groundwater levels, application of such models is often hampered by a lack of appropriate input data. Difficulties especially arise with regard to soil data, which are typically hard to obtain and prone to spatial variability, eventually leading to uncertainties in the model results. Modelling approaches relying entirely on easily measured quantities are therefore an alternative to encourage the applicability of models. We present and compare two models for calculating 1-day-ahead predictions of the groundwater level that are only based on measurements of potential evapotranspiration, precipitation and groundwater levels. The first model is a newly developed conceptual model that is parametrized using the White method (which estimates the actual evapotranspiration on basis of diurnal groundwater fluctuations) and a rainfall-response ratio. Inverted versions of the two latter approaches are then used to calculate the predictions of the groundwater level. Furthermore, as a completely data-driven alternative, a simple feed-forward multilayer perceptron neural network was trained based on the same inputs and outputs. Data of 4 growing periods (April to October) from a study site situated in the Spreewald wetland in North-east Germany were taken to set-up the models and compare their performance. In addition, response surfaces that relate model outputs to combinations of different input variables are used to reveal those aspects in which the two approaches coincide and those in which they differ. Finally, it will be evaluated whether the conceptual approach can be enhanced by extracting knowledge of the neural network. This is done by replacing in the conceptual model the default function that relates groundwater recharge and groundwater level, which is assumed to be linear, by the non-linear function extracted from the neural network.

  2. Assigning clinical codes with data-driven concept representation on Dutch clinical free text.

    Science.gov (United States)

    Scheurwegs, Elyne; Luyckx, Kim; Luyten, Léon; Goethals, Bart; Daelemans, Walter

    2017-05-01

    Clinical codes are used for public reporting purposes, are fundamental to determining public financing for hospitals, and form the basis for reimbursement claims to insurance providers. They are assigned to a patient stay to reflect the diagnosis and performed procedures during that stay. This paper aims to enrich algorithms for automated clinical coding by taking a data-driven approach and by using unsupervised and semi-supervised techniques for the extraction of multi-word expressions that convey a generalisable medical meaning (referred to as concepts). Several methods for extracting concepts from text are compared, two of which are constructed from a large unannotated corpus of clinical free text. A distributional semantic model (i.c. the word2vec skip-gram model) is used to generalize over concepts and retrieve relations between them. These methods are validated on three sets of patient stay data, in the disease areas of urology, cardiology, and gastroenterology. The datasets are in Dutch, which introduces a limitation on available concept definitions from expert-based ontologies (e.g. UMLS). The results show that when expert-based knowledge in ontologies is unavailable, concepts derived from raw clinical texts are a reliable alternative. Both concepts derived from raw clinical texts perform and concepts derived from expert-created dictionaries outperform a bag-of-words approach in clinical code assignment. Adding features based on tokens that appear in a semantically similar context has a positive influence for predicting diagnostic codes. Furthermore, the experiments indicate that a distributional semantics model can find relations between semantically related concepts in texts but also introduces erroneous and redundant relations, which can undermine clinical coding performance. Copyright © 2017. Published by Elsevier Inc.

  3. Data-Driven Iterative Vibration Signal Enhancement Strategy Using Alpha Stable Distribution

    Directory of Open Access Journals (Sweden)

    Grzegorz Żak

    2017-01-01

    Full Text Available The authors propose a novel procedure for enhancement of the signal to noise ratio in vibration data acquired from machines working in mining industry environment. Proposed method allows performing data-driven reduction of the deterministic, high energy, and low frequency components. Furthermore, it provides a way to enhance signal of interest. Procedure incorporates application of the time-frequency decomposition, α-stable distribution based signal modeling, and stability parameter in the time domain as a stoppage criterion for iterative part of the procedure. An advantage of the proposed algorithm is data-driven, automative detection of the informative frequency band as well as band with high energy due to the properties of the used distribution. Furthermore, there is no need to have knowledge regarding kinematics, speed, and so on. The proposed algorithm is applied towards real data acquired from the belt conveyor pulley drive’s gearbox.

  4. Parameterized data-driven fuzzy model based optimal control of a semi-batch reactor.

    Science.gov (United States)

    Kamesh, Reddi; Rani, K Yamuna

    2016-09-01

    A parameterized data-driven fuzzy (PDDF) model structure is proposed for semi-batch processes, and its application for optimal control is illustrated. The orthonormally parameterized input trajectories, initial states and process parameters are the inputs to the model, which predicts the output trajectories in terms of Fourier coefficients. Fuzzy rules are formulated based on the signs of a linear data-driven model, while the defuzzification step incorporates a linear regression model to shift the domain from input to output domain. The fuzzy model is employed to formulate an optimal control problem for single rate as well as multi-rate systems. Simulation study on a multivariable semi-batch reactor system reveals that the proposed PDDF modeling approach is capable of capturing the nonlinear and time-varying behavior inherent in the semi-batch system fairly accurately, and the results of operating trajectory optimization using the proposed model are found to be comparable to the results obtained using the exact first principles model, and are also found to be comparable to or better than parameterized data-driven artificial neural network model based optimization results. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  5. General Purpose Data-Driven Monitoring for Space Operations

    Science.gov (United States)

    Iverson, David L.; Martin, Rodney A.; Schwabacher, Mark A.; Spirkovska, Liljana; Taylor, William McCaa; Castle, Joseph P.; Mackey, Ryan M.

    2009-01-01

    As modern space propulsion and exploration systems improve in capability and efficiency, their designs are becoming increasingly sophisticated and complex. Determining the health state of these systems, using traditional parameter limit checking, model-based, or rule-based methods, is becoming more difficult as the number of sensors and component interactions grow. Data-driven monitoring techniques have been developed to address these issues by analyzing system operations data to automatically characterize normal system behavior. System health can be monitored by comparing real-time operating data with these nominal characterizations, providing detection of anomalous data signatures indicative of system faults or failures. The Inductive Monitoring System (IMS) is a data-driven system health monitoring software tool that has been successfully applied to several aerospace applications. IMS uses a data mining technique called clustering to analyze archived system data and characterize normal interactions between parameters. The scope of IMS based data-driven monitoring applications continues to expand with current development activities. Successful IMS deployment in the International Space Station (ISS) flight control room to monitor ISS attitude control systems has led to applications in other ISS flight control disciplines, such as thermal control. It has also generated interest in data-driven monitoring capability for Constellation, NASA's program to replace the Space Shuttle with new launch vehicles and spacecraft capable of returning astronauts to the moon, and then on to Mars. Several projects are currently underway to evaluate and mature the IMS technology and complementary tools for use in the Constellation program. These include an experiment on board the Air Force TacSat-3 satellite, and ground systems monitoring for NASA's Ares I-X and Ares I launch vehicles. The TacSat-3 Vehicle System Management (TVSM) project is a software experiment to integrate fault

  6. Data-driven algorithm to estimate friction in automobile engine

    DEFF Research Database (Denmark)

    Stotsky, Alexander A.

    2010-01-01

    Algorithms based on the oscillations of the engine angular rotational speed under fuel cutoff and no-load were proposed for estimation of the engine friction torque. The recursive algorithm to restore the periodic signal is used to calculate the amplitude of the engine speed signal at fuel cutoff....... The values of the friction torque in the corresponding table entries are updated at acquiring new measurements of the friction moment. A new, data-driven algorithm for table adaptation on the basis of stepwise regression was developed and verified using the six-cylinder Volvo engine....

  7. Data driven information system for supervision of judicial open

    Directory of Open Access Journals (Sweden)

    Ming LI

    2016-08-01

    Full Text Available Aiming at the four outstanding problems of informationized supervision for judicial publicity, the judicial public data is classified based on data driven to form the finally valuable data. Then, the functional structure, technical structure and business structure of the data processing system are put forward, including data collection module, data reduction module, data analysis module, data application module and data security module, etc. The development of the data processing system based on these structures can effectively reduce work intensity of judicial open iformation management, summarize the work state, find the problems, and promote the level of judicial publicity.

  8. Product design pattern based on big data-driven scenario

    OpenAIRE

    Conggang Yu; Lusha Zhu

    2016-01-01

    This article discusses about new product design patterns in the big data era, gives designer a new rational thinking way, and is a new way to understand the design of the product. Based on the key criteria of the product design process, category, element, and product are used to input the data, which comprises concrete data and abstract data as an enlargement of the criteria of product design process for the establishment of a big data-driven product design pattern’s model. Moreover, an exper...

  9. Data-Driven Model Reduction and Transfer Operator Approximation

    Science.gov (United States)

    Klus, Stefan; Nüske, Feliks; Koltai, Péter; Wu, Hao; Kevrekidis, Ioannis; Schütte, Christof; Noé, Frank

    2018-06-01

    In this review paper, we will present different data-driven dimension reduction techniques for dynamical systems that are based on transfer operator theory as well as methods to approximate transfer operators and their eigenvalues, eigenfunctions, and eigenmodes. The goal is to point out similarities and differences between methods developed independently by the dynamical systems, fluid dynamics, and molecular dynamics communities such as time-lagged independent component analysis, dynamic mode decomposition, and their respective generalizations. As a result, extensions and best practices developed for one particular method can be carried over to other related methods.

  10. Data-Driven Healthcare: Challenges and Opportunities for Interactive Visualization.

    Science.gov (United States)

    Gotz, David; Borland, David

    2016-01-01

    The healthcare industry's widespread digitization efforts are reshaping one of the largest sectors of the world's economy. This transformation is enabling systems that promise to use ever-improving data-driven evidence to help doctors make more precise diagnoses, institutions identify at risk patients for intervention, clinicians develop more personalized treatment plans, and researchers better understand medical outcomes within complex patient populations. Given the scale and complexity of the data required to achieve these goals, advanced data visualization tools have the potential to play a critical role. This article reviews a number of visualization challenges unique to the healthcare discipline.

  11. Management and Nonlinear Analysis of Disinfection System of Water Distribution Networks Using Data Driven Methods

    Directory of Open Access Journals (Sweden)

    Mohammad Zounemat-Kermani

    2018-03-01

    Full Text Available Chlorination unit is widely used to supply safe drinking water and removal of pathogens from water distribution networks. Data-driven approach is one appropriate method for analyzing performance of chlorine in water supply network. In this study, multi-layer perceptron neural network (MLP with three training algorithms (gradient descent, conjugate gradient and BFGS and support vector machine (SVM with RBF kernel function were used to predict the concentration of residual chlorine in water supply networks of Ahmadabad Dafeh and Ahruiyeh villages in Kerman Province. Daily data including discharge (flow, chlorine consumption and residual chlorine were employed from the beginning of 1391 Hijri until the end of 1393 Hijri (for 3 years. To assess the performance of studied models, the criteria such as Nash-Sutcliffe efficiency (NS, root mean square error (RMSE, mean absolute percentage error (MAPE and correlation coefficient (CORR were used that in best modeling situation were 0.9484, 0.0255, 1.081, and 0.974 respectively which resulted from BFGS algorithm. The criteria indicated that MLP model with BFGS and conjugate gradient algorithms were better than all other models in 90 and 10 percent of cases respectively; while the MLP model based on gradient descent algorithm and the SVM model were better in none of the cases. According to the results of this study, proper management of chlorine concentration can be implemented by predicted values of residual chlorine in water supply network. Thus, decreased performance of perceptron network and support vector machine in water supply network of Ahruiyeh in comparison to Ahmadabad Dafeh can be inferred from improper management of chlorination.

  12. Data-Driven User Feedback: An Improved Neurofeedback Strategy considering the Interindividual Variability of EEG Features

    Directory of Open Access Journals (Sweden)

    Chang-Hee Han

    2016-01-01

    Full Text Available It has frequently been reported that some users of conventional neurofeedback systems can experience only a small portion of the total feedback range due to the large interindividual variability of EEG features. In this study, we proposed a data-driven neurofeedback strategy considering the individual variability of electroencephalography (EEG features to permit users of the neurofeedback system to experience a wider range of auditory or visual feedback without a customization process. The main idea of the proposed strategy is to adjust the ranges of each feedback level using the density in the offline EEG database acquired from a group of individuals. Twenty-two healthy subjects participated in offline experiments to construct an EEG database, and five subjects participated in online experiments to validate the performance of the proposed data-driven user feedback strategy. Using the optimized bin sizes, the number of feedback levels that each individual experienced was significantly increased to 139% and 144% of the original results with uniform bin sizes in the offline and online experiments, respectively. Our results demonstrated that the use of our data-driven neurofeedback strategy could effectively increase the overall range of feedback levels that each individual experienced during neurofeedback training.

  13. Data-Driven User Feedback: An Improved Neurofeedback Strategy considering the Interindividual Variability of EEG Features.

    Science.gov (United States)

    Han, Chang-Hee; Lim, Jeong-Hwan; Lee, Jun-Hak; Kim, Kangsan; Im, Chang-Hwan

    2016-01-01

    It has frequently been reported that some users of conventional neurofeedback systems can experience only a small portion of the total feedback range due to the large interindividual variability of EEG features. In this study, we proposed a data-driven neurofeedback strategy considering the individual variability of electroencephalography (EEG) features to permit users of the neurofeedback system to experience a wider range of auditory or visual feedback without a customization process. The main idea of the proposed strategy is to adjust the ranges of each feedback level using the density in the offline EEG database acquired from a group of individuals. Twenty-two healthy subjects participated in offline experiments to construct an EEG database, and five subjects participated in online experiments to validate the performance of the proposed data-driven user feedback strategy. Using the optimized bin sizes, the number of feedback levels that each individual experienced was significantly increased to 139% and 144% of the original results with uniform bin sizes in the offline and online experiments, respectively. Our results demonstrated that the use of our data-driven neurofeedback strategy could effectively increase the overall range of feedback levels that each individual experienced during neurofeedback training.

  14. A data driven nonlinear stochastic model for blood glucose dynamics.

    Science.gov (United States)

    Zhang, Yan; Holt, Tim A; Khovanova, Natalia

    2016-03-01

    The development of adequate mathematical models for blood glucose dynamics may improve early diagnosis and control of diabetes mellitus (DM). We have developed a stochastic nonlinear second order differential equation to describe the response of blood glucose concentration to food intake using continuous glucose monitoring (CGM) data. A variational Bayesian learning scheme was applied to define the number and values of the system's parameters by iterative optimisation of free energy. The model has the minimal order and number of parameters to successfully describe blood glucose dynamics in people with and without DM. The model accounts for the nonlinearity and stochasticity of the underlying glucose-insulin dynamic process. Being data-driven, it takes full advantage of available CGM data and, at the same time, reflects the intrinsic characteristics of the glucose-insulin system without detailed knowledge of the physiological mechanisms. We have shown that the dynamics of some postprandial blood glucose excursions can be described by a reduced (linear) model, previously seen in the literature. A comprehensive analysis demonstrates that deterministic system parameters belong to different ranges for diabetes and controls. Implications for clinical practice are discussed. This is the first study introducing a continuous data-driven nonlinear stochastic model capable of describing both DM and non-DM profiles. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  15. Locative media and data-driven computing experiments

    Directory of Open Access Journals (Sweden)

    Sung-Yueh Perng

    2016-06-01

    Full Text Available Over the past two decades urban social life has undergone a rapid and pervasive geocoding, becoming mediated, augmented and anticipated by location-sensitive technologies and services that generate and utilise big, personal, locative data. The production of these data has prompted the development of exploratory data-driven computing experiments that seek to find ways to extract value and insight from them. These projects often start from the data, rather than from a question or theory, and try to imagine and identify their potential utility. In this paper, we explore the desires and mechanics of data-driven computing experiments. We demonstrate how both locative media data and computing experiments are ‘staged’ to create new values and computing techniques, which in turn are used to try and derive possible futures that are ridden with unintended consequences. We argue that using computing experiments to imagine potential urban futures produces effects that often have little to do with creating new urban practices. Instead, these experiments promote Big Data science and the prospect that data produced for one purpose can be recast for another and act as alternative mechanisms of envisioning urban futures.

  16. Product design pattern based on big data-driven scenario

    Directory of Open Access Journals (Sweden)

    Conggang Yu

    2016-07-01

    Full Text Available This article discusses about new product design patterns in the big data era, gives designer a new rational thinking way, and is a new way to understand the design of the product. Based on the key criteria of the product design process, category, element, and product are used to input the data, which comprises concrete data and abstract data as an enlargement of the criteria of product design process for the establishment of a big data-driven product design pattern’s model. Moreover, an experiment and a product design case are conducted to verify the feasibility of the new pattern. Ultimately, we will conclude that the data-driven product design has two patterns: one is the concrete data supporting the product design, namely “product–data–product” pattern, and the second is based on the value of the abstract data for product design, namely “data–product–data” pattern. Through the data, users are involving themselves in the design development process. Data and product form a huge network, and data plays a role of connection or node. So the essence of the design is to find a new connection based on element, and to find a new node based on category.

  17. A data-driven emulation framework for representing water-food nexus in a changing cold region

    Science.gov (United States)

    Nazemi, A.; Zandmoghaddam, S.; Hatami, S.

    2017-12-01

    Water resource systems are under increasing pressure globally. Growing population along with competition between water demands and emerging effects of climate change have caused enormous vulnerabilities in water resource management across many regions. Diagnosing such vulnerabilities and provision of effective adaptation strategies requires the availability of simulation tools that can adequately represent the interactions between competing water demands for limiting water resources and inform decision makers about the critical vulnerability thresholds under a range of potential natural and anthropogenic conditions. Despite a significant progress in integrated modeling of water resource systems, regional models are often unable to fully represent the contemplating dynamics within the key elements of water resource systems locally. Here we propose a data-driven approach to emulate a complex regional water resource system model developed for Oldman River Basin in southern Alberta, Canada. The aim of the emulation is to provide a detailed understanding of the trade-offs and interaction at the Oldman Reservoir, which is the key to flood control and irrigated agriculture in this over-allocated semi-arid cold region. Different surrogate models are developed to represent the dynamic of irrigation demand and withdrawal as well as reservoir evaporation and release individually. The nan-falsified offline models are then integrated through the water balance equation at the reservoir location to provide a coupled model for representing the dynamic of reservoir operation and water allocation at the local scale. The performance of individual and integrated models are rigorously examined and sources of uncertainty are highlighted. To demonstrate the practical utility of such surrogate modeling approach, we use the integrated data-driven model for examining the trade-off in irrigation water supply, reservoir storage and release under a range of changing climate, upstream

  18. Data-driven analysis of simultaneous EEG/fMRI reveals neurophysiological phenotypes of impulse control.

    Science.gov (United States)

    Schmüser, Lena; Sebastian, Alexandra; Mobascher, Arian; Lieb, Klaus; Feige, Bernd; Tüscher, Oliver

    2016-09-01

    Response inhibition is the ability to suppress inadequate but prepotent or ongoing response tendencies. A fronto-striatal network is involved in these processes. Between-subject differences in the intra-individual variability have been suggested to constitute a key to pathological processes underlying impulse control disorders. Single-trial EEG/fMRI analysis allows to increase sensitivity for inter-individual differences by incorporating intra-individual variability. Thirty-eight healthy subjects performed a visual Go/Nogo task during simultaneous EEG/fMRI. Of 38 healthy subjects, 21 subjects reliably showed Nogo-related ICs (Nogo-IC-positive) while 17 subjects (Nogo-IC-negative) did not. Comparing both groups revealed differences on various levels: On trait level, Nogo-IC-negative subjects scored higher on questionnaires regarding attention deficit/hyperactivity disorder; on a behavioral level, they displayed slower response times (RT) and higher intra-individual RT variability while both groups did not differ in their inhibitory performance. On the neurophysiological level, Nogo-IC-negative subjects showed a hyperactivation of left inferior frontal cortex/insula and left putamen as well as significantly reduced P3 amplitudes. Thus, a data-driven approach for IC classification and the resulting presence or absence of early Nogo-specific ICs as criterion for group selection revealed group differences at behavioral and neurophysiological levels. This may indicate electrophysiological phenotypes characterized by inter-individual variations of neural and behavioral correlates of impulse control. We demonstrated that the inter-individual difference in an electrophysiological correlate of response inhibition is correlated with distinct, potentially compensatory neural activity. This may suggest the existence of electrophysiologically dissociable phenotypes of behavioral and neural motor response inhibition with the Nogo-IC-positive phenotype possibly providing

  19. Employment relations: A data driven analysis of job markets using online job boards and online professional networks

    CSIR Research Space (South Africa)

    Marivate, Vukosi N

    2017-08-01

    Full Text Available Data from online job boards and online professional networks present an opportunity to understand job markets as well as how professionals transition from one job/career to another. We propose a data driven approach to begin to understand a slice...

  20. The Use of Linking Adverbials in Academic Essays by Non-Native Writers: How Data-Driven Learning Can Help

    Science.gov (United States)

    Garner, James Robert

    2013-01-01

    Over the past several decades, the TESOL community has seen an increased interest in the use of data-driven learning (DDL) approaches. Most studies of DDL have focused on the acquisition of vocabulary items, including a wide range of information necessary for their correct usage. One type of vocabulary that has yet to be properly investigated has…

  1. Data-driven Discovery: A New Era of Exploiting the Literature and Data

    Directory of Open Access Journals (Sweden)

    Ying Ding

    2016-11-01

    Full Text Available In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.

  2. A data driven method to measure electron charge mis-identification rate

    CERN Document Server

    Bakhshiansohi, Hamed

    2009-01-01

    Electron charge mis-measurement is an important challenge in analyses which depend on the charge of electron. To estimate the probability of {\\it electron charge mis-measurement} a data driven method is introduced and a good agreement with MC based methods is achieved.\\\\ The third moment of $\\phi$ distribution of hits in electron SuperCluster is studied. The correlation between this variable and the electron charge is also investigated. Using this `new' variable and some other variables the electron charge measurement is improved by two different approaches.

  3. submitter Data-driven RBE parameterization for helium ion beams

    CERN Document Server

    Mairani, A; Dokic, I; Valle, S M; Tessonnier, T; Galm, R; Ciocca, M; Parodi, K; Ferrari, A; Jäkel, O; Haberer, T; Pedroni, P; Böhlen, T T

    2016-01-01

    Helium ion beams are expected to be available again in the near future for clinical use. A suitable formalism to obtain relative biological effectiveness (RBE) values for treatment planning (TP) studies is needed. In this work we developed a data-driven RBE parameterization based on published in vitro experimental values. The RBE parameterization has been developed within the framework of the linear-quadratic (LQ) model as a function of the helium linear energy transfer (LET), dose and the tissue specific parameter ${{(\\alpha /\\beta )}_{\\text{ph}}}$ of the LQ model for the reference radiation. Analytic expressions are provided, derived from the collected database, describing the $\\text{RB}{{\\text{E}}_{\\alpha}}={{\\alpha}_{\\text{He}}}/{{\\alpha}_{\\text{ph}}}$ and ${{\\text{R}}_{\\beta}}={{\\beta}_{\\text{He}}}/{{\\beta}_{\\text{ph}}}$ ratios as a function of LET. Calculated RBE values at 2 Gy photon dose and at 10% survival ($\\text{RB}{{\\text{E}}_{10}}$ ) are compared with the experimental ones. Pearson's correlati...

  4. Data-driven identification of potential Zika virus vectors

    Science.gov (United States)

    Evans, Michelle V; Dallas, Tad A; Han, Barbara A; Murdock, Courtney C; Drake, John M

    2017-01-01

    Zika is an emerging virus whose rapid spread is of great public health concern. Knowledge about transmission remains incomplete, especially concerning potential transmission in geographic areas in which it has not yet been introduced. To identify unknown vectors of Zika, we developed a data-driven model linking vector species and the Zika virus via vector-virus trait combinations that confer a propensity toward associations in an ecological network connecting flaviviruses and their mosquito vectors. Our model predicts that thirty-five species may be able to transmit the virus, seven of which are found in the continental United States, including Culex quinquefasciatus and Cx. pipiens. We suggest that empirical studies prioritize these species to confirm predictions of vector competence, enabling the correct identification of populations at risk for transmission within the United States. DOI: http://dx.doi.org/10.7554/eLife.22053.001 PMID:28244371

  5. Data-driven sensor placement from coherent fluid structures

    Science.gov (United States)

    Manohar, Krithika; Kaiser, Eurika; Brunton, Bingni W.; Kutz, J. Nathan; Brunton, Steven L.

    2017-11-01

    Optimal sensor placement is a central challenge in the prediction, estimation and control of fluid flows. We reinterpret sensor placement as optimizing discrete samples of coherent fluid structures for full state reconstruction. This permits a drastic reduction in the number of sensors required for faithful reconstruction, since complex fluid interactions can often be described by a small number of coherent structures. Our work optimizes point sensors using the pivoted matrix QR factorization to sample coherent structures directly computed from flow data. We apply this sampling technique in conjunction with various data-driven modal identification methods, including the proper orthogonal decomposition (POD) and dynamic mode decomposition (DMD). In contrast to POD-based sensors, DMD demonstrably enables the optimization of sensors for prediction in systems exhibiting multiple scales of dynamics. Finally, reconstruction accuracy from pivot sensors is shown to be competitive with sensors obtained using traditional computationally prohibitive optimization methods.

  6. Objective, Quantitative, Data-Driven Assessment of Chemical Probes.

    Science.gov (United States)

    Antolin, Albert A; Tym, Joseph E; Komianou, Angeliki; Collins, Ian; Workman, Paul; Al-Lazikani, Bissan

    2018-02-15

    Chemical probes are essential tools for understanding biological systems and for target validation, yet selecting probes for biomedical research is rarely based on objective assessment of all potential compounds. Here, we describe the Probe Miner: Chemical Probes Objective Assessment resource, capitalizing on the plethora of public medicinal chemistry data to empower quantitative, objective, data-driven evaluation of chemical probes. We assess >1.8 million compounds for their suitability as chemical tools against 2,220 human targets and dissect the biases and limitations encountered. Probe Miner represents a valuable resource to aid the identification of potential chemical probes, particularly when used alongside expert curation. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  7. Facilitating Data Driven Business Model Innovation - A Case study

    DEFF Research Database (Denmark)

    Bjerrum, Torben Cæsar Bisgaard; Andersen, Troels Christian; Aagaard, Annabeth

    2016-01-01

    . The businesses interdisciplinary capabilities come into play in the BMI process, where knowledge from the facilitation strategy and knowledge from phases of the BMI process needs to be present to create new knowledge, hence new BMs and innovations. Depending on the environment and shareholders, this also exposes......This paper aims to understand the barriers that businesses meet in understanding their current business models (BM) and in their attempt at innovating new data driven business models (DDBM) using data. The interdisciplinary challenge of knowledge exchange occurring outside and/or inside businesses......, that gathers knowledge is of great importance. The SMEs have little, if no experience, within data handling, data analytics, and working with structured Business Model Innovation (BMI), that relates to both new and conventional products, processes and services. This new frontier of data and BMI will have...

  8. A Transition Towards a Data-Driven Business Model (DDBM)

    DEFF Research Database (Denmark)

    Zaki, Mohamed; Bøe-Lillegraven, Tor; Neely, Andy

    2016-01-01

    Nettavisen is a Norwegian online start-up that experienced a boost after the financial crisis of 2009. Since then, the firm has been able to increase its market share and profitability through the use of highly disruptive business models, allowing the relatively small staff to outcompete powerhouse...... legacy-publishing companies and new media players such as Facebook and Google. These disruptive business models have been successful, as Nettavisen captured a large market share in Norway early on, and was consistently one of the top-three online news sites in Norway. Capitalising on media data explosion...... and the recent acquisition of blogger network ‘Blog.no’, Nettavisen is moving towards a data-driven business model (DDBM). In particular, the firm aims to analyse huge volumes of user Web browsing and purchasing habits....

  9. Freight performance measures : approach analysis.

    Science.gov (United States)

    2010-05-01

    This report reviews the existing state of the art and also the state of the practice of freight performance measurement. Most performance measures at the state level have aimed at evaluating highway or transit infrastructure performance with an empha...

  10. Helioseismic and neutrino data-driven reconstruction of solar properties

    Science.gov (United States)

    Song, Ningqiang; Gonzalez-Garcia, M. C.; Villante, Francesco L.; Vinyoles, Nuria; Serenelli, Aldo

    2018-06-01

    In this work, we use Bayesian inference to quantitatively reconstruct the solar properties most relevant to the solar composition problem using as inputs the information provided by helioseismic and solar neutrino data. In particular, we use a Gaussian process to model the functional shape of the opacity uncertainty to gain flexibility and become as free as possible from prejudice in this regard. With these tools we first readdress the statistical significance of the solar composition problem. Furthermore, starting from a composition unbiased set of standard solar models (SSMs) we are able to statistically select those with solar chemical composition and other solar inputs which better describe the helioseismic and neutrino observations. In particular, we are able to reconstruct the solar opacity profile in a data-driven fashion, independently of any reference opacity tables, obtaining a 4 per cent uncertainty at the base of the convective envelope and 0.8 per cent at the solar core. When systematic uncertainties are included, results are 7.5 per cent and 2 per cent, respectively. In addition, we find that the values of most of the other inputs of the SSMs required to better describe the helioseismic and neutrino data are in good agreement with those adopted as the standard priors, with the exception of the astrophysical factor S11 and the microscopic diffusion rates, for which data suggests a 1 per cent and 30 per cent reduction, respectively. As an output of the study we derive the corresponding data-driven predictions for the solar neutrino fluxes.

  11. Data-driven Inference and Investigation of Thermosphere Dynamics and Variations

    Science.gov (United States)

    Mehta, P. M.; Linares, R.

    2017-12-01

    This paper presents a methodology for data-driven inference and investigation of thermosphere dynamics and variations. The approach uses data-driven modal analysis to extract the most energetic modes of variations for neutral thermospheric species using proper orthogonal decomposition, where the time-independent modes or basis represent the dynamics and the time-depedent coefficients or amplitudes represent the model parameters. The data-driven modal analysis approach combined with sparse, discrete observations is used to infer amplitues for the dynamic modes and to calibrate the energy content of the system. In this work, two different data-types, namely the number density measurements from TIMED/GUVI and the mass density measurements from CHAMP/GRACE are simultaneously ingested for an accurate and self-consistent specification of the thermosphere. The assimilation process is achieved with a non-linear least squares solver and allows estimation/tuning of the model parameters or amplitudes rather than the driver. In this work, we use the Naval Research Lab's MSIS model to derive the most energetic modes for six different species, He, O, N2, O2, H, and N. We examine the dominant drivers of variations for helium in MSIS and observe that seasonal latitudinal variation accounts for about 80% of the dynamic energy with a strong preference of helium for the winter hemisphere. We also observe enhanced helium presence near the poles at GRACE altitudes during periods of low solar activity (Feb 2007) as previously deduced. We will also examine the storm-time response of helium derived from observations. The results are expected to be useful in tuning/calibration of the physics-based models.

  12. Systemic Approach to Architectural Performance

    Directory of Open Access Journals (Sweden)

    Marie Davidova

    2017-04-01

    Full Text Available First-hand experiences in several design projects that were based on media richness and collaboration are described in this article. Although complex design processes are merely considered as socio-technical systems, they are deeply involved with natural systems. My collaborative research in the field of performance-oriented design combines digital and physical conceptual sketches, simulations and prototyping. GIGA-mapping - is applied to organise the data. The design process uses the most suitable tools, for the subtasks at hand, and the use of media is mixed according to particular requirements. These tools include digital and physical GIGA-mapping, parametric computer aided design (CAD, digital simulation of analyses, as well as sampling and 1:1 prototyping. Also discussed in this article are the methodologies used in several design projects to strategize these tools and the developments and trends in the tools employed.  The paper argues that the digital tools tend to produce similar results through given pre-sets that often do not correspond to real needs. Thus, there is a significant need for mixed methods including prototyping in the creative design process. Media mixing and cooperation across disciplines is unavoidable in the holistic approach to contemporary design. This includes the consideration of diverse biotic and abiotic agents. I argue that physical and digital GIGA-mapping is a crucial tool to use in coping with this complexity. Furthermore, I propose the integration of physical and digital outputs in one GIGA-map and the participation and co-design of biotic and abiotic agents into one rich design research space, which is resulting in an ever-evolving research-design process-result time-based design.

  13. Dynamic Data-Driven Prediction of Lean Blowout in a Swirl-Stabilized Combustor

    Directory of Open Access Journals (Sweden)

    Soumalya Sarkar

    2015-09-01

    Full Text Available This paper addresses dynamic data-driven prediction of lean blowout (LBO phenomena in confined combustion processes, which are prevalent in many physical applications (e.g., land-based and aircraft gas-turbine engines. The underlying concept is built upon pattern classification and is validated for LBO prediction with time series of chemiluminescence sensor data from a laboratory-scale swirl-stabilized dump combustor. The proposed method of LBO prediction makes use of the theory of symbolic dynamics, where (finite-length time series data are partitioned to produce symbol strings that, in turn, generate a special class of probabilistic finite state automata (PFSA. These PFSA, called D-Markov machines, have a deterministic algebraic structure and their states are represented by symbol blocks of length D or less, where D is a positive integer. The D-Markov machines are constructed in two steps: (i state splitting, i.e., the states are split based on their information contents, and (ii state merging, i.e., two or more states (of possibly different lengths are merged together to form a new state without any significant loss of the embedded information. The modeling complexity (e.g., number of states of a D-Markov machine model is observed to be drastically reduced as the combustor approaches LBO. An anomaly measure, based on Kullback-Leibler divergence, is constructed to predict the proximity of LBO. The problem of LBO prediction is posed in a pattern classification setting and the underlying algorithms have been tested on experimental data at different extents of fuel-air premixing and fuel/air ratio. It is shown that, over a wide range of fuel-air premixing, D-Markov machines with D > 1 perform better as predictors of LBO than those with D = 1.

  14. Central focused convolutional neural networks: Developing a data-driven model for lung nodule segmentation.

    Science.gov (United States)

    Wang, Shuo; Zhou, Mu; Liu, Zaiyi; Liu, Zhenyu; Gu, Dongsheng; Zang, Yali; Dong, Di; Gevaert, Olivier; Tian, Jie

    2017-08-01

    Accurate lung nodule segmentation from computed tomography (CT) images is of great importance for image-driven lung cancer analysis. However, the heterogeneity of lung nodules and the presence of similar visual characteristics between nodules and their surroundings make it difficult for robust nodule segmentation. In this study, we propose a data-driven model, termed the Central Focused Convolutional Neural Networks (CF-CNN), to segment lung nodules from heterogeneous CT images. Our approach combines two key insights: 1) the proposed model captures a diverse set of nodule-sensitive features from both 3-D and 2-D CT images simultaneously; 2) when classifying an image voxel, the effects of its neighbor voxels can vary according to their spatial locations. We describe this phenomenon by proposing a novel central pooling layer retaining much information on voxel patch center, followed by a multi-scale patch learning strategy. Moreover, we design a weighted sampling to facilitate the model training, where training samples are selected according to their degree of segmentation difficulty. The proposed method has been extensively evaluated on the public LIDC dataset including 893 nodules and an independent dataset with 74 nodules from Guangdong General Hospital (GDGH). We showed that CF-CNN achieved superior segmentation performance with average dice scores of 82.15% and 80.02% for the two datasets respectively. Moreover, we compared our results with the inter-radiologists consistency on LIDC dataset, showing a difference in average dice score of only 1.98%. Copyright © 2017. Published by Elsevier B.V.

  15. Data-driven integration of genome-scale regulatory and metabolic network models

    Science.gov (United States)

    Imam, Saheed; Schäuble, Sascha; Brooks, Aaron N.; Baliga, Nitin S.; Price, Nathan D.

    2015-01-01

    Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription, and signaling) have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert—a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system. PMID:25999934

  16. Data-driven integration of genome-scale regulatory and metabolic network models

    Directory of Open Access Journals (Sweden)

    Saheed eImam

    2015-05-01

    Full Text Available Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription and signaling have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert – a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system.

  17. Data driven model generation based on computational intelligence

    Science.gov (United States)

    Gemmar, Peter; Gronz, Oliver; Faust, Christophe; Casper, Markus

    2010-05-01

    The simulation of discharges at a local gauge or the modeling of large scale river catchments are effectively involved in estimation and decision tasks of hydrological research and practical applications like flood prediction or water resource management. However, modeling such processes using analytical or conceptual approaches is made difficult by both complexity of process relations and heterogeneity of processes. It was shown manifold that unknown or assumed process relations can principally be described by computational methods, and that system models can automatically be derived from observed behavior or measured process data. This study describes the development of hydrological process models using computational methods including Fuzzy logic and artificial neural networks (ANN) in a comprehensive and automated manner. Methods We consider a closed concept for data driven development of hydrological models based on measured (experimental) data. The concept is centered on a Fuzzy system using rules of Takagi-Sugeno-Kang type which formulate the input-output relation in a generic structure like Ri : IFq(t) = lowAND...THENq(t+Δt) = ai0 +ai1q(t)+ai2p(t-Δti1)+ai3p(t+Δti2)+.... The rule's premise part (IF) describes process states involving available process information, e.g. actual outlet q(t) is low where low is one of several Fuzzy sets defined over variable q(t). The rule's conclusion (THEN) estimates expected outlet q(t + Δt) by a linear function over selected system variables, e.g. actual outlet q(t), previous and/or forecasted precipitation p(t ?Δtik). In case of river catchment modeling we use head gauges, tributary and upriver gauges in the conclusion part as well. In addition, we consider temperature and temporal (season) information in the premise part. By creating a set of rules R = {Ri|(i = 1,...,N)} the space of process states can be covered as concise as necessary. Model adaptation is achieved by finding on optimal set A = (aij) of conclusion

  18. qPortal: A platform for data-driven biomedical research.

    Science.gov (United States)

    Mohr, Christopher; Friedrich, Andreas; Wojnar, David; Kenar, Erhan; Polatkan, Aydin Can; Codrea, Marius Cosmin; Czemmel, Stefan; Kohlbacher, Oliver; Nahnsen, Sven

    2018-01-01

    Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on

  19. PHYCAA: Data-driven measurement and removal of physiological noise in BOLD fMRI

    DEFF Research Database (Denmark)

    Churchill, Nathan W.; Yourganov, Grigori; Spring, Robyn

    2012-01-01

    , autocorrelated physiological noise sources with reproducible spatial structure, using an adaptation of Canonical Correlation Analysis performed in a split-half resampling framework. The technique is able to identify physiological effects with vascular-linked spatial structure, and an intrinsic dimensionality...... with physiological noise, and real data-driven model prediction and reproducibility, for both block and event-related task designs. This is demonstrated compared to no physiological noise correction, and to the widely used RETROICOR (Glover et al., 2000) physiological denoising algorithm, which uses externally...

  20. Examining Data Driven Decision Making via Formative Assessment: A Confluence of Technology, Data Interpretation Heuristics and Curricular Policy

    Science.gov (United States)

    Swan, Gerry; Mazur, Joan

    2011-01-01

    Although the term data-driven decision making (DDDM) is relatively new (Moss, 2007), the underlying concept of DDDM is not. For example, the practices of formative assessment and computer-managed instruction have historically involved the use of student performance data to guide what happens next in the instructional sequence (Morrison, Kemp, &…

  1. Fork-join and data-driven execution models on multi-core architectures: Case study of the FMM

    KAUST Repository

    Amer, Abdelhalim; Maruyama, Naoya; Pericà s, Miquel; Taura, Kenjiro; Yokota, Rio; Matsuoka, Satoshi

    2013-01-01

    Extracting maximum performance of multi-core architectures is a difficult task primarily due to bandwidth limitations of the memory subsystem and its complex hierarchy. In this work, we study the implications of fork-join and data-driven execution

  2. A copula-based sampling method for data-driven prognostics

    International Nuclear Information System (INIS)

    Xi, Zhimin; Jing, Rong; Wang, Pingfeng; Hu, Chao

    2014-01-01

    This paper develops a Copula-based sampling method for data-driven prognostics. The method essentially consists of an offline training process and an online prediction process: (i) the offline training process builds a statistical relationship between the failure time and the time realizations at specified degradation levels on the basis of off-line training data sets; and (ii) the online prediction process identifies probable failure times for online testing units based on the statistical model constructed in the offline process and the online testing data. Our contributions in this paper are three-fold, namely the definition of a generic health index system to quantify the health degradation of an engineering system, the construction of a Copula-based statistical model to learn the statistical relationship between the failure time and the time realizations at specified degradation levels, and the development of a simulation-based approach for the prediction of remaining useful life (RUL). Two engineering case studies, namely the electric cooling fan health prognostics and the 2008 IEEE PHM challenge problem, are employed to demonstrate the effectiveness of the proposed methodology. - Highlights: • We develop a novel mechanism for data-driven prognostics. • A generic health index system quantifies health degradation of engineering systems. • Off-line training model is constructed based on the Bayesian Copula model. • Remaining useful life is predicted from a simulation-based approach

  3. Data-driven risk identification in phase III clinical trials using central statistical monitoring.

    Science.gov (United States)

    Timmermans, Catherine; Venet, David; Burzykowski, Tomasz

    2016-02-01

    Our interest lies in quality control for clinical trials, in the context of risk-based monitoring (RBM). We specifically study the use of central statistical monitoring (CSM) to support RBM. Under an RBM paradigm, we claim that CSM has a key role to play in identifying the "risks to the most critical data elements and processes" that will drive targeted oversight. In order to support this claim, we first see how to characterize the risks that may affect clinical trials. We then discuss how CSM can be understood as a tool for providing a set of data-driven key risk indicators (KRIs), which help to organize adaptive targeted monitoring. Several case studies are provided where issues in a clinical trial have been identified thanks to targeted investigation after the identification of a risk using CSM. Using CSM to build data-driven KRIs helps to identify different kinds of issues in clinical trials. This ability is directly linked with the exhaustiveness of the CSM approach and its flexibility in the definition of the risks that are searched for when identifying the KRIs. In practice, a CSM assessment of the clinical database seems essential to ensure data quality. The atypical data patterns found in some centers and variables are seen as KRIs under a RBM approach. Targeted monitoring or data management queries can be used to confirm whether the KRIs point to an actual issue or not.

  4. Oracle database performance and scalability a quantitative approach

    CERN Document Server

    Liu, Henry H

    2011-01-01

    A data-driven, fact-based, quantitative text on Oracle performance and scalability With database concepts and theories clearly explained in Oracle's context, readers quickly learn how to fully leverage Oracle's performance and scalability capabilities at every stage of designing and developing an Oracle-based enterprise application. The book is based on the author's more than ten years of experience working with Oracle, and is filled with dependable, tested, and proven performance optimization techniques. Oracle Database Performance and Scalability is divided into four parts that enable reader

  5. Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

    OpenAIRE

    Bettadapura, Vinay; Schindler, Grant; Plotz, Thomaz; Essa, Irfan

    2015-01-01

    We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use of randomly sampled regular ...

  6. Data driven parallelism in experimental high energy physics applications

    International Nuclear Information System (INIS)

    Pohl, M.

    1987-01-01

    I present global design principles for the implementation of high energy physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of high energy physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordiate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms). (orig.)

  7. Selection of the Sample for Data-Driven $Z \\to \

    CERN Document Server

    Krauss, Martin

    2009-01-01

    The topic of this study was to improve the selection of the sample for data-driven Z → ν ν background estimation, which is a major contribution in supersymmetric searches in ̄ a no-lepton search mode. The data is based on Z → + − samples using data created with ATLAS simulation software. This method works if two leptons are reconstructed, but using cuts that are typical for SUSY searches reconstruction efficiency for electrons and muons is rather low. For this reason it was tried to enhance the data sample. Therefore events were considered, where only one electron was reconstructed. In this case the invariant mass for the electron and each jet was computed to select the jet with the best match for the Z boson mass as not reconstructed electron. This way the sample can be extended but significantly looses purity because of also reconstructed background events. To improve this method other variables have to be considered which were not available for this study. Applying a similar method to muons using ...

  8. ATLAS job transforms: a data driven workflow engine

    International Nuclear Information System (INIS)

    Stewart, G A; Breaden-Madden, W B; Maddocks, H J; Harenberg, T; Sandhoff, M; Sarrazin, B

    2014-01-01

    The need to run complex workflows for a high energy physics experiment such as ATLAS has always been present. However, as computing resources have become even more constrained, compared to the wealth of data generated by the LHC, the need to use resources efficiently and manage complex workflows within a single grid job have increased. In ATLAS, a new Job Transform framework has been developed that we describe in this paper. This framework manages the multiple execution steps needed to 'transform' one data type into another (e.g., RAW data to ESD to AOD to final ntuple) and also provides a consistent interface for the ATLAS production system. The new framework uses a data driven workflow definition which is both easy to manage and powerful. After a transform is defined, jobs are expressed simply by specifying the input data and the desired output data. The transform infrastructure then executes only the necessary substeps to produce the final data products. The global execution cost of running the job is minimised and the transform can adapt to scenarios where data can be produced along different execution paths. Transforms for specific physics tasks which support up to 60 individual substeps have been successfully run. As the new transforms infrastructure has been deployed in production many features have been added to the framework which improve reliability, quality of error reporting and also provide support for multi-process jobs.

  9. Data driven parallelism in experimental high energy physics applications

    Science.gov (United States)

    Pohl, Martin

    1987-08-01

    I present global design principles for the implementation of High Energy Physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of High Energy Physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The Task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordinate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms).

  10. Data driven profiting from your most important business asset

    CERN Document Server

    Redman, Thomas C

    2008-01-01

    Your company's data has the potential to add enormous value to every facet of the organization -- from marketing and new product development to strategy to financial management. Yet if your company is like most, it's not using its data to create strategic advantage. Data sits around unused -- or incorrect data fouls up operations and decision making. In Data Driven, Thomas Redman, the "Data Doc," shows how to leverage and deploy data to sharpen your company's competitive edge and enhance its profitability. The author reveals: · The special properties that make data such a powerful asset · The hidden costs of flawed, outdated, or otherwise poor-quality data · How to improve data quality for competitive advantage · Strategies for exploiting your data to make better business decisions · The many ways to bring data to market · Ideas for dealing with political struggles over data and concerns about privacy rights Your company's data is a key business asset, and you need to manage it aggressively and professi...

  11. Data driven processor 'Vertex Trigger' for B experiments

    International Nuclear Information System (INIS)

    Hartouni, E.P.

    1993-01-01

    Data Driven Processors (DDP's) are specialized computation engines configured to solve specific numerical problems, such as vertex reconstruction. The architecture of the DDP which is the subject of this talk was designed and implemented by W. Sippach and B.C. Knapp at Nevis Lab. in the early 1980's. This particular implementation allows multiple parallel streams of data to provide input to a heterogenous collection of simple operators whose interconnection form an algorithm. The local data flow control allows this device to execute algorithms extremely quickly provided that care is taken in the layout of the algorithm. I/O rates of several hundred megabytes/second are routinely achieved thus making DDP's attractive candidates for complex online calculations. The original question was open-quote can a DDP reconstruct tracks in a Silicon Vertex Detector, find events with a separated vertex and do it fast enough to be used as an online trigger?close-quote Restating this inquiry as three questions and describing the answers to the questions will be the subject of this talk. The three specific questions are: (1) Can an algorithm be found which reconstructs tracks in a planar geometry and no magnetic field; (2) Can separated vertices be recognized in some way; (3) Can the algorithm be implemented in the Nevis-UMass and DDP and execute in 10-20 μs?

  12. Data-Driven Machine-Learning Model in District Heating System for Heat Load Prediction: A Comparison Study

    Directory of Open Access Journals (Sweden)

    Fisnik Dalipi

    2016-01-01

    Full Text Available We present our data-driven supervised machine-learning (ML model to predict heat load for buildings in a district heating system (DHS. Even though ML has been used as an approach to heat load prediction in literature, it is hard to select an approach that will qualify as a solution for our case as existing solutions are quite problem specific. For that reason, we compared and evaluated three ML algorithms within a framework on operational data from a DH system in order to generate the required prediction model. The algorithms examined are Support Vector Regression (SVR, Partial Least Square (PLS, and random forest (RF. We use the data collected from buildings at several locations for a period of 29 weeks. Concerning the accuracy of predicting the heat load, we evaluate the performance of the proposed algorithms using mean absolute error (MAE, mean absolute percentage error (MAPE, and correlation coefficient. In order to determine which algorithm had the best accuracy, we conducted performance comparison among these ML algorithms. The comparison of the algorithms indicates that, for DH heat load prediction, SVR method presented in this paper is the most efficient one out of the three also compared to other methods found in the literature.

  13. A Data-driven Concept Schema for Defining Clinical Research Data Needs

    Science.gov (United States)

    Hruby, Gregory W.; Hoxha, Julia; Ravichandran, Praveen Chandar; Mendonça, Eneida A.; Hanauer, David A; Weng, Chunhua

    2016-01-01

    OBJECTIVES The Patient, Intervention, Control/Comparison, and Outcome (PICO) framework is an effective technique for framing a clinical question. We aim to develop the counterpart of PICO to structure clinical research data needs. METHODS We use a data-driven approach to abstracting key concepts representing clinical research data needs by adapting and extending an expert-derived framework originally developed for defining cancer research data needs. We annotated clinical trial eligibility criteria, EHR data request logs, and data queries to electronic health records (EHR), to extract and harmonize concept classes representing clinical research data needs. We evaluated the class coverage, class preservation from the original framework, schema generalizability, schema understandability, and schema structural correctness through a semi-structured interview with eight multidisciplinary domain experts. We iteratively refined the schema based on the evaluations. RESULTS Our data-driven schema preserved 68% of the 63 classes from the original framework and covered 88% (73/82) of the classes proposed by evaluators. Class coverage for participants of different backgrounds ranged from 60% to 100% with a median value of 95% agreement among the individual evaluators. The schema was found understandable and structurally sound. CONCLUSIONS Our proposed schema may serve as the counterpart to PICO for improving the research data needs communication between researchers and informaticians. PMID:27185504

  14. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization.

    Science.gov (United States)

    Peng, Huan-Kai; Lee, Hao-Chih; Pan, Jia-Yu; Marculescu, Radu

    2016-01-01

    In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications.

  15. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization.

    Directory of Open Access Journals (Sweden)

    Huan-Kai Peng

    Full Text Available In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications.

  16. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization

    Science.gov (United States)

    Peng, Huan-Kai; Lee, Hao-Chih; Pan, Jia-Yu; Marculescu, Radu

    2016-01-01

    In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications. PMID:26771830

  17. Data-driven design of fault diagnosis systems nonlinear multimode processes

    CERN Document Server

    Haghani Abandan Sari, Adel

    2014-01-01

    In many industrial applications early detection and diagnosis of abnormal behavior of the plant is of great importance. During the last decades, the complexity of process plants has been drastically increased, which imposes great challenges in development of model-based monitoring approaches and it sometimes becomes unrealistic for modern large-scale processes. The main objective of Adel Haghani Abandan Sari is to study efficient fault diagnosis techniques for complex industrial systems using process historical data and considering the nonlinear behavior of the process. To this end, different methods are presented to solve the fault diagnosis problem based on the overall behavior of the process and its dynamics. Moreover, a novel technique is proposed for fault isolation and determination of the root-cause of the faults in the system, based on the fault impacts on the process measurements. Contents Process monitoring Fault diagnosis and fault-tolerant control Data-driven approaches and decision making Target...

  18. Linear dynamical modes as new variables for data-driven ENSO forecast

    Science.gov (United States)

    Gavrilov, Andrey; Seleznev, Aleksei; Mukhin, Dmitry; Loskutov, Evgeny; Feigin, Alexander; Kurths, Juergen

    2018-05-01

    A new data-driven model for analysis and prediction of spatially distributed time series is proposed. The model is based on a linear dynamical mode (LDM) decomposition of the observed data which is derived from a recently developed nonlinear dimensionality reduction approach. The key point of this approach is its ability to take into account simple dynamical properties of the observed system by means of revealing the system's dominant time scales. The LDMs are used as new variables for empirical construction of a nonlinear stochastic evolution operator. The method is applied to the sea surface temperature anomaly field in the tropical belt where the El Nino Southern Oscillation (ENSO) is the main mode of variability. The advantage of LDMs versus traditionally used empirical orthogonal function decomposition is demonstrated for this data. Specifically, it is shown that the new model has a competitive ENSO forecast skill in comparison with the other existing ENSO models.

  19. Statistical multi-model approach for performance assessment of cooling tower

    International Nuclear Information System (INIS)

    Pan, Tian-Hong; Shieh, Shyan-Shu; Jang, Shi-Shang; Tseng, Wen-Hung; Wu, Chan-Wei; Ou, Jenq-Jang

    2011-01-01

    This paper presents a data-driven model-based assessment strategy to investigate the performance of a cooling tower. In order to achieve this objective, the operations of a cooling tower are first characterized using a data-driven method, multiple models, which presents a set of local models in the format of linear equations. Satisfactory fuzzy c-mean clustering algorithm is used to classify operating data into several groups to build local models. The developed models are then applied to predict the performance of the system based on design input parameters provided by the manufacturer. The tower characteristics are also investigated using the proposed models via the effects of the water/air flow ratio. The predicted results tend to agree well with the calculated tower characteristics using actual measured operating data from an industrial plant. By comparison with the design characteristic curve provided by the manufacturer, the effectiveness of cooling tower can be obtained in the end. A case study conducted in a commercial plant demonstrates the validity of proposed approach. It should be noted that this is the first attempt to assess the cooling efficiency which is deviated from the original design value using operating data for an industrial scale process. Moreover, the evaluated process need not interrupt the normal operation of the cooling tower. This should be of particular interest in industrial applications.

  20. Idiopathic Pulmonary Fibrosis: Data-driven Textural Analysis of Extent of Fibrosis at Baseline and 15-Month Follow-up.

    Science.gov (United States)

    Humphries, Stephen M; Yagihashi, Kunihiro; Huckleberry, Jason; Rho, Byung-Hak; Schroeder, Joyce D; Strand, Matthew; Schwarz, Marvin I; Flaherty, Kevin R; Kazerooni, Ella A; van Beek, Edwin J R; Lynch, David A

    2017-10-01

    Purpose To evaluate associations between pulmonary function and both quantitative analysis and visual assessment of thin-section computed tomography (CT) images at baseline and at 15-month follow-up in subjects with idiopathic pulmonary fibrosis (IPF). Materials and Methods This retrospective analysis of preexisting anonymized data, collected prospectively between 2007 and 2013 in a HIPAA-compliant study, was exempt from additional institutional review board approval. The extent of lung fibrosis at baseline inspiratory chest CT in 280 subjects enrolled in the IPF Network was evaluated. Visual analysis was performed by using a semiquantitative scoring system. Computer-based quantitative analysis included CT histogram-based measurements and a data-driven textural analysis (DTA). Follow-up CT images in 72 of these subjects were also analyzed. Univariate comparisons were performed by using Spearman rank correlation. Multivariate and longitudinal analyses were performed by using a linear mixed model approach, in which models were compared by using asymptotic χ 2 tests. Results At baseline, all CT-derived measures showed moderate significant correlation (P pulmonary function. At follow-up CT, changes in DTA scores showed significant correlation with changes in both forced vital capacity percentage predicted (ρ = -0.41, P pulmonary function (P fibrosis at CT yields an index of severity that correlates with visual assessment and functional change in subjects with IPF. © RSNA, 2017.

  1. Input variable selection for data-driven models of Coriolis flowmeters for two-phase flow measurement

    International Nuclear Information System (INIS)

    Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao

    2017-01-01

    Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction. (paper)

  2. NOvA Event Building, Buffering and Data-Driven Triggering From Within the DAQ System

    Energy Technology Data Exchange (ETDEWEB)

    Fischler, M. [Fermilab; Green, C. [Fermilab; Kowalkowski, J. [Fermilab; Norman, A. [Fermilab; Paterno, M. [Fermilab; Rechenmacher, R. [Fermilab

    2012-06-22

    To make its core measurements, the NOvA experiment needs to make real-time data-driven decisions involving beam-spill time correlation and other triggering issues. NOvA-DDT is a prototype Data-Driven Triggering system, built using the Fermilab artdaq generic DAQ/Event-building tools set. This provides the advantages of sharing online software infrastructure with other Intensity Frontier experiments, and of being able to use any offline analysis module--unchanged--as a component of the online triggering decisions. The NOvA-artdaq architecture chosen has significant advantages, including graceful degradation if the triggering decision software fails or cannot be done quickly enough for some fraction of the time-slice ``events.'' We have tested and measured the performance and overhead of NOvA-DDT using an actual Hough transform based trigger decision module taken from the NOvA offline software. The results of these tests--98 ms mean time per event on only 1/16 of th e available processing power of a node, and overheads of about 2 ms per event--provide a proof of concept: NOvA-DDT is a viable strategy for data acquisition, event building, and trigger processing at the NOvA far detector.

  3. Process analysis and data driven optimization in the salmon industry

    DEFF Research Database (Denmark)

    Johansson, Gine Ørnholt

    Aquaculture supplies around 70% of the salmon in the World and the industry is thus an important player in meeting the increasing demand for salmon products. Such mass production calls for systems that can handle thousands of tonnes of salmon without compromising the welfare of the fish...... and the following product quality. Moreover, the requirement of increased profit performance for the industry should be met with sustainable production solutions. Optimization during the production of salmon fillets could be one feasible approach to increase the outcome from the same level of incoming raw material...... and analysis of data from the salmon industry could be utilized to extract information that will support the industry in their decision-making processes. Mapping of quality parameters, their fluctuations and influences on yield and texture has been investigated. Additionally, the ability to predict the texture...

  4. A data-driven soft sensor for needle deflection in heterogeneous tissue using just-in-time modelling.

    Science.gov (United States)

    Rossa, Carlos; Lehmann, Thomas; Sloboda, Ronald; Usmani, Nawaid; Tavakoli, Mahdi

    2017-08-01

    Global modelling has traditionally been the approach taken to estimate needle deflection in soft tissue. In this paper, we propose a new method based on local data-driven modelling of needle deflection. External measurement of needle-tissue interactions is collected from several insertions in ex vivo tissue to form a cloud of data. Inputs to the system are the needle insertion depth, axial rotations, and the forces and torques measured at the needle base by a force sensor. When a new insertion is performed, the just-in-time learning method estimates the model outputs given the current inputs to the needle-tissue system and the historical database. The query is compared to every observation in the database and is given weights according to some similarity criteria. Only a subset of historical data that is most relevant to the query is selected and a local linear model is fit to the selected points to estimate the query output. The model outputs the 3D deflection of the needle tip and the needle insertion force. The proposed approach is validated in ex vivo multilayered biological tissue in different needle insertion scenarios. Experimental results in five different case studies indicate an accuracy in predicting needle deflection of 0.81 and 1.24 mm in the horizontal and vertical lanes, respectively, and an accuracy of 0.5 N in predicting the needle insertion force over 216 needle insertions.

  5. Data-Driven Design of Intelligent Wireless Networks: An Overview and Tutorial

    Directory of Open Access Journals (Sweden)

    Merima Kulin

    2016-06-01

    Full Text Available Data science or “data-driven research” is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i clarifies when, why and how to use data science in wireless network research; (ii provides a generic framework for applying data science in wireless networks; (iii gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v provides the reader the necessary datasets and scripts to go through the tutorial steps themselves.

  6. Data-driven methods towards learning the highly nonlinear inverse kinematics of tendon-driven surgical manipulators.

    Science.gov (United States)

    Xu, Wenjun; Chen, Jie; Lau, Henry Y K; Ren, Hongliang

    2017-09-01

    Accurate motion control of flexible surgical manipulators is crucial in tissue manipulation tasks. The tendon-driven serpentine manipulator (TSM) is one of the most widely adopted flexible mechanisms in minimally invasive surgery because of its enhanced maneuverability in torturous environments. TSM, however, exhibits high nonlinearities and conventional analytical kinematics model is insufficient to achieve high accuracy. To account for the system nonlinearities, we applied a data driven approach to encode the system inverse kinematics. Three regression methods: extreme learning machine (ELM), Gaussian mixture regression (GMR) and K-nearest neighbors regression (KNNR) were implemented to learn a nonlinear mapping from the robot 3D position states to the control inputs. The performance of the three algorithms was evaluated both in simulation and physical trajectory tracking experiments. KNNR performed the best in the tracking experiments, with the lowest RMSE of 2.1275 mm. The proposed inverse kinematics learning methods provide an alternative and efficient way to accurately model the tendon driven flexible manipulator. Copyright © 2016 John Wiley & Sons, Ltd.

  7. Automatic translation of MPI source into a latency-tolerant, data-driven form

    International Nuclear Information System (INIS)

    Nguyen, Tan; Cicotti, Pietro; Bylaska, Eric; Quinlan, Dan; Baden, Scott

    2017-01-01

    Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo’s performance meets or exceeds that of labor-intensive hand coding. As a result, the translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a well-known library.

  8. Data-driven fault detection for industrial processes canonical correlation analysis and projection based methods

    CERN Document Server

    Chen, Zhiwen

    2017-01-01

    Zhiwen Chen aims to develop advanced fault detection (FD) methods for the monitoring of industrial processes. With the ever increasing demands on reliability and safety in industrial processes, fault detection has become an important issue. Although the model-based fault detection theory has been well studied in the past decades, its applications are limited to large-scale industrial processes because it is difficult to build accurate models. Furthermore, motivated by the limitations of existing data-driven FD methods, novel canonical correlation analysis (CCA) and projection-based methods are proposed from the perspectives of process input and output data, less engineering effort and wide application scope. For performance evaluation of FD methods, a new index is also developed. Contents A New Index for Performance Evaluation of FD Methods CCA-based FD Method for the Monitoring of Stationary Processes Projection-based FD Method for the Monitoring of Dynamic Processes Benchmark Study and Real-Time Implementat...

  9. Data-Driven Based Asynchronous Motor Control for Printing Servo Systems

    Science.gov (United States)

    Bian, Min; Guo, Qingyun

    Modern digital printing equipment aims to the environmental-friendly industry with high dynamic performances and control precision and low vibration and abrasion. High performance motion control system of printing servo systems was required. Control system of asynchronous motor based on data acquisition was proposed. Iterative learning control (ILC) algorithm was studied. PID control was widely used in the motion control. However, it was sensitive to the disturbances and model parameters variation. The ILC applied the history error data and present control signals to approximate the control signal directly in order to fully track the expect trajectory without the system models and structures. The motor control algorithm based on the ILC and PID was constructed and simulation results were given. The results show that data-driven control method is effective dealing with bounded disturbances for the motion control of printing servo systems.

  10. A new data-driven controllability measure with application in intelligent buildings

    DEFF Research Database (Denmark)

    Shaker, Hamid Reza; Lazarova-Molnar, Sanja

    2017-01-01

    and instrumentation within today's intelligent buildings enable collecting high quality data which could be used directly in data-based analysis and control methods. The area of data-based systems analysis and control is concentrating on developing analysis and control methods that rely on data collected from meters...... and sensors, and information obtained by data processing. This differs from the traditional model-based approaches that are based on mathematical models of systems. We propose and describe a data-driven controllability measure for discrete-time linear systems. The concept is developed within a data......-based system analysis and control framework. Therefore, only measured data is used to obtain the proposed controllability measure. The proposed controllability measure not only shows if the system is controllable or not, but also reveals the level of controllability, which is the information its previous...

  11. Automatic sleep classification using a data-driven topic model reveals latent sleep states

    DEFF Research Database (Denmark)

    Koch, Henriette; Christensen, Julie Anja Engelhard; Frandsen, Rune

    2014-01-01

    Latent Dirichlet Allocation. Model application was tested on control subjects and patients with periodic leg movements (PLM) representing a non-neurodegenerative group, and patients with idiopathic REM sleep behavior disorder (iRBD) and Parkinson's Disease (PD) representing a neurodegenerative group......Background: The golden standard for sleep classification uses manual scoring of polysomnography despite points of criticism such as oversimplification, low inter-rater reliability and the standard being designed on young and healthy subjects. New method: To meet the criticism and reveal the latent...... sleep states, this study developed a general and automatic sleep classifier using a data-driven approach. Spectral EEG and EOG measures and eye correlation in 1 s windows were calculated and each sleep epoch was expressed as a mixture of probabilities of latent sleep states by using the topic model...

  12. Data-driven outbreak forecasting with a simple nonlinear growth model.

    Science.gov (United States)

    Lega, Joceline; Brown, Heidi E

    2016-12-01

    Recent events have thrown the spotlight on infectious disease outbreak response. We developed a data-driven method, EpiGro, which can be applied to cumulative case reports to estimate the order of magnitude of the duration, peak and ultimate size of an ongoing outbreak. It is based on a surprisingly simple mathematical property of many epidemiological data sets, does not require knowledge or estimation of disease transmission parameters, is robust to noise and to small data sets, and runs quickly due to its mathematical simplicity. Using data from historic and ongoing epidemics, we present the model. We also provide modeling considerations that justify this approach and discuss its limitations. In the absence of other information or in conjunction with other models, EpiGro may be useful to public health responders. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  13. First-principles data-driven discovery of transition metal oxides for artificial photosynthesis

    Science.gov (United States)

    Yan, Qimin

    We develop a first-principles data-driven approach for rapid identification of transition metal oxide (TMO) light absorbers and photocatalysts for artificial photosynthesis using the Materials Project. Initially focusing on Cr, V, and Mn-based ternary TMOs in the database, we design a broadly-applicable multiple-layer screening workflow automating density functional theory (DFT) and hybrid functional calculations of bulk and surface electronic and magnetic structures. We further assess the electrochemical stability of TMOs in aqueous environments from computed Pourbaix diagrams. Several promising earth-abundant low band-gap TMO compounds with desirable band edge energies and electrochemical stability are identified by our computational efforts and then synergistically evaluated using high-throughput synthesis and photoelectrochemical screening techniques by our experimental collaborators at Caltech. Our joint theory-experiment effort has successfully identified new earth-abundant copper and manganese vanadate complex oxides that meet highly demanding requirements for photoanodes, substantially expanding the known space of such materials. By integrating theory and experiment, we validate our approach and develop important new insights into structure-property relationships for TMOs for oxygen evolution photocatalysts, paving the way for use of first-principles data-driven techniques in future applications. This work is supported by the Materials Project Predictive Modeling Center and the Joint Center for Artificial Photosynthesis through the U.S. Department of Energy, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, under Contract No. DE-AC02-05CH11231. Computational resources also provided by the Department of Energy through the National Energy Supercomputing Center.

  14. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks.

    Science.gov (United States)

    Vlachas, Pantelis R; Byeon, Wonmin; Wan, Zhong Y; Sapsis, Themistoklis P; Koumoutsakos, Petros

    2018-05-01

    We introduce a data-driven forecasting method for high-dimensional chaotic systems using long short-term memory (LSTM) recurrent neural networks. The proposed LSTM neural networks perform inference of high-dimensional dynamical systems in their reduced order space and are shown to be an effective set of nonlinear approximators of their attractor. We demonstrate the forecasting performance of the LSTM and compare it with Gaussian processes (GPs) in time series obtained from the Lorenz 96 system, the Kuramoto-Sivashinsky equation and a prototype climate model. The LSTM networks outperform the GPs in short-term forecasting accuracy in all applications considered. A hybrid architecture, extending the LSTM with a mean stochastic model (MSM-LSTM), is proposed to ensure convergence to the invariant measure. This novel hybrid method is fully data-driven and extends the forecasting capabilities of LSTM networks.

  15. A Data-Driven Stochastic Reactive Power Optimization Considering Uncertainties in Active Distribution Networks and Decomposition Method

    DEFF Research Database (Denmark)

    Ding, Tao; Yang, Qingrun; Yang, Yongheng

    2018-01-01

    To address the uncertain output of distributed generators (DGs) for reactive power optimization in active distribution networks, the stochastic programming model is widely used. The model is employed to find an optimal control strategy with minimum expected network loss while satisfying all......, in this paper, a data-driven modeling approach is introduced to assume that the probability distribution from the historical data is uncertain within a confidence set. Furthermore, a data-driven stochastic programming model is formulated as a two-stage problem, where the first-stage variables find the optimal...... control for discrete reactive power compensation equipment under the worst probability distribution of the second stage recourse. The second-stage variables are adjusted to uncertain probability distribution. In particular, this two-stage problem has a special structure so that the second-stage problem...

  16. Data-driven mapping of the potential mountain permafrost distribution.

    Science.gov (United States)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2017-07-15

    Existing mountain permafrost distribution models generally offer a good overview of the potential extent of this phenomenon at a regional scale. They are however not always able to reproduce the high spatial discontinuity of permafrost at the micro-scale (scale of a specific landform; ten to several hundreds of meters). To overcome this lack, we tested an alternative modelling approach using three classification algorithms belonging to statistics and machine learning: Logistic regression, Support Vector Machines and Random forests. These supervised learning techniques infer a classification function from labelled training data (pixels of permafrost absence and presence) with the aim of predicting the permafrost occurrence where it is unknown. The research was carried out in a 588km 2 area of the Western Swiss Alps. Permafrost evidences were mapped from ortho-image interpretation (rock glacier inventorying) and field data (mainly geoelectrical and thermal data). The relationship between selected permafrost evidences and permafrost controlling factors was computed with the mentioned techniques. Classification performances, assessed with AUROC, range between 0.81 for Logistic regression, 0.85 with Support Vector Machines and 0.88 with Random forests. The adopted machine learning algorithms have demonstrated to be efficient for permafrost distribution modelling thanks to consistent results compared to the field reality. The high resolution of the input dataset (10m) allows elaborating maps at the micro-scale with a modelled permafrost spatial distribution less optimistic than classic spatial models. Moreover, the probability output of adopted algorithms offers a more precise overview of the potential distribution of mountain permafrost than proposing simple indexes of the permafrost favorability. These encouraging results also open the way to new possibilities of permafrost data analysis and mapping. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. The test of data driven TDC application in high energy physics experiment

    International Nuclear Information System (INIS)

    Liu Shubin; Guo Jianhua; Zhang Yanli; Zhao Long; An Qi

    2006-01-01

    In the high energy physics domain there is a trend to use integrated, high resolution, multi-hit time-digital-converter for time measurement, of which the data driven TDC is an important direction. Study on the method of how to test high performance TDC's characters and how to improve these characters will help us to select the proper TDC. The authors have studied the testing of a new high resolution TDC, which is planned to use in the third modification project of Beijing Spectrometer (BESIII). This paper introduces the test platform we built for the TDC, and the method by which we tested for nonlinearity, resolution, double pulse resolution characters, etc. The paper also gives the test results and introduces the compensation way to achieve a very high resolution (24.4 ps). (authors)

  18. Data-driven gradient algorithm for high-precision quantum control

    Science.gov (United States)

    Wu, Re-Bing; Chu, Bing; Owens, David H.; Rabitz, Herschel

    2018-04-01

    In the quest to achieve scalable quantum information processing technologies, gradient-based optimal control algorithms (e.g., grape) are broadly used for implementing high-precision quantum gates, but their performance is often hindered by deterministic or random errors in the system model and the control electronics. In this paper, we show that grape can be taught to be more effective by jointly learning from the design model and the experimental data obtained from process tomography. The resulting data-driven gradient optimization algorithm (d-grape) can in principle correct all deterministic gate errors, with a mild efficiency loss. The d-grape algorithm may become more powerful with broadband controls that involve a large number of control parameters, while other algorithms usually slow down due to the increased size of the search space. These advantages are demonstrated by simulating the implementation of a two-qubit controlled-not gate.

  19. A Data-Driven Noise Reduction Method and Its Application for the Enhancement of Stress Wave Signals

    Directory of Open Access Journals (Sweden)

    Hai-Lin Feng

    2012-01-01

    Full Text Available Ensemble empirical mode decomposition (EEMD has been recently used to recover a signal from observed noisy data. Typically this is performed by partial reconstruction or thresholding operation. In this paper we describe an efficient noise reduction method. EEMD is used to decompose a signal into several intrinsic mode functions (IMFs. The time intervals between two adjacent zero-crossings within the IMF, called instantaneous half period (IHP, are used as a criterion to detect and classify the noise oscillations. The undesirable waveforms with a larger IHP are set to zero. Furthermore, the optimum threshold in this approach can be derived from the signal itself using the consecutive mean square error (CMSE. The method is fully data driven, and it requires no prior knowledge of the target signals. This method can be verified with the simulative program by using Matlab. The denoising results are proper. In comparison with other EEMD based methods, it is concluded that the means adopted in this paper is suitable to preprocess the stress wave signals in the wood nondestructive testing.

  20. Data-driven online monitoring of wind turbines

    NARCIS (Netherlands)

    Kenbeek, T.; Kapodistria, S.; Di Bucchianico, A.

    2017-01-01

    Condition based maintenance is a modern approach to maintenance which has been successfully used in several industrial sectors. In this paper we present a concrete statistical approach to condition based maintenance for wind turbine by applying ideas from statistical process control. A specific

  1. Preface [HD3-2015: International meeting on high-dimensional data-driven science

    International Nuclear Information System (INIS)

    2016-01-01

    A never-ending series of innovations in measurement technology and evolutions in information and communication technologies have led to the ongoing generation and accumulation of large quantities of high-dimensional data every day. While detailed data-centric approaches have been pursued in respective research fields, situations have been encountered where the same mathematical framework of high-dimensional data analysis can be found in a wide variety of seemingly unrelated research fields, such as estimation on the basis of undersampled Fourier transform in nuclear magnetic resonance spectroscopy in chemistry, in magnetic resonance imaging in medicine, and in astronomical interferometry in astronomy. In such situations, bringing diverse viewpoints together therefore becomes a driving force for the creation of innovative developments in various different research fields. This meeting focuses on “Sparse Modeling” (SpM) as a methodology for creation of innovative developments through the incorporation of a wide variety of viewpoints in various research fields. The objective of this meeting is to offer a forum where researchers with interest in SpM can assemble and exchange information on the latest results and newly established methodologies, and discuss future directions of the interdisciplinary studies for High-Dimensional Data-Driven science (HD 3 ). The meeting was held in Kyoto from 14-17 December 2015. We are pleased to publish 22 papers contributed by invited speakers in this volume of Journal of Physics: Conference Series. We hope that this volume will promote further development of High-Dimensional Data-Driven science. (paper)

  2. A data-driven prediction method for fast-slow systems

    Science.gov (United States)

    Groth, Andreas; Chekroun, Mickael; Kondrashov, Dmitri; Ghil, Michael

    2016-04-01

    In this work, we present a prediction method for processes that exhibit a mixture of variability on low and fast scales. The method relies on combining empirical model reduction (EMR) with singular spectrum analysis (SSA). EMR is a data-driven methodology for constructing stochastic low-dimensional models that account for nonlinearity and serial correlation in the estimated noise, while SSA provides a decomposition of the complex dynamics into low-order components that capture spatio-temporal behavior on different time scales. Our study focuses on the data-driven modeling of partial observations from dynamical systems that exhibit power spectra with broad peaks. The main result in this talk is that the combination of SSA pre-filtering with EMR modeling improves, under certain circumstances, the modeling and prediction skill of such a system, as compared to a standard EMR prediction based on raw data. Specifically, it is the separation into "fast" and "slow" temporal scales by the SSA pre-filtering that achieves the improvement. We show, in particular that the resulting EMR-SSA emulators help predict intermittent behavior such as rapid transitions between specific regions of the system's phase space. This capability of the EMR-SSA prediction will be demonstrated on two low-dimensional models: the Rössler system and a Lotka-Volterra model for interspecies competition. In either case, the chaotic dynamics is produced through a Shilnikov-type mechanism and we argue that the latter seems to be an important ingredient for the good prediction skills of EMR-SSA emulators. Shilnikov-type behavior has been shown to arise in various complex geophysical fluid models, such as baroclinic quasi-geostrophic flows in the mid-latitude atmosphere and wind-driven double-gyre ocean circulation models. This pervasiveness of the Shilnikow mechanism of fast-slow transition opens interesting perspectives for the extension of the proposed EMR-SSA approach to more realistic situations.

  3. NOvA Event Building, Buffering and Data-Driven Triggering From Within the DAQ System

    International Nuclear Information System (INIS)

    Fischler, M; Rechenmacher, R; Green, C; Kowalkowski, J; Norman, A; Paterno, M

    2012-01-01

    The NOvA experiment is a long baseline neutrino experiment design to make precision probes of the structure of neutrino mixing. The experiment features a unique deadtimeless data acquisition system that is capable acquiring and building an event data stream from the continuous readout of the more than 360,000 far detector channels. In order to achieve its physics goals the experiment must be able to buffer, correlate and extract the data in this stream with the beam-spills that occur that Fermilab. In addition the NOvA experiment seeks to enhance its data collection efficiencies for rare class of event topologies that are valuable for calibration through the use of data driven triggering. The NOvA-DDT is a prototype Data-Driven Triggering system. NOvA-DDT has been developed using the Fermilab artdaq generic DAQ/Event-building toolkit. This toolkit provides the advantages of sharing online software infrastructure with other Intensity Frontier experiments, and of being able to use any offline analysis module-unchanged-as a component of the online triggering decisions. We have measured the performance and overhead of NOvA-DDT framework using a Hough transform based trigger decision module developed for the NOvA detector to identify cosmic rays. The results of these tests which were run on the NOvA prototype near detector, yielded a mean processing time of 98 ms per event, while consuming only 1/16th of the available processing capacity. These results provide a proof of concept that a NOvA-DDT based processing system is a viable strategy for data acquisition and triggering for the NOvA far detector.

  4. Testing the Accuracy of Data-driven MHD Simulations of Active Region Evolution

    Energy Technology Data Exchange (ETDEWEB)

    Leake, James E.; Linton, Mark G. [U.S. Naval Research Laboratory, 4555 Overlook Avenue, SW, Washington, DC 20375 (United States); Schuck, Peter W., E-mail: james.e.leake@nasa.gov [NASA Goddard Space Flight Center, 8800 Greenbelt Road, Greenbelt, MD 20771 (United States)

    2017-04-01

    Models for the evolution of the solar coronal magnetic field are vital for understanding solar activity, yet the best measurements of the magnetic field lie at the photosphere, necessitating the development of coronal models which are “data-driven” at the photosphere. We present an investigation to determine the feasibility and accuracy of such methods. Our validation framework uses a simulation of active region (AR) formation, modeling the emergence of magnetic flux from the convection zone to the corona, as a ground-truth data set, to supply both the photospheric information and to perform the validation of the data-driven method. We focus our investigation on how the accuracy of the data-driven model depends on the temporal frequency of the driving data. The Helioseismic and Magnetic Imager on NASA’s Solar Dynamics Observatory produces full-disk vector magnetic field measurements at a 12-minute cadence. Using our framework we show that ARs that emerge over 25 hr can be modeled by the data-driving method with only ∼1% error in the free magnetic energy, assuming the photospheric information is specified every 12 minutes. However, for rapidly evolving features, under-sampling of the dynamics at this cadence leads to a strobe effect, generating large electric currents and incorrect coronal morphology and energies. We derive a sampling condition for the driving cadence based on the evolution of these small-scale features, and show that higher-cadence driving can lead to acceptable errors. Future work will investigate the source of errors associated with deriving plasma variables from the photospheric magnetograms as well as other sources of errors, such as reduced resolution, instrument bias, and noise.

  5. Methodological approach to organizational performance improvement process

    OpenAIRE

    Buble, Marin; Dulčić, Želimir; Pavić, Ivan

    2017-01-01

    Organizational performance improvement is one of the fundamental enterprise tasks. This especially applies to the case when the term “performance improvement” implies efficiency improvement measured by indicators, such as ROI, ROE, ROA, or ROVA/ROI. Such tasks are very complex, requiring implementation by means of project management. In this paper, the authors propose a methodological approach to improving the organizational performance of a large enterprise.

  6. Methodological approach to organizational performance improvement process

    Directory of Open Access Journals (Sweden)

    Marin Buble

    2001-01-01

    Full Text Available Organizational performance improvement is one of the fundamental enterprise tasks. This especially applies to the case when the term “performance improvement” implies efficiency improvement measured by indicators, such as ROI, ROE, ROA, or ROVA/ROI. Such tasks are very complex, requiring implementation by means of project management. In this paper, the authors propose a methodological approach to improving the organizational performance of a large enterprise.

  7. Comparing alternative data-driven ontological vistas of natural history

    NARCIS (Netherlands)

    van Erp, M.G.J.; Lendvai, P.K.; van den Bosch, A.; Bunt, H.; Petukhova, V.; Wubben, S.

    2009-01-01

    Traditionally, ontologies are created manually, based on human experts' view of the concepts and relations of the domain at hand. We present ongoing work on two approaches to the automatic construction of ontologies from a flat database of records, and compare them to a manually constructed

  8. The HADDOCK web server for data-driven biomolecular docking

    NARCIS (Netherlands)

    de Vries, S.J.|info:eu-repo/dai/nl/304837717; van Dijk, M.|info:eu-repo/dai/nl/325811113; Bonvin, A.M.J.J.|info:eu-repo/dai/nl/113691238

    2010-01-01

    Computational docking is the prediction or modeling of the three-dimensional structure of a biomolecular complex, starting from the structures of the individual molecules in their free, unbound form. HADDOC K is a popular docking program that takes a datadriven approach to docking, with support for

  9. Data-driven efficient score tests for deconvolution hypotheses

    NARCIS (Netherlands)

    Langovoy, M.

    2008-01-01

    We consider testing statistical hypotheses about densities of signals in deconvolution models. A new approach to this problem is proposed. We constructed score tests for the deconvolution density testing with the known noise density and efficient score tests for the case of unknown density. The

  10. Data-driven design optimization for composite material characterization

    Science.gov (United States)

    John G. Michopoulos; John C. Hermanson; Athanasios Iliopoulos; Samuel G. Lambrakos; Tomonari Furukawa

    2011-06-01

    The main goal of the present paper is to demonstrate the value of design optimization beyond its use for structural shape determination in the realm of the constitutive characterization of anisotropic material systems such as polymer matrix composites with or without damage. The approaches discussed are based on the availability of massive experimental data...

  11. Data-driven gating in PET: Influence of respiratory signal noise on motion resolution.

    Science.gov (United States)

    Büther, Florian; Ernst, Iris; Frohwein, Lynn Johann; Pouw, Joost; Schäfers, Klaus Peter; Stegger, Lars

    2018-05-21

    Data-driven gating (DDG) approaches for positron emission tomography (PET) are interesting alternatives to conventional hardware-based gating methods. In DDG, the measured PET data themselves are utilized to calculate a respiratory signal, that is, subsequently used for gating purposes. The success of gating is then highly dependent on the statistical quality of the PET data. In this study, we investigate how this quality determines signal noise and thus motion resolution in clinical PET scans using a center-of-mass-based (COM) DDG approach, specifically with regard to motion management of target structures in future radiotherapy planning applications. PET list mode datasets acquired in one bed position of 19 different radiotherapy patients undergoing pretreatment [ 18 F]FDG PET/CT or [ 18 F]FDG PET/MRI were included into this retrospective study. All scans were performed over a region with organs (myocardium, kidneys) or tumor lesions of high tracer uptake and under free breathing. Aside from the original list mode data, datasets with progressively decreasing PET statistics were generated. From these, COM DDG signals were derived for subsequent amplitude-based gating of the original list mode file. The apparent respiratory shift d from end-expiration to end-inspiration was determined from the gated images and expressed as a function of signal-to-noise ratio SNR of the determined gating signals. This relation was tested against additional 25 [ 18 F]FDG PET/MRI list mode datasets where high-precision MR navigator-like respiratory signals were available as reference signal for respiratory gating of PET data, and data from a dedicated thorax phantom scan. All original 19 high-quality list mode datasets demonstrated the same behavior in terms of motion resolution when reducing the amount of list mode events for DDG signal generation. Ratios and directions of respiratory shifts between end-respiratory gates and the respective nongated image were constant over all

  12. Data-driven execution of fast multipole methods

    KAUST Repository

    Ltaief, Hatem; Yokota, Rio

    2013-01-01

    time-consuming stages of the FMMs into smaller tasks. The algorithm can then be represented as a directed acyclic graph where nodes represent tasks and edges represent dependencies among them. The execution of the algorithm is performed

  13. A Proposed Data Driven Architecture for Cardiology Network Application

    Directory of Open Access Journals (Sweden)

    Calin Ovidiu Cenan

    2010-04-01

    Full Text Available The paper presents a framework for a distributed medical system meant to bring a modern approach and inhance the quality of medical services offered to chronicle patients with cardio-vascular diseases, through the latest IT&C technologies. The proposed system provides online interaction between the main actors of a medical system: patients, doctors, medical entities (e.g. hospitals, clinics and medical authorities (e.g. social services. With the aid of widely accepted medical standards, the system supplies storage for medical records and offers services for data integration between different kinds of healthcare applications and entities. The proposed ontological approach allows interchange of medical knowledge and best practices with conceptually organized patients’ records. The proposed solution allows computer assisted diagnoses and multi-criteria medical data analysis, with the possible extent of building a data warehouse for complex medical data mining.

  14. Data driven analysis of rain events: feature extraction, clustering, microphysical /macro physical relationship

    Science.gov (United States)

    Djallel Dilmi, Mohamed; Mallet, Cécile; Barthes, Laurent; Chazottes, Aymeric

    2017-04-01

    that a rain time series can be considered by an alternation of independent rain event and no rain period. The five selected feature are used to perform a hierarchical clustering of the events. The well-known division between stratiform and convective events appears clearly. This classification into two classes is then refined in 5 fairly homogeneous subclasses. The data driven analysis performed on whole rain events instead of fixed length samples allows identifying strong relationships between macrophysics (based on rain rate) and microphysics (based on raindrops) features. We show that among the 5 identified subclasses some of them have specific microphysics characteristics. Obtaining information on microphysical characteristics of rainfall events from rain gauges measurement suggests many implications in development of the quantitative precipitation estimation (QPE), for the improvement of rain rate retrieval algorithm in remote sensing context.

  15. Data-driven security analysis, visualization and dashboards

    CERN Document Server

    Jacobs, Jay

    2014-01-01

    Uncover hidden patterns of data and respond with countermeasures Security professionals need all the tools at their disposal to increase their visibility in order to prevent security breaches and attacks. This careful guide explores two of the most powerful ? data analysis and visualization. You'll soon understand how to harness and wield data, from collection and storage to management and analysis as well as visualization and presentation. Using a hands-on approach with real-world examples, this book shows you how to gather feedback, measure the effectiveness of your security methods, and ma

  16. Data-driven Demand Response Characterization and Quantification

    DEFF Research Database (Denmark)

    Le Ray, Guillaume; Pinson, Pierre; Larsen, Emil Mahler

    2017-01-01

    Analysis of load behavior in demand response (DR) schemes is important to evaluate the performance of participants. Very few real-world experiments have been carried out and quantification and characterization of the response is a difficult task. Nevertheless it will be a necessary tool for portf...

  17. Data-driven warehouse optimization : Deploying skills of order pickers

    NARCIS (Netherlands)

    M. Matusiak (Marek); M.B.M. de Koster (René); J. Saarinen (Jari)

    2015-01-01

    textabstractBatching orders and routing order pickers is a commonly studied problem in many picker-to-parts warehouses. The impact of individual differences in picking skills on performance has received little attention. In this paper, we show that taking into account differences in the skills of

  18. Approach to performance based regulation development

    International Nuclear Information System (INIS)

    Spogen, L.R.; Cleland, L.L.

    1977-06-01

    An approach to the development of performance based regulations (PBR's) is described. Initially, a framework is constructed that consists of a function hierarchy and associated measures. The function at the top of the hierarchy is described in terms of societal objectives. Decomposition of this function into subordinate functions and their subsequent decompositions yield the function hierarchy. ''Bottom'' functions describe the roles of system components. When measures are identified for the performance of each function and means of aggregating performances to higher levels are established, the framework may be employed for developing PBR's. Consideration of system flexibility and performance uncertainty guide in determining the hierarchical level at which regulations are formulated. Ease of testing compliance is also a factor. To show the viability of the approach, the framework developed by Lawrence Livermore Laboratory for the Nuclear Regulatory Commission for evaluation of material control systems at fixed facilities is presented

  19. RNA motif search with data-driven element ordering.

    Science.gov (United States)

    Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

    2016-05-18

    In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .

  20. Methodological approach to strategic performance optimization

    OpenAIRE

    Hell, Marko; Vidačić, Stjepan; Garača, Željko

    2009-01-01

    This paper presents a matrix approach to the measuring and optimization of organizational strategic performance. The proposed model is based on the matrix presentation of strategic performance, which follows the theoretical notions of the balanced scorecard (BSC) and strategy map methodologies, initially developed by Kaplan and Norton. Development of a quantitative record of strategic objectives provides an arena for the application of linear programming (LP), which is a mathematical tech...

  1. Data-driven quantification of the robustness and sensitivity of cell signaling networks

    International Nuclear Information System (INIS)

    Mukherjee, Sayak; Seok, Sang-Cheol; Vieland, Veronica J; Das, Jayajit

    2013-01-01

    Robustness and sensitivity of responses generated by cell signaling networks has been associated with survival and evolvability of organisms. However, existing methods analyzing robustness and sensitivity of signaling networks ignore the experimentally observed cell-to-cell variations of protein abundances and cell functions or contain ad hoc assumptions. We propose and apply a data-driven maximum entropy based method to quantify robustness and sensitivity of Escherichia coli (E. coli) chemotaxis signaling network. Our analysis correctly rank orders different models of E. coli chemotaxis based on their robustness and suggests that parameters regulating cell signaling are evolutionary selected to vary in individual cells according to their abilities to perturb cell functions. Furthermore, predictions from our approach regarding distribution of protein abundances and properties of chemotactic responses in individual cells based on cell population averaged data are in excellent agreement with their experimental counterparts. Our approach is general and can be used to evaluate robustness as well as generate predictions of single cell properties based on population averaged experimental data in a wide range of cell signaling systems. (paper)

  2. Dynamic model reduction using data-driven Loewner-framework applied to thermally morphing structures

    Science.gov (United States)

    Phoenix, Austin A.; Tarazaga, Pablo A.

    2017-05-01

    The work herein proposes the use of the data-driven Loewner-framework for reduced order modeling as applied to dynamic Finite Element Models (FEM) of thermally morphing structures. The Loewner-based modeling approach is computationally efficient and accurately constructs reduced models using analytical output data from a FEM. This paper details the two-step process proposed in the Loewner approach. First, a random vibration FEM simulation is used as the input for the development of a Single Input Single Output (SISO) data-based dynamic Loewner state space model. Second, an SVD-based truncation is used on the Loewner state space model, such that the minimal, dynamically representative, state space model is achieved. For this second part, varying levels of reduction are generated and compared. The work herein can be extended to model generation using experimental measurements by replacing the FEM output data in the first step and following the same procedure. This method will be demonstrated on two thermally morphing structures, a rigidly fixed hexapod in multiple geometric configurations and a low mass anisotropic morphing boom. This paper is working to detail the method and identify the benefits of the reduced model methodology.

  3. NERI PROJECT 99-119. TASK 2. DATA-DRIVEN PREDICTION OF PROCESS VARIABLES. FINAL REPORT

    Energy Technology Data Exchange (ETDEWEB)

    Upadhyaya, B.R.

    2003-04-10

    This report describes the detailed results for task 2 of DOE-NERI project number 99-119 entitled ''Automatic Development of Highly Reliable Control Architecture for Future Nuclear Power Plants''. This project is a collaboration effort between the Oak Ridge National Laboratory (ORNL,) The University of Tennessee, Knoxville (UTK) and the North Carolina State University (NCSU). UTK is the lead organization for Task 2 under contract number DE-FG03-99SF21906. Under task 2 we completed the development of data-driven models for the characterization of sub-system dynamics for predicting state variables, control functions, and expected control actions. We have also developed the ''Principal Component Analysis (PCA)'' approach for mapping system measurements, and a nonlinear system modeling approach called the ''Group Method of Data Handling (GMDH)'' with rational functions, and includes temporal data information for transient characterization. The majority of the results are presented in detailed reports for Phases 1 through 3 of our research, which are attached to this report.

  4. Design Tools for Dynamic, Data-Driven, Stream Mining Systems

    Science.gov (United States)

    2015-01-01

    growth in technologies for sensing and computation has contributed to large increases in the volume of data that must be managed and analyzed in many...recognition, speaker identification, pattern recognition) and wireless communication (e.g., GSM, digital radio, NFC , Bluetooth), as well as control...systems for performance and energy consumption. In Proceedings of the IEEE Real-Time Technology and Applications Symposium, pages 124–132, 2003. [49

  5. Developing a Data Driven Process-Based Model for Remote Sensing of Ecosystem Production

    Science.gov (United States)

    Elmasri, B.; Rahman, A. F.

    2010-12-01

    Estimating ecosystem carbon fluxes at various spatial and temporal scales is essential for quantifying the global carbon cycle. Numerous models have been developed for this purpose using several environmental variables as well as vegetation indices derived from remotely sensed data. Here we present a data driven modeling approach for gross primary production (GPP) that is based on a process based model BIOME-BGC. The proposed model was run using available remote sensing data and it does not depend on look-up tables. Furthermore, this approach combines the merits of both empirical and process models, and empirical models were used to estimate certain input variables such as light use efficiency (LUE). This was achieved by using remotely sensed data to the mathematical equations that represent biophysical photosynthesis processes in the BIOME-BGC model. Moreover, a new spectral index for estimating maximum photosynthetic activity, maximum photosynthetic rate index (MPRI), is also developed and presented here. This new index is based on the ratio between the near infrared and the green bands (ρ858.5/ρ555). The model was tested and validated against MODIS GPP product and flux measurements from two eddy covariance flux towers located at Morgan Monroe State Forest (MMSF) in Indiana and Harvard Forest in Massachusetts. Satellite data acquired by the Advanced Microwave Scanning Radiometer (AMSR-E) and MODIS were used. The data driven model showed a strong correlation between the predicted and measured GPP at the two eddy covariance flux towers sites. This methodology produced better predictions of GPP than did the MODIS GPP product. Moreover, the proportion of error in the predicted GPP for MMSF and Harvard forest was dominated by unsystematic errors suggesting that the results are unbiased. The analysis indicated that maintenance respiration is one of the main factors that dominate the overall model outcome errors and improvement in maintenance respiration estimation

  6. Enabling Data-Driven Methodologies Across the Data Lifecycle and Ecosystem

    Science.gov (United States)

    Doyle, R. J.; Crichton, D.

    2017-12-01

    NASA has unlocked unprecedented scientific knowledge through exploration of the Earth, our solar system, and the larger universe. NASA is generating enormous amounts of data that are challenging traditional approaches to capturing, managing, analyzing and ultimately gaining scientific understanding from science data. New architectures, capabilities and methodologies are needed to span the entire observing system, from spacecraft to archive, while integrating data-driven discovery and analytic capabilities. NASA data have a definable lifecycle, from remote collection point to validated accessibility in multiple archives. Data challenges must be addressed across this lifecycle, to capture opportunities and avoid decisions that may limit or compromise what is achievable once data arrives at the archive. Data triage may be necessary when the collection capacity of the sensor or instrument overwhelms data transport or storage capacity. By migrating computational and analytic capability to the point of data collection, informed decisions can be made about which data to keep; in some cases, to close observational decision loops onboard, to enable attending to unexpected or transient phenomena. Along a different dimension than the data lifecycle, scientists and other end-users must work across an increasingly complex data ecosystem, where the range of relevant data is rarely owned by a single institution. To operate effectively, scalable data architectures and community-owned information models become essential. NASA's Planetary Data System is having success with this approach. Finally, there is the difficult challenge of reproducibility and trust. While data provenance techniques will be part of the solution, future interactive analytics environments must support an ability to provide a basis for a result: relevant data source and algorithms, uncertainty tracking, etc., to assure scientific integrity and to enable confident decision making. Advances in data science offer

  7. Data-driven probability concentration and sampling on manifold

    Energy Technology Data Exchange (ETDEWEB)

    Soize, C., E-mail: christian.soize@univ-paris-est.fr [Université Paris-Est, Laboratoire Modélisation et Simulation Multi-Echelle, MSME UMR 8208 CNRS, 5 bd Descartes, 77454 Marne-La-Vallée Cedex 2 (France); Ghanem, R., E-mail: ghanem@usc.edu [University of Southern California, 210 KAP Hall, Los Angeles, CA 90089 (United States)

    2016-09-15

    A new methodology is proposed for generating realizations of a random vector with values in a finite-dimensional Euclidean space that are statistically consistent with a dataset of observations of this vector. The probability distribution of this random vector, while a priori not known, is presumed to be concentrated on an unknown subset of the Euclidean space. A random matrix is introduced whose columns are independent copies of the random vector and for which the number of columns is the number of data points in the dataset. The approach is based on the use of (i) the multidimensional kernel-density estimation method for estimating the probability distribution of the random matrix, (ii) a MCMC method for generating realizations for the random matrix, (iii) the diffusion-maps approach for discovering and characterizing the geometry and the structure of the dataset, and (iv) a reduced-order representation of the random matrix, which is constructed using the diffusion-maps vectors associated with the first eigenvalues of the transition matrix relative to the given dataset. The convergence aspects of the proposed methodology are analyzed and a numerical validation is explored through three applications of increasing complexity. The proposed method is found to be robust to noise levels and data complexity as well as to the intrinsic dimension of data and the size of experimental datasets. Both the methodology and the underlying mathematical framework presented in this paper contribute new capabilities and perspectives at the interface of uncertainty quantification, statistical data analysis, stochastic modeling and associated statistical inverse problems.

  8. Data-Driven Visualization and Group Analysis of Multichannel EEG Coherence with Functional Units

    NARCIS (Netherlands)

    Caat, Michael ten; Maurits, Natasha M.; Roerdink, Jos B.T.M.

    2008-01-01

    A typical data- driven visualization of electroencephalography ( EEG) coherence is a graph layout, with vertices representing electrodes and edges representing significant coherences between electrode signals. A drawback of this layout is its visual clutter for multichannel EEG. To reduce clutter,

  9. Short-term stream flow forecasting at Australian river sites using data-driven regression techniques

    CSIR Research Space (South Africa)

    Steyn, Melise

    2017-09-01

    Full Text Available This study proposes a computationally efficient solution to stream flow forecasting for river basins where historical time series data are available. Two data-driven modeling techniques are investigated, namely support vector regression...

  10. The Facilitation of a Sustainable Power System: A Practice from Data-Driven Enhanced Boiler Control

    Directory of Open Access Journals (Sweden)

    Zhenlong Wu

    2018-04-01

    Full Text Available An increasing penetration of renewable energy may bring significant challenges to a power system due to its inherent intermittency. To achieve a sustainable future for renewable energy, a conventional power plant is required to be able to change its power output rapidly for a grid balance purpose. However, the rapid power change may result in the boiler operating in a dangerous manner. To this end, this paper aims to improve boiler control performance via a data-driven control strategy, namely Active Disturbance Rejection Control (ADRC. For practical implementation, a tuning method is developed for ADRC controller parameters to maximize its potential in controlling a boiler operating in different conditions. Based on a Monte Carlo simulation, a Probabilistic Robustness (PR index is subsequently formulated to represent the controller’s sensitivity to the varying conditions. The stability region of the ADRC controller is depicted to provide the search space in which the optimal group of parameters is searched for based on the PR index. Illustrative simulations are performed to verify the efficacy of the proposed method. Finally, the proposed method is experimentally applied to a boiler’s secondary air control system successfully. The results of the field application show that the proposed ADRC based on PR can ensure the expected control performance even though it works in a wider range of operating conditions. The field application depicts a promising future for the ADRC controller as an alternative solution in the power industry to integrate more renewable energy into the power grid.

  11. A priori data-driven multi-clustered reservoir generation algorithm for echo state network.

    Directory of Open Access Journals (Sweden)

    Xiumin Li

    Full Text Available Echo state networks (ESNs with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision.

  12. A Data-Driven Air Transportation Delay Propagation Model Using Epidemic Process Models

    Directory of Open Access Journals (Sweden)

    B. Baspinar

    2016-01-01

    Full Text Available In air transport network management, in addition to defining the performance behavior of the system’s components, identification of their interaction dynamics is a delicate issue in both strategic and tactical decision-making process so as to decide which elements of the system are “controlled” and how. This paper introduces a novel delay propagation model utilizing epidemic spreading process, which enables the definition of novel performance indicators and interaction rates of the elements of the air transportation network. In order to understand the behavior of the delay propagation over the network at different levels, we have constructed two different data-driven epidemic models approximating the dynamics of the system: (a flight-based epidemic model and (b airport-based epidemic model. The flight-based epidemic model utilizing SIS epidemic model focuses on the individual flights where each flight can be in susceptible or infected states. The airport-centric epidemic model, in addition to the flight-to-flight interactions, allows us to define the collective behavior of the airports, which are modeled as metapopulations. In network model construction, we have utilized historical flight-track data of Europe and performed analysis for certain days involving certain disturbances. Through this effort, we have validated the proposed delay propagation models under disruptive events.

  13. Performance improvement integration: a whole systems approach.

    Science.gov (United States)

    Page, C K

    1999-02-01

    Performance improvement integration in health care organizations is a challenge for health care leaders. Required for accreditation by the Joint Commission on Accreditation of Healthcare Organizations (Joint Commission), performance improvement (PI) can be designed as a sustainable model for performance to survive in a turbulent period. Central Baptist Hospital developed a model for PI that focused on strategy established by the leadership team, delineated responsibility through the organizational structure of shared governance, and accountability for outcomes evidenced through the organization's profitability. Such an approach integrated into the culture of the organization can produce positive financial margins, positive customer satisfaction, and commendations from the Joint Commission.

  14. Service and Data Driven Multi Business Model Platform in a World of Persuasive Technologies

    DEFF Research Database (Denmark)

    Andersen, Troels Christian; Bjerrum, Torben Cæsar Bisgaard

    2016-01-01

    companies in establishing a service organization that delivers, creates and captures value through service and data driven business models by utilizing their network, resources and customers and/or users. Furthermore, based on literature and collaboration with the case company, the suggestion of a new...... framework provides the necessary construction of how the manufac- turing companies can evolve their current business to provide multi service and data driven business models, using the same resources, networks and customers....

  15. Data-Driven Identification of Risk Factors of Patient Satisfaction at a Large Urban Academic Medical Center.

    Science.gov (United States)

    Li, Li; Lee, Nathan J; Glicksberg, Benjamin S; Radbill, Brian D; Dudley, Joel T

    2016-01-01

    The Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey is the first publicly reported nationwide survey to evaluate and compare hospitals. Increasing patient satisfaction is an important goal as it aims to achieve a more effective and efficient healthcare delivery system. In this study, we develop and apply an integrative, data-driven approach to identify clinical risk factors that associate with patient satisfaction outcomes. We included 1,771 unique adult patients who completed the HCAHPS survey and were discharged from the inpatient Medicine service from 2010 to 2012. We collected 266 clinical features including patient demographics, lab measurements, medications, disease categories, and procedures. We developed and applied a data-driven approach to identify risk factors that associate with patient satisfaction outcomes. We identify 102 significant risk factors associating with 18 surveyed questions. The most significantly recurrent clinical risk factors were: self-evaluation of health, education level, Asian, White, treatment in BMT oncology division, being prescribed a new medication. Patients who were prescribed pregabalin were less satisfied particularly in relation to communication with nurses and pain management. Explanation of medication usage was associated with communication with nurses (q = 0.001); however, explanation of medication side effects was associated with communication with doctors (q = 0.003). Overall hospital rating was associated with hospital environment, communication with doctors, and communication about medicines. However, patient likelihood to recommend hospital was associated with hospital environment, communication about medicines, pain management, and communication with nurse. Our study identified a number of putatively novel clinical risk factors for patient satisfaction that suggest new opportunities to better understand and manage patient satisfaction. Hospitals can use a data-driven approach to

  16. Full field reservoir modeling of shale assets using advanced data-driven analytics

    Directory of Open Access Journals (Sweden)

    Soodabeh Esmaili

    2016-01-01

    Full Text Available Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism (sorption process and flow behavior in complex fracture systems - induced or natural leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this technology is that it incorporates the so-called “hard data” directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The “hard data” refers to field measurements during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of “soft data” (non-measured, interpretive data such as frac length, width, height and conductivity in the reservoir models. The study focuses on a Marcellus shale asset that includes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.

  17. WIFIRE: A Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience Cyberinfrastructure for Wildfires

    Science.gov (United States)

    Altintas, I.; Block, J.; Braun, H.; de Callafon, R. A.; Gollner, M. J.; Smarr, L.; Trouve, A.

    2013-12-01

    Recent studies confirm that climate change will cause wildfires to increase in frequency and severity in the coming decades especially for California and in much of the North American West. The most critical sustainability issue in the midst of these ever-changing dynamics is how to achieve a new social-ecological equilibrium of this fire ecology. Wildfire wind speeds and directions change in an instant, and first responders can only be effective when they take action as quickly as the conditions change. To deliver information needed for sustainable policy and management in this dynamically changing fire regime, we must capture these details to understand the environmental processes. We are building an end-to-end cyberinfrastructure (CI), called WIFIRE, for real-time and data-driven simulation, prediction and visualization of wildfire behavior. The WIFIRE integrated CI system supports social-ecological resilience to the changing fire ecology regime in the face of urban dynamics and climate change. Networked observations, e.g., heterogeneous satellite data and real-time remote sensor data is integrated with computational techniques in signal processing, visualization, modeling and data assimilation to provide a scalable, technological, and educational solution to monitor weather patterns to predict a wildfire's Rate of Spread. Our collaborative WIFIRE team of scientists, engineers, technologists, government policy managers, private industry, and firefighters architects implement CI pathways that enable joint innovation for wildfire management. Scientific workflows are used as an integrative distributed programming model and simplify the implementation of engineering modules for data-driven simulation, prediction and visualization while allowing integration with large-scale computing facilities. WIFIRE will be scalable to users with different skill-levels via specialized web interfaces and user-specified alerts for environmental events broadcasted to receivers before

  18. Scidac-Data: Enabling Data Driven Modeling of Exascale Computing

    Science.gov (United States)

    Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo; Tsaris, Aristeidis; Norman, Andrew; Lyon, Adam; Ross, Robert

    2017-10-01

    The SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.

  19. Data-Driven Multiresolution Camera Using the Foveal Adaptive Pyramid

    Directory of Open Access Journals (Sweden)

    Martin González

    2016-11-01

    Full Text Available There exist image processing applications, such as tracking or pattern recognition, that are not necessarily precise enough to maintain the same resolution across the whole image sensor. In fact, they must only keep it as high as possible in a relatively small region, but covering a wide field of view. This is the aim of foveal vision systems. Briefly, they propose to sense a large field of view at a spatially-variant resolution: one relatively small region, the fovea, is mapped at a high resolution, while the rest of the image is captured at a lower resolution. In these systems, this fovea must be moved, from one region of interest to another one, to scan a visual scene. It is interesting that the part of the scene that is covered by the fovea should not be merely spatial, but closely related to perceptual objects. Segmentation and attention are then intimately tied together: while the segmentation process is responsible for extracting perceptively-coherent entities from the scene (proto-objects, attention can guide segmentation. From this loop, the concept of foveal attention arises. This work proposes a hardware system for mapping a uniformly-sampled sensor to a space-variant one. Furthermore, this mapping is tied with a software-based, foveal attention mechanism that takes as input the stream of generated foveal images. The whole hardware/software architecture has been designed to be embedded within an all programmable system on chip (AP SoC. Our results show the flexibility of the data port for exchanging information between the mapping and attention parts of the architecture and the good performance rates of the mapping procedure. Experimental evaluation also demonstrates that the segmentation method and the attention model provide results comparable to other more computationally-expensive algorithms.

  20. Data-Driven Optimization of Incentive-based Demand Response System with Uncertain Responses of Customers

    Directory of Open Access Journals (Sweden)

    Jimyung Kang

    2017-10-01

    Full Text Available Demand response is nowadays considered as another type of generator, beyond just a simple peak reduction mechanism. A demand response service provider (DRSP can, through its subcontracts with many energy customers, virtually generate electricity with actual load reduction. However, in this type of virtual generator, the amount of load reduction includes inevitable uncertainty, because it consists of a very large number of independent energy customers. While they may reduce energy today, they might not tomorrow. In this circumstance, a DSRP must choose a proper set of these uncertain customers to achieve the exact preferred amount of load curtailment. In this paper, the customer selection problem for a service provider that consists of uncertain responses of customers is defined and solved. The uncertainty of energy reduction is fully considered in the formulation with data-driven probability distribution modeling and stochastic programming technique. The proposed optimization method that utilizes only the observed load data provides a realistic and applicable solution to a demand response system. The performance of the proposed optimization is verified with real demand response event data in Korea, and the results show increased and stabilized performance from the service provider’s perspective.

  1. Data-driven fault detection, isolation and estimation of aircraft gas turbine engine actuator and sensors

    Science.gov (United States)

    Naderi, E.; Khorasani, K.

    2018-02-01

    In this work, a data-driven fault detection, isolation, and estimation (FDI&E) methodology is proposed and developed specifically for monitoring the aircraft gas turbine engine actuator and sensors. The proposed FDI&E filters are directly constructed by using only the available system I/O data at each operating point of the engine. The healthy gas turbine engine is stimulated by a sinusoidal input containing a limited number of frequencies. First, the associated system Markov parameters are estimated by using the FFT of the input and output signals to obtain the frequency response of the gas turbine engine. These data are then used for direct design and realization of the fault detection, isolation and estimation filters. Our proposed scheme therefore does not require any a priori knowledge of the system linear model or its number of poles and zeros at each operating point. We have investigated the effects of the size of the frequency response data on the performance of our proposed schemes. We have shown through comprehensive case studies simulations that desirable fault detection, isolation and estimation performance metrics defined in terms of the confusion matrix criterion can be achieved by having access to only the frequency response of the system at only a limited number of frequencies.

  2. A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults.

    Science.gov (United States)

    Sun, Rui; Cheng, Qi; Wang, Guanyu; Ochieng, Washington Yotto

    2017-09-29

    The use of Unmanned Aerial Vehicles (UAVs) has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs' flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS)-based approach is presented for the detection of on-board navigation sensor faults in UAVs. Contrary to the classic UAV sensor fault detection algorithms, based on predefined or modelled faults, the proposed algorithm combines an online data training mechanism with the ANFIS-based decision system. The main advantages of this algorithm are that it allows real-time model-free residual analysis from Kalman Filter (KF) estimates and the ANFIS to build a reliable fault detection system. In addition, it allows fast and accurate detection of faults, which makes it suitable for real-time applications. Experimental results have demonstrated the effectiveness of the proposed fault detection method in terms of accuracy and misdetection rate.

  3. Data-driven asthma endotypes defined from blood biomarker and gene expression data.

    Directory of Open Access Journals (Sweden)

    Barbara Jane George

    Full Text Available The diagnosis and treatment of childhood asthma is complicated by its mechanistically distinct subtypes (endotypes driven by genetic susceptibility and modulating environmental factors. Clinical biomarkers and blood gene expression were collected from a stratified, cross-sectional study of asthmatic and non-asthmatic children from Detroit, MI. This study describes four distinct asthma endotypes identified via a purely data-driven method. Our method was specifically designed to integrate blood gene expression and clinical biomarkers in a way that provides new mechanistic insights regarding the different asthma endotypes. For example, we describe metabolic syndrome-induced systemic inflammation as an associated factor in three of the four asthma endotypes. Context provided by the clinical biomarker data was essential in interpreting gene expression patterns and identifying putative endotypes, which emphasizes the importance of integrated approaches when studying complex disease etiologies. These synthesized patterns of gene expression and clinical markers from our research may lead to development of novel serum-based biomarker panels.

  4. A data-driven, mathematical model of mammalian cell cycle regulation.

    Directory of Open Access Journals (Sweden)

    Michael C Weis

    Full Text Available Few of >150 published cell cycle modeling efforts use significant levels of data for tuning and validation. This reflects the difficultly to generate correlated quantitative data, and it points out a critical uncertainty in modeling efforts. To develop a data-driven model of cell cycle regulation, we used contiguous, dynamic measurements over two time scales (minutes and hours calculated from static multiparametric cytometry data. The approach provided expression profiles of cyclin A2, cyclin B1, and phospho-S10-histone H3. The model was built by integrating and modifying two previously published models such that the model outputs for cyclins A and B fit cyclin expression measurements and the activation of B cyclin/Cdk1 coincided with phosphorylation of histone H3. The model depends on Cdh1-regulated cyclin degradation during G1, regulation of B cyclin/Cdk1 activity by cyclin A/Cdk via Wee1, and transcriptional control of the mitotic cyclins that reflects some of the current literature. We introduced autocatalytic transcription of E2F, E2F regulated transcription of cyclin B, Cdc20/Cdh1 mediated E2F degradation, enhanced transcription of mitotic cyclins during late S/early G2 phase, and the sustained synthesis of cyclin B during mitosis. These features produced a model with good correlation between state variable output and real measurements. Since the method of data generation is extensible, this model can be continually modified based on new correlated, quantitative data.

  5. A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults

    Directory of Open Access Journals (Sweden)

    Rui Sun

    2017-09-01

    Full Text Available The use of Unmanned Aerial Vehicles (UAVs has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs’ flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS-based approach is presented for the detection of on-board navigation sensor faults in UAVs. Contrary to the classic UAV sensor fault detection algorithms, based on predefined or modelled faults, the proposed algorithm combines an online data training mechanism with the ANFIS-based decision system. The main advantages of this algorithm are that it allows real-time model-free residual analysis from Kalman Filter (KF estimates and the ANFIS to build a reliable fault detection system. In addition, it allows fast and accurate detection of faults, which makes it suitable for real-time applications. Experimental results have demonstrated the effectiveness of the proposed fault detection method in terms of accuracy and misdetection rate.

  6. Data-driven Inquiry in Environmental Restoration Studies

    Science.gov (United States)

    Zalles, D. R.; Montgomery, D. R.

    2008-12-01

    Place-based field work has been recognized as an important component of geoscience education programs for engaging students. Field work helps students appreciate the spatial extent of data and the systems operating in a locale. Data collected in a place has a temporal aspect that can be explored through representations such as photographs and maps and also though numerical data sets that capture characteristics of place. Yet, experiencing authentic geoscience research in an educational setting requires going beyond fieldwork: students must develop data literacy skills that will enable them to connect abstract representations of spatio-temporal data with place. Educational researchers at SRI International led by Dr. Daniel Zalles, developer of inquiry-based geoscience curricula, and geoscientists at the University of Washington (UW) led by Dr. David Montgomery, Professor of Earth and Space Sciences, are building educational curriculum modules that help students make these connections. The modules concern the environmental history of the Puget Sound area in Washington State and its relevance for the American Indians living there. This collaborative project relies on environmental data collected in the Puget Sound Regional Synthesis Model (PRISM) and Puget Sound River History Project. The data sets are being applied to inquiry-based geoscience investigations at the undergraduate and high school level. The modules consist of problem-based units centered on the data sets, plus geographic and other data representations. The modules will rely on educational "design patterns" that characterize geoscientific inquiry tasks. Use of design patterns will enable other modules to be built that align to the modes of student thinking and practice articulated in the design patterns. The modules will be accompanied by performance assessments that measure student learning from their data investigations. The design principles that drive this project have already been used effectively

  7. A neuroanatomical approach to exploring organizational performance

    Directory of Open Access Journals (Sweden)

    Gillingwater, D.

    2009-01-01

    Full Text Available Insights gained from studying the human brain have begun to open up promising new areas of research in the behavioural and social sciences. Neuroscience-based principles have been incorporated into areas such as business management, economics and marketing, leading to the development of artificial neural networks, neuroeconomics, neuromarketing and, most recently, organizational cognitive neuroscience. Similarly, the brain has been used as a powerful metaphor for thinking about and analysing the nature of organizations. However, no existing approach to organizational analysis has taken advantage of contemporary neuroanatomical principles, thereby missing the opportunity to translate core neuroanatomical knowledge into other, non-related areas of research. In this essentially conceptual paper, we propose several ways in which neuroanatomical approaches could be used to enhance organizational theory, practice and research. We suggest that truly interdisciplinary and collaborative research between neuroanatomists and organizational analysts is likely to provide novel approaches to exploring and improving organizational performance.

  8. Software performance and scalability a quantitative approach

    CERN Document Server

    Liu, Henry H

    2009-01-01

    Praise from the Reviewers:"The practicality of the subject in a real-world situation distinguishes this book from othersavailable on the market."—Professor Behrouz Far, University of Calgary"This book could replace the computer organization texts now in use that every CS and CpEstudent must take. . . . It is much needed, well written, and thoughtful."—Professor Larry Bernstein, Stevens Institute of TechnologyA distinctive, educational text onsoftware performance and scalabilityThis is the first book to take a quantitative approach to the subject of software performance and scalability

  9. Approaches towards airport economic performance measurement

    Directory of Open Access Journals (Sweden)

    Ivana STRYČEKOVÁ

    2011-01-01

    Full Text Available The paper aims to assess how economic benchmarking is being used by airports as a means of performance measurement and comparison of major international airports in the world. The study focuses on current benchmarking practices and methods by taking into account different factors according to which it is efficient to benchmark airports performance. As methods are considered mainly data envelopment analysis and stochastic frontier analysis. Apart from them other approaches are discussed by airports to provide economic benchmarking. The main objective of this article is to evaluate the efficiency of the airports and answer some undetermined questions involving economic benchmarking of the airports.

  10. RWater - A Novel Cyber-enabled Data-driven Educational Tool for Interpreting and Modeling Hydrologic Processes

    Science.gov (United States)

    Rajib, M. A.; Merwade, V.; Zhao, L.; Song, C.

    2014-12-01

    Explaining the complex cause-and-effect relationships in hydrologic cycle can often be challenging in a classroom with the use of traditional teaching approaches. With the availability of observed rainfall, streamflow and other hydrology data on the internet, it is possible to provide the necessary tools to students to explore these relationships and enhance their learning experience. From this perspective, a new online educational tool, called RWater, is developed using Purdue University's HUBzero technology. RWater's unique features include: (i) its accessibility including the R software from any java supported web browser; (ii) no installation of any software on user's computer; (iii) all the work and resulting data are stored in user's working directory on RWater server; and (iv) no prior programming experience with R software is necessary. In its current version, RWater can dynamically extract streamflow data from any USGS gaging station without any need for post-processing for use in the educational modules. By following data-driven modules, students can write small scripts in R and thereby create visualizations to identify the effect of rainfall distribution and watershed characteristics on runoff generation, investigate the impacts of landuse and climate change on streamflow, and explore the changes in extreme hydrologic events in actual locations. Each module contains relevant definitions, instructions on data extraction and coding, as well as conceptual questions based on the possible analyses which the students would perform. In order to assess its suitability in classroom implementation, and to evaluate users' perception over its utility, the current version of RWater has been tested with three different groups: (i) high school students, (ii) middle and high school teachers; and (iii) upper undergraduate/graduate students. The survey results from these trials suggest that the RWater has potential to improve students' understanding on various

  11. Data-Driven Derivation of an "Informer Compound Set" for Improved Selection of Active Compounds in High-Throughput Screening.

    Science.gov (United States)

    Paricharak, Shardul; IJzerman, Adriaan P; Jenkins, Jeremy L; Bender, Andreas; Nigsch, Florian

    2016-09-26

    Despite the usefulness of high-throughput screening (HTS) in drug discovery, for some systems, low assay throughput or high screening cost can prohibit the screening of large numbers of compounds. In such cases, iterative cycles of screening involving active learning (AL) are employed, creating the need for smaller "informer sets" that can be routinely screened to build predictive models for selecting compounds from the screening collection for follow-up screens. Here, we present a data-driven derivation of an informer compound set with improved predictivity of active compounds in HTS, and we validate its benefit over randomly selected training sets on 46 PubChem assays comprising at least 300,000 compounds and covering a wide range of assay biology. The informer compound set showed improvement in BEDROC(α = 100), PRAUC, and ROCAUC values averaged over all assays of 0.024, 0.014, and 0.016, respectively, compared to randomly selected training sets, all with paired t-test p-values agnostic fashion. This approach led to a consistent improvement in hit rates in follow-up screens without compromising scaffold retrieval. The informer set is adjustable in size depending on the number of compounds one intends to screen, as performance gains are realized for sets with more than 3,000 compounds, and this set is therefore applicable to a variety of situations. Finally, our results indicate that random sampling may not adequately cover descriptor space, drawing attention to the importance of the composition of the training set for predicting actives.

  12. Tracking Invasive Alien Species (TrIAS: Building a data-driven framework to inform policy

    Directory of Open Access Journals (Sweden)

    Sonia Vanderhoeven

    2017-05-01

    initiatives, including citizen science with a wide taxonomic scope from marine, terrestrial and freshwater environments. Observation data will be funnelled in repeatable ways to GBIF. In parallel, a Belgian checklist of AS will be established, benefiting from various taxonomic and project-based checklists foreseen for GBIF publication. The combination of the observation data and the checklist will feed indicators for the identification of emerging species; their level of invasion in Belgium; changes in their invasion status and the identification of areas and species of concern that could be impacted upon by bioinvasions. Data-driven risk evaluation of identified emerging species will be supported by niche and climate modelling and consequent risk mapping using critical climatic variables for the current and projected future climate periods at high resolution. The resulting risk maps will complement risk assessments performed with the recently developed Harmonia+ protocol to assess risks posed by emergent species to biodiversity and human, plant, and animal health. The use of open data will ensure that interested stakeholders in Belgium and abroad can make use of the information we generate. The open science ensures everyone is free to adopt and adapt the workflow for different scenarios and regions. The checklist will be used at national level, but will also serve as the Belgian reference for international databases (IUCN - GRIIS, EASIN and impact assessments (IPBES, SEBI. The workflow will be showcased through GEO BON, the Invasivesnet network and the COST Actions Alien Challenge and ParrotNet. The observations and outcomes of risk evaluations will be used to provide science-based support for the implementation of IAS policies at the regional, federal and EU levels. The publication of Belgian data and checklists on IAS is particularly timely in light of the currently ongoing EU IAS Regulation and its implementation in Belgium. By proving that automated workflows can provide

  13. Data-driven models of dominantly-inherited Alzheimer's disease progression.

    Science.gov (United States)

    Oxtoby, Neil P; Young, Alexandra L; Cash, David M; Benzinger, Tammie L S; Fagan, Anne M; Morris, John C; Bateman, Randall J; Fox, Nick C; Schott, Jonathan M; Alexander, Daniel C

    2018-03-22

    Dominantly-inherited Alzheimer's disease is widely hoped to hold the key to developing interventions for sporadic late onset Alzheimer's disease. We use emerging techniques in generative data-driven disease progression modelling to characterize dominantly-inherited Alzheimer's disease progression with unprecedented resolution, and without relying upon familial estimates of years until symptom onset. We retrospectively analysed biomarker data from the sixth data freeze of the Dominantly Inherited Alzheimer Network observational study, including measures of amyloid proteins and neurofibrillary tangles in the brain, regional brain volumes and cortical thicknesses, brain glucose hypometabolism, and cognitive performance from the Mini-Mental State Examination (all adjusted for age, years of education, sex, and head size, as appropriate). Data included 338 participants with known mutation status (211 mutation carriers in three subtypes: 163 PSEN1, 17 PSEN2, and 31 APP) and a baseline visit (age 19-66; up to four visits each, 1.1 ± 1.9 years in duration; spanning 30 years before, to 21 years after, parental age of symptom onset). We used an event-based model to estimate sequences of biomarker changes from baseline data across disease subtypes (mutation groups), and a differential equation model to estimate biomarker trajectories from longitudinal data (up to 66 mutation carriers, all subtypes combined). The two models concur that biomarker abnormality proceeds as follows: amyloid deposition in cortical then subcortical regions (∼24 ± 11 years before onset); phosphorylated tau (17 ± 8 years), tau and amyloid-β changes in cerebrospinal fluid; neurodegeneration first in the putamen and nucleus accumbens (up to 6 ± 2 years); then cognitive decline (7 ± 6 years), cerebral hypometabolism (4 ± 4 years), and further regional neurodegeneration. Our models predicted symptom onset more accurately than predictions that used familial estimates: root mean squared error of 1

  14. Limited angle CT reconstruction by simultaneous spatial and Radon domain regularization based on TV and data-driven tight frame

    Science.gov (United States)

    Zhang, Wenkun; Zhang, Hanming; Wang, Linyuan; Cai, Ailong; Li, Lei; Yan, Bin

    2018-02-01

    Limited angle computed tomography (CT) reconstruction is widely performed in medical diagnosis and industrial testing because of the size of objects, engine/armor inspection requirements, and limited scan flexibility. Limited angle reconstruction necessitates usage of optimization-based methods that utilize additional sparse priors. However, most of conventional methods solely exploit sparsity priors of spatial domains. When CT projection suffers from serious data deficiency or various noises, obtaining reconstruction images that meet the requirement of quality becomes difficult and challenging. To solve this problem, this paper developed an adaptive reconstruction method for limited angle CT problem. The proposed method simultaneously uses spatial and Radon domain regularization model based on total variation (TV) and data-driven tight frame. Data-driven tight frame being derived from wavelet transformation aims at exploiting sparsity priors of sinogram in Radon domain. Unlike existing works that utilize pre-constructed sparse transformation, the framelets of the data-driven regularization model can be adaptively learned from the latest projection data in the process of iterative reconstruction to provide optimal sparse approximations for given sinogram. At the same time, an effective alternating direction method is designed to solve the simultaneous spatial and Radon domain regularization model. The experiments for both simulation and real data demonstrate that the proposed algorithm shows better performance in artifacts depression and details preservation than the algorithms solely using regularization model of spatial domain. Quantitative evaluations for the results also indicate that the proposed algorithm applying learning strategy performs better than the dual domains algorithms without learning regularization model

  15. Optimizing preventive maintenance policy: A data-driven application for a light rail braking system.

    Science.gov (United States)

    Corman, Francesco; Kraijema, Sander; Godjevac, Milinko; Lodewijks, Gabriel

    2017-10-01

    This article presents a case study determining the optimal preventive maintenance policy for a light rail rolling stock system in terms of reliability, availability, and maintenance costs. The maintenance policy defines one of the three predefined preventive maintenance actions at fixed time-based intervals for each of the subsystems of the braking system. Based on work, maintenance, and failure data, we model the reliability degradation of the system and its subsystems under the current maintenance policy by a Weibull distribution. We then analytically determine the relation between reliability, availability, and maintenance costs. We validate the model against recorded reliability and availability and get further insights by a dedicated sensitivity analysis. The model is then used in a sequential optimization framework determining preventive maintenance intervals to improve on the key performance indicators. We show the potential of data-driven modelling to determine optimal maintenance policy: same system availability and reliability can be achieved with 30% maintenance cost reduction, by prolonging the intervals and re-grouping maintenance actions.

  16. Effective Data-Driven Calibration for a Galvanometric Laser Scanning System Using Binocular Stereo Vision.

    Science.gov (United States)

    Tu, Junchao; Zhang, Liyan

    2018-01-12

    A new solution to the problem of galvanometric laser scanning (GLS) system calibration is presented. Under the machine learning framework, we build a single-hidden layer feedforward neural network (SLFN)to represent the GLS system, which takes the digital control signal at the drives of the GLS system as input and the space vector of the corresponding outgoing laser beam as output. The training data set is obtained with the aid of a moving mechanism and a binocular stereo system. The parameters of the SLFN are efficiently solved in a closed form by using extreme learning machine (ELM). By quantitatively analyzing the regression precision with respective to the number of hidden neurons in the SLFN, we demonstrate that the proper number of hidden neurons can be safely chosen from a broad interval to guarantee good generalization performance. Compared to the traditional model-driven calibration, the proposed calibration method does not need a complex modeling process and is more accurate and stable. As the output of the network is the space vectors of the outgoing laser beams, it costs much less training time and can provide a uniform solution to both laser projection and 3D-reconstruction, in contrast with the existing data-driven calibration method which only works for the laser triangulation problem. Calibration experiment, projection experiment and 3D reconstruction experiment are respectively conducted to test the proposed method, and good results are obtained.

  17. Effective Data-Driven Calibration for a Galvanometric Laser Scanning System Using Binocular Stereo Vision

    Directory of Open Access Journals (Sweden)

    Junchao Tu

    2018-01-01

    Full Text Available A new solution to the problem of galvanometric laser scanning (GLS system calibration is presented. Under the machine learning framework, we build a single-hidden layer feedforward neural network (SLFN)to represent the GLS system, which takes the digital control signal at the drives of the GLS system as input and the space vector of the corresponding outgoing laser beam as output. The training data set is obtained with the aid of a moving mechanism and a binocular stereo system. The parameters of the SLFN are efficiently solved in a closed form by using extreme learning machine (ELM. By quantitatively analyzing the regression precision with respective to the number of hidden neurons in the SLFN, we demonstrate that the proper number of hidden neurons can be safely chosen from a broad interval to guarantee good generalization performance. Compared to the traditional model-driven calibration, the proposed calibration method does not need a complex modeling process and is more accurate and stable. As the output of the network is the space vectors of the outgoing laser beams, it costs much less training time and can provide a uniform solution to both laser projection and 3D-reconstruction, in contrast with the existing data-driven calibration method which only works for the laser triangulation problem. Calibration experiment, projection experiment and 3D reconstruction experiment are respectively conducted to test the proposed method, and good results are obtained.

  18. VLAM-G: Interactive Data Driven Workflow Engine for Grid-Enabled Resources

    Directory of Open Access Journals (Sweden)

    Vladimir Korkhov

    2007-01-01

    Full Text Available Grid brings the power of many computers to scientists. However, the development of Grid-enabled applications requires knowledge about Grid infrastructure and low-level API to Grid services. In turn, workflow management systems provide a high-level environment for rapid prototyping of experimental computing systems. Coupling Grid and workflow paradigms is important for the scientific community: it makes the power of the Grid easily available to the end user. The paradigm of data driven workflow execution is one of the ways to enable distributed workflow on the Grid. The work presented in this paper is carried out in the context of the Virtual Laboratory for e-Science project. We present the VLAM-G workflow management system and its core component: the Run-Time System (RTS. The RTS is a dataflow driven workflow engine which utilizes Grid resources, hiding the complexity of the Grid from a scientist. Special attention is paid to the concept of dataflow and direct data streaming between distributed workflow components. We present the architecture and components of the RTS, describe the features of VLAM-G workflow execution, and evaluate the system by performance measurements and a real life use case.

  19. Data-driven classification of ventilated lung tissues using electrical impedance tomography

    International Nuclear Information System (INIS)

    Gómez-Laberge, Camille; Hogan, Matthew J; Elke, Gunnar; Weiler, Norbert; Frerichs, Inéz; Adler, Andy

    2011-01-01

    Current methods for identifying ventilated lung regions utilizing electrical impedance tomography images rely on dividing the image into arbitrary regions of interest (ROI), manually delineating ROI, or forming ROI with pixels whose signal properties surpass an arbitrary threshold. In this paper, we propose a novel application of a data-driven classification method to identify ventilated lung ROI based on forming k clusters from pixels with correlated signals. A standard first-order model for lung mechanics is then applied to determine which ROI correspond to ventilated lung tissue. We applied the method in an experimental study of 16 mechanically ventilated swine in the supine position, which underwent changes in positive end-expiratory pressure (PEEP) and fraction of inspired oxygen (F I O 2 ). In each stage of the experimental protocol, the method performed best with k = 4 and consistently identified 3 lung tissue ROI and 1 boundary tissue ROI in 15 of the 16 subjects. When testing for changes from baseline in lung position, tidal volume, and respiratory system compliance, we found that PEEP displaced the ventilated lung region dorsally by 2 cm, decreased tidal volume by 1.3%, and increased the respiratory system compliance time constant by 0.3 s. F I O 2 decreased tidal volume by 0.7%. All effects were tested at p < 0.05 with n = 16. These findings suggest that the proposed ROI detection method is robust and sensitive to ventilation dynamics in the experimental setting

  20. On the selection of user-defined parameters in data-driven stochastic subspace identification

    Science.gov (United States)

    Priori, C.; De Angelis, M.; Betti, R.

    2018-02-01

    The paper focuses on the time domain output-only technique called Data-Driven Stochastic Subspace Identification (DD-SSI); in order to identify modal models (frequencies, damping ratios and mode shapes), the role of its user-defined parameters is studied, and rules to determine their minimum values are proposed. Such investigation is carried out using, first, the time histories of structural responses to stationary excitations, with a large number of samples, satisfying the hypothesis on the input imposed by DD-SSI. Then, the case of non-stationary seismic excitations with a reduced number of samples is considered. In this paper, partitions of the data matrix different from the one proposed in the SSI literature are investigated, together with the influence of different choices of the weighting matrices. The study is carried out considering two different applications: (1) data obtained from vibration tests on a scaled structure and (2) in-situ tests on a reinforced concrete building. Referring to the former, the identification of a steel frame structure tested on a shaking table is performed using its responses in terms of absolute accelerations to a stationary (white noise) base excitation and to non-stationary seismic excitations of low intensity. Black-box and modal models are identified in both cases and the results are compared with those from an input-output subspace technique. With regards to the latter, the identification of a complex hospital building is conducted using data obtained from ambient vibration tests.

  1. Performance Optimization in Sport: A Psychophysiological Approach

    Directory of Open Access Journals (Sweden)

    Selenia di Fronso

    2017-11-01

    Full Text Available ABSTRACT In the last 20 years, there was a growing interest in the study of the theoretical and applied issues surrounding psychophysiological processes underlying performance. The psychophysiological monitoring, which enables the study of these processes, consists of the assessment of the activation and functioning level of the organism using a multidimensional approach. In sport, it can be used to attain a better understanding of the processes underlying athletic performance and to improve it. The most frequently used ecological techniques include electromyography (EMG, electrocardiography (ECG, electroencephalography (EEG, and the assessment of electrodermal activity and breathing rhythm. The purpose of this paper is to offer an overview of the use of these techniques in applied interventions in sport and physical exercise and to give athletes, coaches and sport psychology experts new insights for performance improvement.

  2. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application

    Directory of Open Access Journals (Sweden)

    A. Elshorbagy

    2010-10-01

    Full Text Available In this second part of the two-part paper, the data driven modeling (DDM experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs, genetic programming (GP, evolutionary polynomial regression (EPR, Support vector machines (SVM, M5 model trees (M5, K-nearest neighbors (K-nn, and multiple linear regression (MLR techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it

  3. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application

    Science.gov (United States)

    Elshorbagy, A.; Corzo, G.; Srinivasulu, S.; Solomatine, D. P.

    2010-10-01

    In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K-nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should

  4. Data-Driven Method for Wind Turbine Yaw Angle Sensor Zero-Point Shifting Fault Detection

    Directory of Open Access Journals (Sweden)

    Yan Pei

    2018-03-01

    Full Text Available Wind turbine yaw control plays an important role in increasing the wind turbine production and also in protecting the wind turbine. Accurate measurement of yaw angle is the basis of an effective wind turbine yaw controller. The accuracy of yaw angle measurement is affected significantly by the problem of zero-point shifting. Hence, it is essential to evaluate the zero-point shifting error on wind turbines on-line in order to improve the reliability of yaw angle measurement in real time. Particularly, qualitative evaluation of the zero-point shifting error could be useful for wind farm operators to realize prompt and cost-effective maintenance on yaw angle sensors. In the aim of qualitatively evaluating the zero-point shifting error, the yaw angle sensor zero-point shifting fault is firstly defined in this paper. A data-driven method is then proposed to detect the zero-point shifting fault based on Supervisory Control and Data Acquisition (SCADA data. The zero-point shifting fault is detected in the proposed method by analyzing the power performance under different yaw angles. The SCADA data are partitioned into different bins according to both wind speed and yaw angle in order to deeply evaluate the power performance. An indicator is proposed in this method for power performance evaluation under each yaw angle. The yaw angle with the largest indicator is considered as the yaw angle measurement error in our work. A zero-point shifting fault would trigger an alarm if the error is larger than a predefined threshold. Case studies from several actual wind farms proved the effectiveness of the proposed method in detecting zero-point shifting fault and also in improving the wind turbine performance. Results of the proposed method could be useful for wind farm operators to realize prompt adjustment if there exists a large error of yaw angle measurement.

  5. CEREF: A hybrid data-driven model for forecasting annual streamflow from a socio-hydrological system

    Science.gov (United States)

    Zhang, Hongbo; Singh, Vijay P.; Wang, Bin; Yu, Yinghao

    2016-09-01

    Hydrological forecasting is complicated by flow regime alterations in a coupled socio-hydrologic system, encountering increasingly non-stationary, nonlinear and irregular changes, which make decision support difficult for future water resources management. Currently, many hybrid data-driven models, based on the decomposition-prediction-reconstruction principle, have been developed to improve the ability to make predictions of annual streamflow. However, there exist many problems that require further investigation, the chief among which is the direction of trend components decomposed from annual streamflow series and is always difficult to ascertain. In this paper, a hybrid data-driven model was proposed to capture this issue, which combined empirical mode decomposition (EMD), radial basis function neural networks (RBFNN), and external forces (EF) variable, also called the CEREF model. The hybrid model employed EMD for decomposition and RBFNN for intrinsic mode function (IMF) forecasting, and determined future trend component directions by regression with EF as basin water demand representing the social component in the socio-hydrologic system. The Wuding River basin was considered for the case study, and two standard statistical measures, root mean squared error (RMSE) and mean absolute error (MAE), were used to evaluate the performance of CEREF model and compare with other models: the autoregressive (AR), RBFNN and EMD-RBFNN. Results indicated that the CEREF model had lower RMSE and MAE statistics, 42.8% and 7.6%, respectively, than did other models, and provided a superior alternative for forecasting annual runoff in the Wuding River basin. Moreover, the CEREF model can enlarge the effective intervals of streamflow forecasting compared to the EMD-RBFNN model by introducing the water demand planned by the government department to improve long-term prediction accuracy. In addition, we considered the high-frequency component, a frequent subject of concern in EMD

  6. Data-driven modeling and predictive control for boiler-turbine unit using fuzzy clustering and subspace methods.

    Science.gov (United States)

    Wu, Xiao; Shen, Jiong; Li, Yiguo; Lee, Kwang Y

    2014-05-01

    This paper develops a novel data-driven fuzzy modeling strategy and predictive controller for boiler-turbine unit using fuzzy clustering and subspace identification (SID) methods. To deal with the nonlinear behavior of boiler-turbine unit, fuzzy clustering is used to provide an appropriate division of the operation region and develop the structure of the fuzzy model. Then by combining the input data with the corresponding fuzzy membership functions, the SID method is extended to extract the local state-space model parameters. Owing to the advantages of the both methods, the resulting fuzzy model can represent the boiler-turbine unit very closely, and a fuzzy model predictive controller is designed based on this model. As an alternative approach, a direct data-driven fuzzy predictive control is also developed following the same clustering and subspace methods, where intermediate subspace matrices developed during the identification procedure are utilized directly as the predictor. Simulation results show the advantages and effectiveness of the proposed approach. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  7. Protein engineering of Bacillus acidopullulyticus pullulanase for enhanced thermostability using in silico data driven rational design methods.

    Science.gov (United States)

    Chen, Ana; Li, Yamei; Nie, Jianqi; McNeil, Brian; Jeffrey, Laura; Yang, Yankun; Bai, Zhonghu

    2015-10-01

    Thermostability has been considered as a requirement in the starch processing industry to maintain high catalytic activity of pullulanase under high temperatures. Four data driven rational design methods (B-FITTER, proline theory, PoPMuSiC-2.1, and sequence consensus approach) were adopted to identify the key residue potential links with thermostability, and 39 residues of Bacillus acidopullulyticus pullulanase were chosen as mutagenesis targets. Single mutagenesis followed by combined mutagenesis resulted in the best mutant E518I-S662R-Q706P, which exhibited an 11-fold half-life improvement at 60 °C and a 9.5 °C increase in Tm. The optimum temperature of the mutant increased from 60 to 65 °C. Fluorescence spectroscopy results demonstrated that the tertiary structure of the mutant enzyme was more compact than that of the wild-type (WT) enzyme. Structural change analysis revealed that the increase in thermostability was most probably caused by a combination of lower stability free-energy and higher hydrophobicity of E518I, more hydrogen bonds of S662R, and higher rigidity of Q706P compared with the WT. The findings demonstrated the effectiveness of combined data-driven rational design approaches in engineering an industrial enzyme to improve thermostability. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Open-source chemogenomic data-driven algorithms for predicting drug-target interactions.

    Science.gov (United States)

    Hao, Ming; Bryant, Stephen H; Wang, Yanli

    2018-02-06

    While novel technologies such as high-throughput screening have advanced together with significant investment by pharmaceutical companies during the past decades, the success rate for drug development has not yet been improved prompting researchers looking for new strategies of drug discovery. Drug repositioning is a potential approach to solve this dilemma. However, experimental identification and validation of potential drug targets encoded by the human genome is both costly and time-consuming. Therefore, effective computational approaches have been proposed to facilitate drug repositioning, which have proved to be successful in drug discovery. Doubtlessly, the availability of open-accessible data from basic chemical biology research and the success of human genome sequencing are crucial to develop effective in silico drug repositioning methods allowing the identification of potential targets for existing drugs. In this work, we review several chemogenomic data-driven computational algorithms with source codes publicly accessible for predicting drug-target interactions (DTIs). We organize these algorithms by model properties and model evolutionary relationships. We re-implemented five representative algorithms in R programming language, and compared these algorithms by means of mean percentile ranking, a new recall-based evaluation metric in the DTI prediction research field. We anticipate that this review will be objective and helpful to researchers who would like to further improve existing algorithms or need to choose appropriate algorithms to infer potential DTIs in the projects. The source codes for DTI predictions are available at: https://github.com/minghao2016/chemogenomicAlg4DTIpred. Published by Oxford University Press 2018. This work is written by US Government employees and is in the public domain in the US.

  9. A review on data-driven fault severity assessment in rolling bearings

    Science.gov (United States)

    Cerrada, Mariela; Sánchez, René-Vinicio; Li, Chuan; Pacheco, Fannia; Cabrera, Diego; Valente de Oliveira, José; Vásquez, Rafael E.

    2018-01-01

    Health condition monitoring of rotating machinery is a crucial task to guarantee reliability in industrial processes. In particular, bearings are mechanical components used in most rotating devices and they represent the main source of faults in such equipments; reason for which research activities on detecting and diagnosing their faults have increased. Fault detection aims at identifying whether the device is or not in a fault condition, and diagnosis is commonly oriented towards identifying the fault mode of the device, after detection. An important step after fault detection and diagnosis is the analysis of the magnitude or the degradation level of the fault, because this represents a support to the decision-making process in condition based-maintenance. However, no extensive works are devoted to analyse this problem, or some works tackle it from the fault diagnosis point of view. In a rough manner, fault severity is associated with the magnitude of the fault. In bearings, fault severity can be related to the physical size of fault or a general degradation of the component. Due to literature regarding the severity assessment of bearing damages is limited, this paper aims at discussing the recent methods and techniques used to achieve the fault severity evaluation in the main components of the rolling bearings, such as inner race, outer race, and ball. The review is mainly focused on data-driven approaches such as signal processing for extracting the proper fault signatures associated with the damage degradation, and learning approaches that are used to identify degradation patterns with regards to health conditions. Finally, new challenges are highlighted in order to develop new contributions in this field.

  10. Data-Driven Modeling of Complex Systems by means of a Dynamical ANN

    Science.gov (United States)

    Seleznev, A.; Mukhin, D.; Gavrilov, A.; Loskutov, E.; Feigin, A.

    2017-12-01

    The data-driven methods for modeling and prognosis of complex dynamical systems become more and more popular in various fields due to growth of high-resolution data. We distinguish the two basic steps in such an approach: (i) determining the phase subspace of the system, or embedding, from available time series and (ii) constructing an evolution operator acting in this reduced subspace. In this work we suggest a novel approach combining these two steps by means of construction of an artificial neural network (ANN) with special topology. The proposed ANN-based model, on the one hand, projects the data onto a low-dimensional manifold, and, on the other hand, models a dynamical system on this manifold. Actually, this is a recurrent multilayer ANN which has internal dynamics and capable of generating time series. Very important point of the proposed methodology is the optimization of the model allowing us to avoid overfitting: we use Bayesian criterion to optimize the ANN structure and estimate both the degree of evolution operator nonlinearity and the complexity of nonlinear manifold which the data are projected on. The proposed modeling technique will be applied to the analysis of high-dimensional dynamical systems: Lorenz'96 model of atmospheric turbulence, producing high-dimensional space-time chaos, and quasi-geostrophic three-layer model of the Earth's atmosphere with the natural orography, describing the dynamics of synoptical vortexes as well as mesoscale blocking systems. The possibility of application of the proposed methodology to analyze real measured data is also discussed. The study was supported by the Russian Science Foundation (grant #16-12-10198).

  11. Weather models as virtual sensors to data-driven rainfall predictions in urban watersheds

    Science.gov (United States)

    Cozzi, Lorenzo; Galelli, Stefano; Pascal, Samuel Jolivet De Marc; Castelletti, Andrea

    2013-04-01

    Weather and climate predictions are a key element of urban hydrology where they are used to inform water management and assist in flood warning delivering. Indeed, the modelling of the very fast dynamics of urbanized catchments can be substantially improved by the use of weather/rainfall predictions. For example, in Singapore Marina Reservoir catchment runoff processes have a very short time of concentration (roughly one hour) and observational data are thus nearly useless for runoff predictions and weather prediction are required. Unfortunately, radar nowcasting methods do not allow to carrying out long - term weather predictions, whereas numerical models are limited by their coarse spatial scale. Moreover, numerical models are usually poorly reliable because of the fast motion and limited spatial extension of rainfall events. In this study we investigate the combined use of data-driven modelling techniques and weather variables observed/simulated with a numerical model as a way to improve rainfall prediction accuracy and lead time in the Singapore metropolitan area. To explore the feasibility of the approach, we use a Weather Research and Forecast (WRF) model as a virtual sensor network for the input variables (the states of the WRF model) to a machine learning rainfall prediction model. More precisely, we combine an input variable selection method and a non-parametric tree-based model to characterize the empirical relation between the rainfall measured at the catchment level and all possible weather input variables provided by WRF model. We explore different lead time to evaluate the model reliability for different long - term predictions, as well as different time lags to see how past information could improve results. Results show that the proposed approach allow a significant improvement of the prediction accuracy of the WRF model on the Singapore urban area.

  12. Data-driven design of fault diagnosis and fault-tolerant control systems

    CERN Document Server

    Ding, Steven X

    2014-01-01

    Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods, and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and...

  13. Data-driven remaining useful life prognosis techniques stochastic models, methods and applications

    CERN Document Server

    Si, Xiao-Sheng; Hu, Chang-Hua

    2017-01-01

    This book introduces data-driven remaining useful life prognosis techniques, and shows how to utilize the condition monitoring data to predict the remaining useful life of stochastic degrading systems and to schedule maintenance and logistics plans. It is also the first book that describes the basic data-driven remaining useful life prognosis theory systematically and in detail. The emphasis of the book is on the stochastic models, methods and applications employed in remaining useful life prognosis. It includes a wealth of degradation monitoring experiment data, practical prognosis methods for remaining useful life in various cases, and a series of applications incorporated into prognostic information in decision-making, such as maintenance-related decisions and ordering spare parts. It also highlights the latest advances in data-driven remaining useful life prognosis techniques, especially in the contexts of adaptive prognosis for linear stochastic degrading systems, nonlinear degradation modeling based pro...

  14. Land use zones and land use patterns in the Atlantic Zone of Costa Rica : a pattern recognition approach to land use inventory at the sub-regional scale, using remote sensing and GIS, applying an object-oriented and data-driven strategy

    NARCIS (Netherlands)

    Huising, J.

    1993-01-01

    This thesis describes an approach to land use inventory at the sub-regional scale in the Guacimo-Rio Jiménez-Siquirres (GRS) area in the Atlantic Zone of Costa Rica. Therefore, the concept of "land use zones" is introduced. The land use zone (LUZ) plays a central role in the definition of

  15. Data-Driven Neural Network Model for Robust Reconstruction of Automobile Casting

    Science.gov (United States)

    Lin, Jinhua; Wang, Yanjie; Li, Xin; Wang, Lu

    2017-09-01

    In computer vision system, it is a challenging task to robustly reconstruct complex 3D geometries of automobile castings. However, 3D scanning data is usually interfered by noises, the scanning resolution is low, these effects normally lead to incomplete matching and drift phenomenon. In order to solve these problems, a data-driven local geometric learning model is proposed to achieve robust reconstruction of automobile casting. In order to relieve the interference of sensor noise and to be compatible with incomplete scanning data, a 3D convolution neural network is established to match the local geometric features of automobile casting. The proposed neural network combines the geometric feature representation with the correlation metric function to robustly match the local correspondence. We use the truncated distance field(TDF) around the key point to represent the 3D surface of casting geometry, so that the model can be directly embedded into the 3D space to learn the geometric feature representation; Finally, the training labels is automatically generated for depth learning based on the existing RGB-D reconstruction algorithm, which accesses to the same global key matching descriptor. The experimental results show that the matching accuracy of our network is 92.2% for automobile castings, the closed loop rate is about 74.0% when the matching tolerance threshold τ is 0.2. The matching descriptors performed well and retained 81.6% matching accuracy at 95% closed loop. For the sparse geometric castings with initial matching failure, the 3D matching object can be reconstructed robustly by training the key descriptors. Our method performs 3D reconstruction robustly for complex automobile castings.

  16. Differentiating Performance Approach Goals and Their Unique Effects

    Science.gov (United States)

    Edwards, Ordene V.

    2014-01-01

    The study differentiates between two types of performance approach goals (competence demonstration performance approach goal and normative performance approach goal) by examining their unique effects on self-efficacy, interest, and fear of failure. Seventy-nine students completed questionnaires that measure performance approach goals,…

  17. WaveSeq: a novel data-driven method of detecting histone modification enrichments using wavelets.

    Directory of Open Access Journals (Sweden)

    Apratim Mitra

    Full Text Available BACKGROUND: Chromatin immunoprecipitation followed by next-generation sequencing is a genome-wide analysis technique that can be used to detect various epigenetic phenomena such as, transcription factor binding sites and histone modifications. Histone modification profiles can be either punctate or diffuse which makes it difficult to distinguish regions of enrichment from background noise. With the discovery of histone marks having a wide variety of enrichment patterns, there is an urgent need for analysis methods that are robust to various data characteristics and capable of detecting a broad range of enrichment patterns. RESULTS: To address these challenges we propose WaveSeq, a novel data-driven method of detecting regions of significant enrichment in ChIP-Seq data. Our approach utilizes the wavelet transform, is free of distributional assumptions and is robust to diverse data characteristics such as low signal-to-noise ratios and broad enrichment patterns. Using publicly available datasets we showed that WaveSeq compares favorably with other published methods, exhibiting high sensitivity and precision for both punctate and diffuse enrichment regions even in the absence of a control data set. The application of our algorithm to a complex histone modification data set helped make novel functional discoveries which further underlined its utility in such an experimental setup. CONCLUSIONS: WaveSeq is a highly sensitive method capable of accurate identification of enriched regions in a broad range of data sets. WaveSeq can detect both narrow and broad peaks with a high degree of accuracy even in low signal-to-noise ratio data sets. WaveSeq is also suited for application in complex experimental scenarios, helping make biologically relevant functional discoveries.

  18. Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

    Energy Technology Data Exchange (ETDEWEB)

    Leistedt, Boris; Hogg, David W., E-mail: boris.leistedt@nyu.edu, E-mail: david.hogg@nyu.edu [Center for Cosmology and Particle Physics, Department of Physics, New York University, New York, NY 10003 (United States)

    2017-03-20

    We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux–redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i -magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST ) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

  19. Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

    International Nuclear Information System (INIS)

    Leistedt, Boris; Hogg, David W.

    2017-01-01

    We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux–redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i -magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST ) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

  20. Telling Anthropocene Tales: Localizing the impacts of global change using data-driven story maps

    Science.gov (United States)

    Mychajliw, A.; Hadly, E. A.

    2016-12-01

    Navigating the Anthropocene requires innovative approaches for generating scientific knowledge and for its communication outside academia. The global, synergistic nature of the environmental challenges we face - climate change, human population growth, biodiversity loss, pollution, invasive species and diseases - highlight the need for public outreach strategies that incorporate multiple scales and perspectives in an easily understandable and rapidly accessible format. Data-driven story-telling maps are optimal in that they can display variable geographic scales and their intersections with the environmental challenges relevant to both scientists and non-scientists. Maps are a powerful way to present complex data to all stakeholders. We present an overview of best practices in community-engaged scientific story-telling and data translation for policy-makers by reviewing three Story Map projects that map the geographic impacts of global change across multiple spatial and policy scales: the entire United States, the state of California, and the town of Pescadero, California. We document a chain of translation from a primary scientific manscript to a policy document (Scientific Consensus Statement on Maintaining Humanity's Life Support Systems in the 21st Century) to a set of interactive ArcGIS Story Maps. We discuss the widening breadth of participants (students, community members) and audiences (White House, Governor's Office of California, California Congressional Offices, general public) involved. We highlight how scientists, through careful curation of popular news media articles and stakeholder interviews, can co-produce these communication modules with community partners such as non-governmental organizations and government agencies. The placement of scientific and citizen's everyday knowledge of global change into an appropriate geographic context allows for effective dissemination by political units such as congressional districts and agency management units

  1. Effective Rating Scale Development for Speaking Tests: Performance Decision Trees

    Science.gov (United States)

    Fulcher, Glenn; Davidson, Fred; Kemp, Jenny

    2011-01-01

    Rating scale design and development for testing speaking is generally conducted using one of two approaches: the measurement-driven approach or the performance data-driven approach. The measurement-driven approach prioritizes the ordering of descriptors onto a single scale. Meaning is derived from the scaling methodology and the agreement of…

  2. A Machine Learning Approach to Discover Rules for Expressive Performance Actions in Jazz Guitar Music

    Science.gov (United States)

    Giraldo, Sergio I.; Ramirez, Rafael

    2016-01-01

    Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores) of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1) quantitatively evaluate the accuracy of the induced models, (2) analyse the relative importance of the considered musical features, (3) discuss some of the learnt expressive performance rules in the context of previous work, and (4) assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules' performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the ornamentation rules

  3. A Machine Learning Approach to Discover Rules for Expressive Performance Actions in Jazz Guitar Music

    Directory of Open Access Journals (Sweden)

    Sergio Ivan Giraldo

    2016-12-01

    Full Text Available Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1 quantitatively evaluate the accuracy of the induced models, (2 analyse the relative importance of the considered musical features, (3 discuss some of the learnt expressive performance rules in the context of previous work, and (4 assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules’ performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the

  4. A Machine Learning Approach to Discover Rules for Expressive Performance Actions in Jazz Guitar Music.

    Science.gov (United States)

    Giraldo, Sergio I; Ramirez, Rafael

    2016-01-01

    Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores) of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1) quantitatively evaluate the accuracy of the induced models, (2) analyse the relative importance of the considered musical features, (3) discuss some of the learnt expressive performance rules in the context of previous work, and (4) assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules' performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the ornamentation rules.

  5. Redefining the Practice of Peer Review Through Intelligent Automation Part 2: Data-Driven Peer Review Selection and Assignment.

    Science.gov (United States)

    Reiner, Bruce I

    2017-12-01

    In conventional radiology peer review practice, a small number of exams (routinely 5% of the total volume) is randomly selected, which may significantly underestimate the true error rate within a given radiology practice. An alternative and preferable approach would be to create a data-driven model which mathematically quantifies a peer review risk score for each individual exam and uses this data to identify high risk exams and readers, and selectively target these exams for peer review. An analogous model can also be created to assist in the assignment of these peer review cases in keeping with specific priorities of the service provider. An additional option to enhance the peer review process would be to assign the peer review cases in a truly blinded fashion. In addition to eliminating traditional peer review bias, this approach has the potential to better define exam-specific standard of care, particularly when multiple readers participate in the peer review process.

  6. Robust Data-Driven Inference for Density-Weighted Average Derivatives

    DEFF Research Database (Denmark)

    Cattaneo, Matias D.; Crump, Richard K.; Jansson, Michael

    This paper presents a new data-driven bandwidth selector compatible with the small bandwidth asymptotics developed in Cattaneo, Crump, and Jansson (2009) for density- weighted average derivatives. The new bandwidth selector is of the plug-in variety, and is obtained based on a mean squared error...

  7. Ability Grouping and Differentiated Instruction in an Era of Data-Driven Decision Making

    Science.gov (United States)

    Park, Vicki; Datnow, Amanda

    2017-01-01

    Despite data-driven decision making being a ubiquitous part of policy and school reform efforts, little is known about how teachers use data for instructional decision making. Drawing on data from a qualitative case study of four elementary schools, we examine the logic and patterns of teacher decision making about differentiation and ability…

  8. Data-driven haemodynamic response function extraction using Fourier-wavelet regularised deconvolution

    NARCIS (Netherlands)

    Wink, Alle Meije; Hoogduin, Hans; Roerdink, Jos B.T.M.

    2008-01-01

    Background: We present a simple, data-driven method to extract haemodynamic response functions (HRF) from functional magnetic resonance imaging (fMRI) time series, based on the Fourier-wavelet regularised deconvolution (ForWaRD) technique. HRF data are required for many fMRI applications, such as

  9. Data-driven haemodynamic response function extraction using Fourier-wavelet regularised deconvolution

    NARCIS (Netherlands)

    Wink, Alle Meije; Hoogduin, Hans; Roerdink, Jos B.T.M.

    2010-01-01

    Background: We present a simple, data-driven method to extract haemodynamic response functions (HRF) from functional magnetic resonance imaging (fMRI) time series, based on the Fourier-wavelet regularised deconvolution (ForWaRD) technique. HRF data are required for many fMRI applications, such as

  10. Design and evaluation of a data-driven scenario generation framework for game-based training

    NARCIS (Netherlands)

    Luo, L.; Yin, H.; Cai, W.; Zhong, J.; Lees, M.

    Generating suitable game scenarios that can cater for individual players has become an emerging challenge in procedural content generation. In this paper, we propose a data-driven scenario generation framework for game-based training. An evolutionary scenario generation process is designed with a

  11. Teacher Talk about Student Ability and Achievement in the Era of Data-Driven Decision Making

    Science.gov (United States)

    Datnow, Amanda; Choi, Bailey; Park, Vicki; St. John, Elise

    2018-01-01

    Background: Data-driven decision making continues to be a common feature of educational reform agendas across the globe. In many U.S. schools, the teacher team meeting is a key setting in which data use is intended to take place, with the aim of planning instruction to address students' needs. However, most prior research has not examined how the…

  12. Big-Data-Driven Stem Cell Science and Tissue Engineering: Vision and Unique Opportunities.

    Science.gov (United States)

    Del Sol, Antonio; Thiesen, Hans J; Imitola, Jaime; Carazo Salas, Rafael E

    2017-02-02

    Achieving the promises of stem cell science to generate precise disease models and designer cell samples for personalized therapeutics will require harnessing pheno-genotypic cell-level data quantitatively and predictively in the lab and clinic. Those requirements could be met by developing a Big-Data-driven stem cell science strategy and community. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Exploring Techniques of Developing Writing Skill in IELTS Preparatory Courses: A Data-Driven Study

    Science.gov (United States)

    Ostovar-Namaghi, Seyyed Ali; Safaee, Seyyed Esmail

    2017-01-01

    Being driven by the hypothetico-deductive mode of inquiry, previous studies have tested the effectiveness of theory-driven interventions under controlled experimental conditions to come up with universally applicable generalizations. To make a case in the opposite direction, this data-driven study aims at uncovering techniques and strategies…

  14. A framework for the automated data-driven constitutive characterization of composites

    Science.gov (United States)

    J.G. Michopoulos; John Hermanson; T. Furukawa; A. Iliopoulos

    2010-01-01

    We present advances on the development of a mechatronically and algorithmically automated framework for the data-driven identification of constitutive material models based on energy density considerations. These models can capture both the linear and nonlinear constitutive response of multiaxially loaded composite materials in a manner that accounts for progressive...

  15. Writing through Big Data: New Challenges and Possibilities for Data-Driven Arguments

    Science.gov (United States)

    Beveridge, Aaron

    2017-01-01

    As multimodal writing continues to shift and expand in the era of Big Data, writing studies must confront the new challenges and possibilities emerging from data mining, data visualization, and data-driven arguments. Often collected under the broad banner of "data literacy," students' experiences of data visualization and data-driven…

  16. Data-driven directions for effective footwear provision for the high-risk diabetic foot

    NARCIS (Netherlands)

    Arts, M. L. J.; de Haart, M.; Waaijman, R.; Dahmen, R.; Berendsen, H.; Nollet, F.; Bus, S. A.

    2015-01-01

    Custom-made footwear is used to offload the diabetic foot to prevent plantar foot ulcers. This prospective study evaluates the offloading effects of modifying custom-made footwear and aims to provide data-driven directions for the provision of effectively offloading footwear in clinical practice.

  17. Toward Data-Driven Design of Educational Courses: A Feasibility Study

    Science.gov (United States)

    Agrawal, Rakesh; Golshan, Behzad; Papalexakis, Evangelos

    2016-01-01

    A study plan is the choice of concepts and the organization and sequencing of the concepts to be covered in an educational course. While a good study plan is essential for the success of any course offering, the design of study plans currently remains largely a manual task. We present a novel data-driven method, which given a list of concepts can…

  18. Retesting the Limits of Data-Driven Learning: Feedback and Error Correction

    Science.gov (United States)

    Crosthwaite, Peter

    2017-01-01

    An increasing number of studies have looked at the value of corpus-based data-driven learning (DDL) for second language (L2) written error correction, with generally positive results. However, a potential conundrum for language teachers involved in the process is how to provide feedback on students' written production for DDL. The study looks at…

  19. Data-Driven Hint Generation in Vast Solution Spaces: A Self-Improving Python Programming Tutor

    Science.gov (United States)

    Rivers, Kelly; Koedinger, Kenneth R.

    2017-01-01

    To provide personalized help to students who are working on code-writing problems, we introduce a data-driven tutoring system, ITAP (Intelligent Teaching Assistant for Programming). ITAP uses state abstraction, path construction, and state reification to automatically generate personalized hints for students, even when given states that have not…

  20. Improving Spoken Language Outcomes for Children With Hearing Loss: Data-driven Instruction.

    Science.gov (United States)

    Douglas, Michael

    2016-02-01

    To assess the effects of data-driven instruction (DDI) on spoken language outcomes of children with cochlear implants and hearing aids. Retrospective, matched-pairs comparison of post-treatment speech/language data of children who did and did not receive DDI. Private, spoken-language preschool for children with hearing loss. Eleven matched pairs of children with cochlear implants who attended the same spoken language preschool. Groups were matched for age of hearing device fitting, time in the program, degree of predevice fitting hearing loss, sex, and age at testing. Daily informal language samples were collected and analyzed over a 2-year period, per preschool protocol. Annual informal and formal spoken language assessments in articulation, vocabulary, and omnibus language were administered at the end of three time intervals: baseline, end of year one, and end of year two. The primary outcome measures were total raw score performance of spontaneous utterance sentence types and syntax element use as measured by the Teacher Assessment of Spoken Language (TASL). In addition, standardized assessments (the Clinical Evaluation of Language Fundamentals--Preschool Version 2 (CELF-P2), the Expressive One-Word Picture Vocabulary Test (EOWPVT), the Receptive One-Word Picture Vocabulary Test (ROWPVT), and the Goldman-Fristoe Test of Articulation 2 (GFTA2)) were also administered and compared with the control group. The DDI group demonstrated significantly higher raw scores on the TASL each year of the study. The DDI group also achieved statistically significant higher scores for total language on the CELF-P and expressive vocabulary on the EOWPVT, but not for articulation nor receptive vocabulary. Post-hoc assessment revealed that 78% of the students in the DDI group achieved scores in the average range compared with 59% in the control group. The preliminary results of this study support further investigation regarding DDI to investigate whether this method can consistently

  1. The Application of Cyber Physical System for Thermal Power Plants: Data-Driven Modeling

    Directory of Open Access Journals (Sweden)

    Yongping Yang

    2018-03-01

    Full Text Available Optimal operation of energy systems plays an important role to enhance their lifetime security and efficiency. The determination of optimal operating strategies requires intelligent utilization of massive data accumulated during operation or prediction. The investigation of these data solely without combining physical models may run the risk that the established relationships between inputs and outputs, the models which reproduce the behavior of the considered system/component in a wide range of boundary conditions, are invalid for certain boundary conditions, which never occur in the database employed. Therefore, combining big data with physical models via cyber physical systems (CPS is of great importance to derive highly-reliable and -accurate models and becomes more and more popular in practical applications. In this paper, we focus on the description of a systematic method to apply CPS to the performance analysis and decision making of thermal power plants. We proposed a general procedure of CPS with both offline and online phases for its application to thermal power plants and discussed the corresponding methods employed to support each sub-procedure. As an example, a data-driven model of turbine island of an existing air-cooling based thermal power plant is established with the proposed procedure and demonstrates its practicality, validity and flexibility. To establish such model, the historical operating data are employed in the cyber layer for modeling and linking each physical component. The decision-making procedure of optimal frequency of air-cooling condenser is also illustrated to show its applicability of online use. It is concluded that the cyber physical system with the data mining technique is effective and promising to facilitate the real-time analysis and control of thermal power plants.

  2. Data-Driven Diffusion Of Innovations: Successes And Challenges In 3 Large-Scale Innovative Delivery Models.

    Science.gov (United States)

    Dorr, David A; Cohen, Deborah J; Adler-Milstein, Julia

    2018-02-01

    Failed diffusion of innovations may be linked to an inability to use and apply data, information, and knowledge to change perceptions of current practice and motivate change. Using qualitative and quantitative data from three large-scale health care delivery innovations-accountable care organizations, advanced primary care practice, and EvidenceNOW-we assessed where data-driven innovation is occurring and where challenges lie. We found that implementation of some technological components of innovation (for example, electronic health records) has occurred among health care organizations, but core functions needed to use data to drive innovation are lacking. Deficits include the inability to extract and aggregate data from the records; gaps in sharing data; and challenges in adopting advanced data functions, particularly those related to timely reporting of performance data. The unexpectedly high costs and burden incurred during implementation of the innovations have limited organizations' ability to address these and other deficits. Solutions that could help speed progress in data-driven innovation include facilitating peer-to-peer technical assistance, providing tailored feedback reports to providers from data aggregators, and using practice facilitators skilled in using data technology for quality improvement to help practices transform. Policy efforts that promote these solutions may enable more rapid uptake of and successful participation in innovative delivery system reforms.

  3. Comparison of Different Approaches to Predict the Performance of Pumps As Turbines (PATs

    Directory of Open Access Journals (Sweden)

    Mauro Venturini

    2018-04-01

    Full Text Available This paper deals with the comparison of different methods which can be used for the prediction of the performance curves of pumps as turbines (PATs. The considered approaches are four, i.e., one physics-based simulation model (“white box” model, two “gray box” models, which integrate theory on turbomachines with specific data correlations, and one “black box” model. More in detail, the modeling approaches are: (1 a physics-based simulation model developed by the same authors, which includes the equations for estimating head, power, and efficiency and uses loss coefficients and specific parameters; (2 a model developed by Derakhshan and Nourbakhsh, which first predicts the best efficiency point of a PAT and then reconstructs their complete characteristic curves by means of two ad hoc equations; (3 the prediction model developed by Singh and Nestmann, which predicts the complete turbine characteristics based on pump shape and size; (4 an Evolutionary Polynomial Regression model, which represents a data-driven hybrid scheme which can be used for identifying the explicit mathematical relationship between PAT and pump curves. All approaches are applied to literature data, relying on both pump and PAT performance curves of head, power, and efficiency over the entire range of operation. The experimental data were provided by Derakhshan and Nourbakhsh for four different turbomachines, working in both pump and PAT mode with specific speed values in the range 1.53–5.82. This paper provides a quantitative assessment of the predictions made by means of the considered approaches and also analyzes consistency from a physical point of view. Advantages and drawbacks of each method are also analyzed and discussed.

  4. An evaluation of data-driven motion estimation in comparison to the usage of external-surrogates in cardiac SPECT imaging

    International Nuclear Information System (INIS)

    Mukherjee, Joyeeta Mitra; Johnson, Karen L; Pretorius, P Hendrik; King, Michael A; Hutton, Brian F

    2013-01-01

    Motion estimation methods in single photon emission computed tomography (SPECT) can be classified into methods which depend on just the emission data (data-driven), or those that use some other source of information such as an external surrogate. The surrogate-based methods estimate the motion exhibited externally which may not correlate exactly with the movement of organs inside the body. The accuracy of data-driven strategies on the other hand is affected by the type and timing of motion occurrence during acquisition, the source distribution, and various degrading factors such as attenuation, scatter, and system spatial resolution. The goal of this paper is to investigate the performance of two data-driven motion estimation schemes based on the rigid-body registration of projections of motion-transformed source distributions to the acquired projection data for cardiac SPECT studies. Comparison is also made of six intensity based registration metrics to an external surrogate-based method. In the data-driven schemes, a partially reconstructed heart is used as the initial source distribution. The partially-reconstructed heart has inaccuracies due to limited angle artifacts resulting from using only a part of the SPECT projections acquired while the patient maintained the same pose. The performance of different cost functions in quantifying consistency with the SPECT projection data in the data-driven schemes was compared for clinically realistic patient motion occurring as discrete pose changes, one or two times during acquisition. The six intensity-based metrics studied were mean-squared difference, mutual information, normalized mutual information (NMI), pattern intensity (PI), normalized cross-correlation and entropy of the difference. Quantitative and qualitative analysis of the performance is reported using Monte-Carlo simulations of a realistic heart phantom including degradation factors such as attenuation, scatter and system spatial resolution. Further the

  5. Current Trends in the Detection of Sociocultural Signatures: Data-Driven Models

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Bell, Eric B.; Corley, Courtney D.

    2014-09-15

    available that are shaping social computing as a strongly data-driven experimental discipline with an increasingly stronger impact on the decision-making process of groups and individuals alike. In this chapter, we review current advances and trends in the detection of sociocultural signatures. Specific embodiments of the issues discussed are provided with respect to the assessment of violent intent and sociopolitical contention. We begin by reviewing current approaches to the detection of sociocultural signatures in these domains. Next, we turn to the review of novel data harvesting methods for social media content. Finally, we discuss the application of sociocultural models to social media content, and conclude by commenting on current challenges and future developments.

  6. An Open Framework for Dynamic Big-data-driven Application Systems (DBDDAS) Development

    KAUST Repository

    Douglas, Craig

    2014-01-01

    In this paper, we outline key features that dynamic data-driven application systems (DDDAS) have. A DDDAS is an application that has data assimilation that can change the models and/or scales of the computation and that the application controls the data collection based on the computational results. The term Big Data (BD) has come into being in recent years that is highly applicable to most DDDAS since most applications use networks of sensors that generate an overwhelming amount of data in the lifespan of the application runs. We describe what a dynamic big-data-driven application system (DBDDAS) toolkit must have in order to provide all of the essential building blocks that are necessary to easily create new DDDAS without re-inventing the building blocks.

  7. Data Driven Modelling of the Dynamic Wake Between Two Wind Turbines

    DEFF Research Database (Denmark)

    Knudsen, Torben; Bak, Thomas

    2012-01-01

    turbine. This paper establishes flow models relating the wind speeds at turbines in a farm. So far, research in this area has been mainly based on first principles static models and the data driven modelling done has not included the loading of the upwind turbine and its impact on the wind speed downwind......Wind turbines in a wind farm, influence each other through the wind flow. Downwind turbines are in the wake of upwind turbines and the wind speed experienced at downwind turbines is hence a function of the wind speeds at upwind turbines but also the momentum extracted from the wind by the upwind....... This paper is the first where modern commercial mega watt turbines are used for data driven modelling including the upwind turbine loading by changing power reference. Obtaining the necessary data is difficult and data is therefore limited. A simple dynamic extension to the Jensen wake model is tested...

  8. An Open Framework for Dynamic Big-data-driven Application Systems (DBDDAS) Development

    KAUST Repository

    Douglas, Craig

    2014-06-06

    In this paper, we outline key features that dynamic data-driven application systems (DDDAS) have. A DDDAS is an application that has data assimilation that can change the models and/or scales of the computation and that the application controls the data collection based on the computational results. The term Big Data (BD) has come into being in recent years that is highly applicable to most DDDAS since most applications use networks of sensors that generate an overwhelming amount of data in the lifespan of the application runs. We describe what a dynamic big-data-driven application system (DBDDAS) toolkit must have in order to provide all of the essential building blocks that are necessary to easily create new DDDAS without re-inventing the building blocks.

  9. Purely data-driven approaches to trading of renewable energy generation

    DEFF Research Database (Denmark)

    Mazzi, Nicolo; Pinson, Pierre

    2016-01-01

    could readily learn from market data and deduce how to offer strategically in order to maximize expected market revenues. Our analysis shows that a direct reinforcement learning algorithm can track the nominal level of the optimal quantile forecast to trade in the day-ahead market, while yielding higher...

  10. A Data-Driven Approach to Manage Charging Infrastructure for Electric Vehicles in Parking Lots

    NARCIS (Netherlands)

    J. Babic (Jurica); A. Carvalho (Arthur); W. Ketter (Wolfgang); V. Podobnik (Vedran)

    2017-01-01

    textabstractThe ever-increasing number of electric vehicles (EV) on the road is in line with many governments' efforts to tackle urgent environmental challenges. This inherently means that there is a growing need for charging infrastructure as well. A potential solution to address the need for

  11. Data-driven public transport ridership prediction approach including comfort aspects

    NARCIS (Netherlands)

    Van Oort, N.; Drost, M.; Brands, T.; Yap, M.

    2015-01-01

    The most important aspects on which passengers base their choice whether to travel by public transport are the perceived travel time, costs, reliability and comfort. Despite its importance, comfort is often not explicitly considered when predicting demand for public transport. In this paper, we

  12. Understanding Democracy and Development Traps Using a Data-Driven Approach

    Science.gov (United States)

    Ranganathan, Shyam; Nicolis, Stamatios C.; Spaiser, Viktoria; Sumpter, David J.T.

    2015-01-01

    Abstract Methods from machine learning and data science are becoming increasingly important in the social sciences, providing powerful new ways of identifying statistical relationships in large data sets. However, these relationships do not necessarily offer an understanding of the processes underlying the data. To address this problem, we have developed a method for fitting nonlinear dynamical systems models to data related to social change. Here, we use this method to investigate how countries become trapped at low levels of socioeconomic development. We identify two types of traps. The first is a democracy trap, where countries with low levels of economic growth and/or citizen education fail to develop democracy. The second trap is in terms of cultural values, where countries with low levels of democracy and/or life expectancy fail to develop emancipative values. We show that many key developing countries, including India and Egypt, lie near the border of these development traps, and we investigate the time taken for these nations to transition toward higher democracy and socioeconomic well-being. PMID:26487983

  13. Testing the Utility of a Data-Driven Approach for Assessing BMI from Face Images

    DEFF Research Database (Denmark)

    Wolffhechel, Karin Marie Brandt; Hahn, Amanda C.; Jarmer, Hanne Østergaard

    2015-01-01

    Several lines of evidence suggest that facial cues of adiposity may be important for human social interaction. However, tests for quantifiable cues of body mass index (BMI) in the face have examined only a small number of facial proportions and these proportions were found to have relatively low...

  14. Comparison of data-driven and model-driven approaches to brightness temperature diurnal cycle interpolation

    CSIR Research Space (South Africa)

    Van den Bergh, F

    2006-01-01

    Full Text Available This paper presents two new schemes for interpolating missing samples in satellite diurnal temperature cycles (DTCs). The first scheme, referred to here as the cosine model, is an improvement of the model proposed in [2] and combines a cosine...

  15. A Data-Driven Approach to SEM Development at a Two-Year College

    Science.gov (United States)

    Pirius, Landon K.

    2014-01-01

    This article explores implementation of strategic enrollment management (SEM) at a two-year college and why SEM is critical to the long-term viability of an institution. This article also outlines the five initial steps needed to implement SEM, including identifying SEM leadership, building a SEM committee, developing a common understanding of…

  16. Least squares approach for initial data recovery in dynamic data-driven applications simulations

    KAUST Repository

    Douglas, C.; Efendiev, Y.; Ewing, R.; Ginting, V.; Lazarov, R.; Cole, M.; Jones, G.

    2010-01-01

    In this paper, we consider the initial data recovery and the solution update based on the local measured data that are acquired during simulations. Each time new data is obtained, the initial condition, which is a representation of the solution at a

  17. SPECT acquisition using dynamic projections: a novel approach for data-driven respiratory gating

    International Nuclear Information System (INIS)

    Hutton, B.F.; Hatton, R.L.; Yip, N.

    2002-01-01

    Full text: Movement of the heart due to respiration has been previously demonstrated to produce potentially serious artefacts. On-line respiratory gating is difficult, as it requires a high level of patient cooperation. We demonstrate that use of dynamic acquisition of projections permits identification of the respiratory dynamics, allowing retrospective selection of data corresponding to a fixed point in the respiratory cycle. To demonstrate the feasibility of the technique a dynamic study was acquired just prior to myocardial per-fusion SPECT acquisition, using 5 frames/sec for 20 seconds (64*64 matrix) in anterior and lateral projections (using a dual-head right-angled configuration). The dynamic was processed a) by compressing frames in the transverse direction so as to illustrate time dependence, b) by plotting the centre of mass in the axial direction as a function of time. Respiratory motion was enhanced by use of temporal smoothing and intensity thresholding. In ten patients studied the cyclic pattern of motion due to respiratory dynamics was clearly visible in nine. Respiration typically resulted in around 1cm axial translation but in some individuals, movements as large as 3 cm were identified. The respiration rate ranged from 12-18 /min in agreement with independent observation of the patient's breathing pattern. These results suggest that retrospective respiratory gating is feasible without the need for any external respiratory monitoring device, provide that dynamic acquisition of SPECT projections is implemented. Correction for respiratory motion may also be feasible using this technique. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  18. EEG Analytics for Early Detection of Autism Spectrum Disorder: A data-driven approach.

    Science.gov (United States)

    Bosl, William J; Tager-Flusberg, Helen; Nelson, Charles A

    2018-05-01

    Autism spectrum disorder (ASD) is a complex and heterogeneous disorder, diagnosed on the basis of behavioral symptoms during the second year of life or later. Finding scalable biomarkers for early detection is challenging because of the variability in presentation of the disorder and the need for simple measurements that could be implemented routinely during well-baby checkups. EEG is a relatively easy-to-use, low cost brain measurement tool that is being increasingly explored as a potential clinical tool for monitoring atypical brain development. EEG measurements were collected from 99 infants with an older sibling diagnosed with ASD, and 89 low risk controls, beginning at 3 months of age and continuing until 36 months of age. Nonlinear features were computed from EEG signals and used as input to statistical learning methods. Prediction of the clinical diagnostic outcome of ASD or not ASD was highly accurate when using EEG measurements from as early as 3 months of age. Specificity, sensitivity and PPV were high, exceeding 95% at some ages. Prediction of ADOS calibrated severity scores for all infants in the study using only EEG data taken as early as 3 months of age was strongly correlated with the actual measured scores. This suggests that useful digital biomarkers might be extracted from EEG measurements.

  19. Big Data Innovation Challenge : Pioneering Approaches to Data-Driven Development

    OpenAIRE

    World Bank Group

    2016-01-01

    Big data can sound remote and lacking a human dimension, with few obvious links to development and impacting the lives of the poor. Concepts such as anti-poverty targeting, market access or rural electrification seem far more relevant – and easier to grasp. And yet some of today’s most groundbreaking initiatives in these areas rely on big data. This publication profiles these and more, sho...

  20. Estimating the National Carbon Abatement Potential of City Policies: A Data-Driven Approach

    Energy Technology Data Exchange (ETDEWEB)

    O' Shaughnessy, Eric [National Renewable Energy Lab. (NREL), Golden, CO (United States); Heeter, Jenny [National Renewable Energy Lab. (NREL), Golden, CO (United States); Keyser, David [National Renewable Energy Lab. (NREL), Golden, CO (United States); Gagnon, Pieter [National Renewable Energy Lab. (NREL), Golden, CO (United States); Aznar, Alexandra [National Renewable Energy Lab. (NREL), Golden, CO (United States)

    2016-10-01

    Cities are increasingly taking actions such as building code enforcement, urban planning, and public transit expansion to reduce emissions of carbon dioxide in their communities and municipal operations. However, many cities lack the quantitative information needed to estimate policy impacts and prioritize city actions in terms of carbon abatement potential and cost effectiveness. This report fills this research gap by providing methodologies to assess the carbon abatement potential of a variety of city actions. The methodologies are applied to an energy use data set of 23,458 cities compiled for the U.S. Department of Energy City Energy Profile tool. The analysis develops a national estimate of the carbon abatement potential of realizable city actions in six specific policy areas encompassing the most commonly implemented city actions. The results of this analysis suggest that, in aggregate, cities could reduce nationwide carbon emissions by about 210 million metric tons of carbon dioxide (MMT CO2) per year in a 'moderate abatement scenario' by 2035 and 480 MMT CO2/year in a 'high abatement scenario' by 2035 through these common actions typically within a city's control in the six policy areas. The aggregate carbon abatement potential of these specific areas equates to a reduction of 3%-7% relative to 2013 U.S. emissions. At the city level, the results suggest the average city could reduce carbon emissions by 7% (moderate) to 19% (high) relative to current city-level emissions. In the context of U.S. climate commitments under the 21st session of the Conference of the Parties (COP21), the estimated national abatement potential of the city actions analyzed in this report equates to about 15%-35% of the remaining carbon abatement necessary to achieve the U.S. COP21 target. Additional city actions outside the scope of this report, such as community choice aggregation (city-level purchasing of renewable energy), zero energy districts, and multi-level governance strategies, could significantly augment the carbon abatement contributions of city actions toward national climate targets. The results suggest that cities may play a pivotal role in progress toward national climate targets. In addition to providing carbon and emissions estimates, this report estimates the national net economic impacts of policies for which cost and benefit data are available. Impact metrics include employment, worker earnings, and gross domestic product (GDP). For the policy areas studied, the economic analysis demonstrates that city carbon abatement may be achieved with only minimal and generally slightly positive economic impacts. Employment impacts range from 0.04% to 0.13% of U.S, employment during implementation and zero to 0.1% thereafter. GDP estimates show net impacts of 0.02% to 0.07% of GDP during implementation and impacts from -0.02% to zero thereafter. This report quantitatively demonstrates the material impact of a limited set of local policy areas on national carbon abatement potential. The magnitude of estimated carbon reductions from city policies, 3%-7% of national emissions by 2035, suggests an important role for city-led actions in reaching U.S. climate goals. Multi-level governance at the city, state, and national levels could augment the carbon abatement potential of city actions and make cities a key component of long-term U.S. climate strategies.

  1. Estimating the National Carbon Abatement Potential of City Policies: A Data- Driven Approach

    Energy Technology Data Exchange (ETDEWEB)

    Eric O’Shaughnessy, Jenny Heeter, David Keyser, Pieter Gagnon, and Alexandra Aznar

    2016-10-01

    Cities are increasingly taking actions such as building code enforcement, urban planning, and public transit expansion to reduce emissions of carbon dioxide in their communities and municipal operations. However, many cities lack the quantitative information needed to estimate policy impacts and prioritize city actions in terms of carbon abatement potential and cost effectiveness. This report fills this research gap by providing methodologies to assess the carbon abatement potential of a variety of city actions. The methodologies are applied to an energy use data set of 23,458 cities compiled for the U.S. Department of Energy’s City Energy Profile tool. The analysis estimates the national carbon abatement potential of the most commonly implemented actions in six specific policy areas. The results of this analysis suggest that, in aggregate, cities could reduce nationwide carbon emissions by about 210 million metric tons of carbon dioxide (MMT CO2) per year in a "moderate abatement scenario" by 2035 and 480 MMT CO2/year in a "high abatement scenario" by 2035 through these common actions typically within a city’s control in the six policy areas. The aggregate carbon abatement potential of these specific areas equates to a reduction of 3%-7% relative to 2013 U.S. emissions. At the city level, the results suggest the average city could reduce carbon emissions by 7% (moderate) to 19% (high) relative to current city-level emissions. City carbon abatement potential is sensitive to national and state policies that affect the carbon intensity of electricity and transportation. Specifically, the U.S. Clean Power Plan and further renewable energy cost reductions could reduce city carbon emissions overall, helping cities achieve their carbon reduction goals.

  2. Understanding Democracy and Development Traps Using a Data-Driven Approach.

    Science.gov (United States)

    Ranganathan, Shyam; Nicolis, Stamatios C; Spaiser, Viktoria; Sumpter, David J T

    2015-03-01

    Methods from machine learning and data science are becoming increasingly important in the social sciences, providing powerful new ways of identifying statistical relationships in large data sets. However, these relationships do not necessarily offer an understanding of the processes underlying the data. To address this problem, we have developed a method for fitting nonlinear dynamical systems models to data related to social change. Here, we use this method to investigate how countries become trapped at low levels of socioeconomic development. We identify two types of traps. The first is a democracy trap, where countries with low levels of economic growth and/or citizen education fail to develop democracy. The second trap is in terms of cultural values, where countries with low levels of democracy and/or life expectancy fail to develop emancipative values. We show that many key developing countries, including India and Egypt, lie near the border of these development traps, and we investigate the time taken for these nations to transition toward higher democracy and socioeconomic well-being.

  3. Data Driven Approach for High Resolution Population Distribution and Dynamics Models

    Energy Technology Data Exchange (ETDEWEB)

    Bhaduri, Budhendra L [ORNL; Bright, Eddie A [ORNL; Rose, Amy N [ORNL; Liu, Cheng [ORNL; Urban, Marie L [ORNL; Stewart, Robert N [ORNL

    2014-01-01

    High resolution population distribution data are vital for successfully addressing critical issues ranging from energy and socio-environmental research to public health to human security. Commonly available population data from Census is constrained both in space and time and does not capture population dynamics as functions of space and time. This imposes a significant limitation on the fidelity of event-based simulation models with sensitive space-time resolution. This paper describes ongoing development of high-resolution population distribution and dynamics models, at Oak Ridge National Laboratory, through spatial data integration and modeling with behavioral or activity-based mobility datasets for representing temporal dynamics of population. The model is resolved at 1 km resolution globally and describes the U.S. population for nighttime and daytime at 90m. Integration of such population data provides the opportunity to develop simulations and applications in critical infrastructure management from local to global scales.

  4. Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains

    OpenAIRE

    Sethi, Tegjyot Singh; Kantardzic, Mehmed

    2017-01-01

    While modern day web applications aim to create impact at the civilization level, they have become vulnerable to adversarial activity, where the next cyber-attack can take any shape and can originate from anywhere. The increasing scale and sophistication of attacks, has prompted the need for a data driven solution, with machine learning forming the core of many cybersecurity systems. Machine learning was not designed with security in mind, and the essential assumption of stationarity, requiri...

  5. Data Driven Marketing in Apple and Back to School Campaign 2011

    OpenAIRE

    Bernátek, Martin

    2011-01-01

    Out of the campaign analysis the most important contribution is that Data-Driven Marketing makes sense only once it is already part of the marketing plan. So the team preparing the marketing plan defines the goals and sets the proper measurement matrix according to those goals. It enables to adjust the marketing plan to extract more value, watch the execution and do adjustments if necessary and evaluate at the end of the campaign.

  6. Data-driven automatic parking constrained control for four-wheeled mobile vehicles

    OpenAIRE

    Wenxu Yan; Jing Deng; Dezhi Xu

    2016-01-01

    In this article, a novel data-driven constrained control scheme is proposed for automatic parking systems. The design of the proposed scheme only depends on the steering angle and the orientation angle of the car, and it does not involve any model information of the car. Therefore, the proposed scheme-based automatic parking system is applicable to different kinds of cars. In order to further reduce the desired trajectory coordinate tracking errors, a coordinates compensation algorithm is als...

  7. Extension of a data-driven gating technique to 3D, whole body PET studies

    International Nuclear Information System (INIS)

    Schleyer, Paul J; O'Doherty, Michael J; Marsden, Paul K

    2011-01-01

    Respiratory gating can be used to separate a PET acquisition into a series of near motion-free bins. This is typically done using additional gating hardware; however, software-based methods can derive the respiratory signal from the acquired data itself. The aim of this work was to extend a data-driven respiratory gating method to acquire gated, 3D, whole body PET images of clinical patients. The existing method, previously demonstrated with 2D, single bed-position data, uses a spectral analysis to find regions in raw PET data which are subject to respiratory motion. The change in counts over time within these regions is then used to estimate the respiratory signal of the patient. In this work, the gating method was adapted to only accept lines of response from a reduced set of axial angles, and the respiratory frequency derived from the lung bed position was used to help identify the respiratory frequency in all other bed positions. As the respiratory signal does not identify the direction of motion, a registration-based technique was developed to align the direction for all bed positions. Data from 11 clinical FDG PET patients were acquired, and an optical respiratory monitor was used to provide a hardware-based signal for comparison. All data were gated using both the data-driven and hardware methods, and reconstructed. The centre of mass of manually defined regions on gated images was calculated, and the overall displacement was defined as the change in the centre of mass between the first and last gates. The mean displacement was 10.3 mm for the data-driven gated images and 9.1 mm for the hardware gated images. No significant difference was found between the two gating methods when comparing the displacement values. The adapted data-driven gating method was demonstrated to successfully produce respiratory gated, 3D, whole body, clinical PET acquisitions.

  8. KNMI DataLab experiences in serving data-driven innovations

    Science.gov (United States)

    Noteboom, Jan Willem; Sluiter, Raymond

    2016-04-01

    Climate change research and innovations in weather forecasting rely more and more on (Big) data. Besides increasing data from traditional sources (such as observation networks, radars and satellites), the use of open data, crowd sourced data and the Internet of Things (IoT) is emerging. To deploy these sources of data optimally in our services and products, KNMI has established a DataLab to serve data-driven innovations in collaboration with public and private sector partners. Big data management, data integration, data analytics including machine learning and data visualization techniques are playing an important role in the DataLab. Cross-domain data-driven innovations that arise from public-private collaborative projects and research programmes can be explored, experimented and/or piloted by the KNMI DataLab. Furthermore, advice can be requested on (Big) data techniques and data sources. In support of collaborative (Big) data science activities, scalable environments are offered with facilities for data integration, data analysis and visualization. In addition, Data Science expertise is provided directly or from a pool of internal and external experts. At the EGU conference, gained experiences and best practices are presented in operating the KNMI DataLab to serve data-driven innovations for weather and climate applications optimally.

  9. Data-driven analysis of collections of big datasets by the Bi-CoPaM method yields field-specific novel insights

    DEFF Research Database (Denmark)

    Abu-Jamous, Basel; Liu, Chao; Roberts, David, J.

    2017-01-01

    not commonly considered. To bridge this gap between the fast pace of data generation and the slower pace of data analysis, and to exploit the massive amounts of existing data, we suggest employing data-driven explorations to analyse collections of related big datasets. This approach aims at extracting field......Massive amounts of data have recently been, and are increasingly being, generated from various fields, such as bioinformatics, neuroscience and social networks. Many of these big datasets were generated to answer specific research questions, and were analysed accordingly. However, the scope...... clusters of consistently correlated objects. We demonstrate the power of data-driven explorations by applying the Bi-CoPaM to two collections of big datasets from two distinct fields, namely bioinformatics and neuroscience. In the first application, the collective analysis of forty yeast gene expression...

  10. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models.

    Directory of Open Access Journals (Sweden)

    David Henriques

    2017-02-01

    Full Text Available Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models, which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks: it builds dynamic (based on ordinary differential equation models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training. For this task, SELDOM's ensemble prediction is not only consistently better

  11. Regional regression models of percentile flows for the contiguous United States: Expert versus data-driven independent variable selection

    Directory of Open Access Journals (Sweden)

    Geoffrey Fouad

    2018-06-01

    New hydrological insights for the region: A set of three variables selected based on an expert assessment of factors that influence percentile flows performed similarly to larger sets of variables selected using a data-driven method. Expert assessment variables included mean annual precipitation, potential evapotranspiration, and baseflow index. Larger sets of up to 37 variables contributed little, if any, additional predictive information. Variables used to describe the distribution of basin data (e.g. standard deviation were not useful, and average values were sufficient to characterize physical and climatic basin conditions. Effectiveness of the expert assessment variables may be due to the high degree of multicollinearity (i.e. cross-correlation among additional variables. A tool is provided in the Supplementary material to predict percentile flows based on the three expert assessment variables. Future work should develop new variables with a strong understanding of the processes related to percentile flows.

  12. A data-driven adaptive controller for a class of unknown nonlinear discrete-time systems with estimated PPD

    Directory of Open Access Journals (Sweden)

    Chidentree Treesatayapun

    2015-06-01

    Full Text Available An adaptive control scheme based on data-driven controller (DDC is proposed in this article. Unlike several DDC techniques, the proposed controller is constructed by an adaptive fuzzy rule emulated network (FREN which is able to include human knowledge based on controlled plant's input–output signals within the format of IF-THEN rules. Regarding to this advantage, an on-line estimation of pseudo partial derivative (PPD and resetting algorithms, which are commonly used by DDC, can be omitted here. Furthermore, a novel adaptive algorithm is introduced to minimize for both tracking error and control effort with stability analysis for the closed-loop system. The experimental system with brushed DC-motor current control is constructed to validate the performance of the proposed control scheme. Comparative results with conventional DDC and radial basis function (RBF controllers demonstrate that the proposed controller can provide the less tracking error and minimize the control effort.

  13. Data-Driven Nonlinear Subspace Modeling for Prediction and Control of Molten Iron Quality Indices in Blast Furnace Ironmaking

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Ping; Song, Heda; Wang, Hong; Chai, Tianyou

    2017-09-01

    Blast furnace (BF) in ironmaking is a nonlinear dynamic process with complicated physical-chemical reactions, where multi-phase and multi-field coupling and large time delay occur during its operation. In BF operation, the molten iron temperature (MIT) as well as Si, P and S contents of molten iron are the most essential molten iron quality (MIQ) indices, whose measurement, modeling and control have always been important issues in metallurgic engineering and automation field. This paper develops a novel data-driven nonlinear state space modeling for the prediction and control of multivariate MIQ indices by integrating hybrid modeling and control techniques. First, to improve modeling efficiency, a data-driven hybrid method combining canonical correlation analysis and correlation analysis is proposed to identify the most influential controllable variables as the modeling inputs from multitudinous factors would affect the MIQ indices. Then, a Hammerstein model for the prediction of MIQ indices is established using the LS-SVM based nonlinear subspace identification method. Such a model is further simplified by using piecewise cubic Hermite interpolating polynomial method to fit the complex nonlinear kernel function. Compared to the original Hammerstein model, this simplified model can not only significantly reduce the computational complexity, but also has almost the same reliability and accuracy for a stable prediction of MIQ indices. Last, in order to verify the practicability of the developed model, it is applied in designing a genetic algorithm based nonlinear predictive controller for multivariate MIQ indices by directly taking the established model as a predictor. Industrial experiments show the advantages and effectiveness of the proposed approach.

  14. Data-driven Development of ROTEM and TEG Algorithms for the Management of Trauma Hemorrhage: A Prospective Observational Multicenter Study.

    Science.gov (United States)

    Baksaas-Aasen, Kjersti; Van Dieren, Susan; Balvers, Kirsten; Juffermans, Nicole P; Næss, Pål A; Rourke, Claire; Eaglestone, Simon; Ostrowski, Sisse R; Stensballe, Jakob; Stanworth, Simon; Maegele, Marc; Goslings, Carel; Johansson, Pär I; Brohi, Karim; Gaarder, Christine

    2018-05-23

    Developing pragmatic data-driven algorithms for management of trauma induced coagulopathy (TIC) during trauma hemorrhage for viscoelastic hemostatic assays (VHAs). Admission data from conventional coagulation tests (CCT), rotational thrombelastometry (ROTEM) and thrombelastography (TEG) were collected prospectively at 6 European trauma centers during 2008 to 2013. To identify significant VHA parameters capable of detecting TIC (defined as INR > 1.2), hypofibrinogenemia (< 2.0 g/L), and thrombocytopenia (< 100 x10/L), univariate regression models were constructed. Area under the curve (AUC) was calculated, and threshold values for TEG and ROTEM parameters with 70% sensitivity were included in the algorithms. A total of, 2287 adult trauma patients (ROTEM: 2019 and TEG: 968) were enrolled. FIBTEM clot amplitude at 5 minutes (CA5) had the largest AUC and 10 mm detected hypofibrinogenemia with 70% sensitivity. The corresponding value for functional fibrinogen (FF) TEG maximum amplitude (MA) was 19 mm. Thrombocytopenia was similarly detected using the calculated threshold EXTEM-FIBTEM CA5 30 mm. The corresponding rTEG-FF TEG MA was 46 mm. TIC was identified by EXTEM CA5 41 mm, rTEG MA 64 mm (80% sensitivity). For hyperfibrinolysis, we examined the relationship between viscoelastic lysis parameters and clinical outcomes, with resulting threshold values of 85% for EXTEM Li30 and 10% for rTEG Ly30.Based on these analyses, we constructed algorithms for ROTEM, TEG, and CCTs to be used in addition to ratio driven transfusion and tranexamic acid. We describe a systematic approach to define threshold parameters for ROTEM and TEG. These parameters were incorporated into algorithms to support data-driven adjustments of resuscitation with therapeutics, to optimize damage control resuscitation practice in trauma.

  15. Business Intelligence Approach In A Business Performance Context

    OpenAIRE

    Muntean, Mihaela; Cabau, Liviu Gabriel

    2011-01-01

    Subordinated to performance management, Business Intelligence approaches help firms to optimize business performance. Key performance indicators will be added to the multidimensional model grounding the performance perspectives. With respect to the Business Intelligence value chain, a theoretical approach was introduced and a practice example, based on Microsoft SQL Server specific services, for the customer perspective was implemented.

  16. Accounting Student's Learning Approaches And Impact On Academic Performance

    OpenAIRE

    Ismail, Suhaiza

    2009-01-01

    The objective of the study is threefold. Firstly, the study explores the learning approaches adopted by students in completing their Business Finance. Secondly, it examines the impact that learning approaches has on the student's academic performance. Finally, the study considers gender differences in the learning approaches adopted by students and in the relationship between learning approaches and academic performance. The Approaches and Study Skills Inventory for Students (ASSIST) was used...

  17. Approaching Sentient Building Performance Simulation Systems

    DEFF Research Database (Denmark)

    Negendahl, Kristoffer; Perkov, Thomas; Heller, Alfred

    2014-01-01

    Sentient BPS systems can combine one or more high precision BPS and provide near instantaneous performance feedback directly in the design tool, thus providing speed and precision of building performance in the early design stages. Sentient BPS systems are essentially combining: 1) design tools, 2......) parametric tools, 3) BPS tools, 4) dynamic databases 5) interpolation techniques and 6) prediction techniques as a fast and valid simulation system, in the early design stage....

  18. Data-driven criteria to assess fear remission and phenotypic variability of extinction in rats.

    Science.gov (United States)

    Shumake, Jason; Jones, Carolyn; Auchter, Allison; Monfils, Marie-Hélène

    2018-03-19

    Fear conditioning is widely employed to examine the mechanisms that underlie dysregulations of the fear system. Various manipulations are often used following fear acquisition to attenuate fear memories. In rodent studies, freezing is often the main output measure to quantify 'fear'. Here, we developed data-driven criteria for defining a standard benchmark that indicates remission from conditioned fear and for identifying subgroups with differential treatment responses. These analyses will enable a better understanding of individual differences in treatment responding.This article is part of a discussion meeting issue 'Of mice and mental health: facilitating dialogue between basic and clinical neuroscientists'. © 2018 The Author(s).

  19. Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study

    OpenAIRE

    Kim, Eun-Kyung; Kim, Hyo-Eun; Han, Kyunghwa; Kang, Bong Joo; Sohn, Yu-Mee; Woo, Ok Hee; Lee, Chan Wha

    2018-01-01

    We assessed the feasibility of a data-driven imaging biomarker based on weakly supervised learning (DIB; an imaging biomarker derived from large-scale medical image data with deep learning technology) in mammography (DIB-MG). A total of 29,107 digital mammograms from five institutions (4,339 cancer cases and 24,768 normal cases) were included. After matching patients’ age, breast density, and equipment, 1,238 and 1,238 cases were chosen as validation and test sets, respectively, and the remai...

  20. Building Data-Driven Pathways From Routinely Collected Hospital Data: A Case Study on Prostate Cancer

    Science.gov (United States)

    Clark, Jeremy; Cooper, Colin S; Mills, Robert; Rayward-Smith, Victor J; de la Iglesia, Beatriz

    2015-01-01

    Background Routinely collected data in hospitals is complex, typically heterogeneous, and scattered across multiple Hospital Information Systems (HIS). This big data, created as a byproduct of health care activities, has the potential to provide a better understanding of diseases, unearth hidden patterns, and improve services and cost. The extent and uses of such data rely on its quality, which is not consistently checked, nor fully understood. Nevertheless, using routine data for the construction of data-driven clinical pathways, describing processes and trends, is a key topic receiving increasing attention in the literature. Traditional algorithms do not cope well with unstructured processes or data, and do not produce clinically meaningful visualizations. Supporting systems that provide additional information, context, and quality assurance inspection are needed. Objective The objective of the study is to explore how routine hospital data can be used to develop data-driven pathways that describe the journeys that patients take through care, and their potential uses in biomedical research; it proposes a framework for the construction, quality assessment, and visualization of patient pathways for clinical studies and decision support using a case study on prostate cancer. Methods Data pertaining to prostate cancer patients were extracted from a large UK hospital from eight different HIS, validated, and complemented with information from the local cancer registry. Data-driven pathways were built for each of the 1904 patients and an expert knowledge base, containing rules on the prostate cancer biomarker, was used to assess the completeness and utility of the pathways for a specific clinical study. Software components were built to provide meaningful visualizations for the constructed pathways. Results The proposed framework and pathway formalism enable the summarization, visualization, and querying of complex patient-centric clinical information, as well as the

  1. Classification Systems, their Digitization and Consequences for Data-Driven Decision Making

    DEFF Research Database (Denmark)

    Stein, Mari-Klara; Newell, Sue; Galliers, Robert D.

    2013-01-01

    Classification systems are foundational in many standardized software tools. This digitization of classification systems gives them a new ‘materiality’ that, jointly with the social practices of information producers/consumers, has significant consequences on the representational quality of such ...... and the foundational role of representational quality in understanding the success and consequences of data-driven decision-making.......-narration and meta-narration), and three different information production/consumption situations. We contribute to the relational theorization of representational quality and extend classification systems research by drawing explicit attention to the importance of ‘materialization’ of classification systems...

  2. Efficacy of a Template Creation Approach for Performance Improvement

    Science.gov (United States)

    Lyons, Paul R.

    2011-01-01

    This article presents the training and performance improvement approach, performance templates (P-T), and provides empirical evidence to support the efficacy of P-T. This approach involves a partnership among managers, trainers, and employees in the creation, use, and improvement of guides to affect the performance of critical tasks in the…

  3. Enhanced Portfolio Performance Using a Momentum Approach to Annual Rebalancing

    Directory of Open Access Journals (Sweden)

    Michael D. Mattei

    2018-02-01

    Full Text Available After diversification, periodic portfolio rebalancing has become one of the most widely practiced methods for reducing portfolio risk and enhancing returns. Most of the rebalancing strategies found in the literature are generally regarded as contrarian approaches to rebalancing. A recent article proposed a rebalancing approach that incorporates a momentum approach to rebalancing. The momentum approach had a better risk adjusted return than either the traditional approach or a Buy-and-Hold approach. This article identifies an improvement to the momentum approach and then examines the impact of transactions costs and taxes on the portfolio performance of four active rebalancing approaches.

  4. New data-driven estimation of terrestrial CO2 fluxes in Asia using a standardized database of eddy covariance measurements, remote sensing data, and support vector regression

    Science.gov (United States)

    Ichii, Kazuhito; Ueyama, Masahito; Kondo, Masayuki; Saigusa, Nobuko; Kim, Joon; Alberto, Ma. Carmelita; Ardö, Jonas; Euskirchen, Eugénie S.; Kang, Minseok; Hirano, Takashi; Joiner, Joanna; Kobayashi, Hideki; Marchesini, Luca Belelli; Merbold, Lutz; Miyata, Akira; Saitoh, Taku M.; Takagi, Kentaro; Varlagin, Andrej; Bret-Harte, M. Syndonia; Kitamura, Kenzo; Kosugi, Yoshiko; Kotani, Ayumi; Kumar, Kireet; Li, Sheng-Gong; Machimura, Takashi; Matsuura, Yojiro; Mizoguchi, Yasuko; Ohta, Takeshi; Mukherjee, Sandipan; Yanagi, Yuji; Yasuda, Yukio; Zhang, Yiping; Zhao, Fenghua

    2017-04-01

    The lack of a standardized database of eddy covariance observations has been an obstacle for data-driven estimation of terrestrial CO2 fluxes in Asia. In this study, we developed such a standardized database using 54 sites from various databases by applying consistent postprocessing for data-driven estimation of gross primary productivity (GPP) and net ecosystem CO2 exchange (NEE). Data-driven estimation was conducted by using a machine learning algorithm: support vector regression (SVR), with remote sensing data for 2000 to 2015 period. Site-level evaluation of the estimated CO2 fluxes shows that although performance varies in different vegetation and climate classifications, GPP and NEE at 8 days are reproduced (e.g., r2 = 0.73 and 0.42 for 8 day GPP and NEE). Evaluation of spatially estimated GPP with Global Ozone Monitoring Experiment 2 sensor-based Sun-induced chlorophyll fluorescence shows that monthly GPP variations at subcontinental scale were reproduced by SVR (r2 = 1.00, 0.94, 0.91, and 0.89 for Siberia, East Asia, South Asia, and Southeast Asia, respectively). Evaluation of spatially estimated NEE with net atmosphere-land CO2 fluxes of Greenhouse Gases Observing Satellite (GOSAT) Level 4A product shows that monthly variations of these data were consistent in Siberia and East Asia; meanwhile, inconsistency was found in South Asia and Southeast Asia. Furthermore, differences in the land CO2 fluxes from SVR-NEE and GOSAT Level 4A were partially explained by accounting for the differences in the definition of land CO2 fluxes. These data-driven estimates can provide a new opportunity to assess CO2 fluxes in Asia and evaluate and constrain terrestrial ecosystem models.

  5. A new approach in performing microdiffraction analysis

    International Nuclear Information System (INIS)

    Winter, D.J.; Squires, B.A.

    1995-01-01

    Microdiffraction is defined as the x-ray diffraction analysis performed on small samples or MD areas of large samples. Since smallness is a relative term, microdiffraction is considered the technique of choice when samples are too small for the optics and precision of conventional instrumentation. The limit on the size of the sample is dependent upon the accuracy of the instrumentation, which is measured by such variables as the diameter of the incident beam and the sphere of confusion of the goniometer (accuracy of the circle centers). If the sample area of interest is part of a multiphase material, it is necessary for the diameter of the incident x-ray beam to be smaller than the sample area in order to assure that the diffraction pattern produced is from the sample area of interest only. Today, microdiffraction is being performed on samples as small as a few microns in diameter. Common applications for microdiffraction include composite materials such as wafers and pads used in the semiconductor industry, inclusions on laser disks and forensic studies. The analysis is often complicated by the fact that the sample areas can be a few grains or even a single crystal. Conventional powder diffractometers are very well suited for analyzing large volumes of polycrystalline material, however, they require much longer counting times when the sample volume is very small. Ideally, what is needed is the optics of a single crystal diffractometer with the performance of a conventional powder diffractometer. 6 figs

  6. Optofluidic Approaches for Enhanced Microsensor Performances

    Directory of Open Access Journals (Sweden)

    Genni Testa

    2014-12-01

    Full Text Available Optofluidics is a relatively young research field able to create a tight synergy between optics and micro/nano-fluidics. The high level of integration between fluidic and optical elements achievable by means of optofluidic approaches makes it possible to realize an innovative class of sensors, which have been demonstrated to have an improved sensitivity, adaptability and compactness. Many developments in this field have been made in the last years thanks to the availability of a new class of low cost materials and new technologies. This review describes the Italian state of art on optofluidic devices for sensing applications and offers a perspective for further future advances. We introduce the optofluidic concept and describe the advantages of merging photonic and fluidic elements, focusing on sensor developments for both environmental and biomedical monitoring.

  7. General Purpose Data-Driven Online System Health Monitoring with Applications to Space Operations

    Science.gov (United States)

    Iverson, David L.; Spirkovska, Lilly; Schwabacher, Mark

    2010-01-01

    Modern space transportation and ground support system designs are becoming increasingly sophisticated and complex. Determining the health state of these systems using traditional parameter limit checking, or model-based or rule-based methods is becoming more difficult as the number of sensors and component interactions grows. Data-driven monitoring techniques have been developed to address these issues by analyzing system operations data to automatically characterize normal system behavior. System health can be monitored by comparing real-time operating data with these nominal characterizations, providing detection of anomalous data signatures indicative of system faults, failures, or precursors of significant failures. The Inductive Monitoring System (IMS) is a general purpose, data-driven system health monitoring software tool that has been successfully applied to several aerospace applications and is under evaluation for anomaly detection in vehicle and ground equipment for next generation launch systems. After an introduction to IMS application development, we discuss these NASA online monitoring applications, including the integration of IMS with complementary model-based and rule-based methods. Although the examples presented in this paper are from space operations applications, IMS is a general-purpose health-monitoring tool that is also applicable to power generation and transmission system monitoring.

  8. Data-driven CT protocol review and management—experience from a large academic hospital.

    Science.gov (United States)

    Zhang, Da; Savage, Cristy A; Li, Xinhua; Liu, Bob

    2015-03-01

    Protocol review plays a critical role in CT quality assurance, but large numbers of protocols and inconsistent protocol names on scanners and in exam records make thorough protocol review formidable. In this investigation, we report on a data-driven cataloging process that can be used to assist in the reviewing and management of CT protocols. We collected lists of scanner protocols, as well as 18 months of recent exam records, for 10 clinical scanners. We developed computer algorithms to automatically deconstruct the protocol names on the scanner and in the exam records into core names and descriptive components. Based on the core names, we were able to group the scanner protocols into a much smaller set of "core protocols," and to easily link exam records with the scanner protocols. We calculated the percentage of usage for each core protocol, from which the most heavily used protocols were identified. From the percentage-of-usage data, we found that, on average, 18, 33, and 49 core protocols per scanner covered 80%, 90%, and 95%, respectively, of all exams. These numbers are one order of magnitude smaller than the typical numbers of protocols that are loaded on a scanner (200-300, as reported in the literature). Duplicated, outdated, and rarely used protocols on the scanners were easily pinpointed in the cataloging process. The data-driven cataloging process can facilitate the task of protocol review. Copyright © 2015 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  9. Data-driven directions for effective footwear provision for the high-risk diabetic foot.

    Science.gov (United States)

    Arts, M L J; de Haart, M; Waaijman, R; Dahmen, R; Berendsen, H; Nollet, F; Bus, S A

    2015-06-01

    Custom-made footwear is used to offload the diabetic foot to prevent plantar foot ulcers. This prospective study evaluates the offloading effects of modifying custom-made footwear and aims to provide data-driven directions for the provision of effectively offloading footwear in clinical practice. Eighty-five people with diabetic neuropathy and a recently healed plantar foot ulcer, who participated in a clinical trial on footwear effectiveness, had their custom-made footwear evaluated with in-shoe plantar pressure measurements at three-monthly intervals. Footwear was modified when peak pressure was ≥ 200 kPa. The effect of single and combined footwear modifications on in-shoe peak pressure at these high-pressure target locations was assessed. All footwear modifications significantly reduced peak pressure at the target locations compared with pre-modification levels (range -6.7% to -24.0%, P diabetic neuropathy and a recently healed plantar foot ulcer, significant offloading can be achieved at high-risk foot regions by modifying custom-made footwear. These results provide data-driven directions for the design and evaluation of custom-made footwear for high-risk people with diabetes, and essentially mean that each shoe prescribed should incorporate those design features that effectively offload the foot. © 2015 The Authors. Diabetic Medicine © 2015 Diabetes UK.

  10. Microenvironment temperature prediction between body and seat interface using autoregressive data-driven model.

    Science.gov (United States)

    Liu, Zhuofu; Wang, Lin; Luo, Zhongming; Heusch, Andrew I; Cascioli, Vincenzo; McCarthy, Peter W

    2015-11-01

    There is a need to develop a greater understanding of temperature at the skin-seat interface during prolonged seating from the perspectives of both industrial design (comfort/discomfort) and medical care (skin ulcer formation). Here we test the concept of predicting temperature at the seat surface and skin interface during prolonged sitting (such as required from wheelchair users). As caregivers are usually busy, such a method would give them warning ahead of a problem. This paper describes a data-driven model capable of predicting thermal changes and thus having the potential to provide an early warning (15- to 25-min ahead prediction) of an impending temperature that may increase the risk for potential skin damages for those subject to enforced sitting and who have little or no sensory feedback from this area. Initially, the oscillations of the original signal are suppressed using the reconstruction strategy of empirical mode decomposition (EMD). Consequentially, the autoregressive data-driven model can be used to predict future thermal trends based on a shorter period of acquisition, which reduces the possibility of introducing human errors and artefacts associated with longer duration "enforced" sitting by volunteers. In this study, the method had a maximum predictive error of body insensitivity and disability requiring them to be immobile in seats for prolonged periods. Copyright © 2015 Tissue Viability Society. Published by Elsevier Ltd. All rights reserved.

  11. Practical options for selecting data-driven or physics-based prognostics algorithms with reviews

    International Nuclear Information System (INIS)

    An, Dawn; Kim, Nam H.; Choi, Joo-Ho

    2015-01-01

    This paper is to provide practical options for prognostics so that beginners can select appropriate methods for their fields of application. To achieve this goal, several popular algorithms are first reviewed in the data-driven and physics-based prognostics methods. Each algorithm’s attributes and pros and cons are analyzed in terms of model definition, model parameter estimation and ability to handle noise and bias in data. Fatigue crack growth examples are then used to illustrate the characteristics of different algorithms. In order to suggest a suitable algorithm, several studies are made based on the number of data sets, the level of noise and bias, availability of loading and physical models, and complexity of the damage growth behavior. Based on the study, it is concluded that the Gaussian process is easy and fast to implement, but works well only when the covariance function is properly defined. The neural network has the advantage in the case of large noise and complex models but only with many training data sets. The particle filter and Bayesian method are superior to the former methods because they are less affected by noise and model complexity, but work only when physical model and loading conditions are available. - Highlights: • Practical review of data-driven and physics-based prognostics are provided. • As common prognostics algorithms, NN, GP, PF and BM are introduced. • Algorithms’ attributes, pros and cons, and applicable conditions are discussed. • This will be helpful to choose the best algorithm for different applications

  12. Shaping the manufacturing industry performance: MIDAS approach

    International Nuclear Information System (INIS)

    Turhan, Ibrahim M.; Sensoy, Ahmet; Hacihasanoglu, Erk

    2015-01-01

    We aim to find out whether the exchange rate (against US dollar) or the interest rate (in local currency) is a better variable in predicting the capacity utilization rate of manufacturing industry (CUR) of Turkey after the 2008 global financial crisis. In that manner, we implement dynamic mixed data sampling (MIDAS) regression model to forecast monthly changes in CUR by using daily changes in the exchange rate and the interest rate separately. The results show that exchange rate has a better forecast performance suggesting that it is a stronger determinant in shaping the manufacturing industry

  13. An applicable approach for performance auditing in ERP

    Directory of Open Access Journals (Sweden)

    Wan Jian Guo

    2016-01-01

    Full Text Available This paper aims at the realistic problem of performance auditing in ERP environment. Traditional performance auditing methods and existing approaches for performance evaluation of ERP implementation could not work well, because they are either difficult to work or contains certain subjective elements. This paper proposed an applicable performance auditing approach for SAP ERP based on quantitative analysis. This approach consists of 3 parts which are system utilization, data quality and the effectiveness of system control. In each part, we provide the main process to conduct the operation, especially how to calculate the online settlement rate of SAP system. This approach has played an important role in the practical auditing work. A practical case is provided at the end of this paper to describe the effectiveness of this approach. Implementation of this approach also has some significance to the performance auditing of other ERP products.

  14. Meta-control of combustion performance with a data mining approach

    Science.gov (United States)

    Song, Zhe

    Large scale combustion process is complex and proposes challenges of optimizing its performance. Traditional approaches based on thermal dynamics have limitations on finding optimal operational regions due to time-shift nature of the process. Recent advances in information technology enable people collect large volumes of process data easily and continuously. The collected process data contains rich information about the process and, to some extent, represents a digital copy of the process over time. Although large volumes of data exist in industrial combustion processes, they are not fully utilized to the level where the process can be optimized. Data mining is an emerging science which finds patterns or models from large data sets. It has found many successful applications in business marketing, medical and manufacturing domains The focus of this dissertation is on applying data mining to industrial combustion processes, and ultimately optimizing the combustion performance. However the philosophy, methods and frameworks discussed in this research can also be applied to other industrial processes. Optimizing an industrial combustion process has two major challenges. One is the underlying process model changes over time and obtaining an accurate process model is nontrivial. The other is that a process model with high fidelity is usually highly nonlinear, solving the optimization problem needs efficient heuristics. This dissertation is set to solve these two major challenges. The major contribution of this 4-year research is the data-driven solution to optimize the combustion process, where process model or knowledge is identified based on the process data, then optimization is executed by evolutionary algorithms to search for optimal operating regions.

  15. Data-driven simultaneous fault diagnosis for solid oxide fuel cell system using multi-label pattern identification

    Science.gov (United States)

    Li, Shuanghong; Cao, Hongliang; Yang, Yupu

    2018-02-01

    Fault diagnosis is a key process for the reliability and safety of solid oxide fuel cell (SOFC) systems. However, it is difficult to rapidly and accurately identify faults for complicated SOFC systems, especially when simultaneous faults appear. In this research, a data-driven Multi-Label (ML) pattern identification approach is proposed to address the simultaneous fault diagnosis of SOFC systems. The framework of the simultaneous-fault diagnosis primarily includes two components: feature extraction and ML-SVM classifier. The simultaneous-fault diagnosis approach can be trained to diagnose simultaneous SOFC faults, such as fuel leakage, air leakage in different positions in the SOFC system, by just using simple training data sets consisting only single fault and not demanding simultaneous faults data. The experimental result shows the proposed framework can diagnose the simultaneous SOFC system faults with high accuracy requiring small number training data and low computational burden. In addition, Fault Inference Tree Analysis (FITA) is employed to identify the correlations among possible faults and their corresponding symptoms at the system component level.

  16. On the data-driven inference of modulatory networks in climate science: an application to West African rainfall

    Science.gov (United States)

    González, D. L., II; Angus, M. P.; Tetteh, I. K.; Bello, G. A.; Padmanabhan, K.; Pendse, S. V.; Srinivas, S.; Yu, J.; Semazzi, F.; Kumar, V.; Samatova, N. F.

    2015-01-01

    Decades of hypothesis-driven and/or first-principles research have been applied towards the discovery and explanation of the mechanisms that drive climate phenomena, such as western African Sahel summer rainfall~variability. Although connections between various climate factors have been theorized, not all of the key relationships are fully understood. We propose a data-driven approach to identify candidate players in this climate system, which can help explain underlying mechanisms and/or even suggest new relationships, to facilitate building a more comprehensive and predictive model of the modulatory relationships influencing a climate phenomenon of interest. We applied coupled heterogeneous association rule mining (CHARM), Lasso multivariate regression, and dynamic Bayesian networks to find relationships within a complex system, and explored means with which to obtain a consensus result from the application of such varied methodologies. Using this fusion of approaches, we identified relationships among climate factors that modulate Sahel rainfall. These relationships fall into two categories: well-known associations from prior climate knowledge, such as the relationship with the El Niño-Southern Oscillation (ENSO) and putative links, such as North Atlantic Oscillation, that invite further research.

  17. Academic Performance: An Approach From Data Mining

    Directory of Open Access Journals (Sweden)

    David L. La Red Martinez

    2012-02-01

    Full Text Available The relatively low% of students promoted and regularized in Operating Systems Course of the LSI (Bachelor’s Degree in Information Systems of FaCENA (Faculty of Sciences and Natural Surveying - Facultad de Ciencias Exactas, Naturales y Agrimensura of UNNE (academic success, prompted this work, whose objective is to determine the variables that affect the academic performance, whereas the final status of the student according to the Res. 185/03 CD (scheme for evaluation and promotion: promoted, regular or free1. The variables considered are: status of the student, educational level of parents, secondary education, socio-economic level, and others. Data warehouse (Data Warehouses: DW and data mining (Data Mining: DM techniques were used to search pro.les of students and determine success or failure academic potential situations. Classifications through techniques of clustering according to different criteria have become. Some criteria were the following: mining of classification according to academic program, according to final status of the student, according to importance given to the study, mining of demographic clustering and Kohonen clustering according to final status of the student. Were conducted statistics of partition, detail of partitions, details of clusters, detail of fields and frequency of fields, overall quality of each process and quality detailed (precision, classification, reliability, arrays of confusion, diagrams of gain / elevation, trees, distribution of nodes, of importance of fields, correspondence tables of fields and statistics of cluster. Once certain profiles of students with low academic performance, it may address actions aimed at avoiding potential academic failures. This work aims to provide a brief description of aspects related to the data warehouse built and some processes of data mining developed on the same.

  18. Fault Detection for Nonlinear Process With Deterministic Disturbances: A Just-In-Time Learning Based Data Driven Method.

    Science.gov (United States)

    Yin, Shen; Gao, Huijun; Qiu, Jianbin; Kaynak, Okyay

    2017-11-01

    Data-driven fault detection plays an important role in industrial systems due to its applicability in case of unknown physical models. In fault detection, disturbances must be taken into account as an inherent characteristic of processes. Nevertheless, fault detection for nonlinear processes with deterministic disturbances still receive little attention, especially in data-driven field. To solve this problem, a just-in-time learning-based data-driven (JITL-DD) fault detection method for nonlinear processes with deterministic disturbances is proposed in this paper. JITL-DD employs JITL scheme for process description with local model structures to cope with processes dynamics and nonlinearity. The proposed method provides a data-driven fault detection solution for nonlinear processes with deterministic disturbances, and owns inherent online adaptation and high accuracy of fault detection. Two nonlinear systems, i.e., a numerical example and a sewage treatment process benchmark, are employed to show the effectiveness of the proposed method.

  19. Reformulated Neural Network (ReNN): a New Alternative for Data-driven Modelling in Hydrology and Water Resources Engineering

    Science.gov (United States)

    Razavi, S.; Tolson, B.; Burn, D.; Seglenieks, F.

    2012-04-01

    Reformulated Neural Network (ReNN) has been recently developed as an efficient and more effective alternative to feedforward multi-layer perceptron (MLP) neural networks [Razavi, S., and Tolson, B. A. (2011). "A new formulation for feedforward neural networks." IEEE Transactions on Neural Networks, 22(10), 1588-1598, DOI: 1510.1109/TNN.2011.2163169]. This presentation initially aims to introduce the ReNN to the water resources community and then demonstrates ReNN applications to water resources related problems. ReNN is essentially equivalent to a single-hidden-layer MLP neural network but defined on a new set of network variables which is more effective than the traditional set of network weights and biases. The main features of the new network variables are that they are geometrically interpretable and each variable has a distinct role in forming the network response. ReNN is more efficiently trained as it has a less complex error response surface. In addition to the ReNN training efficiency, the interpretability of the ReNN variables enables the users to monitor and understand the internal behaviour of the network while training. Regularization in the ReNN response can be also directly measured and controlled. This feature improves the generalization ability of the network. The appeal of the ReNN is demonstrated with two ReNN applications to water resources engineering problems. In the first application, the ReNN is used to model the rainfall-runoff relationships in multiple watersheds in the Great Lakes basin located in northeastern North America. Modelling inflows to the Great Lakes are of great importance to the management of the Great Lakes system. Due to the lack of some detailed physical data about existing control structures in many subwatersheds of this huge basin, the data-driven approach to modelling such as the ReNN are required to replace predictions from a physically-based rainfall runoff model. Unlike traditional MLPs, the ReNN does not necessarily

  20. A comparison of Data Driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception

    Science.gov (United States)

    Sboev, A.; Moloshnikov, I.; Gudovskikh, D.; Rybka, R.

    2017-12-01

    In this work we compare several data-driven approaches to the task of author’s gender identification for texts with or without gender imitation. The data corpus has been specially gathered with crowdsourcing for this task. The best models are convolutional neural network with input of morphological data (fl-measure: 88%±3) for texts without imitation, and gradient boosting model with vector of character n-grams frequencies as input data (f1-measure: 64% ± 3) for texts with gender imitation. The method to filter the crowdsourced corpus using limited reference sample of texts to increase the accuracy of result is discussed.

  1. Data-driven soft sensor design with multiple-rate sampled data

    DEFF Research Database (Denmark)

    Lin, Bao; Recke, Bodil; Knudsen, Jørgen K.H.

    2007-01-01

    Multi-rate systems are common in industrial processes where quality measurements have slower sampling rate than other process variables. Since inter-sample information is desirable for effective quality control, different approaches have been reported to estimate the quality between samples......, including numerical interpolation, polynomial transformation, data lifting and weighted partial least squares (WPLS). Two modifications to the original data lifting approach are proposed in this paper: reformulating the extraction of a fast model as an optimization problem and ensuring the desired model...... properties through Tikhonov Regularization. A comparative investigation of the four approaches is performed in this paper. Their applicability, accuracy and robustness to process noise are evaluated on a single-input single output (SISO) system. The regularized data lifting and WPLS approaches...

  2. Nursing Theory, Terminology, and Big Data: Data-Driven Discovery of Novel Patterns in Archival Randomized Clinical Trial Data.

    Science.gov (United States)

    Monsen, Karen A; Kelechi, Teresa J; McRae, Marion E; Mathiason, Michelle A; Martin, Karen S

    The growth and diversification of nursing theory, nursing terminology, and nursing data enable a convergence of theory- and data-driven discovery in the era of big data research. Existing datasets can be viewed through theoretical and terminology perspectives using visualization techniques in order to reveal new patterns and generate hypotheses. The Omaha System is a standardized terminology and metamodel that makes explicit the theoretical perspective of the nursing discipline and enables terminology-theory testing research. The purpose of this paper is to illustrate the approach by exploring a large research dataset consisting of 95 variables (demographics, temperature measures, anthropometrics, and standardized instruments measuring quality of life and self-efficacy) from a theory-based perspective using the Omaha System. Aims were to (a) examine the Omaha System dataset to understand the sample at baseline relative to Omaha System problem terms and outcome measures, (b) examine relationships within the normalized Omaha System dataset at baseline in predicting adherence, and (c) examine relationships within the normalized Omaha System dataset at baseline in predicting incident venous ulcer. Variables from a randomized clinical trial of a cryotherapy intervention for the prevention of venous ulcers were mapped onto Omaha System terms and measures to derive a theoretical framework for the terminology-theory testing study. The original dataset was recoded using the mapping to create an Omaha System dataset, which was then examined using visualization to generate hypotheses. The hypotheses were tested using standard inferential statistics. Logistic regression was used to predict adherence and incident venous ulcer. Findings revealed novel patterns in the psychosocial characteristics of the sample that were discovered to be drivers of both adherence (Mental health Behavior: OR = 1.28, 95% CI [1.02, 1.60]; AUC = .56) and incident venous ulcer (Mental health Behavior

  3. Improved multi-stage neonatal seizure detection using a heuristic classifier and a data-driven post-processor.

    Science.gov (United States)

    Ansari, A H; Cherian, P J; Dereymaeker, A; Matic, V; Jansen, K; De Wispelaere, L; Dielman, C; Vervisch, J; Swarte, R M; Govaert, P; Naulaers, G; De Vos, M; Van Huffel, S

    2016-09-01

    After identifying the most seizure-relevant characteristics by a previously developed heuristic classifier, a data-driven post-processor using a novel set of features is applied to improve the performance. The main characteristics of the outputs of the heuristic algorithm are extracted by five sets of features including synchronization, evolution, retention, segment, and signal features. Then, a support vector machine and a decision making layer remove the falsely detected segments. Four datasets including 71 neonates (1023h, 3493 seizures) recorded in two different university hospitals, are used to train and test the algorithm without removing the dubious seizures. The heuristic method resulted in a false alarm rate of 3.81 per hour and good detection rate of 88% on the entire test databases. The post-processor, effectively reduces the false alarm rate by 34% while the good detection rate decreases by 2%. This post-processing technique improves the performance of the heuristic algorithm. The structure of this post-processor is generic, improves our understanding of the core visually determined EEG features of neonatal seizures and is applicable for other neonatal seizure detectors. The post-processor significantly decreases the false alarm rate at the expense of a small reduction of the good detection rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  4. Data-driven techniques to estimate parameters in a rate-dependent ferromagnetic hysteresis model

    International Nuclear Information System (INIS)

    Hu Zhengzheng; Smith, Ralph C.; Ernstberger, Jon M.

    2012-01-01

    The quantification of rate-dependent ferromagnetic hysteresis is important in a range of applications including high speed milling using Terfenol-D actuators. There exist a variety of frameworks for characterizing rate-dependent hysteresis including the magnetic model in Ref. , the homogenized energy framework, Preisach formulations that accommodate after-effects, and Prandtl-Ishlinskii models. A critical issue when using any of these models to characterize physical devices concerns the efficient estimation of model parameters through least squares data fits. A crux of this issue is the determination of initial parameter estimates based on easily measured attributes of the data. In this paper, we present data-driven techniques to efficiently and robustly estimate parameters in the homogenized energy model. This framework was chosen due to its physical basis and its applicability to ferroelectric, ferromagnetic and ferroelastic materials.

  5. Sensor fault analysis using decision theory and data-driven modeling of pressurized water reactor subsystems

    International Nuclear Information System (INIS)

    Upadhyaya, B.R.; Skorska, M.

    1984-01-01

    Instrument fault detection and estimation is important for process surveillance, control, and safety functions of a power plant. The method incorporates the dual-hypotheses decision procedure and system characterization using data-driven time-domain models of signals representing the system. The multivariate models can be developed on-line and can be adapted to changing system conditions. For the method to be effective, specific subsystems of pressurized water reactors were considered, and signal selection was made such that a strong causal relationship exists among the measured variables. The technique is applied to the reactor core subsystem of the loss-of-fluid test reactor using in-core neutron detector and core-exit thermocouple signals. Thermocouple anomalies such as bias error, noise error, and slow drift in the sensor are detected and estimated using appropriate measurement models

  6. Data-driven process decomposition and robust online distributed modelling for large-scale processes

    Science.gov (United States)

    Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou

    2018-02-01

    With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.

  7. Beyond Crowd Judgments: Data-driven Estimation of Market Value in Association Football

    DEFF Research Database (Denmark)

    Müller, Oliver; Simons, Alexander; Weinmann, Markus

    2017-01-01

    concern. Market values can be understood as estimates of transfer fees—that is, prices that could be paid for a player on the football market—so they play an important role in transfer negotiations. These values have traditionally been estimated by football experts, but crowdsourcing has emerged......Association football is a popular sport, but it is also a big business. From a managerial perspective, the most important decisions that team managers make concern player transfers, so issues related to player valuation, especially the determination of transfer fees and market values, are of major......’ market values using multilevel regression analysis. The regression results suggest that data-driven estimates of market value can overcome several of the crowd's practical limitations while producing comparably accurate numbers. Our results have important implications for football managers and scouts...

  8. Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics

    Directory of Open Access Journals (Sweden)

    Richard Mark Leggett

    2013-12-01

    Full Text Available The processes of quality assessment and control are an active area of research at The Genome Analysis Centre (TGAC. Unlike other sequencing centres that often concentrate on a certain species or technology, TGAC applies expertise in genomics and bioinformatics to a wide range of projects, often requiring bespoke wet lab and in silico workflows. TGAC is fortunate to have access to a diverse range of sequencing and analysis platforms, and we are at the forefront of investigations into library quality and sequence data assessment. We have developed and implemented a number of algorithms, tools, pipelines and packages to ascertain, store, and expose quality metrics across a number of next-generation sequencing platforms, allowing rapid and in-depth cross-platform QC bioinformatics. In this review, we describe these tools as a vehicle for data-driven informatics, offering the potential to provide richer context for downstream analysis and to inform experimental design.

  9. USACM Thematic Workshop On Uncertainty Quantification And Data-Driven Modeling.

    Energy Technology Data Exchange (ETDEWEB)

    Stewart, James R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-05-01

    The USACM Thematic Workshop on Uncertainty Quantification and Data-Driven Modeling was held on March 23-24, 2017, in Austin, TX. The organizers of the technical program were James R. Stewart of Sandia National Laboratories and Krishna Garikipati of University of Michigan. The administrative organizer was Ruth Hengst, who serves as Program Coordinator for the USACM. The organization of this workshop was coordinated through the USACM Technical Thrust Area on Uncertainty Quantification and Probabilistic Analysis. The workshop website (http://uqpm2017.usacm.org) includes the presentation agenda as well as links to several of the presentation slides (permission to access the presentations was granted by each of those speakers, respectively). Herein, this final report contains the complete workshop program that includes the presentation agenda, the presentation abstracts, and the list of posters.

  10. Modeling and Predicting Carbon and Water Fluxes Using Data-Driven Techniques in a Forest Ecosystem

    Directory of Open Access Journals (Sweden)

    Xianming Dou

    2017-12-01

    Full Text Available Accurate estimation of carbon and water fluxes of forest ecosystems is of particular importance for addressing the problems originating from global environmental change, and providing helpful information about carbon and water content for analyzing and diagnosing past and future climate change. The main focus of the current work was to investigate the feasibility of four comparatively new methods, including generalized regression neural network, group method of data handling (GMDH, extreme learning machine and adaptive neuro-fuzzy inference system (ANFIS, for elucidating the carbon and water fluxes in a forest ecosystem. A comparison was made between these models and two widely used data-driven models, artificial neural network (ANN and support vector machine (SVM. All the models were evaluated based on the following statistical indices: coefficient of determination, Nash-Sutcliffe efficiency, root mean square error and mean absolute error. Results indicated that the data-driven models are capable of accounting for most variance in each flux with the limited meteorological variables. The ANN model provided the best estimates for gross primary productivity (GPP and net ecosystem exchange (NEE, while the ANFIS model achieved the best for ecosystem respiration (R, indicating that no single model was consistently superior to others for the carbon flux prediction. In addition, the GMDH model consistently produced somewhat worse results for all the carbon flux and evapotranspiration (ET estimations. On the whole, among the carbon and water fluxes, all the models produced similar highly satisfactory accuracy for GPP, R and ET fluxes, and did a reasonable job of reproducing the eddy covariance NEE. Based on these findings, it was concluded that these advanced models are promising alternatives to ANN and SVM for estimating the terrestrial carbon and water fluxes.

  11. Using Data-Driven and Process Mining Techniques for Identifying and Characterizing Problem Gamblers in New Zealand

    Directory of Open Access Journals (Sweden)

    Suriadi Suriadi

    2016-12-01

    Full Text Available This article uses data-driven techniques combined with established theory in order to analyse gambling behavioural patterns of 91 thousand individuals on a real-world fixed-odds gambling dataset in New Zealand. This research uniquely integrates a mixture of process mining, data mining and confirmatory statistical techniques in order to categorise different sub-groups of gamblers, with the explicit motivation of identifying problem gambling behaviours and reporting on the challenges and lessons learned from our case study.We demonstrate how techniques from various disciplines can be combined in order to gain insight into the behavioural patterns exhibited by different types of gamblers, as well as provide assurances of the correctness of our approach and findings. A highlight of this case study is both the methodology which demonstrates how such a combination of techniques provides a rich set of effective tools to undertake an exploratory and open-ended data analysis project that is guided by the process cube concept, as well as the findings themselves which indicate that the contribution that problem gamblers make to the total volume, expenditure, and revenue is higher than previous studies have maintained.

  12. Extended dynamic mode decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the Koopman operator.

    Science.gov (United States)

    Li, Qianxiao; Dietrich, Felix; Bollt, Erik M; Kevrekidis, Ioannis G

    2017-10-01

    Numerical approximation methods for the Koopman operator have advanced considerably in the last few years. In particular, data-driven approaches such as dynamic mode decomposition (DMD) 51 and its generalization, the extended-DMD (EDMD), are becoming increasingly popular in practical applications. The EDMD improves upon the classical DMD by the inclusion of a flexible choice of dictionary of observables which spans a finite dimensional subspace on which the Koopman operator can be approximated. This enhances the accuracy of the solution reconstruction and broadens the applicability of the Koopman formalism. Although the convergence of the EDMD has been established, applying the method in practice requires a careful choice of the observables to improve convergence with just a finite number of terms. This is especially difficult for high dimensional and highly nonlinear systems. In this paper, we employ ideas from machine learning to improve upon the EDMD method. We develop an iterative approximation algorithm which couples the EDMD with a trainable dictionary represented by an artificial neural network. Using the Duffing oscillator and the Kuramoto Sivashinsky partical differential equation as examples, we show that our algorithm can effectively and efficiently adapt the trainable dictionary to the problem at hand to achieve good reconstruction accuracy without the need to choose a fixed dictionary a priori. Furthermore, to obtain a given accuracy, we require fewer dictionary terms than EDMD with fixed dictionaries. This alleviates an important shortcoming of the EDMD algorithm and enhances the applicability of the Koopman framework to practical problems.

  13. PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry.

    Science.gov (United States)

    Nakata, Maho; Shimazaki, Tomomi

    2017-06-26

    Large-scale molecular databases play an essential role in the investigation of various subjects such as the development of organic materials, in silico drug design, and data-driven studies with machine learning. We have developed a large-scale quantum chemistry database based on first-principles methods. Our database currently contains the ground-state electronic structures of 3 million molecules based on density functional theory (DFT) at the B3LYP/6-31G* level, and we successively calculated 10 low-lying excited states of over 2 million molecules via time-dependent DFT with the B3LYP functional and the 6-31+G* basis set. To select the molecules calculated in our project, we referred to the PubChem Project, which was used as the source of the molecular structures in short strings using the InChI and SMILES representations. Accordingly, we have named our quantum chemistry database project "PubChemQC" ( http://pubchemqc.riken.jp/ ) and placed it in the public domain. In this paper, we show the fundamental features of the PubChemQC database and discuss the techniques used to construct the data set for large-scale quantum chemistry calculations. We also present a machine learning approach to predict the electronic structure of molecules as an example to demonstrate the suitability of the large-scale quantum chemistry database.

  14. Combining density functional theory calculations, supercomputing, and data-driven methods to design new materials (Conference Presentation)

    Science.gov (United States)

    Jain, Anubhav

    2017-04-01

    Density functional theory (DFT) simulations solve for the electronic structure of materials starting from the Schrödinger equation. Many case studies have now demonstrated that researchers can often use DFT to design new compounds in the computer (e.g., for batteries, catalysts, and hydrogen storage) before synthesis and characterization in the lab. In this talk, I will focus on how DFT calculations can be executed on large supercomputing resources in order to generate very large data sets on new materials for functional applications. First, I will briefly describe the Materials Project, an effort at LBNL that has virtually characterized over 60,000 materials using DFT and has shared the results with over 17,000 registered users. Next, I will talk about how such data can help discover new materials, describing how preliminary computational screening led to the identification and confirmation of a new family of bulk AMX2 thermoelectric compounds with measured zT reaching 0.8. I will outline future plans for how such data-driven methods can be used to better understand the factors that control thermoelectric behavior, e.g., for the rational design of electronic band structures, in ways that are different from conventional approaches.

  15. A Data-Driven Modeling Strategy for Smart Grid Power Quality Coupling Assessment Based on Time Series Pattern Matching

    Directory of Open Access Journals (Sweden)

    Hao Yu

    2018-01-01

    Full Text Available This study introduces a data-driven modeling strategy for smart grid power quality (PQ coupling assessment based on time series pattern matching to quantify the influence of single and integrated disturbance among nodes in different pollution patterns. Periodic and random PQ patterns are constructed by using multidimensional frequency-domain decomposition for all disturbances. A multidimensional piecewise linear representation based on local extreme points is proposed to extract the patterns features of single and integrated disturbance in consideration of disturbance variation trend and severity. A feature distance of pattern (FDP is developed to implement pattern matching on univariate PQ time series (UPQTS and multivariate PQ time series (MPQTS to quantify the influence of single and integrated disturbance among nodes in the pollution patterns. Case studies on a 14-bus distribution system are performed and analyzed; the accuracy and applicability of the FDP in the smart grid PQ coupling assessment are verified by comparing with other time series pattern matching methods.

  16. Performing Systematic Literature Reviews with Novices: An Iterative Approach

    Science.gov (United States)

    Lavallée, Mathieu; Robillard, Pierre-N.; Mirsalari, Reza

    2014-01-01

    Reviewers performing systematic literature reviews require understanding of the review process and of the knowledge domain. This paper presents an iterative approach for conducting systematic literature reviews that addresses the problems faced by reviewers who are novices in one or both levels of understanding. This approach is derived from…

  17. Abnormal Resting-State Functional Connectivity in Patients with Chronic Fatigue Syndrome: Results of Seed and Data-Driven Analyses.

    Science.gov (United States)

    Gay, Charles W; Robinson, Michael E; Lai, Song; O'Shea, Andrew; Craggs, Jason G; Price, Donald D; Staud, Roland

    2016-02-01

    Although altered resting-state functional connectivity (FC) is a characteristic of many chronic pain conditions, it has not yet been evaluated in patients with chronic fatigue. Our objective was to investigate the association between fatigue and altered resting-state FC in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). Thirty-six female subjects, 19 ME/CFS and 17 healthy controls, completed a fatigue inventory before undergoing functional magnetic resonance imaging. Two methods, (1) data driven and (2) model based, were used to estimate and compare the intraregional FC between both groups during the resting state (RS). The first approach using independent component analysis was applied to investigate five RS networks: the default mode network, salience network (SN), left frontoparietal networks (LFPN) and right frontoparietal networks, and the sensory motor network (SMN). The second approach used a priori selected seed regions demonstrating abnormal regional cerebral blood flow (rCBF) in ME/CFS patients at rest. In ME/CFS patients, Method-1 identified decreased intrinsic connectivity among regions within the LFPN. Furthermore, the FC of the left anterior midcingulate with the SMN and the connectivity of the left posterior cingulate cortex with the SN were significantly decreased. For Method-2, five distinct clusters within the right parahippocampus and occipital lobes, demonstrating significant rCBF reductions in ME/CFS patients, were used as seeds. The parahippocampal seed and three occipital lobe seeds showed altered FC with other brain regions. The degree of abnormal connectivity correlated with the level of self-reported fatigue. Our results confirm altered RS FC in patients with ME/CFS, which was significantly correlated with the severity of their chronic fatigue.

  18. A non-linear dimension reduction methodology for generating data-driven stochastic input models

    Science.gov (United States)

    Ganapathysubramanian, Baskar; Zabaras, Nicholas

    2008-06-01

    Stochastic analysis of random heterogeneous media (polycrystalline materials, porous media, functionally graded materials) provides information of significance only if realistic input models of the topology and property variations are used. This paper proposes a framework to construct such input stochastic models for the topology and thermal diffusivity variations in heterogeneous media using a data-driven strategy. Given a set of microstructure realizations (input samples) generated from given statistical information about the medium topology, the framework constructs a reduced-order stochastic representation of the thermal diffusivity. This problem of constructing a low-dimensional stochastic representation of property variations is analogous to the problem of manifold learning and parametric fitting of hyper-surfaces encountered in image processing and psychology. Denote by M the set of microstructures that satisfy the given experimental statistics. A non-linear dimension reduction strategy is utilized to map M to a low-dimensional region, A. We first show that M is a compact manifold embedded in a high-dimensional input space Rn. An isometric mapping F from M to a low-dimensional, compact, connected set A⊂Rd(d≪n) is constructed. Given only a finite set of samples of the data, the methodology uses arguments from graph theory and differential geometry to construct the isometric transformation F:M→A. Asymptotic convergence of the representation of M by A is shown. This mapping F serves as an accurate, low-dimensional, data-driven representation of the property variations. The reduced-order model of the material topology and thermal diffusivity variations is subsequently used as an input in the solution of stochastic partial differential equations that describe the evolution of dependant variables. A sparse grid collocation strategy (Smolyak algorithm) is utilized to solve these stochastic equations efficiently. We showcase the methodology by constructing low

  19. A non-linear dimension reduction methodology for generating data-driven stochastic input models

    International Nuclear Information System (INIS)

    Ganapathysubramanian, Baskar; Zabaras, Nicholas

    2008-01-01

    Stochastic analysis of random heterogeneous media (polycrystalline materials, porous media, functionally graded materials) provides information of significance only if realistic input models of the topology and property variations are used. This paper proposes a framework to construct such input stochastic models for the topology and thermal diffusivity variations in heterogeneous media using a data-driven strategy. Given a set of microstructure realizations (input samples) generated from given statistical information about the medium topology, the framework constructs a reduced-order stochastic representation of the thermal diffusivity. This problem of constructing a low-dimensional stochastic representation of property variations is analogous to the problem of manifold learning and parametric fitting of hyper-surfaces encountered in image processing and psychology. Denote by M the set of microstructures that satisfy the given experimental statistics. A non-linear dimension reduction strategy is utilized to map M to a low-dimensional region, A. We first show that M is a compact manifold embedded in a high-dimensional input space R n . An isometric mapping F from M to a low-dimensional, compact, connected set A is contained in R d (d<< n) is constructed. Given only a finite set of samples of the data, the methodology uses arguments from graph theory and differential geometry to construct the isometric transformation F:M→A. Asymptotic convergence of the representation of M by A is shown. This mapping F serves as an accurate, low-dimensional, data-driven representation of the property variations. The reduced-order model of the material topology and thermal diffusivity variations is subsequently used as an input in the solution of stochastic partial differential equations that describe the evolution of dependant variables. A sparse grid collocation strategy (Smolyak algorithm) is utilized to solve these stochastic equations efficiently. We showcase the methodology

  20. Mapping of Agricultural Crops from Single High-Resolution Multispectral Images—Data-Driven Smoothing vs. Parcel-Based Smoothing

    Directory of Open Access Journals (Sweden)

    Asli Ozdarici-Ok

    2015-05-01

    Full Text Available Mapping agricultural crops is an important application of remote sensing. However, in many cases it is based either on hyperspectral imagery or on multitemporal coverage, both of which are difficult to scale up to large-scale deployment at high spatial resolution. In the present paper, we evaluate the possibility of crop classification based on single images from very high-resolution (VHR satellite sensors. The main objective of this work is to expose performance difference between state-of-the-art parcel-based smoothing and purely data-driven conditional random field (CRF smoothing, which is yet unknown. To fulfill this objective, we perform extensive tests with four different classification methods (Support Vector Machines, Random Forest, Gaussian Mixtures, and Maximum Likelihood to compute the pixel-wise data term; and we also test two different definitions of the pairwise smoothness term. We have performed a detailed evaluation on different multispectral VHR images (Ikonos, QuickBird, Kompsat-2. The main finding of this study is that pairwise CRF smoothing comes close to the state-of-the-art parcel-based method that requires parcel boundaries (average difference ≈ 2.5%. Our results indicate that a single multispectral (R, G, B, NIR image is enough to reach satisfactory classification accuracy for six crop classes (corn, pasture, rice, sugar beet, wheat, and tomato in Mediterranean climate. Overall, it appears that crop mapping using only one-shot VHR imagery taken at the right time may be a viable alternative, especially since high-resolution multitemporal or hyperspectral coverage as well as parcel boundaries are in practice often not available.

  1. A data-driven and physics-based single-pass retrieval of active-passive microwave covariation and vegetation parameters for the SMAP mission

    Science.gov (United States)

    Entekhabi, D.; Jagdhuber, T.; Das, N. N.; Baur, M.; Link, M.; Piles, M.; Akbar, R.; Konings, A. G.; Mccoll, K. A.; Alemohammad, S. H.; Montzka, C.; Kunstmann, H.

    2016-12-01

    The active-passive soil moisture retrieval algorithm of NASA's SMAP mission depends on robust statistical estimation of active-passive covariation (β) and vegetation structure (Γ) parameters in order to provide reliable global measurements of soil moisture on an intermediate level (9km) compared to the native resolution of the radiometer (36km) and radar (3km) instruments. These parameters apply to the SMAP radiometer-radar combination over the period of record that was cut short with the end of the SMAP radar transmission. They also apply to the current SMAP radiometer and Sentinel 1A/B radar combination for high-resolution surface soil moisture mapping. However, the performance of the statistically-based approach is directly dependent on the selection of a representative time frame in which these parameters can be estimated assuming dynamic soil moisture and stationary soil roughness and vegetation cover. Here, we propose a novel, data-driven and physics-based single-pass retrieval of active-passive microwave covariation and vegetation parameters for the SMAP mission. The algorithm does not depend on time series analyses and can be applied using minimum one pair of an active-passive acquisition. The algorithm stems from the physical link between microwave emission and scattering via conservation of energy. The formulation of the emission radiative transfer is combined with the Distorted Born Approximation of radar scattering for vegetated land surfaces. The two formulations are simultaneously solved for the covariation and vegetation structure parameters. Preliminary results from SMAP active-passive observations (April 13th to July 7th 2015) compare well with the time-series statistical approach and confirms the capability of this method to estimate these parameters. Moreover, the method is not restricted to a given frequency (applies to both L-band and C-band combinations for the radar) or incidence angle (all angles and not just the fixed 40° incidence

  2. Linear Motion Systems. A Modular Approach for Improved Straightness Performance

    NARCIS (Netherlands)

    Nijsse, G.J.P.

    2001-01-01

    This thesis deals with straight motion systems. A modular approach has been applied in order to find ways to improve the performance. The main performance parameters that are considered are position accuracy, repeatability and, to a lesser extent, cost. Because of the increasing requirements to

  3. Data-driven classification of bipolar I disorder from longitudinal course of mood.

    Science.gov (United States)

    Cochran, A L; McInnis, M G; Forger, D B

    2016-10-11

    The Diagnostic and Statistical Manual of Mental Disorder (DSM) classification of bipolar disorder defines categories to reflect common understanding of mood symptoms rather than scientific evidence. This work aimed to determine whether bipolar I can be objectively classified from longitudinal mood data and whether resulting classes have clinical associations. Bayesian nonparametric hierarchical models with latent classes and patient-specific models of mood are fit to data from Longitudinal Interval Follow-up Evaluations (LIFE) of bipolar I patients (N=209). Classes are tested for clinical associations. No classes are justified using the time course of DSM-IV mood states. Three classes are justified using the course of subsyndromal mood symptoms. Classes differed in attempted suicides (P=0.017), disability status (P=0.012) and chronicity of affective symptoms (P=0.009). Thus, bipolar I disorder can be objectively classified from mood course, and individuals in the resulting classes share clinical features. Data-driven classification from mood course could be used to enrich sample populations for pharmacological and etiological studies.

  4. Data-driven automatic parking constrained control for four-wheeled mobile vehicles

    Directory of Open Access Journals (Sweden)

    Wenxu Yan

    2016-11-01

    Full Text Available In this article, a novel data-driven constrained control scheme is proposed for automatic parking systems. The design of the proposed scheme only depends on the steering angle and the orientation angle of the car, and it does not involve any model information of the car. Therefore, the proposed scheme-based automatic parking system is applicable to different kinds of cars. In order to further reduce the desired trajectory coordinate tracking errors, a coordinates compensation algorithm is also proposed. In the design procedure of the controller, a novel dynamic anti-windup compensator is used to deal with the change magnitude and rate saturations of automatic parking control input. It is theoretically proven that all the signals in the closed-loop system are uniformly ultimately bounded based on Lyapunov stability analysis method. Finally, a simulation comparison among the proposed scheme with coordinates compensation and Proportion Integration Differentiation (PID control algorithm is given. It is shown that the proposed scheme with coordinates compensation has smaller tracking errors and more rapid responses than PID scheme.

  5. Forecasting success via early adoptions analysis: A data-driven study.

    Directory of Open Access Journals (Sweden)

    Giulio Rossetti

    Full Text Available Innovations are continuously launched over markets, such as new products over the retail market or new artists over the music scene. Some innovations become a success; others don't. Forecasting which innovations will succeed at the beginning of their lifecycle is hard. In this paper, we provide a data-driven, large-scale account of the existence of a special niche among early adopters, individuals that consistently tend to adopt successful innovations before they reach success: we will call them Hit-Savvy. Hit-Savvy can be discovered in very different markets and retain over time their ability to anticipate the success of innovations. As our second contribution, we devise a predictive analytical process, exploiting Hit-Savvy as signals, which achieves high accuracy in the early-stage prediction of successful innovations, far beyond the reach of state-of-the-art time series forecasting models. Indeed, our findings and predictive model can be fruitfully used to support marketing strategies and product placement.

  6. Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study.

    Science.gov (United States)

    Kim, Eun-Kyung; Kim, Hyo-Eun; Han, Kyunghwa; Kang, Bong Joo; Sohn, Yu-Mee; Woo, Ok Hee; Lee, Chan Wha

    2018-02-09

    We assessed the feasibility of a data-driven imaging biomarker based on weakly supervised learning (DIB; an imaging biomarker derived from large-scale medical image data with deep learning technology) in mammography (DIB-MG). A total of 29,107 digital mammograms from five institutions (4,339 cancer cases and 24,768 normal cases) were included. After matching patients' age, breast density, and equipment, 1,238 and 1,238 cases were chosen as validation and test sets, respectively, and the remainder were used for training. The core algorithm of DIB-MG is a deep convolutional neural network; a deep learning algorithm specialized for images. Each sample (case) is an exam composed of 4-view images (RCC, RMLO, LCC, and LMLO). For each case in a training set, the cancer probability inferred from DIB-MG is compared with the per-case ground-truth label. Then the model parameters in DIB-MG are updated based on the error between the prediction and the ground-truth. At the operating point (threshold) of 0.5, sensitivity was 75.6% and 76.1% when specificity was 90.2% and 88.5%, and AUC was 0.903 and 0.906 for the validation and test sets, respectively. This research showed the potential of DIB-MG as a screening tool for breast cancer.

  7. Data-free and data-driven spectral perturbations for RANS UQ

    Science.gov (United States)

    Edeling, Wouter; Mishra, Aashwin; Iaccarino, Gianluca

    2017-11-01

    Despite recent developments in high-fidelity turbulent flow simulations, RANS modeling is still vastly used by industry, due to its inherent low cost. Since accuracy is a concern in RANS modeling, model-form UQ is an essential tool for assessing the impacts of this uncertainty on quantities of interest. Applying the spectral decomposition to the modeled Reynolds-Stress Tensor (RST) allows for the introduction of decoupled perturbations into the baseline intensity (kinetic energy), shape (eigenvalues), and orientation (eigenvectors). This constitutes a natural methodology to evaluate the model form uncertainty associated to different aspects of RST modeling. In a predictive setting, one frequently encounters an absence of any relevant reference data. To make data-free predictions with quantified uncertainty we employ physical bounds to a-priori define maximum spectral perturbations. When propagated, these perturbations yield intervals of engineering utility. High-fidelity data opens up the possibility of inferring a distribution of uncertainty, by means of various data-driven machine-learning techniques. We will demonstrate our framework on a number of flow problems where RANS models are prone to failure. This research was partially supported by the Defense Advanced Research Projects Agency under the Enabling Quantification of Uncertainty in Physical Systems (EQUiPS) project (technical monitor: Dr Fariba Fahroo), and the DOE PSAAP-II program.

  8. Data-driven modelling of structured populations a practical guide to the integral projection model

    CERN Document Server

    Ellner, Stephen P; Rees, Mark

    2016-01-01

    This book is a “How To” guide for modeling population dynamics using Integral Projection Models (IPM) starting from observational data. It is written by a leading research team in this area and includes code in the R language (in the text and online) to carry out all computations. The intended audience are ecologists, evolutionary biologists, and mathematical biologists interested in developing data-driven models for animal and plant populations. IPMs may seem hard as they involve integrals. The aim of this book is to demystify IPMs, so they become the model of choice for populations structured by size or other continuously varying traits. The book uses real examples of increasing complexity to show how the life-cycle of the study organism naturally leads to the appropriate statistical analysis, which leads directly to the IPM itself. A wide range of model types and analyses are presented, including model construction, computational methods, and the underlying theory, with the more technical material in B...

  9. Lessons learned from a data-driven college access program: The National College Advising Corps.

    Science.gov (United States)

    Horng, Eileen L; Evans, Brent J; Antonio, Anthony L; Foster, Jesse D; Kalamkarian, Hoori S; Hurd, Nicole F; Bettinger, Eric P

    2013-01-01

    This chapter discusses the collaboration between a national college access program, the National College Advising Corps (NCAC), and its research and evaluation team at Stanford University. NCAC is currently active in almost four hundred high schools and through the placement of a recent college graduate to serve as a college adviser provides necessary information and support for students who may find it difficult to navigate the complex college admission process. The advisers also conduct outreach to underclassmen in an effort to improve the school-wide college-going culture. Analyses include examination of both quantitative and qualitative data from numerous sources and partners with every level of the organization from the national office to individual high schools. The authors discuss balancing the pursuit of evaluation goals with academic scholarship. In an effort to benefit other programs seeking to form successful data-driven interventions, the authors provide explicit examples of the partnership and present several examples of how the program has benefited from the data gathered by the evaluation team. © WILEY PERIODICALS, INC.

  10. Forecasting success via early adoptions analysis: A data-driven study.

    Science.gov (United States)

    Rossetti, Giulio; Milli, Letizia; Giannotti, Fosca; Pedreschi, Dino

    2017-01-01

    Innovations are continuously launched over markets, such as new products over the retail market or new artists over the music scene. Some innovations become a success; others don't. Forecasting which innovations will succeed at the beginning of their lifecycle is hard. In this paper, we provide a data-driven, large-scale account of the existence of a special niche among early adopters, individuals that consistently tend to adopt successful innovations before they reach success: we will call them Hit-Savvy. Hit-Savvy can be discovered in very different markets and retain over time their ability to anticipate the success of innovations. As our second contribution, we devise a predictive analytical process, exploiting Hit-Savvy as signals, which achieves high accuracy in the early-stage prediction of successful innovations, far beyond the reach of state-of-the-art time series forecasting models. Indeed, our findings and predictive model can be fruitfully used to support marketing strategies and product placement.

  11. Data-Driven Astrochemistry: One Step Further within the Origin of Life Puzzle.

    Science.gov (United States)

    Ruf, Alexander; d'Hendecourt, Louis L S; Schmitt-Kopplin, Philippe

    2018-06-01

    Astrochemistry, meteoritics and chemical analytics represent a manifold scientific field, including various disciplines. In this review, clarifications on astrochemistry, comet chemistry, laboratory astrophysics and meteoritic research with respect to organic and metalorganic chemistry will be given. The seemingly large number of observed astrochemical molecules necessarily requires explanations on molecular complexity and chemical evolution, which will be discussed. Special emphasis should be placed on data-driven analytical methods including ultrahigh-resolving instruments and their interplay with quantum chemical computations. These methods enable remarkable insights into the complex chemical spaces that exist in meteorites and maximize the level of information on the huge astrochemical molecular diversity. In addition, they allow one to study even yet undescribed chemistry as the one involving organomagnesium compounds in meteorites. Both targeted and non-targeted analytical strategies will be explained and may touch upon epistemological problems. In addition, implications of (metal)organic matter toward prebiotic chemistry leading to the emergence of life will be discussed. The precise description of astrochemical organic and metalorganic matter as seeds for life and their interactions within various astrophysical environments may appear essential to further study questions regarding the emergence of life on a most fundamental level that is within the molecular world and its self-organization properties.

  12. Big Data-Driven Based Real-Time Traffic Flow State Identification and Prediction

    Directory of Open Access Journals (Sweden)

    Hua-pu Lu

    2015-01-01

    Full Text Available With the rapid development of urban informatization, the era of big data is coming. To satisfy the demand of traffic congestion early warning, this paper studies the method of real-time traffic flow state identification and prediction based on big data-driven theory. Traffic big data holds several characteristics, such as temporal correlation, spatial correlation, historical correlation, and multistate. Traffic flow state quantification, the basis of traffic flow state identification, is achieved by a SAGA-FCM (simulated annealing genetic algorithm based fuzzy c-means based traffic clustering model. Considering simple calculation and predictive accuracy, a bilevel optimization model for regional traffic flow correlation analysis is established to predict traffic flow parameters based on temporal-spatial-historical correlation. A two-stage model for correction coefficients optimization is put forward to simplify the bilevel optimization model. The first stage model is built to calculate the number of temporal-spatial-historical correlation variables. The second stage model is present to calculate basic model formulation of regional traffic flow correlation. A case study based on a real-world road network in Beijing, China, is implemented to test the efficiency and applicability of the proposed modeling and computing methods.

  13. Data-driven model-independent searches for long-lived particles at the LHC

    Science.gov (United States)

    Coccaro, Andrea; Curtin, David; Lubatti, H. J.; Russell, Heather; Shelton, Jessie

    2016-12-01

    Neutral long-lived particles (LLPs) are highly motivated by many beyond the Standard Model scenarios, such as theories of supersymmetry, baryogenesis, and neutral naturalness, and present both tremendous discovery opportunities and experimental challenges for the LHC. A major bottleneck for current LLP searches is the prediction of Standard Model backgrounds, which are often impossible to simulate accurately. In this paper, we propose a general strategy for obtaining differential, data-driven background estimates in LLP searches, thereby notably extending the range of LLP masses and lifetimes that can be discovered at the LHC. We focus on LLPs decaying in the ATLAS muon system, where triggers providing both signal and control samples are available at LHC run 2. While many existing searches require two displaced decays, a detailed knowledge of backgrounds will allow for very inclusive searches that require just one detected LLP decay. As we demonstrate for the h →X X signal model of LLP pair production in exotic Higgs decays, this results in dramatic sensitivity improvements for proper lifetimes ≳10 m . In theories of neutral naturalness, this extends reach to glueball masses far below the b ¯b threshold. Our strategy readily generalizes to other signal models and other detector subsystems. This framework therefore lends itself to the development of a systematic, model-independent LLP search program, in analogy to the highly successful simplified-model framework of prompt searches.

  14. Clinical review: optimizing enteral nutrition for critically ill patients - a simple data-driven formula

    Science.gov (United States)

    2011-01-01

    In modern critical care, the paradigm of 'therapeutic nutrition' is replacing traditional 'supportive nutrition'. Standard enteral formulas meet basic macro- and micronutrient needs; therapeutic enteral formulas meet these basic needs and also contain specific pharmaconutrients that may attenuate hyperinflammatory responses, enhance the immune responses to infection, or improve gastrointestinal tolerance. Choosing the right enteral feeding formula may positively affect a patient's outcome; targeted use of therapeutic formulas can reduce the incidence of infectious complications, shorten lengths of stay in the ICU and in the hospital, and lower risk for mortality. In this paper, we review principles of how to feed (enteral, parenteral, or both) and when to feed (early versus delayed start) patients who are critically ill. We discuss what to feed these patients in the context of specific pharmaconutrients in specialized feeding formulations, that is, arginine, glutamine, antioxidants, certain ω-3 and ω-6 fatty acids, hydrolyzed proteins, and medium-chain triglycerides. We summarize current expert guidelines for nutrition in patients with critical illness, and we present specific clinical evidence on the use of enteral formulas supplemented with anti-inflammatory or immune-modulating nutrients, and gastrointestinal tolerance-promoting nutritional formulas. Finally, we introduce an algorithm to help bedside clinicians make data-driven feeding decisions for patients with critical illness. PMID:22136305

  15. Design of a data-driven predictive controller for start-up process of AMT vehicles.

    Science.gov (United States)

    Lu, Xiaohui; Chen, Hong; Wang, Ping; Gao, Bingzhao

    2011-12-01

    In this paper, a data-driven predictive controller is designed for the start-up process of vehicles with automated manual transmissions (AMTs). It is obtained directly from the input-output data of a driveline simulation model constructed by the commercial software AMESim. In order to obtain offset-free control for the reference input, the predictor equation is gained with incremental inputs and outputs. Because of the physical characteristics, the input and output constraints are considered explicitly in the problem formulation. The contradictory requirements of less friction losses and less driveline shock are included in the objective function. The designed controller is tested under nominal conditions and changed conditions. The simulation results show that, during the start-up process, the AMT clutch with the proposed controller works very well, and the process meets the control objectives: fast clutch lockup time, small friction losses, and the preservation of driver comfort, i.e., smooth acceleration of the vehicle. At the same time, the closed-loop system has the ability to reject uncertainties, such as the vehicle mass and road grade.

  16. Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds

    Science.gov (United States)

    Li, Rui; Chen, Lei; Li, Wen-Syan

    Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.

  17. An asynchronous data-driven readout prototype for CEPC vertex detector

    Science.gov (United States)

    Yang, Ping; Sun, Xiangming; Huang, Guangming; Xiao, Le; Gao, Chaosong; Huang, Xing; Zhou, Wei; Ren, Weiping; Li, Yashu; Liu, Jianchao; You, Bihui; Zhang, Li

    2017-12-01

    The Circular Electron Positron Collider (CEPC) is proposed as a Higgs boson and/or Z boson factory for high-precision measurements on the Higgs boson. The precision of secondary vertex impact parameter plays an important role in such measurements which typically rely on flavor-tagging. Thus silicon CMOS Pixel Sensors (CPS) are the most promising technology candidate for a CEPC vertex detector, which can most likely feature a high position resolution, a low power consumption and a fast readout simultaneously. For the R&D of the CEPC vertex detector, we have developed a prototype MIC4 in the Towerjazz 180 nm CMOS Image Sensor (CIS) process. We have proposed and implemented a new architecture of asynchronous zero-suppression data-driven readout inside the matrix combined with a binary front-end inside the pixel. The matrix contains 128 rows and 64 columns with a small pixel pitch of 25 μm. The readout architecture has implemented the traditional OR-gate chain inside a super pixel combined with a priority arbiter tree between the super pixels, only reading out relevant pixels. The MIC4 architecture will be introduced in more detail in this paper. It will be taped out in May and will be characterized when the chip comes back.

  18. Outcomes from the GLEON fellowship program. Training graduate students in data driven network science.

    Science.gov (United States)

    Dugan, H.; Hanson, P. C.; Weathers, K. C.

    2016-12-01

    In the water sciences there is a massive need for graduate students who possess the analytical and technical skills to deal with large datasets and function in the new paradigm of open, collaborative -science. The Global Lake Ecological Observatory Network (GLEON) graduate fellowship program (GFP) was developed as an interdisciplinary training program to supplement the intensive disciplinary training of traditional graduate education. The primary goal of the GFP was to train a diverse cohort of graduate students in network science, open-web technologies, collaboration, and data analytics, and importantly to provide the opportunity to use these skills to conduct collaborative research resulting in publishable scientific products. The GFP is run as a series of three week-long workshops over two years that brings together a cohort of twelve students. In addition, fellows are expected to attend and contribute to at least one international GLEON all-hands' meeting. Here, we provide examples of training modules in the GFP (model building, data QA/QC, information management, bayesian modeling, open coding/version control, national data programs), as well as scientific outputs (manuscripts, software products, and new global datasets) produced by the fellows, as well as the process by which this team science was catalyzed. Data driven education that lets students apply learned skills to real research projects reinforces concepts, provides motivation, and can benefit their publication record. This program design is extendable to other institutions and networks.

  19. Automatic data-driven real-time segmentation and recognition of surgical workflow.

    Science.gov (United States)

    Dergachyova, Olga; Bouget, David; Huaulmé, Arnaud; Morandi, Xavier; Jannin, Pierre

    2016-06-01

    With the intention of extending the perception and action of surgical staff inside the operating room, the medical community has expressed a growing interest towards context-aware systems. Requiring an accurate identification of the surgical workflow, such systems make use of data from a diverse set of available sensors. In this paper, we propose a fully data-driven and real-time method for segmentation and recognition of surgical phases using a combination of video data and instrument usage signals, exploiting no prior knowledge. We also introduce new validation metrics for assessment of workflow detection. The segmentation and recognition are based on a four-stage process. Firstly, during the learning time, a Surgical Process Model is automatically constructed from data annotations to guide the following process. Secondly, data samples are described using a combination of low-level visual cues and instrument information. Then, in the third stage, these descriptions are employed to train a set of AdaBoost classifiers capable of distinguishing one surgical phase from others. Finally, AdaBoost responses are used as input to a Hidden semi-Markov Model in order to obtain a final decision. On the MICCAI EndoVis challenge laparoscopic dataset we achieved a precision and a recall of 91 % in classification of 7 phases. Compared to the analysis based on one data type only, a combination of visual features and instrument signals allows better segmentation, reduction of the detection delay and discovery of the correct phase order.

  20. A transparent and data-driven global tectonic regionalization model for seismic hazard assessment

    Science.gov (United States)

    Chen, Yen-Shin; Weatherill, Graeme; Pagani, Marco; Cotton, Fabrice

    2018-05-01

    A key concept that is common to many assumptions inherent within seismic hazard assessment is that of tectonic similarity. This recognizes that certain regions of the globe may display similar geophysical characteristics, such as in the attenuation of seismic waves, the magnitude scaling properties of seismogenic sources or the seismic coupling of the lithosphere. Previous attempts at tectonic regionalization, particularly within a seismic hazard assessment context, have often been based on expert judgements; in most of these cases, the process for delineating tectonic regions is neither reproducible nor consistent from location to location. In this work, the regionalization process is implemented in a scheme that is reproducible, comprehensible from a geophysical rationale, and revisable when new relevant data are published. A spatial classification-scheme is developed based on fuzzy logic, enabling the quantification of concepts that are approximate rather than precise. Using the proposed methodology, we obtain a transparent and data-driven global tectonic regionalization model for seismic hazard applications as well as the subjective probabilities (e.g. degree of being active/degree of being cratonic) that indicate the degree to which a site belongs in a tectonic category.

  1. Modern data-driven decision support systems: the role of computing with words and computational linguistics

    Science.gov (United States)

    Kacprzyk, Janusz; Zadrożny, Sławomir

    2010-05-01

    We present how the conceptually and numerically simple concept of a fuzzy linguistic database summary can be a very powerful tool for gaining much insight into the very essence of data. The use of linguistic summaries provides tools for the verbalisation of data analysis (mining) results which, in addition to the more commonly used visualisation, e.g. via a graphical user interface, can contribute to an increased human consistency and ease of use, notably for supporting decision makers via the data-driven decision support system paradigm. Two new relevant aspects of the analysis are also outlined which were first initiated by the authors. First, following Kacprzyk and Zadrożny, it is further considered how linguistic data summarisation is closely related to some types of solutions used in natural language generation (NLG). This can make it possible to use more and more effective and efficient tools and techniques developed in NLG. Second, similar remarks are given on relations to systemic functional linguistics. Moreover, following Kacprzyk and Zadrożny, comments are given on an extremely relevant aspect of scalability of linguistic summarisation of data, using a new concept of a conceptual scalability.

  2. Data Driven - Android based displays on data acquisition and system status

    CERN Document Server

    Canilho, Paulo

    2014-01-01

    For years, both hardware and software engineers have struggled with the acquisition of device information in a flexible and fast perspective, numerous devices cannot have their status quickly tested due to time limitation associated with the travelling to a computer terminal. For instance, in order to test a scintillator status, one has to inject beam into the device and quickly return to a terminal to see the results, this is not only time demanding but extremely inconvenient for the person responsible, it consumes time that would be used in more pressing matters. In this train of thoughts, the proposal of creating an interface to bring a stable, flexible, user friendly and data driven solution to this problem was created. Being the most common operative system for mobile display, the Android API proved to have the best efficient in financing, since it is based on an open source software, and in implementation difficulty since it’s backend development resides in JAVA calls and XML for visual representation...

  3. Data-driven strategies for robust forecast of continuous glucose monitoring time-series.

    Science.gov (United States)

    Fiorini, Samuele; Martini, Chiara; Malpassi, Davide; Cordera, Renzo; Maggi, Davide; Verri, Alessandro; Barla, Annalisa

    2017-07-01

    Over the past decade, continuous glucose monitoring (CGM) has proven to be a very resourceful tool for diabetes management. To date, CGM devices are employed for both retrospective and online applications. Their use allows to better describe the patients' pathology as well as to achieve a better control of patients' level of glycemia. The analysis of CGM sensor data makes possible to observe a wide range of metrics, such as the glycemic variability during the day or the amount of time spent below or above certain glycemic thresholds. However, due to the high variability of the glycemic signals among sensors and individuals, CGM data analysis is a non-trivial task. Standard signal filtering solutions fall short when an appropriate model personalization is not applied. State-of-the-art data-driven strategies for online CGM forecasting rely upon the use of recursive filters. Each time a new sample is collected, such models need to adjust their parameters in order to predict the next glycemic level. In this paper we aim at demonstrating that the problem of online CGM forecasting can be successfully tackled by personalized machine learning models, that do not need to recursively update their parameters.

  4. Review of the Remaining Useful Life Prognostics of Vehicle Lithium-Ion Batteries Using Data-Driven Methodologies

    Directory of Open Access Journals (Sweden)

    Lifeng Wu

    2016-05-01

    Full Text Available Lithium-ion batteries are the primary power source in electric vehicles, and the prognosis of their remaining useful life is vital for ensuring the safety, stability, and long lifetime of electric vehicles. Accurately establishing a mechanism model of a vehicle lithium-ion battery involves a complex electrochemical process. Remaining useful life (RUL prognostics based on data-driven methods has become a focus of research. Current research on data-driven methodologies is summarized in this paper. By analyzing the problems of vehicle lithium-ion batteries in practical applications, the problems that need to be solved in the future are identified.

  5. Data-driven modeling of sleep EEG and EOG reveals characteristics indicative of pre-Parkinson's and Parkinson's disease.

    Science.gov (United States)

    Christensen, Julie A E; Zoetmulder, Marielle; Koch, Henriette; Frandsen, Rune; Arvastson, Lars; Christensen, Søren R; Jennum, Poul; Sorensen, Helge B D

    2014-09-30

    Manual scoring of sleep relies on identifying certain characteristics in polysomnograph (PSG) signals. However, these characteristics are disrupted in patients with neurodegenerative diseases. This study evaluates sleep using a topic modeling and unsupervised learning approach to identify sleep topics directly from electroencephalography (EEG) and electrooculography (EOG). PSG data from control subjects were used to develop an EOG and an EEG topic model. The models were applied to PSG data from 23 control subjects, 25 patients with periodic leg movements (PLMs), 31 patients with idiopathic REM sleep behavior disorder (iRBD) and 36 patients with Parkinson's disease (PD). The data were divided into training and validation datasets and features reflecting EEG and EOG characteristics based on topics were computed. The most discriminative feature subset for separating iRBD/PD and PLM/controls was estimated using a Lasso-regularized regression model. The features with highest discriminability were the number and stability of EEG topics linked to REM and N3, respectively. Validation of the model indicated a sensitivity of 91.4% and a specificity of 68.8% when classifying iRBD/PD patients. The topics showed visual accordance with the manually scored sleep stages, and the features revealed sleep characteristics containing information indicative of neurodegeneration. This study suggests that the amount of N3 and the ability to maintain NREM and REM sleep have potential as early PD biomarkers. Data-driven analysis of sleep may contribute to the evaluation of neurodegenerative patients. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. Data-Driven Robust RVFLNs Modeling of a Blast Furnace Iron-Making Process Using Cauchy Distribution Weighted M-Estimation

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Ping; Lv, Youbin; Wang, Hong; Chai, Tianyou

    2017-09-01

    Optimal operation of a practical blast furnace (BF) ironmaking process depends largely on a good measurement of molten iron quality (MIQ) indices. However, measuring the MIQ online is not feasible using the available techniques. In this paper, a novel data-driven robust modeling is proposed for online estimation of MIQ using improved random vector functional-link networks (RVFLNs). Since the output weights of traditional RVFLNs are obtained by the least squares approach, a robustness problem may occur when the training dataset is contaminated with outliers. This affects the modeling accuracy of RVFLNs. To solve this problem, a Cauchy distribution weighted M-estimation based robust RFVLNs is proposed. Since the weights of different outlier data are properly determined by the Cauchy distribution, their corresponding contribution on modeling can be properly distinguished. Thus robust and better modeling results can be achieved. Moreover, given that the BF is a complex nonlinear system with numerous coupling variables, the data-driven canonical correlation analysis is employed to identify the most influential components from multitudinous factors that affect the MIQ indices to reduce the model dimension. Finally, experiments using industrial data and comparative studies have demonstrated that the obtained model produces a better modeling and estimating accuracy and stronger robustness than other modeling methods.

  7. Systems engineering approach towards performance monitoring of emergency diesel generator

    International Nuclear Information System (INIS)

    Nurhayati Ramli; Lee, Y.K.

    2013-01-01

    Full-text: Systems engineering is an interdisciplinary approach and means to enable the realization of successful systems. In this study, systems engineering approach towards the performance monitoring of Emergency Diesel Generator (EDG) is presented. Performance monitoring is part and parcel of predictive maintenance where the systems and components conditions can be detected before they result into failures. In an effort to identify the proposal for addressing performance monitoring, the EDG boundary has been defined. Based on the Probabilistic Safety Analysis (PSA) results and industry operating experiences, the most critical component is identified. This paper proposed a systems engineering concept development framework towards EDG performance monitoring. The expected output of this study is that the EDG reliability can be improved by the performance monitoring alternatives through the systems engineering concept development effort. (author)

  8. Total System Performance Assessment-License Application Methods and Approach

    Energy Technology Data Exchange (ETDEWEB)

    J. McNeish

    2002-09-13

    ''Total System Performance Assessment-License Application (TSPA-LA) Methods and Approach'' provides the top-level method and approach for conducting the TSPA-LA model development and analyses. The method and approach is responsive to the criteria set forth in Total System Performance Assessment Integration (TSPAI) Key Technical Issue (KTI) agreements, the ''Yucca Mountain Review Plan'' (CNWRA 2002 [158449]), and 10 CFR Part 63. This introductory section provides an overview of the TSPA-LA, the projected TSPA-LA documentation structure, and the goals of the document. It also provides a brief discussion of the regulatory framework, the approach to risk management of the development and analysis of the model, and the overall organization of the document. The section closes with some important conventions that are utilized in this document.

  9. Approaches for University Students and their Relationship to Academic Performance

    Directory of Open Access Journals (Sweden)

    Evelyn Fernández-Castillo

    2015-05-01

    Full Text Available The way students perceive learning is influenced by multiple factors. The present study aimed at establishing relationships between the learning approaches, academic performance, and the academic year in a sample of students from different courses of Universidad Central  “Marta Abreu”, Las Villas. For this ex post facto study, a probabilistic sample was used based on a simple random sampling of 524 university students who participated in the Study Process Questionnaire.  The analysis of variance (MANOVA and ANOVA and the analysis of clusters reported associations between a deep approach to learning and a better academic performance.  These analyses showed differences in the learning approach in the different courses, predominantly a soft approach.

  10. Total System Performance Assessment - License Application Methods and Approach

    International Nuclear Information System (INIS)

    McNeish, J.

    2003-01-01

    ''Total System Performance Assessment-License Application (TSPA-LA) Methods and Approach'' provides the top-level method and approach for conducting the TSPA-LA model development and analyses. The method and approach is responsive to the criteria set forth in Total System Performance Assessment Integration (TSPAI) Key Technical Issues (KTIs) identified in agreements with the U.S. Nuclear Regulatory Commission, the ''Yucca Mountain Review Plan'' (YMRP), ''Final Report'' (NRC 2003 [163274]), and the NRC final rule 10 CFR Part 63 (NRC 2002 [156605]). This introductory section provides an overview of the TSPA-LA, the projected TSPA-LA documentation structure, and the goals of the document. It also provides a brief discussion of the regulatory framework, the approach to risk management of the development and analysis of the model, and the overall organization of the document. The section closes with some important conventions that are used in this document

  11. Access Control with Delegated Authorization Policy Evaluation for Data-Driven Microservice Workflows

    Directory of Open Access Journals (Sweden)

    Davy Preuveneers

    2017-09-01

    Full Text Available Microservices offer a compelling competitive advantage for building data flow systems as a choreography of self-contained data endpoints that each implement a specific data processing functionality. Such a ‘single responsibility principle’ design makes them well suited for constructing scalable and flexible data integration and real-time data flow applications. In this paper, we investigate microservice based data processing workflows from a security point of view, i.e., (1 how to constrain data processing workflows with respect to dynamic authorization policies granting or denying access to certain microservice results depending on the flow of the data; (2 how to let multiple microservices contribute to a collective data-driven authorization decision and (3 how to put adequate measures in place such that the data within each individual microservice is protected against illegitimate access from unauthorized users or other microservices. Due to this multifold objective, enforcing access control on the data endpoints to prevent information leakage or preserve one’s privacy becomes far more challenging, as authorization policies can have dependencies and decision outcomes cross-cutting data in multiple microservices. To address this challenge, we present and evaluate a workflow-oriented authorization framework that enforces authorization policies in a decentralized manner and where the delegated policy evaluation leverages feature toggles that are managed at runtime by software circuit breakers to secure the distributed data processing workflows. The benefit of our solution is that, on the one hand, authorization policies restrict access to the data endpoints of the microservices, and on the other hand, microservices can safely rely on other data endpoints to collectively evaluate cross-cutting access control decisions without having to rely on a shared storage backend holding all the necessary information for the policy evaluation.

  12. Data Science and its Relationship to Big Data and Data-Driven Decision Making.

    Science.gov (United States)

    Provost, Foster; Fawcett, Tom

    2013-03-01

    Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot-even "sexy"-career choice. However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science. One reason is that data science is intricately intertwined with other important concepts also of growing importance, such as big data and data-driven decision making. Another reason is the natural tendency to associate what a practitioner does with the definition of the practitioner's field; this can result in overlooking the fundamentals of the field. We believe that trying to define the boundaries of data science precisely is not of the utmost importance. We can debate the boundaries of the field in an academic setting, but in order for data science to serve business effectively, it is important (i) to understand its relationships to other important related concepts, and (ii) to begin to identify the fundamental principles underlying data science. Once we embrace (ii), we can much better understand and explain exactly what data science has to offer. Furthermore, only once we embrace (ii) should we be comfortable calling it data science. In this article, we present a perspective that addresses all these concepts. We close by offering, as examples, a partial list of fundamental principles underlying data science.

  13. Data-driven haemodynamic response function extraction using Fourier-wavelet regularised deconvolution

    Directory of Open Access Journals (Sweden)

    Roerdink Jos BTM

    2008-04-01

    Full Text Available Abstract Background We present a simple, data-driven method to extract haemodynamic response functions (HRF from functional magnetic resonance imaging (fMRI time series, based on the Fourier-wavelet regularised deconvolution (ForWaRD technique. HRF data are required for many fMRI applications, such as defining region-specific HRFs, effciently representing a general HRF, or comparing subject-specific HRFs. Results ForWaRD is applied to fMRI time signals, after removing low-frequency trends by a wavelet-based method, and the output of ForWaRD is a time series of volumes, containing the HRF in each voxel. Compared to more complex methods, this extraction algorithm requires few assumptions (separability of signal and noise in the frequency and wavelet domains and the general linear model and it is fast (HRF extraction from a single fMRI data set takes about the same time as spatial resampling. The extraction method is tested on simulated event-related activation signals, contaminated with noise from a time series of real MRI images. An application for HRF data is demonstrated in a simple event-related experiment: data are extracted from a region with significant effects of interest in a first time series. A continuous-time HRF is obtained by fitting a nonlinear function to the discrete HRF coeffcients, and is then used to analyse a later time series. Conclusion With the parameters used in this paper, the extraction method presented here is very robust to changes in signal properties. Comparison of analyses with fitted HRFs and with a canonical HRF shows that a subject-specific, regional HRF significantly improves detection power. Sensitivity and specificity increase not only in the region from which the HRFs are extracted, but also in other regions of interest.

  14. Data Driven Trigger Design and Analysis for the NOvA Experiment

    Energy Technology Data Exchange (ETDEWEB)

    Kurbanov, Serdar [Univ. of Virginia, Charlottesville, VA (United States)

    2016-01-01

    This thesis primarily describes analysis related to studying the Moon shadow with cosmic rays, an analysis using upward-going muons trigger data, and other work done as part of MSc thesis work conducted at Fermi National Laboratory. While at Fermilab I made hardware and software contributions to two experiments - NOvA and Mu2e. NOvA is a neutrino experiment with the primary goal of measuring parameters related to neutrino oscillation. This is a running experiment, so it's possible to provide analysis of real beam and cosmic data. Most of this work was related to the Data-Driven Trigger (DDT) system of NOvA. The results of the Upward-Going muon analysis was presented at ICHEP in August 2016. The analysis demonstrates the proof of principle for a low-mass dark matter search. Mu2e is an experiment currently being built at Fermilab. Its primary goal is to detect the hypothetical neutrinoless conversion from a muon into an electron. I contributed to the production and tests of Cathode Strip Chambers (CSCs) which are required for testing the Cosmic Ray Veto (CRV) system for the experiment. This contribution is described in the last chapter along with a short description of the technical work provided for the DDT system of the NOvA experiment. All of the work described in this thesis will be extended by the next generation of UVA graduate students and postdocs as new data is collected by the experiment. I hope my eorts of have helped lay the foundation for many years of beautiful results from Mu2e and NOvA.

  15. Data-driven nutrient analysis and reality check: Human inputs, catchment delivery and management effects

    Science.gov (United States)

    Destouni, G.

    2017-12-01

    Measures for mitigating nutrient loads to aquatic ecosystems should have observable effects, e.g, in the Baltic region after joint first periods of nutrient management actions under the Baltic Sea Action Plan (BASP; since 2007) and the EU Water Framework Directive (WFD; since 2009). Looking for such observable effects, all openly available water and nutrient monitoring data since 2003 are compiled and analyzed for Sweden as a case study. Results show that hydro-climatically driven water discharge dominates the determination of waterborne loads of both phosphorus and nitrogen. Furthermore, the nutrient loads and water discharge are all similarly well correlated with the ecosystem status classification of Swedish water bodies according to the WFD. Nutrient concentrations, which are hydro-climatically correlated and should thus reflect human effects better than loads, have changed only slightly over the study period (2003-2013) and even increased in moderate-to-bad status waters, where the WFD and BSAP jointly target nutrient decreases. These results indicate insufficient distinction and mitigation of human-driven nutrient components by the internationally harmonized applications of both the WFD and the BSAP. Aiming for better general identification of such components, nutrient data for the large transboundary catchments of the Baltic Sea and the Sava River are compared. The comparison shows cross-regional consistency in nutrient relationships to driving hydro-climatic conditions (water discharge) for nutrient loads, and socio-economic conditions (population density and farmland share) for nutrient concentrations. A data-driven screening methodology is further developed for estimating nutrient input and retention-delivery in catchments. Its first application to nested Sava River catchments identifies characteristic regional values of nutrient input per area and relative delivery, and hotspots of much larger inputs, related to urban high-population areas.

  16. Enhancing Transparency and Control When Drawing Data-Driven Inferences About Individuals.

    Science.gov (United States)

    Chen, Daizhuo; Fraiberger, Samuel P; Moakler, Robert; Provost, Foster

    2017-09-01

    Recent studies show the remarkable power of fine-grained information disclosed by users on social network sites to infer users' personal characteristics via predictive modeling. Similar fine-grained data are being used successfully in other commercial applications. In response, attention is turning increasingly to the transparency that organizations provide to users as to what inferences are drawn and why, as well as to what sort of control users can be given over inferences that are drawn about them. In this article, we focus on inferences about personal characteristics based on information disclosed by users' online actions. As a use case, we explore personal inferences that are made possible from "Likes" on Facebook. We first present a means for providing transparency into the information responsible for inferences drawn by data-driven models. We then introduce the "cloaking device"-a mechanism for users to inhibit the use of particular pieces of information in inference. Using these analytical tools we ask two main questions: (1) How much information must users cloak to significantly affect inferences about their personal traits? We find that usually users must cloak only a small portion of their actions to inhibit inference. We also find that, encouragingly, false-positive inferences are significantly easier to cloak than true-positive inferences. (2) Can firms change their modeling behavior to make cloaking more difficult? The answer is a definitive yes. We demonstrate a simple modeling change that requires users to cloak substantially more information to affect the inferences drawn. The upshot is that organizations can provide transparency and control even into complicated, predictive model-driven inferences, but they also can make control easier or harder for their users.

  17. Disruption of functional networks in dyslexia: A whole-brain, data-driven analysis of connectivity

    Science.gov (United States)

    Finn, Emily S.; Shen, Xilin; Holahan, John M.; Scheinost, Dustin; Lacadie, Cheryl; Papademetris, Xenophon; Shaywitz, Sally E.; Shaywitz, Bennett A.; Constable, R. Todd

    2013-01-01

    Background Functional connectivity analyses of fMRI data are a powerful tool for characterizing brain networks and how they are disrupted in neural disorders. However, many such analyses examine only one or a small number of a priori seed regions. Studies that consider the whole brain frequently rely on anatomic atlases to define network nodes, which may result in mixing distinct activation timecourses within a single node. Here, we improve upon previous methods by using a data-driven brain parcellation to compare connectivity profiles of dyslexic (DYS) versus non-impaired (NI) readers in the first whole-brain functional connectivity analysis of dyslexia. Methods Whole-brain connectivity was assessed in children (n = 75; 43 NI, 32 DYS) and adult (n = 104; 64 NI, 40 DYS) readers. Results Compared to NI readers, DYS readers showed divergent connectivity within the visual pathway and between visual association areas and prefrontal attention areas; increased right-hemisphere connectivity; reduced connectivity in the visual word-form area (part of the left fusiform gyrus specialized for printed words); and persistent connectivity to anterior language regions around the inferior frontal gyrus. Conclusions Together, findings suggest that NI readers are better able to integrate visual information and modulate their attention to visual stimuli, allowing them to recognize words based on their visual properties, while DYS readers recruit altered reading circuits and rely on laborious phonology-based “sounding out” strategies into adulthood. These results deepen our understanding of the neural basis of dyslexia and highlight the importance of synchrony between diverse brain regions for successful reading. PMID:24124929

  18. STUDY OF THE POYNTING FLUX IN ACTIVE REGION 10930 USING DATA-DRIVEN MAGNETOHYDRODYNAMIC SIMULATION

    International Nuclear Information System (INIS)

    Fan, Y. L.; Wang, H. N.; He, H.; Zhu, X. S.

    2011-01-01

    Powerful solar flares are closely related to the evolution of magnetic field configuration on the photosphere. We choose the Poynting flux as a parameter in the study of magnetic field changes. We use time-dependent multidimensional MHD simulations around a flare occurrence to generate the results, with the temporal variation of the bottom boundary conditions being deduced from the projected normal characteristic method. By this method, the photospheric magnetogram could be incorporated self-consistently as the bottom condition of data-driven simulations. The model is first applied to a simulation datum produced by an emerging magnetic flux rope as a test case. Then, the model is used to study NOAA AR 10930, which has an X3.4 flare, the data of which has been obtained by the Hinode/Solar Optical Telescope on 2006 December 13. We compute the magnitude of Poynting flux (S total ), radial Poynting flux (S z ), a proxy for ideal radial Poynting flux (S proxy ), Poynting flux due to plasma surface motion (S sur ), and Poynting flux due to plasma emergence (S emg ) and analyze their extensive properties in four selected areas: the whole sunspot, the positive sunspot, the negative sunspot, and the strong-field polarity inversion line (SPIL) area. It is found that (1) the S total , S z , and S proxy parameters show similar behaviors in the whole sunspot area and in the negative sunspot area. The evolutions of these three parameters in the positive area and the SPIL area are more volatile because of the effect of sunspot rotation and flux emergence. (2) The evolution of S sur is largely influenced by the process of sunspot rotation, especially in the positive sunspot. The evolution of S emg is greatly affected by flux emergence, especially in the SPIL area.

  19. Finding candidate locations for aerosol pollution monitoring at street level using a data-driven methodology

    Science.gov (United States)

    Moosavi, V.; Aschwanden, G.; Velasco, E.

    2015-09-01

    Finding the number and best locations of fixed air quality monitoring stations at street level is challenging because of the complexity of the urban environment and the large number of factors affecting the pollutants concentration. Data sets of such urban parameters as land use, building morphology and street geometry in high-resolution grid cells in combination with direct measurements of airborne pollutants at high frequency (1-10 s) along a reasonable number of streets can be used to interpolate concentration of pollutants in a whole gridded domain and determine the optimum number of monitoring sites and best locations for a network of fixed monitors at ground level. In this context, a data-driven modeling methodology is developed based on the application of Self-Organizing Map (SOM) to approximate the nonlinear relations between urban parameters (80 in this work) and aerosol pollution data, such as mass and number concentrations measured along streets of a commercial/residential neighborhood of Singapore. Cross-validations between measured and predicted aerosol concentrations based on the urban parameters at each individual grid cell showed satisfying results. This proof of concept study showed that the selected urban parameters proved to be an appropriate indirect measure of aerosol concentrations within the studied area. The potential locations for fixed air quality monitors are identified through clustering of areas (i.e., group of cells) with similar urban patterns. The typological center of each cluster corresponds to the most representative cell for all other cells in the cluster. In the studied neighborhood four different clusters were identified and for each cluster potential sites for air quality monitoring at ground level are identified.

  20. Investigation on the performance of bridge approach slab

    Directory of Open Access Journals (Sweden)

    Abdelrahman Amr

    2018-01-01

    Full Text Available In Egypt, where highway bridges are to be constructed on soft cohesive soils, the bridge abutments are usually founded on rigid piles, whereas the earth embankments for the bridge approaches are directly founded on the natural soft ground. Consequently, excessive differential settlement frequently occurs between the bridge deck and the bridge approaches resulting in a “bump” at both ends of the bridge deck. Such a bump not only creates a rough and uncomfortable ride but also represents a hazardous condition to traffic. One effective technique to cope with the bump problem is to use a reinforced concrete approach slab to provide a smooth grade transition between the bridge deck and the approach pavement. Investigating the geotechnical and structural performance of approach slabs and revealing the fundamental affecting factors have become mandatory. In this paper, a 2-D finite element model is employed to investigate the performance of approach slabs. Moreover, an extensive parametric study is carried out to appraise the relatively optimum geometries of approach slab, i.e. slab length, thickness, embedded depth and slope, that can yield permissible bumps. Different geo-mechanical conditions of the cohesive foundation soil and the fill material of the bridge embankment are examined.