WorldWideScience

Sample records for liferaft data-driven batch

  1. Data-driven batch schuduling

    Energy Technology Data Exchange (ETDEWEB)

    Bent, John [Los Alamos National Laboratory; Denehy, Tim [GOOGLE; Arpaci - Dusseau, Remzi [UNIV OF WISCONSIN; Livny, Miron [UNIV OF WISCONSIN; Arpaci - Dusseau, Andrea C [NON LANL

    2009-01-01

    In this paper, we develop data-driven strategies for batch computing schedulers. Current CPU-centric batch schedulers ignore the data needs within workloads and execute them by linking them transparently and directly to their needed data. When scheduled on remote computational resources, this elegant solution of direct data access can incur an order of magnitude performance penalty for data-intensive workloads. Adding data-awareness to batch schedulers allows a careful coordination of data and CPU allocation thereby reducing the cost of remote execution. We offer here new techniques by which batch schedulers can become data-driven. Such systems can use our analytical predictive models to select one of the four data-driven scheduling policies that we have created. Through simulation, we demonstrate the accuracy of our predictive models and show how they can reduce time to completion for some workloads by as much as 80%.

  2. Parameterized data-driven fuzzy model based optimal control of a semi-batch reactor.

    Science.gov (United States)

    Kamesh, Reddi; Rani, K Yamuna

    2016-09-01

    A parameterized data-driven fuzzy (PDDF) model structure is proposed for semi-batch processes, and its application for optimal control is illustrated. The orthonormally parameterized input trajectories, initial states and process parameters are the inputs to the model, which predicts the output trajectories in terms of Fourier coefficients. Fuzzy rules are formulated based on the signs of a linear data-driven model, while the defuzzification step incorporates a linear regression model to shift the domain from input to output domain. The fuzzy model is employed to formulate an optimal control problem for single rate as well as multi-rate systems. Simulation study on a multivariable semi-batch reactor system reveals that the proposed PDDF modeling approach is capable of capturing the nonlinear and time-varying behavior inherent in the semi-batch system fairly accurately, and the results of operating trajectory optimization using the proposed model are found to be comparable to the results obtained using the exact first principles model, and are also found to be comparable to or better than parameterized data-driven artificial neural network model based optimization results. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  3. Data-driven storytelling

    CERN Document Server

    Hurter, Christophe; Diakopoulos, Nicholas ed.; Carpendale, Sheelagh

    2018-01-01

    This book is an accessible introduction to data-driven storytelling, resulting from discussions between data visualization researchers and data journalists. This book will be the first to define the topic, present compelling examples and existing resources, as well as identify challenges and new opportunities for research.

  4. 46 CFR 199.203 - Marshalling of liferafts.

    Science.gov (United States)

    2010-10-01

    ... LIFESAVING SYSTEMS FOR CERTAIN INSPECTED VESSELS Additional Requirements for Passenger Vessels § 199.203 Marshalling of liferafts. (a) Each passenger vessel must have a lifeboat or rescue boat for each six liferafts when— (1) Each lifeboat and rescue boat is loaded with its full complement of persons; and (2) The...

  5. 46 CFR 160.051-5 - Design and performance of Coastal Service inflatable liferafts.

    Science.gov (United States)

    2010-10-01

    ... 46 Shipping 6 2010-10-01 2010-10-01 false Design and performance of Coastal Service inflatable... Liferafts for Domestic Service § 160.051-5 Design and performance of Coastal Service inflatable liferafts. To obtain Coast Guard approval, each Coastal Service inflatable liferaft must comply with subpart 160...

  6. Data driven marketing for dummies

    CERN Document Server

    Semmelroth, David

    2013-01-01

    Embrace data and use it to sell and market your products Data is everywhere and it keeps growing and accumulating. Companies need to embrace big data and make it work harder to help them sell and market their products. Successful data analysis can help marketing professionals spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Data Driven Marketing For Dummies helps companies use all the data at their disposal to make current customers more satisfied, reach new customers, and sell to their most important customer segments more efficiently. Identifyi

  7. Data-driven architectural production and operation

    NARCIS (Netherlands)

    Bier, H.H.; Mostafavi, S.

    2014-01-01

    Data-driven architectural production and operation as explored within Hyperbody rely heavily on system thinking implying that all parts of a system are to be understood in relation to each other. These relations are increasingly established bi-directionally so that data-driven architecture is not

  8. Data Driven Economic Model Predictive Control

    Directory of Open Access Journals (Sweden)

    Masoud Kheradmandi

    2018-04-01

    Full Text Available This manuscript addresses the problem of data driven model based economic model predictive control (MPC design. To this end, first, a data-driven Lyapunov-based MPC is designed, and shown to be capable of stabilizing a system at an unstable equilibrium point. The data driven Lyapunov-based MPC utilizes a linear time invariant (LTI model cognizant of the fact that the training data, owing to the unstable nature of the equilibrium point, has to be obtained from closed-loop operation or experiments. Simulation results are first presented demonstrating closed-loop stability under the proposed data-driven Lyapunov-based MPC. The underlying data-driven model is then utilized as the basis to design an economic MPC. The economic improvements yielded by the proposed method are illustrated through simulations on a nonlinear chemical process system example.

  9. Data-Driven Problems in Elasticity

    Science.gov (United States)

    Conti, S.; Müller, S.; Ortiz, M.

    2018-01-01

    We consider a new class of problems in elasticity, referred to as Data-Driven problems, defined on the space of strain-stress field pairs, or phase space. The problem consists of minimizing the distance between a given material data set and the subspace of compatible strain fields and stress fields in equilibrium. We find that the classical solutions are recovered in the case of linear elasticity. We identify conditions for convergence of Data-Driven solutions corresponding to sequences of approximating material data sets. Specialization to constant material data set sequences in turn establishes an appropriate notion of relaxation. We find that relaxation within this Data-Driven framework is fundamentally different from the classical relaxation of energy functions. For instance, we show that in the Data-Driven framework the relaxation of a bistable material leads to material data sets that are not graphs.

  10. Consistent data-driven computational mechanics

    Science.gov (United States)

    González, D.; Chinesta, F.; Cueto, E.

    2018-05-01

    We present a novel method, within the realm of data-driven computational mechanics, to obtain reliable and thermodynamically sound simulation from experimental data. We thus avoid the need to fit any phenomenological model in the construction of the simulation model. This kind of techniques opens unprecedented possibilities in the framework of data-driven application systems and, particularly, in the paradigm of industry 4.0.

  11. Data-driven regionalization of housing markets

    NARCIS (Netherlands)

    Helbich, M.; Brunauer, W.; Hagenauer, J.; Leitner, M.

    2013-01-01

    This article presents a data-driven framework for housing market segmentation. Local marginal house price surfaces are investigated by means of mixed geographically weighted regression and are reduced to a set of principal component maps, which in turn serve as input for spatial regionalization. The

  12. Data Driven Constraints for the SVM

    DEFF Research Database (Denmark)

    Darkner, Sune; Clemmensen, Line Katrine Harder

    2012-01-01

    We propose a generalized data driven constraint for support vector machines exemplified by classification of paired observations in general and specifically on the human ear canal. This is particularly interesting in dynamic cases such as tissue movement or pathologies developing over time. Assum...

  13. Challenges of Data-driven Healthcare Management

    DEFF Research Database (Denmark)

    Bossen, Claus; Danholt, Peter; Ubbesen, Morten Bonde

    This paper describes the new kind of data-work involved in developing data-driven healthcare based on two cases from Denmark: The first case concerns a governance infrastructure based on Diagnose-Related Groups (DRG), which was introduced in Denmark in the 1990s. The DRG-system links healthcare...... activity and financing and relies of extensive data entry, reporting and calculations. This has required the development of new skills, work and work roles. The second case concerns a New Governance project aimed at developing new performance indicators for healthcare delivery as an alternative to DRG....... Here, a core challenge is select indicators and actually being able to acquire data upon them. The two cases point out that data-driven healthcare requires more and new kinds of work for which new skills, functions and work roles have to be developed....

  14. Data Driven Tuning of Inventory Controllers

    DEFF Research Database (Denmark)

    Huusom, Jakob Kjøbsted; Santacoloma, Paloma Andrade; Poulsen, Niels Kjølstad

    2007-01-01

    A systematic method for criterion based tuning of inventory controllers based on data-driven iterative feedback tuning is presented. This tuning method circumvent problems with modeling bias. The process model used for the design of the inventory control is utilized in the tuning...... as an approximation to reduce time required on experiments. The method is illustrated in an application with a multivariable inventory control implementation on a four tank system....

  15. Data-driven workflows for microservices

    DEFF Research Database (Denmark)

    Safina, Larisa; Mazzara, Manuel; Montesi, Fabrizio

    2016-01-01

    Microservices is an architectural style inspired by service-oriented computing that has recently started gainingpopularity. Jolie is a programming language based on the microservices paradigm: the main building block of Jolie systems are services, in contrast to, e.g., functions or objects....... The primitives offered by the Jolie language elicit many of the recurring patterns found in microservices, like load balancers and structured processes. However, Jolie still lacks some useful constructs for dealing with message types and data manipulation that are present in service-oriented computing......). We show the impact of our implementation on some of the typical scenarios found in microservice systems. This shows how computation can move from a process-driven to a data-driven approach, and leads to the preliminary identification of recurring communication patterns that can be shaped as design...

  16. Data-Driven Security-Constrained OPF

    DEFF Research Database (Denmark)

    Thams, Florian; Halilbasic, Lejla; Pinson, Pierre

    2017-01-01

    considerations, while being less conservative than current approaches. Our approach can be scalable for large systems, accounts explicitly for power system security, and enables the electricity market to identify a cost-efficient dispatch avoiding redispatching actions. We demonstrate the performance of our......In this paper we unify electricity market operations with power system security considerations. Using data-driven techniques, we address both small signal stability and steady-state security, derive tractable decision rules in the form of line flow limits, and incorporate the resulting constraints...... in market clearing algorithms. Our goal is to minimize redispatching actions, and instead allow the market to determine the most cost-efficient dispatch while considering all security constraints. To maintain tractability of our approach we perform our security assessment offline, examining large datasets...

  17. Combining engineering and data-driven approaches

    DEFF Research Database (Denmark)

    Fischer, Katharina; De Sanctis, Gianluca; Kohler, Jochen

    2015-01-01

    Two general approaches may be followed for the development of a fire risk model: statistical models based on observed fire losses can support simple cost-benefit studies but are usually not detailed enough for engineering decision-making. Engineering models, on the other hand, require many assump...... to the calibration of a generic fire risk model for single family houses to Swiss insurance data. The example demonstrates that the bias in the risk estimation can be strongly reduced by model calibration.......Two general approaches may be followed for the development of a fire risk model: statistical models based on observed fire losses can support simple cost-benefit studies but are usually not detailed enough for engineering decision-making. Engineering models, on the other hand, require many...... assumptions that may result in a biased risk assessment. In two related papers we show how engineering and data-driven modelling can be combined by developing generic risk models that are calibrated to statistical data on observed fire events. The focus of the present paper is on the calibration procedure...

  18. Data driven modelling of vertical atmospheric radiation

    International Nuclear Information System (INIS)

    Antoch, Jaromir; Hlubinka, Daniel

    2011-01-01

    In the Czech Hydrometeorological Institute (CHMI) there exists a unique set of meteorological measurements consisting of the values of vertical atmospheric levels of beta and gamma radiation. In this paper a stochastic data-driven model based on nonlinear regression and on nonhomogeneous Poisson process is suggested. In the first part of the paper, growth curves were used to establish an appropriate nonlinear regression model. For comparison we considered a nonhomogeneous Poisson process with its intensity based on growth curves. In the second part both approaches were applied to the real data and compared. Computational aspects are briefly discussed as well. The primary goal of this paper is to present an improved understanding of the distribution of environmental radiation as obtained from the measurements of the vertical radioactivity profiles by the radioactivity sonde system. - Highlights: → We model vertical atmospheric levels of beta and gamma radiation. → We suggest appropriate nonlinear regression model based on growth curves. → We compare nonlinear regression modelling with Poisson process based modeling. → We apply both models to the real data.

  19. Data driven innovations in structural health monitoring

    Science.gov (United States)

    Rosales, M. J.; Liyanapathirana, R.

    2017-05-01

    At present, substantial investments are being allocated to civil infrastructures also considered as valuable assets at a national or global scale. Structural Health Monitoring (SHM) is an indispensable tool required to ensure the performance and safety of these structures based on measured response parameters. The research to date on damage assessment has tended to focus on the utilization of wireless sensor networks (WSN) as it proves to be the best alternative over the traditional visual inspections and tethered or wired counterparts. Over the last decade, the structural health and behaviour of innumerable infrastructure has been measured and evaluated owing to several successful ventures of implementing these sensor networks. Various monitoring systems have the capability to rapidly transmit, measure, and store large capacities of data. The amount of data collected from these networks have eventually been unmanageable which paved the way to other relevant issues such as data quality, relevance, re-use, and decision support. There is an increasing need to integrate new technologies in order to automate the evaluation processes as well as to enhance the objectivity of data assessment routines. This paper aims to identify feasible methodologies towards the application of time-series analysis techniques to judiciously exploit the vast amount of readily available as well as the upcoming data resources. It continues the momentum of a greater effort to collect and archive SHM approaches that will serve as data-driven innovations for the assessment of damage through efficient algorithms and data analytics.

  20. 46 CFR 160.151-15 - Design and performance of inflatable liferafts.

    Science.gov (United States)

    2010-10-01

    ...). (g) Towing attachments (Regulation III/38.1.4.) Each towing attachment must be reinforced strongly... mm (3/8-inch), or equivalent. Each lifeline-attachment patch must have a minimum breaking strength of... inflation cylinders in place when the liferaft is dropped into the water from its stowage height and during...

  1. 46 CFR 160.151-17 - Additional requirements for design and performance of SOLAS A and SOLAS B inflatable liferafts.

    Science.gov (United States)

    2010-10-01

    ... stability appendages on its underside to resist capsizing from wind and waves. These appendages must meet...). Means must be provided for identifying the liferaft with the name and port of registry of the ship to...

  2. Data-driven architectural design to production and operation

    NARCIS (Netherlands)

    Bier, H.H.; Mostafavi, S.

    2015-01-01

    Data-driven architectural production and operation explored within Hyperbody rely heavily on system thinking implying that all parts of a system are to be understood in relation to each other. These relations are established bi-directionally so that data-driven architecture is not only produced

  3. Data-Driven Methods to Diversify Knowledge of Human Psychology

    OpenAIRE

    Jack, Rachael E.; Crivelli, Carlos; Wheatley, Thalia

    2017-01-01

    open access article Psychology aims to understand real human behavior. However, cultural biases in the scientific process can constrain knowledge. We describe here how data-driven methods can relax these constraints to reveal new insights that theories can overlook. To advance knowledge we advocate a symbiotic approach that better combines data-driven methods with theory.

  4. Dynamic Data-Driven UAV Network for Plume Characterization

    Science.gov (United States)

    2016-05-23

    AFRL-AFOSR-VA-TR-2016-0203 Dynamic Data-Driven UAV Network for Plume Characterization Kamran Mohseni UNIVERSITY OF FLORIDA Final Report 05/23/2016...AND SUBTITLE Dynamic Data-Driven UAV Network for Plume Characterization 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA9550-13-1-0090 5c.  PROGRAM ELEMENT...studied a dynamic data driven (DDD) approach to operation of a heterogeneous team of unmanned aerial vehicles ( UAVs ) or micro/miniature aerial

  5. Data-Driven Exercises for Chemistry: A New Digital Collection

    Science.gov (United States)

    Grubbs, W. Tandy

    2007-01-01

    The analysis presents a new digital collection for various data-driven exercises that are used for teaching chemistry to the students. Such methods are expected to help the students to think in a more scientific manner.

  6. Data-Driven Model Order Reduction for Bayesian Inverse Problems

    KAUST Repository

    Cui, Tiangang; Youssef, Marzouk; Willcox, Karen

    2014-01-01

    One of the major challenges in using MCMC for the solution of inverse problems is the repeated evaluation of computationally expensive numerical models. We develop a data-driven projection- based model order reduction technique to reduce

  7. Dynamically adaptive data-driven simulation of extreme hydrological flows

    KAUST Repository

    Kumar Jain, Pushkar; Mandli, Kyle; Hoteit, Ibrahim; Knio, Omar; Dawson, Clint

    2017-01-01

    evacuation in real-time and through the development of resilient infrastructure based on knowledge of how systems respond to extreme events. Data-driven computational modeling is a critical technology underpinning these efforts. This investigation focuses

  8. The Structural Consequences of Big Data-Driven Education.

    Science.gov (United States)

    Zeide, Elana

    2017-06-01

    Educators and commenters who evaluate big data-driven learning environments focus on specific questions: whether automated education platforms improve learning outcomes, invade student privacy, and promote equality. This article puts aside separate unresolved-and perhaps unresolvable-issues regarding the concrete effects of specific technologies. It instead examines how big data-driven tools alter the structure of schools' pedagogical decision-making, and, in doing so, change fundamental aspects of America's education enterprise. Technological mediation and data-driven decision-making have a particularly significant impact in learning environments because the education process primarily consists of dynamic information exchange. In this overview, I highlight three significant structural shifts that accompany school reliance on data-driven instructional platforms that perform core school functions: teaching, assessment, and credentialing. First, virtual learning environments create information technology infrastructures featuring constant data collection, continuous algorithmic assessment, and possibly infinite record retention. This undermines the traditional intellectual privacy and safety of classrooms. Second, these systems displace pedagogical decision-making from educators serving public interests to private, often for-profit, technology providers. They constrain teachers' academic autonomy, obscure student evaluation, and reduce parents' and students' ability to participate or challenge education decision-making. Third, big data-driven tools define what "counts" as education by mapping the concepts, creating the content, determining the metrics, and setting desired learning outcomes of instruction. These shifts cede important decision-making to private entities without public scrutiny or pedagogical examination. In contrast to the public and heated debates that accompany textbook choices, schools often adopt education technologies ad hoc. Given education

  9. Temporal Data-Driven Sleep Scheduling and Spatial Data-Driven Anomaly Detection for Clustered Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Gang Li

    2016-09-01

    Full Text Available The spatial–temporal correlation is an important feature of sensor data in wireless sensor networks (WSNs. Most of the existing works based on the spatial–temporal correlation can be divided into two parts: redundancy reduction and anomaly detection. These two parts are pursued separately in existing works. In this work, the combination of temporal data-driven sleep scheduling (TDSS and spatial data-driven anomaly detection is proposed, where TDSS can reduce data redundancy. The TDSS model is inspired by transmission control protocol (TCP congestion control. Based on long and linear cluster structure in the tunnel monitoring system, cooperative TDSS and spatial data-driven anomaly detection are then proposed. To realize synchronous acquisition in the same ring for analyzing the situation of every ring, TDSS is implemented in a cooperative way in the cluster. To keep the precision of sensor data, spatial data-driven anomaly detection based on the spatial correlation and Kriging method is realized to generate an anomaly indicator. The experiment results show that cooperative TDSS can realize non-uniform sensing effectively to reduce the energy consumption. In addition, spatial data-driven anomaly detection is quite significant for maintaining and improving the precision of sensor data.

  10. Supervision of Fed-Batch Fermentations

    DEFF Research Database (Denmark)

    Gregersen, Lars; Jørgensen, Sten Bay

    1999-01-01

    Process faults may be detected on-line using existing measurements based upon modelling that is entirely data driven. A multivariate statistical model is developed and used for fault diagnosis of an industrial fed-batch fermentation process. Data from several (25) batches are used to develop...... a model for cultivation behaviour. This model is validated against 13 data sets and demonstrated to explain a significant amount of variation in the data. The multivariate model may directly be used for process monitoring. With this method faults are detected in real time and the responsible measurements...

  11. Data-Driven Learning: Reasonable Fears and Rational Reassurance

    Science.gov (United States)

    Boulton, Alex

    2009-01-01

    Computer corpora have many potential applications in teaching and learning languages, the most direct of which--when the learners explore a corpus themselves--has become known as data-driven learning (DDL). Despite considerable enthusiasm in the research community and interest in higher education, the approach has not made major inroads to…

  12. Data-driven Regulation and Governance in Smart Cities

    NARCIS (Netherlands)

    Ranchordás, Sofia; Klop, Abram; Mak, Vanessa; Berlee, Anna; Tjong Tjin Tai, Eric

    2018-01-01

    This chapter discusses the concept of data-driven regulation and governance in the context of smart cities by describing how these urban centres harness these technologies to collect and process information about citizens, traffic, urban planning or waste production. It describes how several smart

  13. Data-Driven Planning: Using Assessment in Strategic Planning

    Science.gov (United States)

    Bresciani, Marilee J.

    2010-01-01

    Data-driven planning or evidence-based decision making represents nothing new in its concept. For years, business leaders have claimed they have implemented planning informed by data that have been strategically and systematically gathered. Within higher education and student affairs, there may be less evidence of the actual practice of…

  14. Data-Driven Model Order Reduction for Bayesian Inverse Problems

    KAUST Repository

    Cui, Tiangang

    2014-01-06

    One of the major challenges in using MCMC for the solution of inverse problems is the repeated evaluation of computationally expensive numerical models. We develop a data-driven projection- based model order reduction technique to reduce the computational cost of numerical PDE evaluations in this context.

  15. Data mining, knowledge discovery and data-driven modelling

    NARCIS (Netherlands)

    Solomatine, D.P.; Velickov, S.; Bhattacharya, B.; Van der Wal, B.

    2003-01-01

    The project was aimed at exploring the possibilities of a new paradigm in modelling - data-driven modelling, often referred as "data mining". Several application areas were considered: sedimentation problems in the Port of Rotterdam, automatic soil classification on the basis of cone penetration

  16. Scalable data-driven short-term traffic prediction

    NARCIS (Netherlands)

    Friso, K.; Wismans, L. J.J.; Tijink, M. B.

    2017-01-01

    Short-term traffic prediction has a lot of potential for traffic management. However, most research has traditionally focused on either traffic models-which do not scale very well to large networks, computationally-or on data-driven methods for freeways, leaving out urban arterials completely. Urban

  17. Data-driven analysis of blood glucose management effectiveness

    NARCIS (Netherlands)

    Nannings, B.; Abu-Hanna, A.; Bosman, R. J.

    2005-01-01

    The blood-glucose-level (BGL) of Intensive Care (IC) patients requires close monitoring and control. In this paper we describe a general data-driven analytical method for studying the effectiveness of BGL management. The method is based on developing and studying a clinical outcome reflecting the

  18. Data-Driven Learning of Q-Matrix

    Science.gov (United States)

    Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2012-01-01

    The recent surge of interests in cognitive assessment has led to developments of novel statistical models for diagnostic classification. Central to many such models is the well-known "Q"-matrix, which specifies the item-attribute relationships. This article proposes a data-driven approach to identification of the "Q"-matrix and estimation of…

  19. Knowledge-Driven Versus Data-Driven Logics

    Czech Academy of Sciences Publication Activity Database

    Dubois, D.; Hájek, Petr; Prade, H.

    2000-01-01

    Roč. 9, č. 1 (2000), s. 65-89 ISSN 0925-8531 R&D Projects: GA AV ČR IAA1030601 Grant - others:CNRS(FR) 4008 Institutional research plan: AV0Z1030915 Keywords : epistemic logic * possibility theory * data-driven reasoning * deontic logic Subject RIV: BA - General Mathematics

  20. Developing Annotation Solutions for Online Data Driven Learning

    Science.gov (United States)

    Perez-Paredes, Pascual; Alcaraz-Calero, Jose M.

    2009-01-01

    Although "annotation" is a widely-researched topic in Corpus Linguistics (CL), its potential role in Data Driven Learning (DDL) has not been addressed in depth by Foreign Language Teaching (FLT) practitioners. Furthermore, most of the research in the use of DDL methods pays little attention to annotation in the design and implementation…

  1. Data-driven modelling of LTI systems using symbolic regression

    NARCIS (Netherlands)

    Khandelwal, D.; Toth, R.; Van den Hof, P.M.J.

    2017-01-01

    The aim of this project is to automate the task of data-driven identification of dynamical systems. The underlying goal is to develop an identification tool that models a physical system without distinguishing between classes of systems such as linear, nonlinear or possibly even hybrid systems. Such

  2. A Data-Driven Control Design Approach for Freeway Traffic Ramp Metering with Virtual Reference Feedback Tuning

    Directory of Open Access Journals (Sweden)

    Shangtai Jin

    2014-01-01

    Full Text Available ALINEA is a simple, efficient, and easily implemented ramp metering strategy. Virtual reference feedback tuning (VRFT is most suitable for many practical systems since it is a “one-shot” data-driven control design methodology. This paper presents an application of VRFT to a ramp metering problem of freeway traffic system. When there is not enough prior knowledge of the controlled system to select a proper parameter of ALINEA, the VRFT approach is used to optimize the ALINEA's parameter by only using a batch of input and output data collected from the freeway traffic system. The extensive simulations are built on both the macroscopic MATLAB platform and the microscopic PARAMICS platform to show the effectiveness and applicability of the proposed data-driven controller tuning approach.

  3. Data-Driven Controller Design The H2 Approach

    CERN Document Server

    Sanfelice Bazanella, Alexandre; Eckhard, Diego

    2012-01-01

    Data-driven methodologies have recently emerged as an important paradigm alternative to model-based controller design and several such methodologies are formulated as an H2 performance optimization. This book presents a comprehensive theoretical treatment of the H2 approach to data-driven control design. The fundamental properties implied by the H2 problem formulation are analyzed in detail, so that common features to all solutions are identified. Direct methods (VRFT) and iterative methods (IFT, DFT, CbT) are put under a common theoretical framework. The choice of the reference model, the experimental conditions, the optimization method to be used, and several other designer’s choices are crucial to the quality of the final outcome, and firm guidelines for all these choices are derived from the theoretical analysis presented. The practical application of the concepts in the book is illustrated with a large number of practical designs performed for different classes of processes: thermal, fluid processing a...

  4. Data-driven importance distributions for articulated tracking

    DEFF Research Database (Denmark)

    Hauberg, Søren; Pedersen, Kim Steenstrup

    2011-01-01

    We present two data-driven importance distributions for particle filterbased articulated tracking; one based on background subtraction, another on depth information. In order to keep the algorithms efficient, we represent human poses in terms of spatial joint positions. To ensure constant bone le...... filter, where they improve both accuracy and efficiency of the tracker. In fact, they triple the effective number of samples compared to the most commonly used importance distribution at little extra computational cost....

  5. A data-driven framework for investigating customer retention

    OpenAIRE

    Mgbemena, Chidozie Simon

    2016-01-01

    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London. This study presents a data-driven simulation framework in order to understand customer behaviour and therefore improve customer retention. The overarching system design methodology used for this study is aligned with the design science paradigm. The Social Media Domain Analysis (SoMeDoA) approach is adopted and evaluated to build a model on the determinants of customer satisfaction ...

  6. Retrospective data-driven respiratory gating for PET/CT

    International Nuclear Information System (INIS)

    Schleyer, Paul J; O'Doherty, Michael J; Barrington, Sally F; Marsden, Paul K

    2009-01-01

    Respiratory motion can adversely affect both PET and CT acquisitions. Respiratory gating allows an acquisition to be divided into a series of motion-reduced bins according to the respiratory signal, which is typically hardware acquired. In order that the effects of motion can potentially be corrected for, we have developed a novel, automatic, data-driven gating method which retrospectively derives the respiratory signal from the acquired PET and CT data. PET data are acquired in listmode and analysed in sinogram space, and CT data are acquired in cine mode and analysed in image space. Spectral analysis is used to identify regions within the CT and PET data which are subject to respiratory motion, and the variation of counts within these regions is used to estimate the respiratory signal. Amplitude binning is then used to create motion-reduced PET and CT frames. The method was demonstrated with four patient datasets acquired on a 4-slice PET/CT system. To assess the accuracy of the data-derived respiratory signal, a hardware-based signal was acquired for comparison. Data-driven gating was successfully performed on PET and CT datasets for all four patients. Gated images demonstrated respiratory motion throughout the bin sequences for all PET and CT series, and image analysis and direct comparison of the traces derived from the data-driven method with the hardware-acquired traces indicated accurate recovery of the respiratory signal.

  7. Authoring Data-Driven Videos with DataClips.

    Science.gov (United States)

    Amini, Fereshteh; Riche, Nathalie Henry; Lee, Bongshin; Monroy-Hernandez, Andres; Irani, Pourang

    2017-01-01

    Data videos, or short data-driven motion graphics, are an increasingly popular medium for storytelling. However, creating data videos is difficult as it involves pulling together a unique combination of skills. We introduce DataClips, an authoring tool aimed at lowering the barriers to crafting data videos. DataClips allows non-experts to assemble data-driven "clips" together to form longer sequences. We constructed the library of data clips by analyzing the composition of over 70 data videos produced by reputable sources such as The New York Times and The Guardian. We demonstrate that DataClips can reproduce over 90% of our data videos corpus. We also report on a qualitative study comparing the authoring process and outcome achieved by (1) non-experts using DataClips, and (2) experts using Adobe Illustrator and After Effects to create data-driven clips. Results indicated that non-experts are able to learn and use DataClips with a short training period. In the span of one hour, they were able to produce more videos than experts using a professional editing tool, and their clips were rated similarly by an independent audience.

  8. Data-Driven H∞ Control for Nonlinear Distributed Parameter Systems.

    Science.gov (United States)

    Luo, Biao; Huang, Tingwen; Wu, Huai-Ning; Yang, Xiong

    2015-11-01

    The data-driven H∞ control problem of nonlinear distributed parameter systems is considered in this paper. An off-policy learning method is developed to learn the H∞ control policy from real system data rather than the mathematical model. First, Karhunen-Loève decomposition is used to compute the empirical eigenfunctions, which are then employed to derive a reduced-order model (ROM) of slow subsystem based on the singular perturbation theory. The H∞ control problem is reformulated based on the ROM, which can be transformed to solve the Hamilton-Jacobi-Isaacs (HJI) equation, theoretically. To learn the solution of the HJI equation from real system data, a data-driven off-policy learning approach is proposed based on the simultaneous policy update algorithm and its convergence is proved. For implementation purpose, a neural network (NN)- based action-critic structure is developed, where a critic NN and two action NNs are employed to approximate the value function, control, and disturbance policies, respectively. Subsequently, a least-square NN weight-tuning rule is derived with the method of weighted residuals. Finally, the developed data-driven off-policy learning approach is applied to a nonlinear diffusion-reaction process, and the obtained results demonstrate its effectiveness.

  9. General Purpose Data-Driven Monitoring for Space Operations

    Science.gov (United States)

    Iverson, David L.; Martin, Rodney A.; Schwabacher, Mark A.; Spirkovska, Liljana; Taylor, William McCaa; Castle, Joseph P.; Mackey, Ryan M.

    2009-01-01

    As modern space propulsion and exploration systems improve in capability and efficiency, their designs are becoming increasingly sophisticated and complex. Determining the health state of these systems, using traditional parameter limit checking, model-based, or rule-based methods, is becoming more difficult as the number of sensors and component interactions grow. Data-driven monitoring techniques have been developed to address these issues by analyzing system operations data to automatically characterize normal system behavior. System health can be monitored by comparing real-time operating data with these nominal characterizations, providing detection of anomalous data signatures indicative of system faults or failures. The Inductive Monitoring System (IMS) is a data-driven system health monitoring software tool that has been successfully applied to several aerospace applications. IMS uses a data mining technique called clustering to analyze archived system data and characterize normal interactions between parameters. The scope of IMS based data-driven monitoring applications continues to expand with current development activities. Successful IMS deployment in the International Space Station (ISS) flight control room to monitor ISS attitude control systems has led to applications in other ISS flight control disciplines, such as thermal control. It has also generated interest in data-driven monitoring capability for Constellation, NASA's program to replace the Space Shuttle with new launch vehicles and spacecraft capable of returning astronauts to the moon, and then on to Mars. Several projects are currently underway to evaluate and mature the IMS technology and complementary tools for use in the Constellation program. These include an experiment on board the Air Force TacSat-3 satellite, and ground systems monitoring for NASA's Ares I-X and Ares I launch vehicles. The TacSat-3 Vehicle System Management (TVSM) project is a software experiment to integrate fault

  10. Data-driven algorithm to estimate friction in automobile engine

    DEFF Research Database (Denmark)

    Stotsky, Alexander A.

    2010-01-01

    Algorithms based on the oscillations of the engine angular rotational speed under fuel cutoff and no-load were proposed for estimation of the engine friction torque. The recursive algorithm to restore the periodic signal is used to calculate the amplitude of the engine speed signal at fuel cutoff....... The values of the friction torque in the corresponding table entries are updated at acquiring new measurements of the friction moment. A new, data-driven algorithm for table adaptation on the basis of stepwise regression was developed and verified using the six-cylinder Volvo engine....

  11. Data driven information system for supervision of judicial open

    Directory of Open Access Journals (Sweden)

    Ming LI

    2016-08-01

    Full Text Available Aiming at the four outstanding problems of informationized supervision for judicial publicity, the judicial public data is classified based on data driven to form the finally valuable data. Then, the functional structure, technical structure and business structure of the data processing system are put forward, including data collection module, data reduction module, data analysis module, data application module and data security module, etc. The development of the data processing system based on these structures can effectively reduce work intensity of judicial open iformation management, summarize the work state, find the problems, and promote the level of judicial publicity.

  12. Product design pattern based on big data-driven scenario

    OpenAIRE

    Conggang Yu; Lusha Zhu

    2016-01-01

    This article discusses about new product design patterns in the big data era, gives designer a new rational thinking way, and is a new way to understand the design of the product. Based on the key criteria of the product design process, category, element, and product are used to input the data, which comprises concrete data and abstract data as an enlargement of the criteria of product design process for the establishment of a big data-driven product design pattern’s model. Moreover, an exper...

  13. Controller synthesis for negative imaginary systems: a data driven approach

    KAUST Repository

    Mabrok, Mohamed

    2016-02-17

    The negative imaginary (NI) property occurs in many important applications. For instance, flexible structure systems with collocated force actuators and position sensors can be modelled as negative imaginary systems. In this study, a data-driven controller synthesis methodology for NI systems is presented. In this approach, measured frequency response data of the plant is used to construct the controller frequency response at every frequency by minimising a cost function. Then, this controller response is used to identify the controller transfer function using system identification methods. © The Institution of Engineering and Technology 2016.

  14. Data-Driven Model Reduction and Transfer Operator Approximation

    Science.gov (United States)

    Klus, Stefan; Nüske, Feliks; Koltai, Péter; Wu, Hao; Kevrekidis, Ioannis; Schütte, Christof; Noé, Frank

    2018-06-01

    In this review paper, we will present different data-driven dimension reduction techniques for dynamical systems that are based on transfer operator theory as well as methods to approximate transfer operators and their eigenvalues, eigenfunctions, and eigenmodes. The goal is to point out similarities and differences between methods developed independently by the dynamical systems, fluid dynamics, and molecular dynamics communities such as time-lagged independent component analysis, dynamic mode decomposition, and their respective generalizations. As a result, extensions and best practices developed for one particular method can be carried over to other related methods.

  15. Data-Driven Healthcare: Challenges and Opportunities for Interactive Visualization.

    Science.gov (United States)

    Gotz, David; Borland, David

    2016-01-01

    The healthcare industry's widespread digitization efforts are reshaping one of the largest sectors of the world's economy. This transformation is enabling systems that promise to use ever-improving data-driven evidence to help doctors make more precise diagnoses, institutions identify at risk patients for intervention, clinicians develop more personalized treatment plans, and researchers better understand medical outcomes within complex patient populations. Given the scale and complexity of the data required to achieve these goals, advanced data visualization tools have the potential to play a critical role. This article reviews a number of visualization challenges unique to the healthcare discipline.

  16. Data Driven Broiler Weight Forecasting using Dynamic Neural Network Models

    DEFF Research Database (Denmark)

    Johansen, Simon Vestergaard; Bendtsen, Jan Dimon; Riisgaard-Jensen, Martin

    2017-01-01

    In this article, the dynamic influence of environmental broiler house conditions and broiler growth is investigated. Dynamic neural network forecasting models have been trained on farm-scale broiler batch production data from 12 batches from the same house. The model forecasts future broiler weight...... and uses environmental conditions such as heating, ventilation, and temperature along with broiler behavior such as feed and water consumption. Training data and forecasting data is analyzed to explain when the model might fail at generalizing. We present ensemble broiler weight forecasts to day 7, 14, 21...

  17. Data-driven execution of fast multipole methods

    KAUST Repository

    Ltaief, Hatem

    2013-09-17

    Fast multipole methods (FMMs) have O (N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation supercomputers. Their most common application is to accelerate N-body problems, but they can also be used to solve boundary integral equations. When the particle distribution is irregular and the tree structure is adaptive, load balancing becomes a non-trivial question. A common strategy for load balancing FMMs is to use the work load from the previous step as weights to statically repartition the next step. The authors discuss in the paper another approach based on data-driven execution to efficiently tackle this challenging load balancing problem. The core idea consists of breaking the most time-consuming stages of the FMMs into smaller tasks. The algorithm can then be represented as a directed acyclic graph where nodes represent tasks and edges represent dependencies among them. The execution of the algorithm is performed by asynchronously scheduling the tasks using the queueing and runtime for kernels runtime environment, in a way such that data dependencies are not violated for numerical correctness purposes. This asynchronous scheduling results in an out-of-order execution. The performance results of the data-driven FMM execution outperform the previous strategy and show linear speedup on a quad-socket quad-core Intel Xeon system.Copyright © 2013 John Wiley & Sons, Ltd. Copyright © 2013 John Wiley & Sons, Ltd.

  18. A data driven nonlinear stochastic model for blood glucose dynamics.

    Science.gov (United States)

    Zhang, Yan; Holt, Tim A; Khovanova, Natalia

    2016-03-01

    The development of adequate mathematical models for blood glucose dynamics may improve early diagnosis and control of diabetes mellitus (DM). We have developed a stochastic nonlinear second order differential equation to describe the response of blood glucose concentration to food intake using continuous glucose monitoring (CGM) data. A variational Bayesian learning scheme was applied to define the number and values of the system's parameters by iterative optimisation of free energy. The model has the minimal order and number of parameters to successfully describe blood glucose dynamics in people with and without DM. The model accounts for the nonlinearity and stochasticity of the underlying glucose-insulin dynamic process. Being data-driven, it takes full advantage of available CGM data and, at the same time, reflects the intrinsic characteristics of the glucose-insulin system without detailed knowledge of the physiological mechanisms. We have shown that the dynamics of some postprandial blood glucose excursions can be described by a reduced (linear) model, previously seen in the literature. A comprehensive analysis demonstrates that deterministic system parameters belong to different ranges for diabetes and controls. Implications for clinical practice are discussed. This is the first study introducing a continuous data-driven nonlinear stochastic model capable of describing both DM and non-DM profiles. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  19. Data driven CAN node reliability assessment for manufacturing system

    Science.gov (United States)

    Zhang, Leiming; Yuan, Yong; Lei, Yong

    2017-01-01

    The reliability of the Controller Area Network(CAN) is critical to the performance and safety of the system. However, direct bus-off time assessment tools are lacking in practice due to inaccessibility of the node information and the complexity of the node interactions upon errors. In order to measure the mean time to bus-off(MTTB) of all the nodes, a novel data driven node bus-off time assessment method for CAN network is proposed by directly using network error information. First, the corresponding network error event sequence for each node is constructed using multiple-layer network error information. Then, the generalized zero inflated Poisson process(GZIP) model is established for each node based on the error event sequence. Finally, the stochastic model is constructed to predict the MTTB of the node. The accelerated case studies with different error injection rates are conducted on a laboratory network to demonstrate the proposed method, where the network errors are generated by a computer controlled error injection system. Experiment results show that the MTTB of nodes predicted by the proposed method agree well with observations in the case studies. The proposed data driven node time to bus-off assessment method for CAN networks can successfully predict the MTTB of nodes by directly using network error event data.

  20. Locative media and data-driven computing experiments

    Directory of Open Access Journals (Sweden)

    Sung-Yueh Perng

    2016-06-01

    Full Text Available Over the past two decades urban social life has undergone a rapid and pervasive geocoding, becoming mediated, augmented and anticipated by location-sensitive technologies and services that generate and utilise big, personal, locative data. The production of these data has prompted the development of exploratory data-driven computing experiments that seek to find ways to extract value and insight from them. These projects often start from the data, rather than from a question or theory, and try to imagine and identify their potential utility. In this paper, we explore the desires and mechanics of data-driven computing experiments. We demonstrate how both locative media data and computing experiments are ‘staged’ to create new values and computing techniques, which in turn are used to try and derive possible futures that are ridden with unintended consequences. We argue that using computing experiments to imagine potential urban futures produces effects that often have little to do with creating new urban practices. Instead, these experiments promote Big Data science and the prospect that data produced for one purpose can be recast for another and act as alternative mechanisms of envisioning urban futures.

  1. Dynamically adaptive data-driven simulation of extreme hydrological flows

    Science.gov (United States)

    Kumar Jain, Pushkar; Mandli, Kyle; Hoteit, Ibrahim; Knio, Omar; Dawson, Clint

    2018-02-01

    Hydrological hazards such as storm surges, tsunamis, and rainfall-induced flooding are physically complex events that are costly in loss of human life and economic productivity. Many such disasters could be mitigated through improved emergency evacuation in real-time and through the development of resilient infrastructure based on knowledge of how systems respond to extreme events. Data-driven computational modeling is a critical technology underpinning these efforts. This investigation focuses on the novel combination of methodologies in forward simulation and data assimilation. The forward geophysical model utilizes adaptive mesh refinement (AMR), a process by which a computational mesh can adapt in time and space based on the current state of a simulation. The forward solution is combined with ensemble based data assimilation methods, whereby observations from an event are assimilated into the forward simulation to improve the veracity of the solution, or used to invert for uncertain physical parameters. The novelty in our approach is the tight two-way coupling of AMR and ensemble filtering techniques. The technology is tested using actual data from the Chile tsunami event of February 27, 2010. These advances offer the promise of significantly transforming data-driven, real-time modeling of hydrological hazards, with potentially broader applications in other science domains.

  2. Product design pattern based on big data-driven scenario

    Directory of Open Access Journals (Sweden)

    Conggang Yu

    2016-07-01

    Full Text Available This article discusses about new product design patterns in the big data era, gives designer a new rational thinking way, and is a new way to understand the design of the product. Based on the key criteria of the product design process, category, element, and product are used to input the data, which comprises concrete data and abstract data as an enlargement of the criteria of product design process for the establishment of a big data-driven product design pattern’s model. Moreover, an experiment and a product design case are conducted to verify the feasibility of the new pattern. Ultimately, we will conclude that the data-driven product design has two patterns: one is the concrete data supporting the product design, namely “product–data–product” pattern, and the second is based on the value of the abstract data for product design, namely “data–product–data” pattern. Through the data, users are involving themselves in the design development process. Data and product form a huge network, and data plays a role of connection or node. So the essence of the design is to find a new connection based on element, and to find a new node based on category.

  3. Dynamically adaptive data-driven simulation of extreme hydrological flows

    KAUST Repository

    Kumar Jain, Pushkar

    2017-12-27

    Hydrological hazards such as storm surges, tsunamis, and rainfall-induced flooding are physically complex events that are costly in loss of human life and economic productivity. Many such disasters could be mitigated through improved emergency evacuation in real-time and through the development of resilient infrastructure based on knowledge of how systems respond to extreme events. Data-driven computational modeling is a critical technology underpinning these efforts. This investigation focuses on the novel combination of methodologies in forward simulation and data assimilation. The forward geophysical model utilizes adaptive mesh refinement (AMR), a process by which a computational mesh can adapt in time and space based on the current state of a simulation. The forward solution is combined with ensemble based data assimilation methods, whereby observations from an event are assimilated into the forward simulation to improve the veracity of the solution, or used to invert for uncertain physical parameters. The novelty in our approach is the tight two-way coupling of AMR and ensemble filtering techniques. The technology is tested using actual data from the Chile tsunami event of February 27, 2010. These advances offer the promise of significantly transforming data-driven, real-time modeling of hydrological hazards, with potentially broader applications in other science domains.

  4. Data-driven warehouse optimization : Deploying skills of order pickers

    NARCIS (Netherlands)

    M. Matusiak (Marek); M.B.M. de Koster (René); J. Saarinen (Jari)

    2015-01-01

    textabstractBatching orders and routing order pickers is a commonly studied problem in many picker-to-parts warehouses. The impact of individual differences in picking skills on performance has received little attention. In this paper, we show that taking into account differences in the skills of

  5. Data-driven simulation methodology using DES 4-layer architecture

    Directory of Open Access Journals (Sweden)

    Aida Saez

    2016-05-01

    Full Text Available In this study, we present a methodology to build data-driven simulation models of manufacturing plants. We go further than other research proposals and we suggest focusing simulation model development under a 4-layer architecture (network, logic, database and visual reality. The Network layer includes system infrastructure. The Logic layer covers operations planning and control system, and material handling equipment system. The Database holds all the information needed to perform the simulation, the results used to analyze and the values that the Logic layer is using to manage the Plant. Finally, the Visual Reality displays an augmented reality system including not only the machinery and the movement but also blackboards and other Andon elements. This architecture provides numerous advantages as helps to build a simulation model that consistently considers the internal logistics, in a very flexible way.

  6. Data driven approaches for diagnostics and optimization of NPP operation

    International Nuclear Information System (INIS)

    Pliska, J.; Machat, Z.

    2014-01-01

    The efficiency and heat rate is an important indicator of both the health of the power plant equipment and the quality of power plant operation. To achieve this challenges powerful tool is a statistical data processing of large data sets which are stored in data historians. These large data sets contain useful information about process quality and equipment and sensor health. The paper discusses data-driven approaches for model building of main power plant equipment such as condenser, cooling tower and the overall thermal cycle as well using multivariate regression techniques based on so called a regression triplet - data, model and method. Regression models comprise a base for diagnostics and optimization tasks. Diagnostics and optimization tasks are demonstrated on practical cases - diagnostics of main power plant equipment to early identify equipment fault, and optimization task of cooling circuit by cooling water flow control to achieve for a given boundary conditions the highest power output. (authors)

  7. submitter Data-driven RBE parameterization for helium ion beams

    CERN Document Server

    Mairani, A; Dokic, I; Valle, S M; Tessonnier, T; Galm, R; Ciocca, M; Parodi, K; Ferrari, A; Jäkel, O; Haberer, T; Pedroni, P; Böhlen, T T

    2016-01-01

    Helium ion beams are expected to be available again in the near future for clinical use. A suitable formalism to obtain relative biological effectiveness (RBE) values for treatment planning (TP) studies is needed. In this work we developed a data-driven RBE parameterization based on published in vitro experimental values. The RBE parameterization has been developed within the framework of the linear-quadratic (LQ) model as a function of the helium linear energy transfer (LET), dose and the tissue specific parameter ${{(\\alpha /\\beta )}_{\\text{ph}}}$ of the LQ model for the reference radiation. Analytic expressions are provided, derived from the collected database, describing the $\\text{RB}{{\\text{E}}_{\\alpha}}={{\\alpha}_{\\text{He}}}/{{\\alpha}_{\\text{ph}}}$ and ${{\\text{R}}_{\\beta}}={{\\beta}_{\\text{He}}}/{{\\beta}_{\\text{ph}}}$ ratios as a function of LET. Calculated RBE values at 2 Gy photon dose and at 10% survival ($\\text{RB}{{\\text{E}}_{10}}$ ) are compared with the experimental ones. Pearson's correlati...

  8. Data-driven forward model inference for EEG brain imaging

    DEFF Research Database (Denmark)

    Hansen, Sofie Therese; Hauberg, Søren; Hansen, Lars Kai

    2016-01-01

    Electroencephalography (EEG) is a flexible and accessible tool with excellent temporal resolution but with a spatial resolution hampered by volume conduction. Reconstruction of the cortical sources of measured EEG activity partly alleviates this problem and effectively turns EEG into a brain......-of-concept study, we show that, even when anatomical knowledge is unavailable, a suitable forward model can be estimated directly from the EEG. We propose a data-driven approach that provides a low-dimensional parametrization of head geometry and compartment conductivities, built using a corpus of forward models....... Combined with only a recorded EEG signal, we are able to estimate both the brain sources and a person-specific forward model by optimizing this parametrization. We thus not only solve an inverse problem, but also optimize over its specification. Our work demonstrates that personalized EEG brain imaging...

  9. Data-Driven Predictive Direct Load Control of Refrigeration Systems

    DEFF Research Database (Denmark)

    Shafiei, Seyed Ehsan; Knudsen, Torben; Wisniewski, Rafal

    2015-01-01

    A predictive control using subspace identification is applied for the smart grid integration of refrigeration systems under a direct load control scheme. A realistic demand response scenario based on regulation of the electrical power consumption is considered. A receding horizon optimal control...... is proposed to fulfil two important objectives: to secure high coefficient of performance and to participate in power consumption management. Moreover, a new method for design of input signals for system identification is put forward. The control method is fully data driven without an explicit use of model...... against real data. The performance improvement results in a 22% reduction in the energy consumption. A comparative simulation is accomplished showing the superiority of the method over the existing approaches in terms of the load following performance....

  10. Data-Driven Assistance Functions for Industrial Automation Systems

    International Nuclear Information System (INIS)

    Windmann, Stefan; Niggemann, Oliver

    2015-01-01

    The increasing amount of data in industrial automation systems overburdens the user in process control and diagnosis tasks. One possibility to cope with these challenges consists of using smart assistance systems that automatically monitor and optimize processes. This article deals with aspects of data-driven assistance systems such as assistance functions, process models and data acquisition. The paper describes novel approaches for self-diagnosis and self-optimization, and shows how these assistance functions can be integrated in different industrial environments. The considered assistance functions are based on process models that are automatically learned from process data. Fault detection and isolation is based on the comparison of observations of the real system with predictions obtained by application of the process models. The process models are further employed for energy efficiency optimization of industrial processes. Experimental results are presented for fault detection and energy efficiency optimization of a drive system. (paper)

  11. Data-driven discovery of Koopman eigenfunctions using deep learning

    Science.gov (United States)

    Lusch, Bethany; Brunton, Steven L.; Kutz, J. Nathan

    2017-11-01

    Koopman operator theory transforms any autonomous non-linear dynamical system into an infinite-dimensional linear system. Since linear systems are well-understood, a mapping of non-linear dynamics to linear dynamics provides a powerful approach to understanding and controlling fluid flows. However, finding the correct change of variables remains an open challenge. We present a strategy to discover an approximate mapping using deep learning. Our neural networks find this change of variables, its inverse, and a finite-dimensional linear dynamical system defined on the new variables. Our method is completely data-driven and only requires measurements of the system, i.e. it does not require derivatives or knowledge of the governing equations. We find a minimal set of approximate Koopman eigenfunctions that are sufficient to reconstruct and advance the system to future states. We demonstrate the method on several dynamical systems.

  12. Data-driven identification of potential Zika virus vectors

    Science.gov (United States)

    Evans, Michelle V; Dallas, Tad A; Han, Barbara A; Murdock, Courtney C; Drake, John M

    2017-01-01

    Zika is an emerging virus whose rapid spread is of great public health concern. Knowledge about transmission remains incomplete, especially concerning potential transmission in geographic areas in which it has not yet been introduced. To identify unknown vectors of Zika, we developed a data-driven model linking vector species and the Zika virus via vector-virus trait combinations that confer a propensity toward associations in an ecological network connecting flaviviruses and their mosquito vectors. Our model predicts that thirty-five species may be able to transmit the virus, seven of which are found in the continental United States, including Culex quinquefasciatus and Cx. pipiens. We suggest that empirical studies prioritize these species to confirm predictions of vector competence, enabling the correct identification of populations at risk for transmission within the United States. DOI: http://dx.doi.org/10.7554/eLife.22053.001 PMID:28244371

  13. Data-driven sensor placement from coherent fluid structures

    Science.gov (United States)

    Manohar, Krithika; Kaiser, Eurika; Brunton, Bingni W.; Kutz, J. Nathan; Brunton, Steven L.

    2017-11-01

    Optimal sensor placement is a central challenge in the prediction, estimation and control of fluid flows. We reinterpret sensor placement as optimizing discrete samples of coherent fluid structures for full state reconstruction. This permits a drastic reduction in the number of sensors required for faithful reconstruction, since complex fluid interactions can often be described by a small number of coherent structures. Our work optimizes point sensors using the pivoted matrix QR factorization to sample coherent structures directly computed from flow data. We apply this sampling technique in conjunction with various data-driven modal identification methods, including the proper orthogonal decomposition (POD) and dynamic mode decomposition (DMD). In contrast to POD-based sensors, DMD demonstrably enables the optimization of sensors for prediction in systems exhibiting multiple scales of dynamics. Finally, reconstruction accuracy from pivot sensors is shown to be competitive with sensors obtained using traditional computationally prohibitive optimization methods.

  14. Objective, Quantitative, Data-Driven Assessment of Chemical Probes.

    Science.gov (United States)

    Antolin, Albert A; Tym, Joseph E; Komianou, Angeliki; Collins, Ian; Workman, Paul; Al-Lazikani, Bissan

    2018-02-15

    Chemical probes are essential tools for understanding biological systems and for target validation, yet selecting probes for biomedical research is rarely based on objective assessment of all potential compounds. Here, we describe the Probe Miner: Chemical Probes Objective Assessment resource, capitalizing on the plethora of public medicinal chemistry data to empower quantitative, objective, data-driven evaluation of chemical probes. We assess >1.8 million compounds for their suitability as chemical tools against 2,220 human targets and dissect the biases and limitations encountered. Probe Miner represents a valuable resource to aid the identification of potential chemical probes, particularly when used alongside expert curation. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Data-driven system to predict academic grades and dropout

    Science.gov (United States)

    Rovira, Sergi; Puertas, Eloi

    2017-01-01

    Nowadays, the role of a tutor is more important than ever to prevent students dropout and improve their academic performance. This work proposes a data-driven system to extract relevant information hidden in the student academic data and, thus, help tutors to offer their pupils a more proactive personal guidance. In particular, our system, based on machine learning techniques, makes predictions of dropout intention and courses grades of students, as well as personalized course recommendations. Moreover, we present different visualizations which help in the interpretation of the results. In the experimental validation, we show that the system obtains promising results with data from the degree studies in Law, Computer Science and Mathematics of the Universitat de Barcelona. PMID:28196078

  16. Using Shape Memory Alloys: A Dynamic Data Driven Approach

    KAUST Repository

    Douglas, Craig C.

    2013-06-01

    Shape Memory Alloys (SMAs) are capable of changing their crystallographic structure due to changes of either stress or temperature. SMAs are used in a number of aerospace devices and are required in some devices in exotic environments. We are developing dynamic data driven application system (DDDAS) tools to monitor and change SMAs in real time for delivering payloads by aerospace vehicles. We must be able to turn on and off the sensors and heating units, change the stress on the SMA, monitor on-line data streams, change scales based on incoming data, and control what type of data is generated. The application must have the capability to be run and steered remotely as an unmanned feedback control loop.

  17. Facilitating Data Driven Business Model Innovation - A Case study

    DEFF Research Database (Denmark)

    Bjerrum, Torben Cæsar Bisgaard; Andersen, Troels Christian; Aagaard, Annabeth

    2016-01-01

    . The businesses interdisciplinary capabilities come into play in the BMI process, where knowledge from the facilitation strategy and knowledge from phases of the BMI process needs to be present to create new knowledge, hence new BMs and innovations. Depending on the environment and shareholders, this also exposes......This paper aims to understand the barriers that businesses meet in understanding their current business models (BM) and in their attempt at innovating new data driven business models (DDBM) using data. The interdisciplinary challenge of knowledge exchange occurring outside and/or inside businesses......, that gathers knowledge is of great importance. The SMEs have little, if no experience, within data handling, data analytics, and working with structured Business Model Innovation (BMI), that relates to both new and conventional products, processes and services. This new frontier of data and BMI will have...

  18. Econophysics and Data Driven Modelling of Market Dynamics

    CERN Document Server

    Aoyama, Hideaki; Chakrabarti, Bikas; Chakraborti, Anirban; Ghosh, Asim; Econophysics and Data Driven Modelling of Market Dynamics

    2015-01-01

    This book presents the works and research findings of physicists, economists, mathematicians, statisticians, and financial engineers who have undertaken data-driven modelling of market dynamics and other empirical studies in the field of Econophysics. During recent decades, the financial market landscape has changed dramatically with the deregulation of markets and the growing complexity of products. The ever-increasing speed and decreasing costs of computational power and networks have led to the emergence of huge databases. The availability of these data should permit the development of models that are better founded empirically, and econophysicists have accordingly been advocating that one should rely primarily on the empirical observations in order to construct models and validate them. The recent turmoil in financial markets and the 2008 crash appear to offer a strong rationale for new models and approaches. The Econophysics community accordingly has an important future role to play in market modelling....

  19. A Transition Towards a Data-Driven Business Model (DDBM)

    DEFF Research Database (Denmark)

    Zaki, Mohamed; Bøe-Lillegraven, Tor; Neely, Andy

    2016-01-01

    Nettavisen is a Norwegian online start-up that experienced a boost after the financial crisis of 2009. Since then, the firm has been able to increase its market share and profitability through the use of highly disruptive business models, allowing the relatively small staff to outcompete powerhouse...... legacy-publishing companies and new media players such as Facebook and Google. These disruptive business models have been successful, as Nettavisen captured a large market share in Norway early on, and was consistently one of the top-three online news sites in Norway. Capitalising on media data explosion...... and the recent acquisition of blogger network ‘Blog.no’, Nettavisen is moving towards a data-driven business model (DDBM). In particular, the firm aims to analyse huge volumes of user Web browsing and purchasing habits....

  20. Helioseismic and neutrino data-driven reconstruction of solar properties

    Science.gov (United States)

    Song, Ningqiang; Gonzalez-Garcia, M. C.; Villante, Francesco L.; Vinyoles, Nuria; Serenelli, Aldo

    2018-06-01

    In this work, we use Bayesian inference to quantitatively reconstruct the solar properties most relevant to the solar composition problem using as inputs the information provided by helioseismic and solar neutrino data. In particular, we use a Gaussian process to model the functional shape of the opacity uncertainty to gain flexibility and become as free as possible from prejudice in this regard. With these tools we first readdress the statistical significance of the solar composition problem. Furthermore, starting from a composition unbiased set of standard solar models (SSMs) we are able to statistically select those with solar chemical composition and other solar inputs which better describe the helioseismic and neutrino observations. In particular, we are able to reconstruct the solar opacity profile in a data-driven fashion, independently of any reference opacity tables, obtaining a 4 per cent uncertainty at the base of the convective envelope and 0.8 per cent at the solar core. When systematic uncertainties are included, results are 7.5 per cent and 2 per cent, respectively. In addition, we find that the values of most of the other inputs of the SSMs required to better describe the helioseismic and neutrino data are in good agreement with those adopted as the standard priors, with the exception of the astrophysical factor S11 and the microscopic diffusion rates, for which data suggests a 1 per cent and 30 per cent reduction, respectively. As an output of the study we derive the corresponding data-driven predictions for the solar neutrino fluxes.

  1. Pro Spring Batch

    CERN Document Server

    Minella, Michael T

    2011-01-01

    Since its release, Spring Framework has transformed virtually every aspect of Java development including web applications, security, aspect-oriented programming, persistence, and messaging. Spring Batch, one of its newer additions, now brings the same familiar Spring idioms to batch processing. Spring Batch addresses the needs of any batch process, from the complex calculations performed in the biggest financial institutions to simple data migrations that occur with many software development projects. Pro Spring Batch is intended to answer three questions: *What? What is batch processing? What

  2. Data-driven non-Markovian closure models

    Science.gov (United States)

    Kondrashov, Dmitri; Chekroun, Mickaël D.; Ghil, Michael

    2015-03-01

    This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori-Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the MZ formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of MSM applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lotka-Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model's parameter

  3. A Data-Driven Approach to Realistic Shape Morphing

    KAUST Repository

    Gao, Lin; Lai, Yu-Kun; Huang, Qi-Xing; Hu, Shi-Min

    2013-01-01

    Morphing between 3D objects is a fundamental technique in computer graphics. Traditional methods of shape morphing focus on establishing meaningful correspondences and finding smooth interpolation between shapes. Such methods however only take geometric information as input and thus cannot in general avoid producing unnatural interpolation, in particular for large-scale deformations. This paper proposes a novel data-driven approach for shape morphing. Given a database with various models belonging to the same category, we treat them as data samples in the plausible deformation space. These models are then clustered to form local shape spaces of plausible deformations. We use a simple metric to reasonably represent the closeness between pairs of models. Given source and target models, the morphing problem is casted as a global optimization problem of finding a minimal distance path within the local shape spaces connecting these models. Under the guidance of intermediate models in the path, an extended as-rigid-as-possible interpolation is used to produce the final morphing. By exploiting the knowledge of plausible models, our approach produces realistic morphing for challenging cases as demonstrated by various examples in the paper. © 2013 The Eurographics Association and Blackwell Publishing Ltd.

  4. Data driven parallelism in experimental high energy physics applications

    International Nuclear Information System (INIS)

    Pohl, M.

    1987-01-01

    I present global design principles for the implementation of high energy physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of high energy physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordiate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms). (orig.)

  5. Selection of the Sample for Data-Driven $Z \\to \

    CERN Document Server

    Krauss, Martin

    2009-01-01

    The topic of this study was to improve the selection of the sample for data-driven Z → ν ν background estimation, which is a major contribution in supersymmetric searches in ̄ a no-lepton search mode. The data is based on Z → + − samples using data created with ATLAS simulation software. This method works if two leptons are reconstructed, but using cuts that are typical for SUSY searches reconstruction efficiency for electrons and muons is rather low. For this reason it was tried to enhance the data sample. Therefore events were considered, where only one electron was reconstructed. In this case the invariant mass for the electron and each jet was computed to select the jet with the best match for the Z boson mass as not reconstructed electron. This way the sample can be extended but significantly looses purity because of also reconstructed background events. To improve this method other variables have to be considered which were not available for this study. Applying a similar method to muons using ...

  6. A data-driven approach to quality risk management.

    Science.gov (United States)

    Alemayehu, Demissie; Alvir, Jose; Levenstein, Marcia; Nickerson, David

    2013-10-01

    An effective clinical trial strategy to ensure patient safety as well as trial quality and efficiency involves an integrated approach, including prospective identification of risk factors, mitigation of the risks through proper study design and execution, and assessment of quality metrics in real-time. Such an integrated quality management plan may also be enhanced by using data-driven techniques to identify risk factors that are most relevant in predicting quality issues associated with a trial. In this paper, we illustrate such an approach using data collected from actual clinical trials. Several statistical methods were employed, including the Wilcoxon rank-sum test and logistic regression, to identify the presence of association between risk factors and the occurrence of quality issues, applied to data on quality of clinical trials sponsored by Pfizer. ONLY A SUBSET OF THE RISK FACTORS HAD A SIGNIFICANT ASSOCIATION WITH QUALITY ISSUES, AND INCLUDED: Whether study used Placebo, whether an agent was a biologic, unusual packaging label, complex dosing, and over 25 planned procedures. Proper implementation of the strategy can help to optimize resource utilization without compromising trial integrity and patient safety.

  7. A data-driven approach to quality risk management

    Directory of Open Access Journals (Sweden)

    Demissie Alemayehu

    2013-01-01

    Full Text Available Aim: An effective clinical trial strategy to ensure patient safety as well as trial quality and efficiency involves an integrated approach, including prospective identification of risk factors, mitigation of the risks through proper study design and execution, and assessment of quality metrics in real-time. Such an integrated quality management plan may also be enhanced by using data-driven techniques to identify risk factors that are most relevant in predicting quality issues associated with a trial. In this paper, we illustrate such an approach using data collected from actual clinical trials. Materials and Methods: Several statistical methods were employed, including the Wilcoxon rank-sum test and logistic regression, to identify the presence of association between risk factors and the occurrence of quality issues, applied to data on quality of clinical trials sponsored by Pfizer. Results: Only a subset of the risk factors had a significant association with quality issues, and included: Whether study used Placebo, whether an agent was a biologic, unusual packaging label, complex dosing, and over 25 planned procedures. Conclusion: Proper implementation of the strategy can help to optimize resource utilization without compromising trial integrity and patient safety.

  8. ATLAS job transforms: a data driven workflow engine

    International Nuclear Information System (INIS)

    Stewart, G A; Breaden-Madden, W B; Maddocks, H J; Harenberg, T; Sandhoff, M; Sarrazin, B

    2014-01-01

    The need to run complex workflows for a high energy physics experiment such as ATLAS has always been present. However, as computing resources have become even more constrained, compared to the wealth of data generated by the LHC, the need to use resources efficiently and manage complex workflows within a single grid job have increased. In ATLAS, a new Job Transform framework has been developed that we describe in this paper. This framework manages the multiple execution steps needed to 'transform' one data type into another (e.g., RAW data to ESD to AOD to final ntuple) and also provides a consistent interface for the ATLAS production system. The new framework uses a data driven workflow definition which is both easy to manage and powerful. After a transform is defined, jobs are expressed simply by specifying the input data and the desired output data. The transform infrastructure then executes only the necessary substeps to produce the final data products. The global execution cost of running the job is minimised and the transform can adapt to scenarios where data can be produced along different execution paths. Transforms for specific physics tasks which support up to 60 individual substeps have been successfully run. As the new transforms infrastructure has been deployed in production many features have been added to the framework which improve reliability, quality of error reporting and also provide support for multi-process jobs.

  9. Human body segmentation via data-driven graph cut.

    Science.gov (United States)

    Li, Shifeng; Lu, Huchuan; Shao, Xingqing

    2014-11-01

    Human body segmentation is a challenging and important problem in computer vision. Existing methods usually entail a time-consuming training phase for prior knowledge learning with complex shape matching for body segmentation. In this paper, we propose a data-driven method that integrates top-down body pose information and bottom-up low-level visual cues for segmenting humans in static images within the graph cut framework. The key idea of our approach is first to exploit human kinematics to search for body part candidates via dynamic programming for high-level evidence. Then, by using the body parts classifiers, obtaining bottom-up cues of human body distribution for low-level evidence. All the evidence collected from top-down and bottom-up procedures are integrated in a graph cut framework for human body segmentation. Qualitative and quantitative experiment results demonstrate the merits of the proposed method in segmenting human bodies with arbitrary poses from cluttered backgrounds.

  10. Data-driven classification of patients with primary progressive aphasia.

    Science.gov (United States)

    Hoffman, Paul; Sajjadi, Seyed Ahmad; Patterson, Karalyn; Nestor, Peter J

    2017-11-01

    Current diagnostic criteria classify primary progressive aphasia into three variants-semantic (sv), nonfluent (nfv) and logopenic (lv) PPA-though the adequacy of this scheme is debated. This study took a data-driven approach, applying k-means clustering to data from 43 PPA patients. The algorithm grouped patients based on similarities in language, semantic and non-linguistic cognitive scores. The optimum solution consisted of three groups. One group, almost exclusively those diagnosed as svPPA, displayed a selective semantic impairment. A second cluster, with impairments to speech production, repetition and syntactic processing, contained a majority of patients with nfvPPA but also some lvPPA patients. The final group exhibited more severe deficits to speech, repetition and syntax as well as semantic and other cognitive deficits. These results suggest that, amongst cases of non-semantic PPA, differentiation mainly reflects overall degree of language/cognitive impairment. The observed patterns were scarcely affected by inclusion/exclusion of non-linguistic cognitive scores. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  11. A Data-Driven Approach to Realistic Shape Morphing

    KAUST Repository

    Gao, Lin

    2013-05-01

    Morphing between 3D objects is a fundamental technique in computer graphics. Traditional methods of shape morphing focus on establishing meaningful correspondences and finding smooth interpolation between shapes. Such methods however only take geometric information as input and thus cannot in general avoid producing unnatural interpolation, in particular for large-scale deformations. This paper proposes a novel data-driven approach for shape morphing. Given a database with various models belonging to the same category, we treat them as data samples in the plausible deformation space. These models are then clustered to form local shape spaces of plausible deformations. We use a simple metric to reasonably represent the closeness between pairs of models. Given source and target models, the morphing problem is casted as a global optimization problem of finding a minimal distance path within the local shape spaces connecting these models. Under the guidance of intermediate models in the path, an extended as-rigid-as-possible interpolation is used to produce the final morphing. By exploiting the knowledge of plausible models, our approach produces realistic morphing for challenging cases as demonstrated by various examples in the paper. © 2013 The Eurographics Association and Blackwell Publishing Ltd.

  12. Data driven parallelism in experimental high energy physics applications

    Science.gov (United States)

    Pohl, Martin

    1987-08-01

    I present global design principles for the implementation of High Energy Physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of High Energy Physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The Task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordinate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms).

  13. Data driven profiting from your most important business asset

    CERN Document Server

    Redman, Thomas C

    2008-01-01

    Your company's data has the potential to add enormous value to every facet of the organization -- from marketing and new product development to strategy to financial management. Yet if your company is like most, it's not using its data to create strategic advantage. Data sits around unused -- or incorrect data fouls up operations and decision making. In Data Driven, Thomas Redman, the "Data Doc," shows how to leverage and deploy data to sharpen your company's competitive edge and enhance its profitability. The author reveals: · The special properties that make data such a powerful asset · The hidden costs of flawed, outdated, or otherwise poor-quality data · How to improve data quality for competitive advantage · Strategies for exploiting your data to make better business decisions · The many ways to bring data to market · Ideas for dealing with political struggles over data and concerns about privacy rights Your company's data is a key business asset, and you need to manage it aggressively and professi...

  14. Data driven processor 'Vertex Trigger' for B experiments

    International Nuclear Information System (INIS)

    Hartouni, E.P.

    1993-01-01

    Data Driven Processors (DDP's) are specialized computation engines configured to solve specific numerical problems, such as vertex reconstruction. The architecture of the DDP which is the subject of this talk was designed and implemented by W. Sippach and B.C. Knapp at Nevis Lab. in the early 1980's. This particular implementation allows multiple parallel streams of data to provide input to a heterogenous collection of simple operators whose interconnection form an algorithm. The local data flow control allows this device to execute algorithms extremely quickly provided that care is taken in the layout of the algorithm. I/O rates of several hundred megabytes/second are routinely achieved thus making DDP's attractive candidates for complex online calculations. The original question was open-quote can a DDP reconstruct tracks in a Silicon Vertex Detector, find events with a separated vertex and do it fast enough to be used as an online trigger?close-quote Restating this inquiry as three questions and describing the answers to the questions will be the subject of this talk. The three specific questions are: (1) Can an algorithm be found which reconstructs tracks in a planar geometry and no magnetic field; (2) Can separated vertices be recognized in some way; (3) Can the algorithm be implemented in the Nevis-UMass and DDP and execute in 10-20 μs?

  15. EXPLORING DATA-DRIVEN SPECTRAL MODELS FOR APOGEE M DWARFS

    Science.gov (United States)

    Lua Birky, Jessica; Hogg, David; Burgasser, Adam J.; Jessica Birky

    2018-01-01

    The Cannon (Ness et al. 2015; Casey et al. 2016) is a flexible, data-driven spectral modeling and parameter inference framework, demonstrated on high-resolution Apache Point Galactic Evolution Experiment (APOGEE; λ/Δλ~22,500, 1.5-1.7µm) spectra of giant stars to estimate stellar labels (Teff, logg, [Fe/H], and chemical abundances) to precisions higher than the model-grid pipeline. The lack of reliable stellar parameters reported by the APOGEE pipeline for temperatures less than ~3550K, motivates extension of this approach to M dwarf stars. Using a training set of 51 M dwarfs with spectral types ranging M0-M9 obtained from SDSS optical spectra, we demonstrate that the Cannon can infer spectral types to a precision of +/-0.6 types, making it an effective tool for classifying high-resolution near-infrared spectra. We discuss the potential for extending this work to determine the physical stellar labels Teff, logg, and [Fe/H].This work is supported by the SDSS Faculty and Student (FAST) initiative.

  16. Data driven fault detection and isolation: a wind turbine scenario

    Directory of Open Access Journals (Sweden)

    Rubén Francisco Manrique Piramanrique

    2015-04-01

    Full Text Available One of the greatest drawbacks in wind energy generation is the high maintenance cost associated to mechanical faults. This problem becomes more evident in utility scale wind turbines, where the increased size and nominal capacity comes with additional problems associated with structural vibrations and aeroelastic effects in the blades. Due to the increased operation capability, it is imperative to detect system degradation and faults in an efficient manner, maintaining system integrity, reliability and reducing operation costs. This paper presents a comprehensive comparison of four different Fault Detection and Isolation (FDI filters based on “Data Driven” (DD techniques. In order to enhance FDI performance, a multi-level strategy is used where:  the first level detects the occurrence of any given fault (detection, while  the second identifies the source of the fault (isolation. Four different DD classification techniques (namely Support Vector Machines, Artificial Neural Networks, K Nearest Neighbors and Gaussian Mixture Models were studied and compared for each of the proposed classification levels. The best strategy at each level could be selected to build the final data driven FDI system. The performance of the proposed scheme is evaluated on a benchmark model of a commercial wind turbine. 

  17. Data-driven approach for creating synthetic electronic medical records.

    Science.gov (United States)

    Buczak, Anna L; Babin, Steven; Moniz, Linda

    2010-10-14

    New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population. We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4

  18. Evidence-based and data-driven road safety management

    Directory of Open Access Journals (Sweden)

    Fred Wegman

    2015-07-01

    Full Text Available Over the past decades, road safety in highly-motorised countries has made significant progress. Although we have a fair understanding of the reasons for this progress, we don't have conclusive evidence for this. A new generation of road safety management approaches has entered road safety, starting when countries decided to guide themselves by setting quantitative targets (e.g. 50% less casualties in ten years' time. Setting realistic targets, designing strategies and action plans to achieve these targets and monitoring progress have resulted in more scientific research to support decision-making on these topics. Three subjects are key in this new approach of evidence-based and data-driven road safety management: ex-post and ex-ante evaluation of both individual interventions and intervention packages in road safety strategies, and transferability (external validity of the research results. In this article, we explore these subjects based on recent experiences in four jurisdictions (Western Australia, the Netherlands, Sweden and Switzerland. All four apply similar approaches and tools; differences are considered marginal. It is concluded that policy-making and political decisions were influenced to a great extent by the results of analysis and research. Nevertheless, to compensate for a relatively weak theoretical basis and to improve the power of this new approach, a number of issues will need further research. This includes ex-post and ex-ante evaluation, a better understanding of extrapolation of historical trends and the transferability of research results. This new approach cannot be realized without high-quality road safety data. Good data and knowledge are indispensable for this new and very promising approach.

  19. Data-Driven Model Uncertainty Estimation in Hydrologic Data Assimilation

    Science.gov (United States)

    Pathiraja, S.; Moradkhani, H.; Marshall, L.; Sharma, A.; Geenens, G.

    2018-02-01

    The increasing availability of earth observations necessitates mathematical methods to optimally combine such data with hydrologic models. Several algorithms exist for such purposes, under the umbrella of data assimilation (DA). However, DA methods are often applied in a suboptimal fashion for complex real-world problems, due largely to several practical implementation issues. One such issue is error characterization, which is known to be critical for a successful assimilation. Mischaracterized errors lead to suboptimal forecasts, and in the worst case, to degraded estimates even compared to the no assimilation case. Model uncertainty characterization has received little attention relative to other aspects of DA science. Traditional methods rely on subjective, ad hoc tuning factors or parametric distribution assumptions that may not always be applicable. We propose a novel data-driven approach (named SDMU) to model uncertainty characterization for DA studies where (1) the system states are partially observed and (2) minimal prior knowledge of the model error processes is available, except that the errors display state dependence. It includes an approach for estimating the uncertainty in hidden model states, with the end goal of improving predictions of observed variables. The SDMU is therefore suited to DA studies where the observed variables are of primary interest. Its efficacy is demonstrated through a synthetic case study with low-dimensional chaotic dynamics and a real hydrologic experiment for one-day-ahead streamflow forecasting. In both experiments, the proposed method leads to substantial improvements in the hidden states and observed system outputs over a standard method involving perturbation with Gaussian noise.

  20. Data-driven motion correction in brain SPECT

    International Nuclear Information System (INIS)

    Kyme, A.Z.; Hutton, B.F.; Hatton, R.L.; Skerrett, D.W.

    2002-01-01

    Patient motion can cause image artifacts in SPECT despite restraining measures. Data-driven detection and correction of motion can be achieved by comparison of acquired data with the forward-projections. By optimising the orientation of the reconstruction, parameters can be obtained for each misaligned projection and applied to update this volume using a 3D reconstruction algorithm. Digital and physical phantom validation was performed to investigate this approach. Noisy projection data simulating at least one fully 3D patient head movement during acquisition were constructed by projecting the digital Huffman brain phantom at various orientations. Motion correction was applied to the reconstructed studies. The importance of including attenuation effects in the estimation of motion and the need for implementing an iterated correction were assessed in the process. Correction success was assessed visually for artifact reduction, and quantitatively using a mean square difference (MSD) measure. Physical Huffman phantom studies with deliberate movements introduced during the acquisition were also acquired and motion corrected. Effective artifact reduction in the simulated corrupt studies was achieved by motion correction. Typically the MSD ratio between the corrected and reference studies compared to the corrupted and reference studies was > 2. Motion correction could be achieved without inclusion of attenuation effects in the motion estimation stage, providing simpler implementation and greater efficiency. Moreover the additional improvement with multiple iterations of the approach was small. Improvement was also observed in the physical phantom data, though the technique appeared limited here by an object symmetry. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  1. Architectural Strategies for Enabling Data-Driven Science at Scale

    Science.gov (United States)

    Crichton, D. J.; Law, E. S.; Doyle, R. J.; Little, M. M.

    2017-12-01

    architectural strategies, including a 2015-2016 NASA AIST Study on Big Data, for evolving scientific research towards massively distributed data-driven discovery. It will include example use cases across earth science, planetary science, and other disciplines.

  2. SIDEKICK: Genomic data driven analysis and decision-making framework

    Directory of Open Access Journals (Sweden)

    Yoon Kihoon

    2010-12-01

    Full Text Available Abstract Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to

  3. Data-driven approach for creating synthetic electronic medical records

    Directory of Open Access Journals (Sweden)

    Moniz Linda

    2010-10-01

    Full Text Available Abstract Background New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed. Methods This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia and for background records. The method developed has three major steps: 1 synthetic patient identity and basic information generation; 2 identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3 adaptation of these care patterns to the synthetic patient population. Results We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified. Conclusions A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders. The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious

  4. Data-driven modeling of nano-nose gas sensor arrays

    DEFF Research Database (Denmark)

    Alstrøm, Tommy Sonne; Larsen, Jan; Nielsen, Claus Højgård

    2010-01-01

    We present a data-driven approach to classification of Quartz Crystal Microbalance (QCM) sensor data. The sensor is a nano-nose gas sensor that detects concentrations of analytes down to ppm levels using plasma polymorized coatings. Each sensor experiment takes approximately one hour hence...... the number of available training data is limited. We suggest a data-driven classification model which work from few examples. The paper compares a number of data-driven classification and quantification schemes able to detect the gas and the concentration level. The data-driven approaches are based on state...

  5. Spring batch essentials

    CERN Document Server

    Rao, P Raja Malleswara

    2015-01-01

    If you are a Java developer with basic knowledge of Spring and some experience in the development of enterprise applications, and want to learn about batch application development in detail, then this book is ideal for you. This book will be perfect as your next step towards building simple yet powerful batch applications on a Java-based platform.

  6. Data driven model generation based on computational intelligence

    Science.gov (United States)

    Gemmar, Peter; Gronz, Oliver; Faust, Christophe; Casper, Markus

    2010-05-01

    The simulation of discharges at a local gauge or the modeling of large scale river catchments are effectively involved in estimation and decision tasks of hydrological research and practical applications like flood prediction or water resource management. However, modeling such processes using analytical or conceptual approaches is made difficult by both complexity of process relations and heterogeneity of processes. It was shown manifold that unknown or assumed process relations can principally be described by computational methods, and that system models can automatically be derived from observed behavior or measured process data. This study describes the development of hydrological process models using computational methods including Fuzzy logic and artificial neural networks (ANN) in a comprehensive and automated manner. Methods We consider a closed concept for data driven development of hydrological models based on measured (experimental) data. The concept is centered on a Fuzzy system using rules of Takagi-Sugeno-Kang type which formulate the input-output relation in a generic structure like Ri : IFq(t) = lowAND...THENq(t+Δt) = ai0 +ai1q(t)+ai2p(t-Δti1)+ai3p(t+Δti2)+.... The rule's premise part (IF) describes process states involving available process information, e.g. actual outlet q(t) is low where low is one of several Fuzzy sets defined over variable q(t). The rule's conclusion (THEN) estimates expected outlet q(t + Δt) by a linear function over selected system variables, e.g. actual outlet q(t), previous and/or forecasted precipitation p(t ?Δtik). In case of river catchment modeling we use head gauges, tributary and upriver gauges in the conclusion part as well. In addition, we consider temperature and temporal (season) information in the premise part. By creating a set of rules R = {Ri|(i = 1,...,N)} the space of process states can be covered as concise as necessary. Model adaptation is achieved by finding on optimal set A = (aij) of conclusion

  7. SPS batch spacing optimisation

    CERN Document Server

    Velotti, F M; Carlier, E; Goddard, B; Kain, V; Kotzian, G

    2017-01-01

    Until 2015, the LHC filling schemes used the batch spac-ing as specified in the LHC design report. The maximumnumber of bunches injectable in the LHC directly dependson the batch spacing at injection in the SPS and hence onthe MKP rise time.As part of the LHC Injectors Upgrade project for LHCheavy ions, a reduction of the batch spacing is needed. In thisdirection, studies to approach the MKP design rise time of150ns(2-98%) have been carried out. These measurementsgave clear indications that such optimisation, and beyond,could be done also for higher injection momentum beams,where the additional slower MKP (MKP-L) is needed.After the successful results from 2015 SPS batch spacingoptimisation for the Pb-Pb run [1], the same concept wasthought to be used also for proton beams. In fact, thanksto the SPS transverse feed back, it was already observedthat lower batch spacing than the design one (225ns) couldbe achieved. For the 2016 p-Pb run, a batch spacing of200nsfor the proton beam with100nsbunch spacing wasreque...

  8. Data-Driven Visualization and Group Analysis of Multichannel EEG Coherence with Functional Units

    NARCIS (Netherlands)

    Caat, Michael ten; Maurits, Natasha M.; Roerdink, Jos B.T.M.

    2008-01-01

    A typical data- driven visualization of electroencephalography ( EEG) coherence is a graph layout, with vertices representing electrodes and edges representing significant coherences between electrode signals. A drawback of this layout is its visual clutter for multichannel EEG. To reduce clutter,

  9. Estimating the Probability of Wind Ramping Events: A Data-driven Approach

    OpenAIRE

    Wang, Cheng; Wei, Wei; Wang, Jianhui; Qiu, Feng

    2016-01-01

    This letter proposes a data-driven method for estimating the probability of wind ramping events without exploiting the exact probability distribution function (PDF) of wind power. Actual wind data validates the proposed method.

  10. Autonomous Soil Assessment System: A Data-Driven Approach to Planetary Mobility Hazard Detection

    Science.gov (United States)

    Raimalwala, K.; Faragalli, M.; Reid, E.

    2018-04-01

    The Autonomous Soil Assessment System predicts mobility hazards for rovers. Its development and performance are presented, with focus on its data-driven models, machine learning algorithms, and real-time sensor data fusion for predictive analytics.

  11. Designing Data-Driven Battery Prognostic Approaches for Variable Loading Profiles: Some Lessons Learned

    Data.gov (United States)

    National Aeronautics and Space Administration — Among various approaches for implementing prognostic algorithms data-driven algorithms are popular in the industry due to their intuitive nature and relatively fast...

  12. Short-term stream flow forecasting at Australian river sites using data-driven regression techniques

    CSIR Research Space (South Africa)

    Steyn, Melise

    2017-09-01

    Full Text Available This study proposes a computationally efficient solution to stream flow forecasting for river basins where historical time series data are available. Two data-driven modeling techniques are investigated, namely support vector regression...

  13. Service and Data Driven Multi Business Model Platform in a World of Persuasive Technologies

    DEFF Research Database (Denmark)

    Andersen, Troels Christian; Bjerrum, Torben Cæsar Bisgaard

    2016-01-01

    companies in establishing a service organization that delivers, creates and captures value through service and data driven business models by utilizing their network, resources and customers and/or users. Furthermore, based on literature and collaboration with the case company, the suggestion of a new...... framework provides the necessary construction of how the manufac- turing companies can evolve their current business to provide multi service and data driven business models, using the same resources, networks and customers....

  14. Data-Driven Cyber-Physical Systems via Real-Time Stream Analytics and Machine Learning

    OpenAIRE

    Akkaya, Ilge

    2016-01-01

    Emerging distributed cyber-physical systems (CPSs) integrate a wide range of heterogeneous components that need to be orchestrated in a dynamic environment. While model-based techniques are commonly used in CPS design, they be- come inadequate in capturing the complexity as systems become larger and extremely dynamic. The adaptive nature of the systems makes data-driven approaches highly desirable, if not necessary.Traditionally, data-driven systems utilize large volumes of static data sets t...

  15. Heuristics for batching and sequencing in batch processing machines

    Directory of Open Access Journals (Sweden)

    Chuda Basnet

    2016-12-01

    Full Text Available In this paper, we discuss the “batch processing” problem, where there are multiple jobs to be processed in flow shops. These jobs can however be formed into batches and the number of jobs in a batch is limited by the capacity of the processing machines to accommodate the jobs. The processing time required by a batch in a machine is determined by the greatest processing time of the jobs included in the batch. Thus, the batch processing problem is a mix of batching and sequencing – the jobs need to be grouped into distinct batches, the batches then need to be sequenced through the flow shop. We apply certain newly developed heuristics to the problem and present computational results. The contributions of this paper are deriving a lower bound, and the heuristics developed and tested in this paper.

  16. Prunus dulcis, Batch

    African Journals Online (AJOL)

    STORAGESEVER

    2010-06-07

    Jun 7, 2010 ... almond (Prunus dulcis, Batch) genotypes as revealed by PCR analysis. Yavar Sharafi1*, Jafar Hajilou1, Seyed AbolGhasem Mohammadi2, Mohammad Reza Dadpour1 and Sadollah Eskandari3. 1Department of Horticulture, Faculty of Agriculture, University of Tabriz, Tabriz, 5166614766, Iran.

  17. Data-driven design of fault diagnosis and fault-tolerant control systems

    CERN Document Server

    Ding, Steven X

    2014-01-01

    Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods, and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and...

  18. Observer and data-driven model based fault detection in Power Plant Coal Mills

    DEFF Research Database (Denmark)

    Fogh Odgaard, Peter; Lin, Bao; Jørgensen, Sten Bay

    2008-01-01

    model with motor power as the controlled variable, data-driven methods for fault detection are also investigated. Regression models that represent normal operating conditions (NOCs) are developed with both static and dynamic principal component analysis and partial least squares methods. The residual...... between process measurement and the NOC model prediction is used for fault detection. A hybrid approach, where a data-driven model is employed to derive an optimal unknown input observer, is also implemented. The three methods are evaluated with case studies on coal mill data, which includes a fault......This paper presents and compares model-based and data-driven fault detection approaches for coal mill systems. The first approach detects faults with an optimal unknown input observer developed from a simplified energy balance model. Due to the time-consuming effort in developing a first principles...

  19. Data-driven remaining useful life prognosis techniques stochastic models, methods and applications

    CERN Document Server

    Si, Xiao-Sheng; Hu, Chang-Hua

    2017-01-01

    This book introduces data-driven remaining useful life prognosis techniques, and shows how to utilize the condition monitoring data to predict the remaining useful life of stochastic degrading systems and to schedule maintenance and logistics plans. It is also the first book that describes the basic data-driven remaining useful life prognosis theory systematically and in detail. The emphasis of the book is on the stochastic models, methods and applications employed in remaining useful life prognosis. It includes a wealth of degradation monitoring experiment data, practical prognosis methods for remaining useful life in various cases, and a series of applications incorporated into prognostic information in decision-making, such as maintenance-related decisions and ordering spare parts. It also highlights the latest advances in data-driven remaining useful life prognosis techniques, especially in the contexts of adaptive prognosis for linear stochastic degrading systems, nonlinear degradation modeling based pro...

  20. Modelling of Batch Process Operations

    DEFF Research Database (Denmark)

    Abdul Samad, Noor Asma Fazli; Cameron, Ian; Gani, Rafiqul

    2011-01-01

    Here a batch cooling crystalliser is modelled and simulated as is a batch distillation system. In the batch crystalliser four operational modes of the crystalliser are considered, namely: initial cooling, nucleation, crystal growth and product removal. A model generation procedure is shown that s...

  1. Robust Data-Driven Inference for Density-Weighted Average Derivatives

    DEFF Research Database (Denmark)

    Cattaneo, Matias D.; Crump, Richard K.; Jansson, Michael

    This paper presents a new data-driven bandwidth selector compatible with the small bandwidth asymptotics developed in Cattaneo, Crump, and Jansson (2009) for density- weighted average derivatives. The new bandwidth selector is of the plug-in variety, and is obtained based on a mean squared error...

  2. Ability Grouping and Differentiated Instruction in an Era of Data-Driven Decision Making

    Science.gov (United States)

    Park, Vicki; Datnow, Amanda

    2017-01-01

    Despite data-driven decision making being a ubiquitous part of policy and school reform efforts, little is known about how teachers use data for instructional decision making. Drawing on data from a qualitative case study of four elementary schools, we examine the logic and patterns of teacher decision making about differentiation and ability…

  3. Data-driven diagnostics of terrestrial carbon dynamics over North America

    Science.gov (United States)

    Jingfeng Xiao; Scott V. Ollinger; Steve Frolking; George C. Hurtt; David Y. Hollinger; Kenneth J. Davis; Yude Pan; Xiaoyang Zhang; Feng Deng; Jiquan Chen; Dennis D. Baldocchi; Bevery E. Law; M. Altaf Arain; Ankur R. Desai; Andrew D. Richardson; Ge Sun; Brian Amiro; Hank Margolis; Lianhong Gu; Russell L. Scott; Peter D. Blanken; Andrew E. Suyker

    2014-01-01

    The exchange of carbon dioxide is a key measure of ecosystem metabolism and a critical intersection between the terrestrial biosphere and the Earth's climate. Despite the general agreement that the terrestrial ecosystems in North America provide a sizeable carbon sink, the size and distribution of the sink remain uncertain. We use a data-driven approach to upscale...

  4. Data-driven haemodynamic response function extraction using Fourier-wavelet regularised deconvolution

    NARCIS (Netherlands)

    Wink, Alle Meije; Hoogduin, Hans; Roerdink, Jos B.T.M.

    2008-01-01

    Background: We present a simple, data-driven method to extract haemodynamic response functions (HRF) from functional magnetic resonance imaging (fMRI) time series, based on the Fourier-wavelet regularised deconvolution (ForWaRD) technique. HRF data are required for many fMRI applications, such as

  5. Data-driven haemodynamic response function extraction using Fourier-wavelet regularised deconvolution

    NARCIS (Netherlands)

    Wink, Alle Meije; Hoogduin, Hans; Roerdink, Jos B.T.M.

    2010-01-01

    Background: We present a simple, data-driven method to extract haemodynamic response functions (HRF) from functional magnetic resonance imaging (fMRI) time series, based on the Fourier-wavelet regularised deconvolution (ForWaRD) technique. HRF data are required for many fMRI applications, such as

  6. Perspectives of data-driven LPV modeling of high-purity distillation columns

    NARCIS (Netherlands)

    Bachnas, A.A.; Toth, R.; Mesbah, A.; Ludlage, J.H.A.

    2013-01-01

    Abstract—This paper investigates data-driven, Linear- Parameter-Varying (LPV) modeling of a high-purity distillation column. Two LPV modeling approaches are studied: a local approach, corresponding to the interpolation of Linear Time- Invariant (LTI) models identified at steady-state purity levels,

  7. The Role of Guided Induction in Paper-Based Data-Driven Learning

    Science.gov (United States)

    Smart, Jonathan

    2014-01-01

    This study examines the role of guided induction as an instructional approach in paper-based data-driven learning (DDL) in the context of an ESL grammar course during an intensive English program at an American public university. Specifically, it examines whether corpus-informed grammar instruction is more effective through inductive, data-driven…

  8. Design and evaluation of a data-driven scenario generation framework for game-based training

    NARCIS (Netherlands)

    Luo, L.; Yin, H.; Cai, W.; Zhong, J.; Lees, M.

    Generating suitable game scenarios that can cater for individual players has become an emerging challenge in procedural content generation. In this paper, we propose a data-driven scenario generation framework for game-based training. An evolutionary scenario generation process is designed with a

  9. Data-driven Development of ROTEM and TEG Algorithms for the Management of Trauma Hemorrhage

    DEFF Research Database (Denmark)

    Baksaas-Aasen, Kjersti; Van Dieren, Susan; Balvers, Kirsten

    2018-01-01

    for ROTEM, TEG, and CCTs to be used in addition to ratio driven transfusion and tranexamic acid. CONCLUSIONS: We describe a systematic approach to define threshold parameters for ROTEM and TEG. These parameters were incorporated into algorithms to support data-driven adjustments of resuscitation...

  10. Teacher Talk about Student Ability and Achievement in the Era of Data-Driven Decision Making

    Science.gov (United States)

    Datnow, Amanda; Choi, Bailey; Park, Vicki; St. John, Elise

    2018-01-01

    Background: Data-driven decision making continues to be a common feature of educational reform agendas across the globe. In many U.S. schools, the teacher team meeting is a key setting in which data use is intended to take place, with the aim of planning instruction to address students' needs. However, most prior research has not examined how the…

  11. Big-Data-Driven Stem Cell Science and Tissue Engineering: Vision and Unique Opportunities.

    Science.gov (United States)

    Del Sol, Antonio; Thiesen, Hans J; Imitola, Jaime; Carazo Salas, Rafael E

    2017-02-02

    Achieving the promises of stem cell science to generate precise disease models and designer cell samples for personalized therapeutics will require harnessing pheno-genotypic cell-level data quantitatively and predictively in the lab and clinic. Those requirements could be met by developing a Big-Data-driven stem cell science strategy and community. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Exploring Techniques of Developing Writing Skill in IELTS Preparatory Courses: A Data-Driven Study

    Science.gov (United States)

    Ostovar-Namaghi, Seyyed Ali; Safaee, Seyyed Esmail

    2017-01-01

    Being driven by the hypothetico-deductive mode of inquiry, previous studies have tested the effectiveness of theory-driven interventions under controlled experimental conditions to come up with universally applicable generalizations. To make a case in the opposite direction, this data-driven study aims at uncovering techniques and strategies…

  13. A framework for the automated data-driven constitutive characterization of composites

    Science.gov (United States)

    J.G. Michopoulos; John Hermanson; T. Furukawa; A. Iliopoulos

    2010-01-01

    We present advances on the development of a mechatronically and algorithmically automated framework for the data-driven identification of constitutive material models based on energy density considerations. These models can capture both the linear and nonlinear constitutive response of multiaxially loaded composite materials in a manner that accounts for progressive...

  14. Writing through Big Data: New Challenges and Possibilities for Data-Driven Arguments

    Science.gov (United States)

    Beveridge, Aaron

    2017-01-01

    As multimodal writing continues to shift and expand in the era of Big Data, writing studies must confront the new challenges and possibilities emerging from data mining, data visualization, and data-driven arguments. Often collected under the broad banner of "data literacy," students' experiences of data visualization and data-driven…

  15. Data-driven directions for effective footwear provision for the high-risk diabetic foot

    NARCIS (Netherlands)

    Arts, M. L. J.; de Haart, M.; Waaijman, R.; Dahmen, R.; Berendsen, H.; Nollet, F.; Bus, S. A.

    2015-01-01

    Custom-made footwear is used to offload the diabetic foot to prevent plantar foot ulcers. This prospective study evaluates the offloading effects of modifying custom-made footwear and aims to provide data-driven directions for the provision of effectively offloading footwear in clinical practice.

  16. Toward Data-Driven Design of Educational Courses: A Feasibility Study

    Science.gov (United States)

    Agrawal, Rakesh; Golshan, Behzad; Papalexakis, Evangelos

    2016-01-01

    A study plan is the choice of concepts and the organization and sequencing of the concepts to be covered in an educational course. While a good study plan is essential for the success of any course offering, the design of study plans currently remains largely a manual task. We present a novel data-driven method, which given a list of concepts can…

  17. Retesting the Limits of Data-Driven Learning: Feedback and Error Correction

    Science.gov (United States)

    Crosthwaite, Peter

    2017-01-01

    An increasing number of studies have looked at the value of corpus-based data-driven learning (DDL) for second language (L2) written error correction, with generally positive results. However, a potential conundrum for language teachers involved in the process is how to provide feedback on students' written production for DDL. The study looks at…

  18. Articulatory Distinctiveness of Vowels and Consonants: A Data-Driven Approach

    Science.gov (United States)

    Wang, Jun; Green, Jordan R.; Samal, Ashok; Yunusova, Yana

    2013-01-01

    Purpose: To quantify the articulatory distinctiveness of 8 major English vowels and 11 English consonants based on tongue and lip movement time series data using a data-driven approach. Method: Tongue and lip movements of 8 vowels and 11 consonants from 10 healthy talkers were

  19. Data-Driven Hint Generation in Vast Solution Spaces: A Self-Improving Python Programming Tutor

    Science.gov (United States)

    Rivers, Kelly; Koedinger, Kenneth R.

    2017-01-01

    To provide personalized help to students who are working on code-writing problems, we introduce a data-driven tutoring system, ITAP (Intelligent Teaching Assistant for Programming). ITAP uses state abstraction, path construction, and state reification to automatically generate personalized hints for students, even when given states that have not…

  20. Progressing batch hydrolysis process

    Science.gov (United States)

    Wright, J.D.

    1985-01-10

    A progressive batch hydrolysis process is disclosed for producing sugar from a lignocellulosic feedstock. It comprises passing a stream of dilute acid serially through a plurality of percolation hydrolysis reactors charged with feed stock, at a flow rate, temperature and pressure sufficient to substantially convert all the cellulose component of the feed stock to glucose. The cooled dilute acid stream containing glucose, after exiting the last percolation hydrolysis reactor, serially fed through a plurality of pre-hydrolysis percolation reactors, charged with said feedstock, at a flow rate, temperature and pressure sufficient to substantially convert all the hemicellulose component of said feedstock to glucose. The dilute acid stream containing glucose is cooled after it exits the last prehydrolysis reactor.

  1. An Open Framework for Dynamic Big-data-driven Application Systems (DBDDAS) Development

    KAUST Repository

    Douglas, Craig

    2014-01-01

    In this paper, we outline key features that dynamic data-driven application systems (DDDAS) have. A DDDAS is an application that has data assimilation that can change the models and/or scales of the computation and that the application controls the data collection based on the computational results. The term Big Data (BD) has come into being in recent years that is highly applicable to most DDDAS since most applications use networks of sensors that generate an overwhelming amount of data in the lifespan of the application runs. We describe what a dynamic big-data-driven application system (DBDDAS) toolkit must have in order to provide all of the essential building blocks that are necessary to easily create new DDDAS without re-inventing the building blocks.

  2. Data-Driven Iterative Vibration Signal Enhancement Strategy Using Alpha Stable Distribution

    Directory of Open Access Journals (Sweden)

    Grzegorz Żak

    2017-01-01

    Full Text Available The authors propose a novel procedure for enhancement of the signal to noise ratio in vibration data acquired from machines working in mining industry environment. Proposed method allows performing data-driven reduction of the deterministic, high energy, and low frequency components. Furthermore, it provides a way to enhance signal of interest. Procedure incorporates application of the time-frequency decomposition, α-stable distribution based signal modeling, and stability parameter in the time domain as a stoppage criterion for iterative part of the procedure. An advantage of the proposed algorithm is data-driven, automative detection of the informative frequency band as well as band with high energy due to the properties of the used distribution. Furthermore, there is no need to have knowledge regarding kinematics, speed, and so on. The proposed algorithm is applied towards real data acquired from the belt conveyor pulley drive’s gearbox.

  3. Data Driven Modelling of the Dynamic Wake Between Two Wind Turbines

    DEFF Research Database (Denmark)

    Knudsen, Torben; Bak, Thomas

    2012-01-01

    turbine. This paper establishes flow models relating the wind speeds at turbines in a farm. So far, research in this area has been mainly based on first principles static models and the data driven modelling done has not included the loading of the upwind turbine and its impact on the wind speed downwind......Wind turbines in a wind farm, influence each other through the wind flow. Downwind turbines are in the wake of upwind turbines and the wind speed experienced at downwind turbines is hence a function of the wind speeds at upwind turbines but also the momentum extracted from the wind by the upwind....... This paper is the first where modern commercial mega watt turbines are used for data driven modelling including the upwind turbine loading by changing power reference. Obtaining the necessary data is difficult and data is therefore limited. A simple dynamic extension to the Jensen wake model is tested...

  4. Pipe break prediction based on evolutionary data-driven methods with brief recorded data

    International Nuclear Information System (INIS)

    Xu Qiang; Chen Qiuwen; Li Weifeng; Ma Jinfeng

    2011-01-01

    Pipe breaks often occur in water distribution networks, imposing great pressure on utility managers to secure stable water supply. However, pipe breaks are hard to detect by the conventional method. It is therefore necessary to develop reliable and robust pipe break models to assess the pipe's probability to fail and then to optimize the pipe break detection scheme. In the absence of deterministic physical models for pipe break, data-driven techniques provide a promising approach to investigate the principles underlying pipe break. In this paper, two data-driven techniques, namely Genetic Programming (GP) and Evolutionary Polynomial Regression (EPR) are applied to develop pipe break models for the water distribution system of Beijing City. The comparison with the recorded pipe break data from 1987 to 2005 showed that the models have great capability to obtain reliable predictions. The models can be used to prioritize pipes for break inspection and then improve detection efficiency.

  5. Data-driven modeling and real-time distributed control for energy efficient manufacturing systems

    International Nuclear Information System (INIS)

    Zou, Jing; Chang, Qing; Arinez, Jorge; Xiao, Guoxian

    2017-01-01

    As manufacturers face the challenges of increasing global competition and energy saving requirements, it is imperative to seek out opportunities to reduce energy waste and overall cost. In this paper, a novel data-driven stochastic manufacturing system modeling method is proposed to identify and predict energy saving opportunities and their impact on production. A real-time distributed feedback production control policy, which integrates the current and predicted system performance, is established to improve the overall profit and energy efficiency. A case study is presented to demonstrate the effectiveness of the proposed control policy. - Highlights: • A data-driven stochastic manufacturing system model is proposed. • Real-time system performance and energy saving opportunity identification method is developed. • Prediction method for future potential system performance and energy saving opportunity is developed. • A real-time distributed feedback control policy is established to improve energy efficiency and overall system profit.

  6. An Open Framework for Dynamic Big-data-driven Application Systems (DBDDAS) Development

    KAUST Repository

    Douglas, Craig

    2014-06-06

    In this paper, we outline key features that dynamic data-driven application systems (DDDAS) have. A DDDAS is an application that has data assimilation that can change the models and/or scales of the computation and that the application controls the data collection based on the computational results. The term Big Data (BD) has come into being in recent years that is highly applicable to most DDDAS since most applications use networks of sensors that generate an overwhelming amount of data in the lifespan of the application runs. We describe what a dynamic big-data-driven application system (DBDDAS) toolkit must have in order to provide all of the essential building blocks that are necessary to easily create new DDDAS without re-inventing the building blocks.

  7. A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults

    OpenAIRE

    Rui Sun; Qi Cheng; Guanyu Wang; Washington Yotto Ochieng

    2017-01-01

    The use of Unmanned Aerial Vehicles (UAVs) has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs’ flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS)-based approach is presented for the detection of on-board navigation sensor faults in ...

  8. Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains

    OpenAIRE

    Sethi, Tegjyot Singh; Kantardzic, Mehmed

    2017-01-01

    While modern day web applications aim to create impact at the civilization level, they have become vulnerable to adversarial activity, where the next cyber-attack can take any shape and can originate from anywhere. The increasing scale and sophistication of attacks, has prompted the need for a data driven solution, with machine learning forming the core of many cybersecurity systems. Machine learning was not designed with security in mind, and the essential assumption of stationarity, requiri...

  9. Data Driven Marketing in Apple and Back to School Campaign 2011

    OpenAIRE

    Bernátek, Martin

    2011-01-01

    Out of the campaign analysis the most important contribution is that Data-Driven Marketing makes sense only once it is already part of the marketing plan. So the team preparing the marketing plan defines the goals and sets the proper measurement matrix according to those goals. It enables to adjust the marketing plan to extract more value, watch the execution and do adjustments if necessary and evaluate at the end of the campaign.

  10. Data-driven automatic parking constrained control for four-wheeled mobile vehicles

    OpenAIRE

    Wenxu Yan; Jing Deng; Dezhi Xu

    2016-01-01

    In this article, a novel data-driven constrained control scheme is proposed for automatic parking systems. The design of the proposed scheme only depends on the steering angle and the orientation angle of the car, and it does not involve any model information of the car. Therefore, the proposed scheme-based automatic parking system is applicable to different kinds of cars. In order to further reduce the desired trajectory coordinate tracking errors, a coordinates compensation algorithm is als...

  11. Extension of a data-driven gating technique to 3D, whole body PET studies

    International Nuclear Information System (INIS)

    Schleyer, Paul J; O'Doherty, Michael J; Marsden, Paul K

    2011-01-01

    Respiratory gating can be used to separate a PET acquisition into a series of near motion-free bins. This is typically done using additional gating hardware; however, software-based methods can derive the respiratory signal from the acquired data itself. The aim of this work was to extend a data-driven respiratory gating method to acquire gated, 3D, whole body PET images of clinical patients. The existing method, previously demonstrated with 2D, single bed-position data, uses a spectral analysis to find regions in raw PET data which are subject to respiratory motion. The change in counts over time within these regions is then used to estimate the respiratory signal of the patient. In this work, the gating method was adapted to only accept lines of response from a reduced set of axial angles, and the respiratory frequency derived from the lung bed position was used to help identify the respiratory frequency in all other bed positions. As the respiratory signal does not identify the direction of motion, a registration-based technique was developed to align the direction for all bed positions. Data from 11 clinical FDG PET patients were acquired, and an optical respiratory monitor was used to provide a hardware-based signal for comparison. All data were gated using both the data-driven and hardware methods, and reconstructed. The centre of mass of manually defined regions on gated images was calculated, and the overall displacement was defined as the change in the centre of mass between the first and last gates. The mean displacement was 10.3 mm for the data-driven gated images and 9.1 mm for the hardware gated images. No significant difference was found between the two gating methods when comparing the displacement values. The adapted data-driven gating method was demonstrated to successfully produce respiratory gated, 3D, whole body, clinical PET acquisitions.

  12. A data-driven approach for retrieving temperatures and abundances in brown dwarf atmospheres

    OpenAIRE

    Line, MR; Fortney, JJ; Marley, MS; Sorahana, S

    2014-01-01

    © 2014. The American Astronomical Society. All rights reserved. Brown dwarf spectra contain a wealth of information about their molecular abundances, temperature structure, and gravity. We present a new data driven retrieval approach, previously used in planetary atmosphere studies, to extract the molecular abundances and temperature structure from brown dwarf spectra. The approach makes few a priori physical assumptions about the state of the atmosphere. The feasibility of the approach is fi...

  13. Using Two Different Approaches to Assess Dietary Patterns: Hypothesis-Driven and Data-Driven Analysis

    Directory of Open Access Journals (Sweden)

    Ágatha Nogueira Previdelli

    2016-09-01

    Full Text Available The use of dietary patterns to assess dietary intake has become increasingly common in nutritional epidemiology studies due to the complexity and multidimensionality of the diet. Currently, two main approaches have been widely used to assess dietary patterns: data-driven and hypothesis-driven analysis. Since the methods explore different angles of dietary intake, using both approaches simultaneously might yield complementary and useful information; thus, we aimed to use both approaches to gain knowledge of adolescents’ dietary patterns. Food intake from a cross-sectional survey with 295 adolescents was assessed by 24 h dietary recall (24HR. In hypothesis-driven analysis, based on the American National Cancer Institute method, the usual intake of Brazilian Healthy Eating Index Revised components were estimated. In the data-driven approach, the usual intake of foods/food groups was estimated by the Multiple Source Method. In the results, hypothesis-driven analysis showed low scores for Whole grains, Total vegetables, Total fruit and Whole fruits, while, in data-driven analysis, fruits and whole grains were not presented in any pattern. High intakes of sodium, fats and sugars were observed in hypothesis-driven analysis with low total scores for Sodium, Saturated fat and SoFAA (calories from solid fat, alcohol and added sugar components in agreement, while the data-driven approach showed the intake of several foods/food groups rich in these nutrients, such as butter/margarine, cookies, chocolate powder, whole milk, cheese, processed meat/cold cuts and candies. In this study, using both approaches at the same time provided consistent and complementary information with regard to assessing the overall dietary habits that will be important in order to drive public health programs, and improve their efficiency to monitor and evaluate the dietary patterns of populations.

  14. Data-Driven and Expectation-Driven Discovery of Empirical Laws.

    Science.gov (United States)

    1982-10-10

    occurred in small integer proportions to each other. In 1809, Joseph Gay- Lussac found evidence for his law of combining volumes, which stated that a...of Empirical Laws Patrick W. Langley Gary L. Bradshaw Herbert A. Simon T1he Robotics Institute Carnegie-Mellon University Pittsburgh, Pennsylvania...Subtitle) S. TYPE OF REPORT & PERIOD COVERED Data-Driven and Expectation-Driven Discovery Interim Report 2/82-10/82 of Empirical Laws S. PERFORMING ORG

  15. Data-driven non-linear elasticity: constitutive manifold construction and problem discretization

    Science.gov (United States)

    Ibañez, Ruben; Borzacchiello, Domenico; Aguado, Jose Vicente; Abisset-Chavanne, Emmanuelle; Cueto, Elias; Ladeveze, Pierre; Chinesta, Francisco

    2017-11-01

    The use of constitutive equations calibrated from data has been implemented into standard numerical solvers for successfully addressing a variety problems encountered in simulation-based engineering sciences (SBES). However, the complexity remains constantly increasing due to the need of increasingly detailed models as well as the use of engineered materials. Data-Driven simulation constitutes a potential change of paradigm in SBES. Standard simulation in computational mechanics is based on the use of two very different types of equations. The first one, of axiomatic character, is related to balance laws (momentum, mass, energy,\\ldots ), whereas the second one consists of models that scientists have extracted from collected, either natural or synthetic, data. Data-driven (or data-intensive) simulation consists of directly linking experimental data to computers in order to perform numerical simulations. These simulations will employ laws, universally recognized as epistemic, while minimizing the need of explicit, often phenomenological, models. The main drawback of such an approach is the large amount of required data, some of them inaccessible from the nowadays testing facilities. Such difficulty can be circumvented in many cases, and in any case alleviated, by considering complex tests, collecting as many data as possible and then using a data-driven inverse approach in order to generate the whole constitutive manifold from few complex experimental tests, as discussed in the present work.

  16. Data-Driven Anomaly Detection Performance for the Ares I-X Ground Diagnostic Prototype

    Science.gov (United States)

    Martin, Rodney A.; Schwabacher, Mark A.; Matthews, Bryan L.

    2010-01-01

    In this paper, we will assess the performance of a data-driven anomaly detection algorithm, the Inductive Monitoring System (IMS), which can be used to detect simulated Thrust Vector Control (TVC) system failures. However, the ability of IMS to detect these failures in a true operational setting may be related to the realistic nature of how they are simulated. As such, we will investigate both a low fidelity and high fidelity approach to simulating such failures, with the latter based upon the underlying physics. Furthermore, the ability of IMS to detect anomalies that were previously unknown and not previously simulated will be studied in earnest, as well as apparent deficiencies or misapplications that result from using the data-driven paradigm. Our conclusions indicate that robust detection performance of simulated failures using IMS is not appreciably affected by the use of a high fidelity simulation. However, we have found that the inclusion of a data-driven algorithm such as IMS into a suite of deployable health management technologies does add significant value.

  17. Data-Driven User Feedback: An Improved Neurofeedback Strategy considering the Interindividual Variability of EEG Features

    Directory of Open Access Journals (Sweden)

    Chang-Hee Han

    2016-01-01

    Full Text Available It has frequently been reported that some users of conventional neurofeedback systems can experience only a small portion of the total feedback range due to the large interindividual variability of EEG features. In this study, we proposed a data-driven neurofeedback strategy considering the individual variability of electroencephalography (EEG features to permit users of the neurofeedback system to experience a wider range of auditory or visual feedback without a customization process. The main idea of the proposed strategy is to adjust the ranges of each feedback level using the density in the offline EEG database acquired from a group of individuals. Twenty-two healthy subjects participated in offline experiments to construct an EEG database, and five subjects participated in online experiments to validate the performance of the proposed data-driven user feedback strategy. Using the optimized bin sizes, the number of feedback levels that each individual experienced was significantly increased to 139% and 144% of the original results with uniform bin sizes in the offline and online experiments, respectively. Our results demonstrated that the use of our data-driven neurofeedback strategy could effectively increase the overall range of feedback levels that each individual experienced during neurofeedback training.

  18. KNMI DataLab experiences in serving data-driven innovations

    Science.gov (United States)

    Noteboom, Jan Willem; Sluiter, Raymond

    2016-04-01

    Climate change research and innovations in weather forecasting rely more and more on (Big) data. Besides increasing data from traditional sources (such as observation networks, radars and satellites), the use of open data, crowd sourced data and the Internet of Things (IoT) is emerging. To deploy these sources of data optimally in our services and products, KNMI has established a DataLab to serve data-driven innovations in collaboration with public and private sector partners. Big data management, data integration, data analytics including machine learning and data visualization techniques are playing an important role in the DataLab. Cross-domain data-driven innovations that arise from public-private collaborative projects and research programmes can be explored, experimented and/or piloted by the KNMI DataLab. Furthermore, advice can be requested on (Big) data techniques and data sources. In support of collaborative (Big) data science activities, scalable environments are offered with facilities for data integration, data analysis and visualization. In addition, Data Science expertise is provided directly or from a pool of internal and external experts. At the EGU conference, gained experiences and best practices are presented in operating the KNMI DataLab to serve data-driven innovations for weather and climate applications optimally.

  19. Data-Driven User Feedback: An Improved Neurofeedback Strategy considering the Interindividual Variability of EEG Features.

    Science.gov (United States)

    Han, Chang-Hee; Lim, Jeong-Hwan; Lee, Jun-Hak; Kim, Kangsan; Im, Chang-Hwan

    2016-01-01

    It has frequently been reported that some users of conventional neurofeedback systems can experience only a small portion of the total feedback range due to the large interindividual variability of EEG features. In this study, we proposed a data-driven neurofeedback strategy considering the individual variability of electroencephalography (EEG) features to permit users of the neurofeedback system to experience a wider range of auditory or visual feedback without a customization process. The main idea of the proposed strategy is to adjust the ranges of each feedback level using the density in the offline EEG database acquired from a group of individuals. Twenty-two healthy subjects participated in offline experiments to construct an EEG database, and five subjects participated in online experiments to validate the performance of the proposed data-driven user feedback strategy. Using the optimized bin sizes, the number of feedback levels that each individual experienced was significantly increased to 139% and 144% of the original results with uniform bin sizes in the offline and online experiments, respectively. Our results demonstrated that the use of our data-driven neurofeedback strategy could effectively increase the overall range of feedback levels that each individual experienced during neurofeedback training.

  20. External radioactive markers for PET data-driven respiratory gating in positron emission tomography.

    Science.gov (United States)

    Büther, Florian; Ernst, Iris; Hamill, James; Eich, Hans T; Schober, Otmar; Schäfers, Michael; Schäfers, Klaus P

    2013-04-01

    Respiratory gating is an established approach to overcoming respiration-induced image artefacts in PET. Of special interest in this respect are raw PET data-driven gating methods which do not require additional hardware to acquire respiratory signals during the scan. However, these methods rely heavily on the quality of the acquired PET data (statistical properties, data contrast, etc.). We therefore combined external radioactive markers with data-driven respiratory gating in PET/CT. The feasibility and accuracy of this approach was studied for [(18)F]FDG PET/CT imaging in patients with malignant liver and lung lesions. PET data from 30 patients with abdominal or thoracic [(18)F]FDG-positive lesions (primary tumours or metastases) were included in this prospective study. The patients underwent a 10-min list-mode PET scan with a single bed position following a standard clinical whole-body [(18)F]FDG PET/CT scan. During this scan, one to three radioactive point sources (either (22)Na or (18)F, 50-100 kBq) in a dedicated holder were attached the patient's abdomen. The list mode data acquired were retrospectively analysed for respiratory signals using established data-driven gating approaches and additionally by tracking the motion of the point sources in sinogram space. Gated reconstructions were examined qualitatively, in terms of the amount of respiratory displacement and in respect of changes in local image intensity in the gated images. The presence of the external markers did not affect whole-body PET/CT image quality. Tracking of the markers led to characteristic respiratory curves in all patients. Applying these curves for gated reconstructions resulted in images in which motion was well resolved. Quantitatively, the performance of the external marker-based approach was similar to that of the best intrinsic data-driven methods. Overall, the gain in measured tumour uptake from the nongated to the gated images indicating successful removal of respiratory motion

  1. Data-driven Inference and Investigation of Thermosphere Dynamics and Variations

    Science.gov (United States)

    Mehta, P. M.; Linares, R.

    2017-12-01

    This paper presents a methodology for data-driven inference and investigation of thermosphere dynamics and variations. The approach uses data-driven modal analysis to extract the most energetic modes of variations for neutral thermospheric species using proper orthogonal decomposition, where the time-independent modes or basis represent the dynamics and the time-depedent coefficients or amplitudes represent the model parameters. The data-driven modal analysis approach combined with sparse, discrete observations is used to infer amplitues for the dynamic modes and to calibrate the energy content of the system. In this work, two different data-types, namely the number density measurements from TIMED/GUVI and the mass density measurements from CHAMP/GRACE are simultaneously ingested for an accurate and self-consistent specification of the thermosphere. The assimilation process is achieved with a non-linear least squares solver and allows estimation/tuning of the model parameters or amplitudes rather than the driver. In this work, we use the Naval Research Lab's MSIS model to derive the most energetic modes for six different species, He, O, N2, O2, H, and N. We examine the dominant drivers of variations for helium in MSIS and observe that seasonal latitudinal variation accounts for about 80% of the dynamic energy with a strong preference of helium for the winter hemisphere. We also observe enhanced helium presence near the poles at GRACE altitudes during periods of low solar activity (Feb 2007) as previously deduced. We will also examine the storm-time response of helium derived from observations. The results are expected to be useful in tuning/calibration of the physics-based models.

  2. Data-driven criteria to assess fear remission and phenotypic variability of extinction in rats.

    Science.gov (United States)

    Shumake, Jason; Jones, Carolyn; Auchter, Allison; Monfils, Marie-Hélène

    2018-03-19

    Fear conditioning is widely employed to examine the mechanisms that underlie dysregulations of the fear system. Various manipulations are often used following fear acquisition to attenuate fear memories. In rodent studies, freezing is often the main output measure to quantify 'fear'. Here, we developed data-driven criteria for defining a standard benchmark that indicates remission from conditioned fear and for identifying subgroups with differential treatment responses. These analyses will enable a better understanding of individual differences in treatment responding.This article is part of a discussion meeting issue 'Of mice and mental health: facilitating dialogue between basic and clinical neuroscientists'. © 2018 The Author(s).

  3. Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study

    OpenAIRE

    Kim, Eun-Kyung; Kim, Hyo-Eun; Han, Kyunghwa; Kang, Bong Joo; Sohn, Yu-Mee; Woo, Ok Hee; Lee, Chan Wha

    2018-01-01

    We assessed the feasibility of a data-driven imaging biomarker based on weakly supervised learning (DIB; an imaging biomarker derived from large-scale medical image data with deep learning technology) in mammography (DIB-MG). A total of 29,107 digital mammograms from five institutions (4,339 cancer cases and 24,768 normal cases) were included. After matching patients’ age, breast density, and equipment, 1,238 and 1,238 cases were chosen as validation and test sets, respectively, and the remai...

  4. Building Data-Driven Pathways From Routinely Collected Hospital Data: A Case Study on Prostate Cancer

    Science.gov (United States)

    Clark, Jeremy; Cooper, Colin S; Mills, Robert; Rayward-Smith, Victor J; de la Iglesia, Beatriz

    2015-01-01

    Background Routinely collected data in hospitals is complex, typically heterogeneous, and scattered across multiple Hospital Information Systems (HIS). This big data, created as a byproduct of health care activities, has the potential to provide a better understanding of diseases, unearth hidden patterns, and improve services and cost. The extent and uses of such data rely on its quality, which is not consistently checked, nor fully understood. Nevertheless, using routine data for the construction of data-driven clinical pathways, describing processes and trends, is a key topic receiving increasing attention in the literature. Traditional algorithms do not cope well with unstructured processes or data, and do not produce clinically meaningful visualizations. Supporting systems that provide additional information, context, and quality assurance inspection are needed. Objective The objective of the study is to explore how routine hospital data can be used to develop data-driven pathways that describe the journeys that patients take through care, and their potential uses in biomedical research; it proposes a framework for the construction, quality assessment, and visualization of patient pathways for clinical studies and decision support using a case study on prostate cancer. Methods Data pertaining to prostate cancer patients were extracted from a large UK hospital from eight different HIS, validated, and complemented with information from the local cancer registry. Data-driven pathways were built for each of the 1904 patients and an expert knowledge base, containing rules on the prostate cancer biomarker, was used to assess the completeness and utility of the pathways for a specific clinical study. Software components were built to provide meaningful visualizations for the constructed pathways. Results The proposed framework and pathway formalism enable the summarization, visualization, and querying of complex patient-centric clinical information, as well as the

  5. PHYCAA: Data-driven measurement and removal of physiological noise in BOLD fMRI

    DEFF Research Database (Denmark)

    Churchill, Nathan W.; Yourganov, Grigori; Spring, Robyn

    2012-01-01

    , autocorrelated physiological noise sources with reproducible spatial structure, using an adaptation of Canonical Correlation Analysis performed in a split-half resampling framework. The technique is able to identify physiological effects with vascular-linked spatial structure, and an intrinsic dimensionality...... with physiological noise, and real data-driven model prediction and reproducibility, for both block and event-related task designs. This is demonstrated compared to no physiological noise correction, and to the widely used RETROICOR (Glover et al., 2000) physiological denoising algorithm, which uses externally...

  6. Classification Systems, their Digitization and Consequences for Data-Driven Decision Making

    DEFF Research Database (Denmark)

    Stein, Mari-Klara; Newell, Sue; Galliers, Robert D.

    2013-01-01

    Classification systems are foundational in many standardized software tools. This digitization of classification systems gives them a new ‘materiality’ that, jointly with the social practices of information producers/consumers, has significant consequences on the representational quality of such ...... and the foundational role of representational quality in understanding the success and consequences of data-driven decision-making.......-narration and meta-narration), and three different information production/consumption situations. We contribute to the relational theorization of representational quality and extend classification systems research by drawing explicit attention to the importance of ‘materialization’ of classification systems...

  7. Data-driven Discovery: A New Era of Exploiting the Literature and Data

    Directory of Open Access Journals (Sweden)

    Ying Ding

    2016-11-01

    Full Text Available In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.

  8. A data driven method to measure electron charge mis-identification rate

    CERN Document Server

    Bakhshiansohi, Hamed

    2009-01-01

    Electron charge mis-measurement is an important challenge in analyses which depend on the charge of electron. To estimate the probability of {\\it electron charge mis-measurement} a data driven method is introduced and a good agreement with MC based methods is achieved.\\\\ The third moment of $\\phi$ distribution of hits in electron SuperCluster is studied. The correlation between this variable and the electron charge is also investigated. Using this `new' variable and some other variables the electron charge measurement is improved by two different approaches.

  9. Kubernetes as a batch scheduler

    OpenAIRE

    Souza, Clenimar; Brito Da Rocha, Ricardo

    2017-01-01

    This project aims at executing a CERN batch use case using Kubernetes, in order to figure out what are the advantages and disadvantages, as well as the functionality that can be replicated or is missing. The reference for the batch system is the CERN Batch System, which uses HTCondor. Another goal of this project is to evaluate the current status of federated resources in Kubernetes, in comparison to the single-cluster API resources. Finally, the last goal of this project is to implement buil...

  10. A data-driven approach to reverse engineering customer engagement models: towards functional constructs.

    Directory of Open Access Journals (Sweden)

    Natalie Jane de Vries

    Full Text Available Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.

  11. A Data-driven Concept Schema for Defining Clinical Research Data Needs

    Science.gov (United States)

    Hruby, Gregory W.; Hoxha, Julia; Ravichandran, Praveen Chandar; Mendonça, Eneida A.; Hanauer, David A; Weng, Chunhua

    2016-01-01

    OBJECTIVES The Patient, Intervention, Control/Comparison, and Outcome (PICO) framework is an effective technique for framing a clinical question. We aim to develop the counterpart of PICO to structure clinical research data needs. METHODS We use a data-driven approach to abstracting key concepts representing clinical research data needs by adapting and extending an expert-derived framework originally developed for defining cancer research data needs. We annotated clinical trial eligibility criteria, EHR data request logs, and data queries to electronic health records (EHR), to extract and harmonize concept classes representing clinical research data needs. We evaluated the class coverage, class preservation from the original framework, schema generalizability, schema understandability, and schema structural correctness through a semi-structured interview with eight multidisciplinary domain experts. We iteratively refined the schema based on the evaluations. RESULTS Our data-driven schema preserved 68% of the 63 classes from the original framework and covered 88% (73/82) of the classes proposed by evaluators. Class coverage for participants of different backgrounds ranged from 60% to 100% with a median value of 95% agreement among the individual evaluators. The schema was found understandable and structurally sound. CONCLUSIONS Our proposed schema may serve as the counterpart to PICO for improving the research data needs communication between researchers and informaticians. PMID:27185504

  12. A copula-based sampling method for data-driven prognostics

    International Nuclear Information System (INIS)

    Xi, Zhimin; Jing, Rong; Wang, Pingfeng; Hu, Chao

    2014-01-01

    This paper develops a Copula-based sampling method for data-driven prognostics. The method essentially consists of an offline training process and an online prediction process: (i) the offline training process builds a statistical relationship between the failure time and the time realizations at specified degradation levels on the basis of off-line training data sets; and (ii) the online prediction process identifies probable failure times for online testing units based on the statistical model constructed in the offline process and the online testing data. Our contributions in this paper are three-fold, namely the definition of a generic health index system to quantify the health degradation of an engineering system, the construction of a Copula-based statistical model to learn the statistical relationship between the failure time and the time realizations at specified degradation levels, and the development of a simulation-based approach for the prediction of remaining useful life (RUL). Two engineering case studies, namely the electric cooling fan health prognostics and the 2008 IEEE PHM challenge problem, are employed to demonstrate the effectiveness of the proposed methodology. - Highlights: • We develop a novel mechanism for data-driven prognostics. • A generic health index system quantifies health degradation of engineering systems. • Off-line training model is constructed based on the Bayesian Copula model. • Remaining useful life is predicted from a simulation-based approach

  13. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization.

    Science.gov (United States)

    Peng, Huan-Kai; Lee, Hao-Chih; Pan, Jia-Yu; Marculescu, Radu

    2016-01-01

    In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications.

  14. General Purpose Data-Driven Online System Health Monitoring with Applications to Space Operations

    Science.gov (United States)

    Iverson, David L.; Spirkovska, Lilly; Schwabacher, Mark

    2010-01-01

    Modern space transportation and ground support system designs are becoming increasingly sophisticated and complex. Determining the health state of these systems using traditional parameter limit checking, or model-based or rule-based methods is becoming more difficult as the number of sensors and component interactions grows. Data-driven monitoring techniques have been developed to address these issues by analyzing system operations data to automatically characterize normal system behavior. System health can be monitored by comparing real-time operating data with these nominal characterizations, providing detection of anomalous data signatures indicative of system faults, failures, or precursors of significant failures. The Inductive Monitoring System (IMS) is a general purpose, data-driven system health monitoring software tool that has been successfully applied to several aerospace applications and is under evaluation for anomaly detection in vehicle and ground equipment for next generation launch systems. After an introduction to IMS application development, we discuss these NASA online monitoring applications, including the integration of IMS with complementary model-based and rule-based methods. Although the examples presented in this paper are from space operations applications, IMS is a general-purpose health-monitoring tool that is also applicable to power generation and transmission system monitoring.

  15. A data-driven approach to reverse engineering customer engagement models: towards functional constructs.

    Science.gov (United States)

    de Vries, Natalie Jane; Carlson, Jamie; Moscato, Pablo

    2014-01-01

    Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.

  16. Data-driven risk identification in phase III clinical trials using central statistical monitoring.

    Science.gov (United States)

    Timmermans, Catherine; Venet, David; Burzykowski, Tomasz

    2016-02-01

    Our interest lies in quality control for clinical trials, in the context of risk-based monitoring (RBM). We specifically study the use of central statistical monitoring (CSM) to support RBM. Under an RBM paradigm, we claim that CSM has a key role to play in identifying the "risks to the most critical data elements and processes" that will drive targeted oversight. In order to support this claim, we first see how to characterize the risks that may affect clinical trials. We then discuss how CSM can be understood as a tool for providing a set of data-driven key risk indicators (KRIs), which help to organize adaptive targeted monitoring. Several case studies are provided where issues in a clinical trial have been identified thanks to targeted investigation after the identification of a risk using CSM. Using CSM to build data-driven KRIs helps to identify different kinds of issues in clinical trials. This ability is directly linked with the exhaustiveness of the CSM approach and its flexibility in the definition of the risks that are searched for when identifying the KRIs. In practice, a CSM assessment of the clinical database seems essential to ensure data quality. The atypical data patterns found in some centers and variables are seen as KRIs under a RBM approach. Targeted monitoring or data management queries can be used to confirm whether the KRIs point to an actual issue or not.

  17. Data-driven integration of genome-scale regulatory and metabolic network models

    Science.gov (United States)

    Imam, Saheed; Schäuble, Sascha; Brooks, Aaron N.; Baliga, Nitin S.; Price, Nathan D.

    2015-01-01

    Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription, and signaling) have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert—a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system. PMID:25999934

  18. Data-driven HR how to use analytics and metrics to drive performance

    CERN Document Server

    Marr, Bernard

    2018-01-01

    Traditionally seen as a purely people function unconcerned with numbers, HR is now uniquely placed to use company data to drive performance, both of the people in the organization and the organization as a whole. Data-driven HR is a practical guide which enables HR practitioners to leverage the value of the vast amount of data available at their fingertips. Covering how to identify the most useful sources of data, how to collect information in a transparent way that is in line with data protection requirements and how to turn this data into tangible insights, this book marks a turning point for the HR profession. Covering all the key elements of HR including recruitment, employee engagement, performance management, wellbeing and training, Data-driven HR examines the ways data can contribute to organizational success by, among other things, optimizing processes, driving performance and improving HR decision making. Packed with case studies and real-life examples, this is essential reading for all HR profession...

  19. Data-driven integration of genome-scale regulatory and metabolic network models

    Directory of Open Access Journals (Sweden)

    Saheed eImam

    2015-05-01

    Full Text Available Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription and signaling have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert – a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system.

  20. Data-driven CT protocol review and management—experience from a large academic hospital.

    Science.gov (United States)

    Zhang, Da; Savage, Cristy A; Li, Xinhua; Liu, Bob

    2015-03-01

    Protocol review plays a critical role in CT quality assurance, but large numbers of protocols and inconsistent protocol names on scanners and in exam records make thorough protocol review formidable. In this investigation, we report on a data-driven cataloging process that can be used to assist in the reviewing and management of CT protocols. We collected lists of scanner protocols, as well as 18 months of recent exam records, for 10 clinical scanners. We developed computer algorithms to automatically deconstruct the protocol names on the scanner and in the exam records into core names and descriptive components. Based on the core names, we were able to group the scanner protocols into a much smaller set of "core protocols," and to easily link exam records with the scanner protocols. We calculated the percentage of usage for each core protocol, from which the most heavily used protocols were identified. From the percentage-of-usage data, we found that, on average, 18, 33, and 49 core protocols per scanner covered 80%, 90%, and 95%, respectively, of all exams. These numbers are one order of magnitude smaller than the typical numbers of protocols that are loaded on a scanner (200-300, as reported in the literature). Duplicated, outdated, and rarely used protocols on the scanners were easily pinpointed in the cataloging process. The data-driven cataloging process can facilitate the task of protocol review. Copyright © 2015 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  1. Data-driven approach for assessing utility of medical tests using electronic medical records.

    Science.gov (United States)

    Skrøvseth, Stein Olav; Augestad, Knut Magne; Ebadollahi, Shahram

    2015-02-01

    To precisely define the utility of tests in a clinical pathway through data-driven analysis of the electronic medical record (EMR). The information content was defined in terms of the entropy of the expected value of the test related to a given outcome. A kernel density classifier was used to estimate the necessary distributions. To validate the method, we used data from the EMR of the gastrointestinal department at a university hospital. Blood tests from patients undergoing surgery for gastrointestinal surgery were analyzed with respect to second surgery within 30 days of the index surgery. The information content is clearly reflected in the patient pathway for certain combinations of tests and outcomes. C-reactive protein tests coupled to anastomosis leakage, a severe complication show a clear pattern of information gain through the patient trajectory, where the greatest gain from the test is 3-4 days post index surgery. We have defined the information content in a data-driven and information theoretic way such that the utility of a test can be precisely defined. The results reflect clinical knowledge. In the case we used the tests carry little negative impact. The general approach can be expanded to cases that carry a substantial negative impact, such as in certain radiological techniques. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  2. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization.

    Directory of Open Access Journals (Sweden)

    Huan-Kai Peng

    Full Text Available In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications.

  3. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization

    Science.gov (United States)

    Peng, Huan-Kai; Lee, Hao-Chih; Pan, Jia-Yu; Marculescu, Radu

    2016-01-01

    In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications. PMID:26771830

  4. Data-driven directions for effective footwear provision for the high-risk diabetic foot.

    Science.gov (United States)

    Arts, M L J; de Haart, M; Waaijman, R; Dahmen, R; Berendsen, H; Nollet, F; Bus, S A

    2015-06-01

    Custom-made footwear is used to offload the diabetic foot to prevent plantar foot ulcers. This prospective study evaluates the offloading effects of modifying custom-made footwear and aims to provide data-driven directions for the provision of effectively offloading footwear in clinical practice. Eighty-five people with diabetic neuropathy and a recently healed plantar foot ulcer, who participated in a clinical trial on footwear effectiveness, had their custom-made footwear evaluated with in-shoe plantar pressure measurements at three-monthly intervals. Footwear was modified when peak pressure was ≥ 200 kPa. The effect of single and combined footwear modifications on in-shoe peak pressure at these high-pressure target locations was assessed. All footwear modifications significantly reduced peak pressure at the target locations compared with pre-modification levels (range -6.7% to -24.0%, P diabetic neuropathy and a recently healed plantar foot ulcer, significant offloading can be achieved at high-risk foot regions by modifying custom-made footwear. These results provide data-driven directions for the design and evaluation of custom-made footwear for high-risk people with diabetes, and essentially mean that each shoe prescribed should incorporate those design features that effectively offload the foot. © 2015 The Authors. Diabetic Medicine © 2015 Diabetes UK.

  5. Microenvironment temperature prediction between body and seat interface using autoregressive data-driven model.

    Science.gov (United States)

    Liu, Zhuofu; Wang, Lin; Luo, Zhongming; Heusch, Andrew I; Cascioli, Vincenzo; McCarthy, Peter W

    2015-11-01

    There is a need to develop a greater understanding of temperature at the skin-seat interface during prolonged seating from the perspectives of both industrial design (comfort/discomfort) and medical care (skin ulcer formation). Here we test the concept of predicting temperature at the seat surface and skin interface during prolonged sitting (such as required from wheelchair users). As caregivers are usually busy, such a method would give them warning ahead of a problem. This paper describes a data-driven model capable of predicting thermal changes and thus having the potential to provide an early warning (15- to 25-min ahead prediction) of an impending temperature that may increase the risk for potential skin damages for those subject to enforced sitting and who have little or no sensory feedback from this area. Initially, the oscillations of the original signal are suppressed using the reconstruction strategy of empirical mode decomposition (EMD). Consequentially, the autoregressive data-driven model can be used to predict future thermal trends based on a shorter period of acquisition, which reduces the possibility of introducing human errors and artefacts associated with longer duration "enforced" sitting by volunteers. In this study, the method had a maximum predictive error of body insensitivity and disability requiring them to be immobile in seats for prolonged periods. Copyright © 2015 Tissue Viability Society. Published by Elsevier Ltd. All rights reserved.

  6. Ensemble of data-driven prognostic algorithms for robust prediction of remaining useful life

    International Nuclear Information System (INIS)

    Hu Chao; Youn, Byeng D.; Wang Pingfeng; Taek Yoon, Joung

    2012-01-01

    Prognostics aims at determining whether a failure of an engineered system (e.g., a nuclear power plant) is impending and estimating the remaining useful life (RUL) before the failure occurs. The traditional data-driven prognostic approach is to construct multiple candidate algorithms using a training data set, evaluate their respective performance using a testing data set, and select the one with the best performance while discarding all the others. This approach has three shortcomings: (i) the selected standalone algorithm may not be robust; (ii) it wastes the resources for constructing the algorithms that are discarded; (iii) it requires the testing data in addition to the training data. To overcome these drawbacks, this paper proposes an ensemble data-driven prognostic approach which combines multiple member algorithms with a weighted-sum formulation. Three weighting schemes, namely the accuracy-based weighting, diversity-based weighting and optimization-based weighting, are proposed to determine the weights of member algorithms. The k-fold cross validation (CV) is employed to estimate the prediction error required by the weighting schemes. The results obtained from three case studies suggest that the ensemble approach with any weighting scheme gives more accurate RUL predictions compared to any sole algorithm when member algorithms producing diverse RUL predictions have comparable prediction accuracy and that the optimization-based weighting scheme gives the best overall performance among the three weighting schemes.

  7. Practical options for selecting data-driven or physics-based prognostics algorithms with reviews

    International Nuclear Information System (INIS)

    An, Dawn; Kim, Nam H.; Choi, Joo-Ho

    2015-01-01

    This paper is to provide practical options for prognostics so that beginners can select appropriate methods for their fields of application. To achieve this goal, several popular algorithms are first reviewed in the data-driven and physics-based prognostics methods. Each algorithm’s attributes and pros and cons are analyzed in terms of model definition, model parameter estimation and ability to handle noise and bias in data. Fatigue crack growth examples are then used to illustrate the characteristics of different algorithms. In order to suggest a suitable algorithm, several studies are made based on the number of data sets, the level of noise and bias, availability of loading and physical models, and complexity of the damage growth behavior. Based on the study, it is concluded that the Gaussian process is easy and fast to implement, but works well only when the covariance function is properly defined. The neural network has the advantage in the case of large noise and complex models but only with many training data sets. The particle filter and Bayesian method are superior to the former methods because they are less affected by noise and model complexity, but work only when physical model and loading conditions are available. - Highlights: • Practical review of data-driven and physics-based prognostics are provided. • As common prognostics algorithms, NN, GP, PF and BM are introduced. • Algorithms’ attributes, pros and cons, and applicable conditions are discussed. • This will be helpful to choose the best algorithm for different applications

  8. NOvA Event Building, Buffering and Data-Driven Triggering From Within the DAQ System

    Energy Technology Data Exchange (ETDEWEB)

    Fischler, M. [Fermilab; Green, C. [Fermilab; Kowalkowski, J. [Fermilab; Norman, A. [Fermilab; Paterno, M. [Fermilab; Rechenmacher, R. [Fermilab

    2012-06-22

    To make its core measurements, the NOvA experiment needs to make real-time data-driven decisions involving beam-spill time correlation and other triggering issues. NOvA-DDT is a prototype Data-Driven Triggering system, built using the Fermilab artdaq generic DAQ/Event-building tools set. This provides the advantages of sharing online software infrastructure with other Intensity Frontier experiments, and of being able to use any offline analysis module--unchanged--as a component of the online triggering decisions. The NOvA-artdaq architecture chosen has significant advantages, including graceful degradation if the triggering decision software fails or cannot be done quickly enough for some fraction of the time-slice ``events.'' We have tested and measured the performance and overhead of NOvA-DDT using an actual Hough transform based trigger decision module taken from the NOvA offline software. The results of these tests--98 ms mean time per event on only 1/16 of th e available processing power of a node, and overheads of about 2 ms per event--provide a proof of concept: NOvA-DDT is a viable strategy for data acquisition, event building, and trigger processing at the NOvA far detector.

  9. The effects of data-driven learning activities on EFL learners' writing development.

    Science.gov (United States)

    Luo, Qinqin

    2016-01-01

    Data-driven learning has been proved as an effective approach in helping learners solve various writing problems such as correcting lexical or grammatical errors, improving the use of collocations and generating ideas in writing, etc. This article reports on an empirical study in which data-driven learning was accomplished with the assistance of the user-friendly BNCweb, and presents the evaluation of the outcome by comparing the effectiveness of BNCweb and a search engine Baidu which is most commonly used as reference resource by Chinese learners of English as a foreign language. The quantitative results about 48 Chinese college students revealed that the experimental group which used BNCweb performed significantly better in the post-test in terms of writing fluency and accuracy, as compared with the control group which used the search engine Baidu. However, no significant difference was found between the two groups in terms of writing complexity. The qualitative results about the interview revealed that learners generally showed a positive attitude toward the use of BNCweb but there were still some problems of using corpora in the writing process, thus the combined use of corpora and other types of reference resource was suggested as a possible way to counter the potential barriers for Chinese learners of English.

  10. Fault Detection for Nonlinear Process With Deterministic Disturbances: A Just-In-Time Learning Based Data Driven Method.

    Science.gov (United States)

    Yin, Shen; Gao, Huijun; Qiu, Jianbin; Kaynak, Okyay

    2017-11-01

    Data-driven fault detection plays an important role in industrial systems due to its applicability in case of unknown physical models. In fault detection, disturbances must be taken into account as an inherent characteristic of processes. Nevertheless, fault detection for nonlinear processes with deterministic disturbances still receive little attention, especially in data-driven field. To solve this problem, a just-in-time learning-based data-driven (JITL-DD) fault detection method for nonlinear processes with deterministic disturbances is proposed in this paper. JITL-DD employs JITL scheme for process description with local model structures to cope with processes dynamics and nonlinearity. The proposed method provides a data-driven fault detection solution for nonlinear processes with deterministic disturbances, and owns inherent online adaptation and high accuracy of fault detection. Two nonlinear systems, i.e., a numerical example and a sewage treatment process benchmark, are employed to show the effectiveness of the proposed method.

  11. Global retrieval of soil moisture and vegetation properties using data-driven methods

    Science.gov (United States)

    Rodriguez-Fernandez, Nemesio; Richaume, Philippe; Kerr, Yann

    2017-04-01

    Data-driven methods such as neural networks (NNs) are a powerful tool to retrieve soil moisture from multi-wavelength remote sensing observations at global scale. In this presentation we will review a number of recent results regarding the retrieval of soil moisture with the Soil Moisture and Ocean Salinity (SMOS) satellite, either using SMOS brightness temperatures as input data for the retrieval or using SMOS soil moisture retrievals as reference dataset for the training. The presentation will discuss several possibilities for both the input datasets and the datasets to be used as reference for the supervised learning phase. Regarding the input datasets, it will be shown that NNs take advantage of the synergy of SMOS data and data from other sensors such as the Advanced Scatterometer (ASCAT, active microwaves) and MODIS (visible and infra red). NNs have also been successfully used to construct long time series of soil moisture from the Advanced Microwave Scanning Radiometer - Earth Observing System (AMSR-E) and SMOS. A NN with input data from ASMR-E observations and SMOS soil moisture as reference for the training was used to construct a dataset sharing a similar climatology and without a significant bias with respect to SMOS soil moisture. Regarding the reference data to train the data-driven retrievals, we will show different possibilities depending on the application. Using actual in situ measurements is challenging at global scale due to the scarce distribution of sensors. In contrast, in situ measurements have been successfully used to retrieve SM at continental scale in North America, where the density of in situ measurement stations is high. Using global land surface models to train the NN constitute an interesting alternative to implement new remote sensing surface datasets. In addition, these datasets can be used to perform data assimilation into the model used as reference for the training. This approach has recently been tested at the European Centre

  12. Automatic translation of MPI source into a latency-tolerant, data-driven form

    International Nuclear Information System (INIS)

    Nguyen, Tan; Cicotti, Pietro; Bylaska, Eric; Quinlan, Dan; Baden, Scott

    2017-01-01

    Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo’s performance meets or exceeds that of labor-intensive hand coding. As a result, the translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a well-known library.

  13. The test of data driven TDC application in high energy physics experiment

    International Nuclear Information System (INIS)

    Liu Shubin; Guo Jianhua; Zhang Yanli; Zhao Long; An Qi

    2006-01-01

    In the high energy physics domain there is a trend to use integrated, high resolution, multi-hit time-digital-converter for time measurement, of which the data driven TDC is an important direction. Study on the method of how to test high performance TDC's characters and how to improve these characters will help us to select the proper TDC. The authors have studied the testing of a new high resolution TDC, which is planned to use in the third modification project of Beijing Spectrometer (BESIII). This paper introduces the test platform we built for the TDC, and the method by which we tested for nonlinearity, resolution, double pulse resolution characters, etc. The paper also gives the test results and introduces the compensation way to achieve a very high resolution (24.4 ps). (authors)

  14. Data-driven techniques to estimate parameters in a rate-dependent ferromagnetic hysteresis model

    International Nuclear Information System (INIS)

    Hu Zhengzheng; Smith, Ralph C.; Ernstberger, Jon M.

    2012-01-01

    The quantification of rate-dependent ferromagnetic hysteresis is important in a range of applications including high speed milling using Terfenol-D actuators. There exist a variety of frameworks for characterizing rate-dependent hysteresis including the magnetic model in Ref. , the homogenized energy framework, Preisach formulations that accommodate after-effects, and Prandtl-Ishlinskii models. A critical issue when using any of these models to characterize physical devices concerns the efficient estimation of model parameters through least squares data fits. A crux of this issue is the determination of initial parameter estimates based on easily measured attributes of the data. In this paper, we present data-driven techniques to efficiently and robustly estimate parameters in the homogenized energy model. This framework was chosen due to its physical basis and its applicability to ferroelectric, ferromagnetic and ferroelastic materials.

  15. Data-driven fault detection for industrial processes canonical correlation analysis and projection based methods

    CERN Document Server

    Chen, Zhiwen

    2017-01-01

    Zhiwen Chen aims to develop advanced fault detection (FD) methods for the monitoring of industrial processes. With the ever increasing demands on reliability and safety in industrial processes, fault detection has become an important issue. Although the model-based fault detection theory has been well studied in the past decades, its applications are limited to large-scale industrial processes because it is difficult to build accurate models. Furthermore, motivated by the limitations of existing data-driven FD methods, novel canonical correlation analysis (CCA) and projection-based methods are proposed from the perspectives of process input and output data, less engineering effort and wide application scope. For performance evaluation of FD methods, a new index is also developed. Contents A New Index for Performance Evaluation of FD Methods CCA-based FD Method for the Monitoring of Stationary Processes Projection-based FD Method for the Monitoring of Dynamic Processes Benchmark Study and Real-Time Implementat...

  16. Combining engineering and data-driven approaches: Development of a generic fire risk model facilitating calibration

    DEFF Research Database (Denmark)

    De Sanctis, G.; Fischer, K.; Kohler, J.

    2014-01-01

    Fire risk models support decision making for engineering problems under the consistent consideration of the associated uncertainties. Empirical approaches can be used for cost-benefit studies when enough data about the decision problem are available. But often the empirical approaches...... a generic risk model that is calibrated to observed fire loss data. Generic risk models assess the risk of buildings based on specific risk indicators and support risk assessment at a portfolio level. After an introduction to the principles of generic risk assessment, the focus of the present paper...... are not detailed enough. Engineering risk models, on the other hand, may be detailed but typically involve assumptions that may result in a biased risk assessment and make a cost-benefit study problematic. In two related papers it is shown how engineering and data-driven modeling can be combined by developing...

  17. Sensor fault analysis using decision theory and data-driven modeling of pressurized water reactor subsystems

    International Nuclear Information System (INIS)

    Upadhyaya, B.R.; Skorska, M.

    1984-01-01

    Instrument fault detection and estimation is important for process surveillance, control, and safety functions of a power plant. The method incorporates the dual-hypotheses decision procedure and system characterization using data-driven time-domain models of signals representing the system. The multivariate models can be developed on-line and can be adapted to changing system conditions. For the method to be effective, specific subsystems of pressurized water reactors were considered, and signal selection was made such that a strong causal relationship exists among the measured variables. The technique is applied to the reactor core subsystem of the loss-of-fluid test reactor using in-core neutron detector and core-exit thermocouple signals. Thermocouple anomalies such as bias error, noise error, and slow drift in the sensor are detected and estimated using appropriate measurement models

  18. Data-driven process decomposition and robust online distributed modelling for large-scale processes

    Science.gov (United States)

    Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou

    2018-02-01

    With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.

  19. A new data-driven controllability measure with application in intelligent buildings

    DEFF Research Database (Denmark)

    Shaker, Hamid Reza; Lazarova-Molnar, Sanja

    2017-01-01

    and instrumentation within today's intelligent buildings enable collecting high quality data which could be used directly in data-based analysis and control methods. The area of data-based systems analysis and control is concentrating on developing analysis and control methods that rely on data collected from meters...... and sensors, and information obtained by data processing. This differs from the traditional model-based approaches that are based on mathematical models of systems. We propose and describe a data-driven controllability measure for discrete-time linear systems. The concept is developed within a data......-based system analysis and control framework. Therefore, only measured data is used to obtain the proposed controllability measure. The proposed controllability measure not only shows if the system is controllable or not, but also reveals the level of controllability, which is the information its previous...

  20. Beyond Crowd Judgments: Data-driven Estimation of Market Value in Association Football

    DEFF Research Database (Denmark)

    Müller, Oliver; Simons, Alexander; Weinmann, Markus

    2017-01-01

    concern. Market values can be understood as estimates of transfer fees—that is, prices that could be paid for a player on the football market—so they play an important role in transfer negotiations. These values have traditionally been estimated by football experts, but crowdsourcing has emerged......Association football is a popular sport, but it is also a big business. From a managerial perspective, the most important decisions that team managers make concern player transfers, so issues related to player valuation, especially the determination of transfer fees and market values, are of major......’ market values using multilevel regression analysis. The regression results suggest that data-driven estimates of market value can overcome several of the crowd's practical limitations while producing comparably accurate numbers. Our results have important implications for football managers and scouts...

  1. Data-driven design of fault diagnosis systems nonlinear multimode processes

    CERN Document Server

    Haghani Abandan Sari, Adel

    2014-01-01

    In many industrial applications early detection and diagnosis of abnormal behavior of the plant is of great importance. During the last decades, the complexity of process plants has been drastically increased, which imposes great challenges in development of model-based monitoring approaches and it sometimes becomes unrealistic for modern large-scale processes. The main objective of Adel Haghani Abandan Sari is to study efficient fault diagnosis techniques for complex industrial systems using process historical data and considering the nonlinear behavior of the process. To this end, different methods are presented to solve the fault diagnosis problem based on the overall behavior of the process and its dynamics. Moreover, a novel technique is proposed for fault isolation and determination of the root-cause of the faults in the system, based on the fault impacts on the process measurements. Contents Process monitoring Fault diagnosis and fault-tolerant control Data-driven approaches and decision making Target...

  2. Data-driven modeling, control and tools for cyber-physical energy systems

    Science.gov (United States)

    Behl, Madhur

    Energy systems are experiencing a gradual but substantial change in moving away from being non-interactive and manually-controlled systems to utilizing tight integration of both cyber (computation, communications, and control) and physical representations guided by first principles based models, at all scales and levels. Furthermore, peak power reduction programs like demand response (DR) are becoming increasingly important as the volatility on the grid continues to increase due to regulation, integration of renewables and extreme weather conditions. In order to shield themselves from the risk of price volatility, end-user electricity consumers must monitor electricity prices and be flexible in the ways they choose to use electricity. This requires the use of control-oriented predictive models of an energy system's dynamics and energy consumption. Such models are needed for understanding and improving the overall energy efficiency and operating costs. However, learning dynamical models using grey/white box approaches is very cost and time prohibitive since it often requires significant financial investments in retrofitting the system with several sensors and hiring domain experts for building the model. We present the use of data-driven methods for making model capture easy and efficient for cyber-physical energy systems. We develop Model-IQ, a methodology for analysis of uncertainty propagation for building inverse modeling and controls. Given a grey-box model structure and real input data from a temporary set of sensors, Model-IQ evaluates the effect of the uncertainty propagation from sensor data to model accuracy and to closed-loop control performance. We also developed a statistical method to quantify the bias in the sensor measurement and to determine near optimal sensor placement and density for accurate data collection for model training and control. Using a real building test-bed, we show how performing an uncertainty analysis can reveal trends about

  3. Automatic sleep classification using a data-driven topic model reveals latent sleep states

    DEFF Research Database (Denmark)

    Koch, Henriette; Christensen, Julie Anja Engelhard; Frandsen, Rune

    2014-01-01

    Latent Dirichlet Allocation. Model application was tested on control subjects and patients with periodic leg movements (PLM) representing a non-neurodegenerative group, and patients with idiopathic REM sleep behavior disorder (iRBD) and Parkinson's Disease (PD) representing a neurodegenerative group......Background: The golden standard for sleep classification uses manual scoring of polysomnography despite points of criticism such as oversimplification, low inter-rater reliability and the standard being designed on young and healthy subjects. New method: To meet the criticism and reveal the latent...... sleep states, this study developed a general and automatic sleep classifier using a data-driven approach. Spectral EEG and EOG measures and eye correlation in 1 s windows were calculated and each sleep epoch was expressed as a mixture of probabilities of latent sleep states by using the topic model...

  4. Data-driven outbreak forecasting with a simple nonlinear growth model.

    Science.gov (United States)

    Lega, Joceline; Brown, Heidi E

    2016-12-01

    Recent events have thrown the spotlight on infectious disease outbreak response. We developed a data-driven method, EpiGro, which can be applied to cumulative case reports to estimate the order of magnitude of the duration, peak and ultimate size of an ongoing outbreak. It is based on a surprisingly simple mathematical property of many epidemiological data sets, does not require knowledge or estimation of disease transmission parameters, is robust to noise and to small data sets, and runs quickly due to its mathematical simplicity. Using data from historic and ongoing epidemics, we present the model. We also provide modeling considerations that justify this approach and discuss its limitations. In the absence of other information or in conjunction with other models, EpiGro may be useful to public health responders. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  5. A Data-Driven Frequency-Domain Approach for Robust Controller Design via Convex Optimization

    CERN Document Server

    AUTHOR|(CDS)2092751; Martino, Michele

    The objective of this dissertation is to develop data-driven frequency-domain methods for designing robust controllers through the use of convex optimization algorithms. Many of today's industrial processes are becoming more complex, and modeling accurate physical models for these plants using first principles may be impossible. Albeit a model may be available; however, such a model may be too complex to consider for an appropriate controller design. With the increased developments in the computing world, large amounts of measured data can be easily collected and stored for processing purposes. Data can also be collected and used in an on-line fashion. Thus it would be very sensible to make full use of this data for controller design, performance evaluation, and stability analysis. The design methods imposed in this work ensure that the dynamics of a system are captured in an experiment and avoids the problem of unmodeled dynamics associated with parametric models. The devised methods consider robust designs...

  6. Data-Driven Based Asynchronous Motor Control for Printing Servo Systems

    Science.gov (United States)

    Bian, Min; Guo, Qingyun

    Modern digital printing equipment aims to the environmental-friendly industry with high dynamic performances and control precision and low vibration and abrasion. High performance motion control system of printing servo systems was required. Control system of asynchronous motor based on data acquisition was proposed. Iterative learning control (ILC) algorithm was studied. PID control was widely used in the motion control. However, it was sensitive to the disturbances and model parameters variation. The ILC applied the history error data and present control signals to approximate the control signal directly in order to fully track the expect trajectory without the system models and structures. The motor control algorithm based on the ILC and PID was constructed and simulation results were given. The results show that data-driven control method is effective dealing with bounded disturbances for the motion control of printing servo systems.

  7. DOE High Performance Computing Operational Review (HPCOR): Enabling Data-Driven Scientific Discovery at HPC Facilities

    Energy Technology Data Exchange (ETDEWEB)

    Gerber, Richard; Allcock, William; Beggio, Chris; Campbell, Stuart; Cherry, Andrew; Cholia, Shreyas; Dart, Eli; England, Clay; Fahey, Tim; Foertter, Fernanda; Goldstone, Robin; Hick, Jason; Karelitz, David; Kelly, Kaki; Monroe, Laura; Prabhat,; Skinner, David; White, Julia

    2014-10-17

    U.S. Department of Energy (DOE) High Performance Computing (HPC) facilities are on the verge of a paradigm shift in the way they deliver systems and services to science and engineering teams. Research projects are producing a wide variety of data at unprecedented scale and level of complexity, with community-specific services that are part of the data collection and analysis workflow. On June 18-19, 2014 representatives from six DOE HPC centers met in Oakland, CA at the DOE High Performance Operational Review (HPCOR) to discuss how they can best provide facilities and services to enable large-scale data-driven scientific discovery at the DOE national laboratories. The report contains findings from that review.

  8. A data-driven fault-tolerant control design of linear multivariable systems with performance optimization.

    Science.gov (United States)

    Li, Zhe; Yang, Guang-Hong

    2017-09-01

    In this paper, an integrated data-driven fault-tolerant control (FTC) design scheme is proposed under the configuration of the Youla parameterization for multiple-input multiple-output (MIMO) systems. With unknown system model parameters, the canonical form identification technique is first applied to design the residual observer in fault-free case. In faulty case, with online tuning of the Youla parameters based on the system data via the gradient-based algorithm, the fault influence is attenuated with system performance optimization. In addition, to improve the robustness of the residual generator to a class of system deviations, a novel adaptive scheme is proposed for the residual generator to prevent its over-activation. Simulation results of a two-tank flow system demonstrate the optimized performance and effect of the proposed FTC scheme. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  9. Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics

    Directory of Open Access Journals (Sweden)

    Richard Mark Leggett

    2013-12-01

    Full Text Available The processes of quality assessment and control are an active area of research at The Genome Analysis Centre (TGAC. Unlike other sequencing centres that often concentrate on a certain species or technology, TGAC applies expertise in genomics and bioinformatics to a wide range of projects, often requiring bespoke wet lab and in silico workflows. TGAC is fortunate to have access to a diverse range of sequencing and analysis platforms, and we are at the forefront of investigations into library quality and sequence data assessment. We have developed and implemented a number of algorithms, tools, pipelines and packages to ascertain, store, and expose quality metrics across a number of next-generation sequencing platforms, allowing rapid and in-depth cross-platform QC bioinformatics. In this review, we describe these tools as a vehicle for data-driven informatics, offering the potential to provide richer context for downstream analysis and to inform experimental design.

  10. A data-driven multiplicative fault diagnosis approach for automation processes.

    Science.gov (United States)

    Hao, Haiyang; Zhang, Kai; Ding, Steven X; Chen, Zhiwen; Lei, Yaguo

    2014-09-01

    This paper presents a new data-driven method for diagnosing multiplicative key performance degradation in automation processes. Different from the well-established additive fault diagnosis approaches, the proposed method aims at identifying those low-level components which increase the variability of process variables and cause performance degradation. Based on process data, features of multiplicative fault are extracted. To identify the root cause, the impact of fault on each process variable is evaluated in the sense of contribution to performance degradation. Then, a numerical example is used to illustrate the functionalities of the method and Monte-Carlo simulation is performed to demonstrate the effectiveness from the statistical viewpoint. Finally, to show the practical applicability, a case study on the Tennessee Eastman process is presented. Copyright © 2013. Published by Elsevier Ltd.

  11. Data-driven gradient algorithm for high-precision quantum control

    Science.gov (United States)

    Wu, Re-Bing; Chu, Bing; Owens, David H.; Rabitz, Herschel

    2018-04-01

    In the quest to achieve scalable quantum information processing technologies, gradient-based optimal control algorithms (e.g., grape) are broadly used for implementing high-precision quantum gates, but their performance is often hindered by deterministic or random errors in the system model and the control electronics. In this paper, we show that grape can be taught to be more effective by jointly learning from the design model and the experimental data obtained from process tomography. The resulting data-driven gradient optimization algorithm (d-grape) can in principle correct all deterministic gate errors, with a mild efficiency loss. The d-grape algorithm may become more powerful with broadband controls that involve a large number of control parameters, while other algorithms usually slow down due to the increased size of the search space. These advantages are demonstrated by simulating the implementation of a two-qubit controlled-not gate.

  12. USACM Thematic Workshop On Uncertainty Quantification And Data-Driven Modeling.

    Energy Technology Data Exchange (ETDEWEB)

    Stewart, James R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-05-01

    The USACM Thematic Workshop on Uncertainty Quantification and Data-Driven Modeling was held on March 23-24, 2017, in Austin, TX. The organizers of the technical program were James R. Stewart of Sandia National Laboratories and Krishna Garikipati of University of Michigan. The administrative organizer was Ruth Hengst, who serves as Program Coordinator for the USACM. The organization of this workshop was coordinated through the USACM Technical Thrust Area on Uncertainty Quantification and Probabilistic Analysis. The workshop website (http://uqpm2017.usacm.org) includes the presentation agenda as well as links to several of the presentation slides (permission to access the presentations was granted by each of those speakers, respectively). Herein, this final report contains the complete workshop program that includes the presentation agenda, the presentation abstracts, and the list of posters.

  13. Linear dynamical modes as new variables for data-driven ENSO forecast

    Science.gov (United States)

    Gavrilov, Andrey; Seleznev, Aleksei; Mukhin, Dmitry; Loskutov, Evgeny; Feigin, Alexander; Kurths, Juergen

    2018-05-01

    A new data-driven model for analysis and prediction of spatially distributed time series is proposed. The model is based on a linear dynamical mode (LDM) decomposition of the observed data which is derived from a recently developed nonlinear dimensionality reduction approach. The key point of this approach is its ability to take into account simple dynamical properties of the observed system by means of revealing the system's dominant time scales. The LDMs are used as new variables for empirical construction of a nonlinear stochastic evolution operator. The method is applied to the sea surface temperature anomaly field in the tropical belt where the El Nino Southern Oscillation (ENSO) is the main mode of variability. The advantage of LDMs versus traditionally used empirical orthogonal function decomposition is demonstrated for this data. Specifically, it is shown that the new model has a competitive ENSO forecast skill in comparison with the other existing ENSO models.

  14. NOvA Event Building, Buffering and Data-Driven Triggering From Within the DAQ System

    International Nuclear Information System (INIS)

    Fischler, M; Rechenmacher, R; Green, C; Kowalkowski, J; Norman, A; Paterno, M

    2012-01-01

    The NOvA experiment is a long baseline neutrino experiment design to make precision probes of the structure of neutrino mixing. The experiment features a unique deadtimeless data acquisition system that is capable acquiring and building an event data stream from the continuous readout of the more than 360,000 far detector channels. In order to achieve its physics goals the experiment must be able to buffer, correlate and extract the data in this stream with the beam-spills that occur that Fermilab. In addition the NOvA experiment seeks to enhance its data collection efficiencies for rare class of event topologies that are valuable for calibration through the use of data driven triggering. The NOvA-DDT is a prototype Data-Driven Triggering system. NOvA-DDT has been developed using the Fermilab artdaq generic DAQ/Event-building toolkit. This toolkit provides the advantages of sharing online software infrastructure with other Intensity Frontier experiments, and of being able to use any offline analysis module-unchanged-as a component of the online triggering decisions. We have measured the performance and overhead of NOvA-DDT framework using a Hough transform based trigger decision module developed for the NOvA detector to identify cosmic rays. The results of these tests which were run on the NOvA prototype near detector, yielded a mean processing time of 98 ms per event, while consuming only 1/16th of the available processing capacity. These results provide a proof of concept that a NOvA-DDT based processing system is a viable strategy for data acquisition and triggering for the NOvA far detector.

  15. Testing the Accuracy of Data-driven MHD Simulations of Active Region Evolution

    Energy Technology Data Exchange (ETDEWEB)

    Leake, James E.; Linton, Mark G. [U.S. Naval Research Laboratory, 4555 Overlook Avenue, SW, Washington, DC 20375 (United States); Schuck, Peter W., E-mail: james.e.leake@nasa.gov [NASA Goddard Space Flight Center, 8800 Greenbelt Road, Greenbelt, MD 20771 (United States)

    2017-04-01

    Models for the evolution of the solar coronal magnetic field are vital for understanding solar activity, yet the best measurements of the magnetic field lie at the photosphere, necessitating the development of coronal models which are “data-driven” at the photosphere. We present an investigation to determine the feasibility and accuracy of such methods. Our validation framework uses a simulation of active region (AR) formation, modeling the emergence of magnetic flux from the convection zone to the corona, as a ground-truth data set, to supply both the photospheric information and to perform the validation of the data-driven method. We focus our investigation on how the accuracy of the data-driven model depends on the temporal frequency of the driving data. The Helioseismic and Magnetic Imager on NASA’s Solar Dynamics Observatory produces full-disk vector magnetic field measurements at a 12-minute cadence. Using our framework we show that ARs that emerge over 25 hr can be modeled by the data-driving method with only ∼1% error in the free magnetic energy, assuming the photospheric information is specified every 12 minutes. However, for rapidly evolving features, under-sampling of the dynamics at this cadence leads to a strobe effect, generating large electric currents and incorrect coronal morphology and energies. We derive a sampling condition for the driving cadence based on the evolution of these small-scale features, and show that higher-cadence driving can lead to acceptable errors. Future work will investigate the source of errors associated with deriving plasma variables from the photospheric magnetograms as well as other sources of errors, such as reduced resolution, instrument bias, and noise.

  16. Data-Driven Design of Intelligent Wireless Networks: An Overview and Tutorial

    Directory of Open Access Journals (Sweden)

    Merima Kulin

    2016-06-01

    Full Text Available Data science or “data-driven research” is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i clarifies when, why and how to use data science in wireless network research; (ii provides a generic framework for applying data science in wireless networks; (iii gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v provides the reader the necessary datasets and scripts to go through the tutorial steps themselves.

  17. First-principles data-driven discovery of transition metal oxides for artificial photosynthesis

    Science.gov (United States)

    Yan, Qimin

    We develop a first-principles data-driven approach for rapid identification of transition metal oxide (TMO) light absorbers and photocatalysts for artificial photosynthesis using the Materials Project. Initially focusing on Cr, V, and Mn-based ternary TMOs in the database, we design a broadly-applicable multiple-layer screening workflow automating density functional theory (DFT) and hybrid functional calculations of bulk and surface electronic and magnetic structures. We further assess the electrochemical stability of TMOs in aqueous environments from computed Pourbaix diagrams. Several promising earth-abundant low band-gap TMO compounds with desirable band edge energies and electrochemical stability are identified by our computational efforts and then synergistically evaluated using high-throughput synthesis and photoelectrochemical screening techniques by our experimental collaborators at Caltech. Our joint theory-experiment effort has successfully identified new earth-abundant copper and manganese vanadate complex oxides that meet highly demanding requirements for photoanodes, substantially expanding the known space of such materials. By integrating theory and experiment, we validate our approach and develop important new insights into structure-property relationships for TMOs for oxygen evolution photocatalysts, paving the way for use of first-principles data-driven techniques in future applications. This work is supported by the Materials Project Predictive Modeling Center and the Joint Center for Artificial Photosynthesis through the U.S. Department of Energy, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, under Contract No. DE-AC02-05CH11231. Computational resources also provided by the Department of Energy through the National Energy Supercomputing Center.

  18. Modeling and Predicting Carbon and Water Fluxes Using Data-Driven Techniques in a Forest Ecosystem

    Directory of Open Access Journals (Sweden)

    Xianming Dou

    2017-12-01

    Full Text Available Accurate estimation of carbon and water fluxes of forest ecosystems is of particular importance for addressing the problems originating from global environmental change, and providing helpful information about carbon and water content for analyzing and diagnosing past and future climate change. The main focus of the current work was to investigate the feasibility of four comparatively new methods, including generalized regression neural network, group method of data handling (GMDH, extreme learning machine and adaptive neuro-fuzzy inference system (ANFIS, for elucidating the carbon and water fluxes in a forest ecosystem. A comparison was made between these models and two widely used data-driven models, artificial neural network (ANN and support vector machine (SVM. All the models were evaluated based on the following statistical indices: coefficient of determination, Nash-Sutcliffe efficiency, root mean square error and mean absolute error. Results indicated that the data-driven models are capable of accounting for most variance in each flux with the limited meteorological variables. The ANN model provided the best estimates for gross primary productivity (GPP and net ecosystem exchange (NEE, while the ANFIS model achieved the best for ecosystem respiration (R, indicating that no single model was consistently superior to others for the carbon flux prediction. In addition, the GMDH model consistently produced somewhat worse results for all the carbon flux and evapotranspiration (ET estimations. On the whole, among the carbon and water fluxes, all the models produced similar highly satisfactory accuracy for GPP, R and ET fluxes, and did a reasonable job of reproducing the eddy covariance NEE. Based on these findings, it was concluded that these advanced models are promising alternatives to ANN and SVM for estimating the terrestrial carbon and water fluxes.

  19. Preface [HD3-2015: International meeting on high-dimensional data-driven science

    International Nuclear Information System (INIS)

    2016-01-01

    A never-ending series of innovations in measurement technology and evolutions in information and communication technologies have led to the ongoing generation and accumulation of large quantities of high-dimensional data every day. While detailed data-centric approaches have been pursued in respective research fields, situations have been encountered where the same mathematical framework of high-dimensional data analysis can be found in a wide variety of seemingly unrelated research fields, such as estimation on the basis of undersampled Fourier transform in nuclear magnetic resonance spectroscopy in chemistry, in magnetic resonance imaging in medicine, and in astronomical interferometry in astronomy. In such situations, bringing diverse viewpoints together therefore becomes a driving force for the creation of innovative developments in various different research fields. This meeting focuses on “Sparse Modeling” (SpM) as a methodology for creation of innovative developments through the incorporation of a wide variety of viewpoints in various research fields. The objective of this meeting is to offer a forum where researchers with interest in SpM can assemble and exchange information on the latest results and newly established methodologies, and discuss future directions of the interdisciplinary studies for High-Dimensional Data-Driven science (HD 3 ). The meeting was held in Kyoto from 14-17 December 2015. We are pleased to publish 22 papers contributed by invited speakers in this volume of Journal of Physics: Conference Series. We hope that this volume will promote further development of High-Dimensional Data-Driven science. (paper)

  20. A data-driven weighting scheme for multivariate phenotypic endpoints recapitulates zebrafish developmental cascades

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Guozhu, E-mail: gzhang6@ncsu.edu [Bioinformatics Research Center, North Carolina State University, Raleigh, NC (United States); Roell, Kyle R., E-mail: krroell@ncsu.edu [Bioinformatics Research Center, North Carolina State University, Raleigh, NC (United States); Truong, Lisa, E-mail: lisa.truong@oregonstate.edu [Department of Environmental and Molecular Toxicology, Sinnhuber Aquatic Research Laboratory, Oregon State University, Corvallis, OR (United States); Tanguay, Robert L., E-mail: robert.tanguay@oregonstate.edu [Department of Environmental and Molecular Toxicology, Sinnhuber Aquatic Research Laboratory, Oregon State University, Corvallis, OR (United States); Reif, David M., E-mail: dmreif@ncsu.edu [Bioinformatics Research Center, North Carolina State University, Raleigh, NC (United States); Department of Biological Sciences, Center for Human Health and the Environment, North Carolina State University, Raleigh, NC (United States)

    2017-01-01

    Zebrafish have become a key alternative model for studying health effects of environmental stressors, partly due to their genetic similarity to humans, fast generation time, and the efficiency of generating high-dimensional systematic data. Studies aiming to characterize adverse health effects in zebrafish typically include several phenotypic measurements (endpoints). While there is a solid biomedical basis for capturing a comprehensive set of endpoints, making summary judgments regarding health effects requires thoughtful integration across endpoints. Here, we introduce a Bayesian method to quantify the informativeness of 17 distinct zebrafish endpoints as a data-driven weighting scheme for a multi-endpoint summary measure, called weighted Aggregate Entropy (wAggE). We implement wAggE using high-throughput screening (HTS) data from zebrafish exposed to five concentrations of all 1060 ToxCast chemicals. Our results show that our empirical weighting scheme provides better performance in terms of the Receiver Operating Characteristic (ROC) curve for identifying significant morphological effects and improves robustness over traditional curve-fitting approaches. From a biological perspective, our results suggest that developmental cascade effects triggered by chemical exposure can be recapitulated by analyzing the relationships among endpoints. Thus, wAggE offers a powerful approach for analysis of multivariate phenotypes that can reveal underlying etiological processes. - Highlights: • Introduced a data-driven weighting scheme for multiple phenotypic endpoints. • Weighted Aggregate Entropy (wAggE) implies differential importance of endpoints. • Endpoint relationships reveal developmental cascade effects triggered by exposure. • wAggE is generalizable to multi-endpoint data of different shapes and scales.

  1. A data-driven prediction method for fast-slow systems

    Science.gov (United States)

    Groth, Andreas; Chekroun, Mickael; Kondrashov, Dmitri; Ghil, Michael

    2016-04-01

    In this work, we present a prediction method for processes that exhibit a mixture of variability on low and fast scales. The method relies on combining empirical model reduction (EMR) with singular spectrum analysis (SSA). EMR is a data-driven methodology for constructing stochastic low-dimensional models that account for nonlinearity and serial correlation in the estimated noise, while SSA provides a decomposition of the complex dynamics into low-order components that capture spatio-temporal behavior on different time scales. Our study focuses on the data-driven modeling of partial observations from dynamical systems that exhibit power spectra with broad peaks. The main result in this talk is that the combination of SSA pre-filtering with EMR modeling improves, under certain circumstances, the modeling and prediction skill of such a system, as compared to a standard EMR prediction based on raw data. Specifically, it is the separation into "fast" and "slow" temporal scales by the SSA pre-filtering that achieves the improvement. We show, in particular that the resulting EMR-SSA emulators help predict intermittent behavior such as rapid transitions between specific regions of the system's phase space. This capability of the EMR-SSA prediction will be demonstrated on two low-dimensional models: the Rössler system and a Lotka-Volterra model for interspecies competition. In either case, the chaotic dynamics is produced through a Shilnikov-type mechanism and we argue that the latter seems to be an important ingredient for the good prediction skills of EMR-SSA emulators. Shilnikov-type behavior has been shown to arise in various complex geophysical fluid models, such as baroclinic quasi-geostrophic flows in the mid-latitude atmosphere and wind-driven double-gyre ocean circulation models. This pervasiveness of the Shilnikow mechanism of fast-slow transition opens interesting perspectives for the extension of the proposed EMR-SSA approach to more realistic situations.

  2. BatchJS: Implementing Batches in JavaScript

    NARCIS (Netherlands)

    D. Kasemier

    2014-01-01

    htmlabstractNone of our popular programming languages know how to handle distribution well. Yet our programs interact more and more with each other and our data resorts in databases and web services. Batches are a new addition to languages that can finally bring native support for distribution to

  3. Fork-join and data-driven execution models on multi-core architectures: Case study of the FMM

    KAUST Repository

    Amer, Abdelhalim

    2013-01-01

    Extracting maximum performance of multi-core architectures is a difficult task primarily due to bandwidth limitations of the memory subsystem and its complex hierarchy. In this work, we study the implications of fork-join and data-driven execution models on this type of architecture at the level of task parallelism. For this purpose, we use a highly optimized fork-join based implementation of the FMM and extend it to a data-driven implementation using a distributed task scheduling approach. This study exposes some limitations of the conventional fork-join implementation in terms of synchronization overheads. We find that these are not negligible and their elimination by the data-driven method, with a careful data locality strategy, was beneficial. Experimental evaluation of both methods on state-of-the-art multi-socket multi-core architectures showed up to 22% speed-ups of the data-driven approach compared to the original method. We demonstrate that a data-driven execution of FMM not only improves performance by avoiding global synchronization overheads but also reduces the memory-bandwidth pressure caused by memory-intensive computations. © 2013 Springer-Verlag.

  4. Tracking Invasive Alien Species (TrIAS: Building a data-driven framework to inform policy

    Directory of Open Access Journals (Sweden)

    Sonia Vanderhoeven

    2017-05-01

    Full Text Available Imagine a future where dynamically, from year to year, we can track the progression of alien species (AS, identify emerging problem species, assess their current and future risk and timely inform policy in a seamless data-driven workflow. One that is built on open science and open data infrastructures. By using international biodiversity standards and facilities, we would ensure interoperability, repeatability and sustainability. This would make the process adaptable to future requirements in an evolving AS policy landscape both locally and internationally. In recent years, Belgium has developed decision support tools to inform invasive alien species (IAS policy, including information systems, early warning initiatives and risk assessment protocols. However, the current workflows from biodiversity observations to IAS science and policy are slow, not easily repeatable, and their scope is often taxonomically, spatially and temporally limited. This is mainly caused by the diversity of actors involved and the closed, fragmented nature of the sources of these biodiversity data, which leads to considerable knowledge gaps for IAS research and policy. We will leverage expertise and knowledge from nine former and current BELSPO projects and initiatives: Alien Alert, Invaxen, Diars, INPLANBEL, Alien Impact, Ensis, CORDEX.be, Speedy and the Belgian Biodiversity Platform. The project will be built on two components: 1 The establishment of a data mobilization framework for AS data from diverse data sources and 2 the development of data-driven procedures for risk evaluation based on risk modelling, risk mapping and risk assessment. We will use facilities from the Global Biodiversity Information Facility (GBIF, standards from the Biodiversity Information Standards organization (TDWG and expertise from Lifewatch to create and facilitate a systematic workflow. Alien species data will be gathered from a large set of regional, national and international

  5. Towards Data-Driven Simulations of Wildfire Spread using Ensemble-based Data Assimilation

    Science.gov (United States)

    Rochoux, M. C.; Bart, J.; Ricci, S. M.; Cuenot, B.; Trouvé, A.; Duchaine, F.; Morel, T.

    2012-12-01

    Real-time predictions of a propagating wildfire remain a challenging task because the problem involves both multi-physics and multi-scales. The propagation speed of wildfires, also called the rate of spread (ROS), is indeed determined by complex interactions between pyrolysis, combustion and flow dynamics, atmospheric dynamics occurring at vegetation, topographical and meteorological scales. Current operational fire spread models are mainly based on a semi-empirical parameterization of the ROS in terms of vegetation, topographical and meteorological properties. For the fire spread simulation to be predictive and compatible with operational applications, the uncertainty on the ROS model should be reduced. As recent progress made in remote sensing technology provides new ways to monitor the fire front position, a promising approach to overcome the difficulties found in wildfire spread simulations is to integrate fire modeling and fire sensing technologies using data assimilation (DA). For this purpose we have developed a prototype data-driven wildfire spread simulator in order to provide optimal estimates of poorly known model parameters [*]. The data-driven simulation capability is adapted for more realistic wildfire spread : it considers a regional-scale fire spread model that is informed by observations of the fire front location. An Ensemble Kalman Filter algorithm (EnKF) based on a parallel computing platform (OpenPALM) was implemented in order to perform a multi-parameter sequential estimation where wind magnitude and direction are in addition to vegetation properties (see attached figure). The EnKF algorithm shows its good ability to track a small-scale grassland fire experiment and ensures a good accounting for the sensitivity of the simulation outcomes to the control parameters. As a conclusion, it was shown that data assimilation is a promising approach to more accurately forecast time-varying wildfire spread conditions as new airborne-like observations of

  6. A non-linear dimension reduction methodology for generating data-driven stochastic input models

    Science.gov (United States)

    Ganapathysubramanian, Baskar; Zabaras, Nicholas

    2008-06-01

    Stochastic analysis of random heterogeneous media (polycrystalline materials, porous media, functionally graded materials) provides information of significance only if realistic input models of the topology and property variations are used. This paper proposes a framework to construct such input stochastic models for the topology and thermal diffusivity variations in heterogeneous media using a data-driven strategy. Given a set of microstructure realizations (input samples) generated from given statistical information about the medium topology, the framework constructs a reduced-order stochastic representation of the thermal diffusivity. This problem of constructing a low-dimensional stochastic representation of property variations is analogous to the problem of manifold learning and parametric fitting of hyper-surfaces encountered in image processing and psychology. Denote by M the set of microstructures that satisfy the given experimental statistics. A non-linear dimension reduction strategy is utilized to map M to a low-dimensional region, A. We first show that M is a compact manifold embedded in a high-dimensional input space Rn. An isometric mapping F from M to a low-dimensional, compact, connected set A⊂Rd(d≪n) is constructed. Given only a finite set of samples of the data, the methodology uses arguments from graph theory and differential geometry to construct the isometric transformation F:M→A. Asymptotic convergence of the representation of M by A is shown. This mapping F serves as an accurate, low-dimensional, data-driven representation of the property variations. The reduced-order model of the material topology and thermal diffusivity variations is subsequently used as an input in the solution of stochastic partial differential equations that describe the evolution of dependant variables. A sparse grid collocation strategy (Smolyak algorithm) is utilized to solve these stochastic equations efficiently. We showcase the methodology by constructing low

  7. Developing a Data Driven Process-Based Model for Remote Sensing of Ecosystem Production

    Science.gov (United States)

    Elmasri, B.; Rahman, A. F.

    2010-12-01

    Estimating ecosystem carbon fluxes at various spatial and temporal scales is essential for quantifying the global carbon cycle. Numerous models have been developed for this purpose using several environmental variables as well as vegetation indices derived from remotely sensed data. Here we present a data driven modeling approach for gross primary production (GPP) that is based on a process based model BIOME-BGC. The proposed model was run using available remote sensing data and it does not depend on look-up tables. Furthermore, this approach combines the merits of both empirical and process models, and empirical models were used to estimate certain input variables such as light use efficiency (LUE). This was achieved by using remotely sensed data to the mathematical equations that represent biophysical photosynthesis processes in the BIOME-BGC model. Moreover, a new spectral index for estimating maximum photosynthetic activity, maximum photosynthetic rate index (MPRI), is also developed and presented here. This new index is based on the ratio between the near infrared and the green bands (ρ858.5/ρ555). The model was tested and validated against MODIS GPP product and flux measurements from two eddy covariance flux towers located at Morgan Monroe State Forest (MMSF) in Indiana and Harvard Forest in Massachusetts. Satellite data acquired by the Advanced Microwave Scanning Radiometer (AMSR-E) and MODIS were used. The data driven model showed a strong correlation between the predicted and measured GPP at the two eddy covariance flux towers sites. This methodology produced better predictions of GPP than did the MODIS GPP product. Moreover, the proportion of error in the predicted GPP for MMSF and Harvard forest was dominated by unsystematic errors suggesting that the results are unbiased. The analysis indicated that maintenance respiration is one of the main factors that dominate the overall model outcome errors and improvement in maintenance respiration estimation

  8. Data-driven models of dominantly-inherited Alzheimer's disease progression.

    Science.gov (United States)

    Oxtoby, Neil P; Young, Alexandra L; Cash, David M; Benzinger, Tammie L S; Fagan, Anne M; Morris, John C; Bateman, Randall J; Fox, Nick C; Schott, Jonathan M; Alexander, Daniel C

    2018-03-22

    Dominantly-inherited Alzheimer's disease is widely hoped to hold the key to developing interventions for sporadic late onset Alzheimer's disease. We use emerging techniques in generative data-driven disease progression modelling to characterize dominantly-inherited Alzheimer's disease progression with unprecedented resolution, and without relying upon familial estimates of years until symptom onset. We retrospectively analysed biomarker data from the sixth data freeze of the Dominantly Inherited Alzheimer Network observational study, including measures of amyloid proteins and neurofibrillary tangles in the brain, regional brain volumes and cortical thicknesses, brain glucose hypometabolism, and cognitive performance from the Mini-Mental State Examination (all adjusted for age, years of education, sex, and head size, as appropriate). Data included 338 participants with known mutation status (211 mutation carriers in three subtypes: 163 PSEN1, 17 PSEN2, and 31 APP) and a baseline visit (age 19-66; up to four visits each, 1.1 ± 1.9 years in duration; spanning 30 years before, to 21 years after, parental age of symptom onset). We used an event-based model to estimate sequences of biomarker changes from baseline data across disease subtypes (mutation groups), and a differential equation model to estimate biomarker trajectories from longitudinal data (up to 66 mutation carriers, all subtypes combined). The two models concur that biomarker abnormality proceeds as follows: amyloid deposition in cortical then subcortical regions (∼24 ± 11 years before onset); phosphorylated tau (17 ± 8 years), tau and amyloid-β changes in cerebrospinal fluid; neurodegeneration first in the putamen and nucleus accumbens (up to 6 ± 2 years); then cognitive decline (7 ± 6 years), cerebral hypometabolism (4 ± 4 years), and further regional neurodegeneration. Our models predicted symptom onset more accurately than predictions that used familial estimates: root mean squared error of 1

  9. WIFIRE: A Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience Cyberinfrastructure for Wildfires

    Science.gov (United States)

    Altintas, I.; Block, J.; Braun, H.; de Callafon, R. A.; Gollner, M. J.; Smarr, L.; Trouve, A.

    2013-12-01

    Recent studies confirm that climate change will cause wildfires to increase in frequency and severity in the coming decades especially for California and in much of the North American West. The most critical sustainability issue in the midst of these ever-changing dynamics is how to achieve a new social-ecological equilibrium of this fire ecology. Wildfire wind speeds and directions change in an instant, and first responders can only be effective when they take action as quickly as the conditions change. To deliver information needed for sustainable policy and management in this dynamically changing fire regime, we must capture these details to understand the environmental processes. We are building an end-to-end cyberinfrastructure (CI), called WIFIRE, for real-time and data-driven simulation, prediction and visualization of wildfire behavior. The WIFIRE integrated CI system supports social-ecological resilience to the changing fire ecology regime in the face of urban dynamics and climate change. Networked observations, e.g., heterogeneous satellite data and real-time remote sensor data is integrated with computational techniques in signal processing, visualization, modeling and data assimilation to provide a scalable, technological, and educational solution to monitor weather patterns to predict a wildfire's Rate of Spread. Our collaborative WIFIRE team of scientists, engineers, technologists, government policy managers, private industry, and firefighters architects implement CI pathways that enable joint innovation for wildfire management. Scientific workflows are used as an integrative distributed programming model and simplify the implementation of engineering modules for data-driven simulation, prediction and visualization while allowing integration with large-scale computing facilities. WIFIRE will be scalable to users with different skill-levels via specialized web interfaces and user-specified alerts for environmental events broadcasted to receivers before

  10. A non-linear dimension reduction methodology for generating data-driven stochastic input models

    International Nuclear Information System (INIS)

    Ganapathysubramanian, Baskar; Zabaras, Nicholas

    2008-01-01

    Stochastic analysis of random heterogeneous media (polycrystalline materials, porous media, functionally graded materials) provides information of significance only if realistic input models of the topology and property variations are used. This paper proposes a framework to construct such input stochastic models for the topology and thermal diffusivity variations in heterogeneous media using a data-driven strategy. Given a set of microstructure realizations (input samples) generated from given statistical information about the medium topology, the framework constructs a reduced-order stochastic representation of the thermal diffusivity. This problem of constructing a low-dimensional stochastic representation of property variations is analogous to the problem of manifold learning and parametric fitting of hyper-surfaces encountered in image processing and psychology. Denote by M the set of microstructures that satisfy the given experimental statistics. A non-linear dimension reduction strategy is utilized to map M to a low-dimensional region, A. We first show that M is a compact manifold embedded in a high-dimensional input space R n . An isometric mapping F from M to a low-dimensional, compact, connected set A is contained in R d (d<< n) is constructed. Given only a finite set of samples of the data, the methodology uses arguments from graph theory and differential geometry to construct the isometric transformation F:M→A. Asymptotic convergence of the representation of M by A is shown. This mapping F serves as an accurate, low-dimensional, data-driven representation of the property variations. The reduced-order model of the material topology and thermal diffusivity variations is subsequently used as an input in the solution of stochastic partial differential equations that describe the evolution of dependant variables. A sparse grid collocation strategy (Smolyak algorithm) is utilized to solve these stochastic equations efficiently. We showcase the methodology

  11. Enabling Data-Driven Methodologies Across the Data Lifecycle and Ecosystem

    Science.gov (United States)

    Doyle, R. J.; Crichton, D.

    2017-12-01

    NASA has unlocked unprecedented scientific knowledge through exploration of the Earth, our solar system, and the larger universe. NASA is generating enormous amounts of data that are challenging traditional approaches to capturing, managing, analyzing and ultimately gaining scientific understanding from science data. New architectures, capabilities and methodologies are needed to span the entire observing system, from spacecraft to archive, while integrating data-driven discovery and analytic capabilities. NASA data have a definable lifecycle, from remote collection point to validated accessibility in multiple archives. Data challenges must be addressed across this lifecycle, to capture opportunities and avoid decisions that may limit or compromise what is achievable once data arrives at the archive. Data triage may be necessary when the collection capacity of the sensor or instrument overwhelms data transport or storage capacity. By migrating computational and analytic capability to the point of data collection, informed decisions can be made about which data to keep; in some cases, to close observational decision loops onboard, to enable attending to unexpected or transient phenomena. Along a different dimension than the data lifecycle, scientists and other end-users must work across an increasingly complex data ecosystem, where the range of relevant data is rarely owned by a single institution. To operate effectively, scalable data architectures and community-owned information models become essential. NASA's Planetary Data System is having success with this approach. Finally, there is the difficult challenge of reproducibility and trust. While data provenance techniques will be part of the solution, future interactive analytics environments must support an ability to provide a basis for a result: relevant data source and algorithms, uncertainty tracking, etc., to assure scientific integrity and to enable confident decision making. Advances in data science offer

  12. Simulated Batch Production of Penicillin

    Science.gov (United States)

    Whitaker, A.; Walker, J. D.

    1973-01-01

    Describes a program in applied biology in which the simulation of the production of penicillin in a batch fermentor is used as a teaching technique to give students experience before handling a genuine industrial fermentation process. Details are given for the calculation of minimum production cost. (JR)

  13. NDA BATCH 2002-02

    Energy Technology Data Exchange (ETDEWEB)

    Lawrence Livermore National Laboratory

    2009-12-09

    QC sample results (daily background checks, 20-gram and 100-gram SGS drum checks) were within acceptable criteria established by WIPP's Quality Assurance Objectives for TRU Waste Characterization. Replicate runs were performed on 5 drums with IDs LL85101099TRU, LL85801147TRU, LL85801109TRU, LL85300999TRU and LL85500979TRU. All replicate measurement results are identical at the 95% confidence level as established by WIPP criteria. Note that the batch covered 5 weeks of SGS measurements from 23-Jan-2002 through 22-Feb-2002. Data packet for SGS Batch 2002-02 generated using gamma spectroscopy with the Pu Facility SGS unit is technically reasonable. All QC samples are in compliance with established control limits. The batch data packet has been reviewed for correctness, completeness, consistency and compliance with WIPP's Quality Assurance Objectives and determined to be acceptable. An Expert Review was performed on the data packet between 28-Feb-02 and 09-Jul-02 to check for potential U-235, Np-237 and Am-241 interferences and address drum cases where specific scan segments showed Se gamma ray transmissions for the 136-keV gamma to be below 0.1 %. Two drums in the batch showed Pu-238 at a relative mass ratio more than 2% of all the Pu isotopes.

  14. Batching System for Superior Service

    Science.gov (United States)

    2001-01-01

    Veridian's Portable Batch System (PBS) was the recipient of the 1997 NASA Space Act Award for outstanding software. A batch system is a set of processes for managing queues and jobs. Without a batch system, it is difficult to manage the workload of a computer system. By bundling the enterprise's computing resources, the PBS technology offers users a single coherent interface, resulting in efficient management of the batch services. Users choose which information to package into "containers" for system-wide use. PBS also provides detailed system usage data, a procedure not easily executed without this software. PBS operates on networked, multi-platform UNIX environments. Veridian's new version, PBS Pro,TM has additional features and enhancements, including support for additional operating systems. Veridian distributes the original version of PBS as Open Source software via the PBS website. Customers can register and download the software at no cost. PBS Pro is also available via the web and offers additional features such as increased stability, reliability, and fault tolerance.A company using PBS can expect a significant increase in the effective management of its computing resources. Tangible benefits include increased utilization of costly resources and enhanced understanding of computational requirements and user needs.

  15. A Data-Driven Reliability Estimation Approach for Phased-Mission Systems

    Directory of Open Access Journals (Sweden)

    Hua-Feng He

    2014-01-01

    Full Text Available We attempt to address the issues associated with reliability estimation for phased-mission systems (PMS and present a novel data-driven approach to achieve reliability estimation for PMS using the condition monitoring information and degradation data of such system under dynamic operating scenario. In this sense, this paper differs from the existing methods only considering the static scenario without using the real-time information, which aims to estimate the reliability for a population but not for an individual. In the presented approach, to establish a linkage between the historical data and real-time information of the individual PMS, we adopt a stochastic filtering model to model the phase duration and obtain the updated estimation of the mission time by Bayesian law at each phase. At the meanwhile, the lifetime of PMS is estimated from degradation data, which are modeled by an adaptive Brownian motion. As such, the mission reliability can be real time obtained through the estimated distribution of the mission time in conjunction with the estimated lifetime distribution. We demonstrate the usefulness of the developed approach via a numerical example.

  16. Data-driven classification of bipolar I disorder from longitudinal course of mood.

    Science.gov (United States)

    Cochran, A L; McInnis, M G; Forger, D B

    2016-10-11

    The Diagnostic and Statistical Manual of Mental Disorder (DSM) classification of bipolar disorder defines categories to reflect common understanding of mood symptoms rather than scientific evidence. This work aimed to determine whether bipolar I can be objectively classified from longitudinal mood data and whether resulting classes have clinical associations. Bayesian nonparametric hierarchical models with latent classes and patient-specific models of mood are fit to data from Longitudinal Interval Follow-up Evaluations (LIFE) of bipolar I patients (N=209). Classes are tested for clinical associations. No classes are justified using the time course of DSM-IV mood states. Three classes are justified using the course of subsyndromal mood symptoms. Classes differed in attempted suicides (P=0.017), disability status (P=0.012) and chronicity of affective symptoms (P=0.009). Thus, bipolar I disorder can be objectively classified from mood course, and individuals in the resulting classes share clinical features. Data-driven classification from mood course could be used to enrich sample populations for pharmacological and etiological studies.

  17. Data-driven automatic parking constrained control for four-wheeled mobile vehicles

    Directory of Open Access Journals (Sweden)

    Wenxu Yan

    2016-11-01

    Full Text Available In this article, a novel data-driven constrained control scheme is proposed for automatic parking systems. The design of the proposed scheme only depends on the steering angle and the orientation angle of the car, and it does not involve any model information of the car. Therefore, the proposed scheme-based automatic parking system is applicable to different kinds of cars. In order to further reduce the desired trajectory coordinate tracking errors, a coordinates compensation algorithm is also proposed. In the design procedure of the controller, a novel dynamic anti-windup compensator is used to deal with the change magnitude and rate saturations of automatic parking control input. It is theoretically proven that all the signals in the closed-loop system are uniformly ultimately bounded based on Lyapunov stability analysis method. Finally, a simulation comparison among the proposed scheme with coordinates compensation and Proportion Integration Differentiation (PID control algorithm is given. It is shown that the proposed scheme with coordinates compensation has smaller tracking errors and more rapid responses than PID scheme.

  18. Data-Driven Optimization of Incentive-based Demand Response System with Uncertain Responses of Customers

    Directory of Open Access Journals (Sweden)

    Jimyung Kang

    2017-10-01

    Full Text Available Demand response is nowadays considered as another type of generator, beyond just a simple peak reduction mechanism. A demand response service provider (DRSP can, through its subcontracts with many energy customers, virtually generate electricity with actual load reduction. However, in this type of virtual generator, the amount of load reduction includes inevitable uncertainty, because it consists of a very large number of independent energy customers. While they may reduce energy today, they might not tomorrow. In this circumstance, a DSRP must choose a proper set of these uncertain customers to achieve the exact preferred amount of load curtailment. In this paper, the customer selection problem for a service provider that consists of uncertain responses of customers is defined and solved. The uncertainty of energy reduction is fully considered in the formulation with data-driven probability distribution modeling and stochastic programming technique. The proposed optimization method that utilizes only the observed load data provides a realistic and applicable solution to a demand response system. The performance of the proposed optimization is verified with real demand response event data in Korea, and the results show increased and stabilized performance from the service provider’s perspective.

  19. The Facilitation of a Sustainable Power System: A Practice from Data-Driven Enhanced Boiler Control

    Directory of Open Access Journals (Sweden)

    Zhenlong Wu

    2018-04-01

    Full Text Available An increasing penetration of renewable energy may bring significant challenges to a power system due to its inherent intermittency. To achieve a sustainable future for renewable energy, a conventional power plant is required to be able to change its power output rapidly for a grid balance purpose. However, the rapid power change may result in the boiler operating in a dangerous manner. To this end, this paper aims to improve boiler control performance via a data-driven control strategy, namely Active Disturbance Rejection Control (ADRC. For practical implementation, a tuning method is developed for ADRC controller parameters to maximize its potential in controlling a boiler operating in different conditions. Based on a Monte Carlo simulation, a Probabilistic Robustness (PR index is subsequently formulated to represent the controller’s sensitivity to the varying conditions. The stability region of the ADRC controller is depicted to provide the search space in which the optimal group of parameters is searched for based on the PR index. Illustrative simulations are performed to verify the efficacy of the proposed method. Finally, the proposed method is experimentally applied to a boiler’s secondary air control system successfully. The results of the field application show that the proposed ADRC based on PR can ensure the expected control performance even though it works in a wider range of operating conditions. The field application depicts a promising future for the ADRC controller as an alternative solution in the power industry to integrate more renewable energy into the power grid.

  20. Data-driven analysis of functional brain interactions during free listening to music and speech.

    Science.gov (United States)

    Fang, Jun; Hu, Xintao; Han, Junwei; Jiang, Xi; Zhu, Dajiang; Guo, Lei; Liu, Tianming

    2015-06-01

    Natural stimulus functional magnetic resonance imaging (N-fMRI) such as fMRI acquired when participants were watching video streams or listening to audio streams has been increasingly used to investigate functional mechanisms of the human brain in recent years. One of the fundamental challenges in functional brain mapping based on N-fMRI is to model the brain's functional responses to continuous, naturalistic and dynamic natural stimuli. To address this challenge, in this paper we present a data-driven approach to exploring functional interactions in the human brain during free listening to music and speech streams. Specifically, we model the brain responses using N-fMRI by measuring the functional interactions on large-scale brain networks with intrinsically established structural correspondence, and perform music and speech classification tasks to guide the systematic identification of consistent and discriminative functional interactions when multiple subjects were listening music and speech in multiple categories. The underlying premise is that the functional interactions derived from N-fMRI data of multiple subjects should exhibit both consistency and discriminability. Our experimental results show that a variety of brain systems including attention, memory, auditory/language, emotion, and action networks are among the most relevant brain systems involved in classic music, pop music and speech differentiation. Our study provides an alternative approach to investigating the human brain's mechanism in comprehension of complex natural music and speech.

  1. Forecasting success via early adoptions analysis: A data-driven study.

    Directory of Open Access Journals (Sweden)

    Giulio Rossetti

    Full Text Available Innovations are continuously launched over markets, such as new products over the retail market or new artists over the music scene. Some innovations become a success; others don't. Forecasting which innovations will succeed at the beginning of their lifecycle is hard. In this paper, we provide a data-driven, large-scale account of the existence of a special niche among early adopters, individuals that consistently tend to adopt successful innovations before they reach success: we will call them Hit-Savvy. Hit-Savvy can be discovered in very different markets and retain over time their ability to anticipate the success of innovations. As our second contribution, we devise a predictive analytical process, exploiting Hit-Savvy as signals, which achieves high accuracy in the early-stage prediction of successful innovations, far beyond the reach of state-of-the-art time series forecasting models. Indeed, our findings and predictive model can be fruitfully used to support marketing strategies and product placement.

  2. A Data-Driven Response Virtual Sensor Technique with Partial Vibration Measurements Using Convolutional Neural Network

    Science.gov (United States)

    Sun, Shan-Bin; He, Yuan-Yuan; Zhou, Si-Da; Yue, Zhen-Jiang

    2017-01-01

    Measurement of dynamic responses plays an important role in structural health monitoring, damage detection and other fields of research. However, in aerospace engineering, the physical sensors are limited in the operational conditions of spacecraft, due to the severe environment in outer space. This paper proposes a virtual sensor model with partial vibration measurements using a convolutional neural network. The transmissibility function is employed as prior knowledge. A four-layer neural network with two convolutional layers, one fully connected layer, and an output layer is proposed as the predicting model. Numerical examples of two different structural dynamic systems demonstrate the performance of the proposed approach. The excellence of the novel technique is further indicated using a simply supported beam experiment comparing to a modal-model-based virtual sensor, which uses modal parameters, such as mode shapes, for estimating the responses of the faulty sensors. The results show that the presented data-driven response virtual sensor technique can predict structural response with high accuracy. PMID:29231868

  3. Optimizing preventive maintenance policy: A data-driven application for a light rail braking system.

    Science.gov (United States)

    Corman, Francesco; Kraijema, Sander; Godjevac, Milinko; Lodewijks, Gabriel

    2017-10-01

    This article presents a case study determining the optimal preventive maintenance policy for a light rail rolling stock system in terms of reliability, availability, and maintenance costs. The maintenance policy defines one of the three predefined preventive maintenance actions at fixed time-based intervals for each of the subsystems of the braking system. Based on work, maintenance, and failure data, we model the reliability degradation of the system and its subsystems under the current maintenance policy by a Weibull distribution. We then analytically determine the relation between reliability, availability, and maintenance costs. We validate the model against recorded reliability and availability and get further insights by a dedicated sensitivity analysis. The model is then used in a sequential optimization framework determining preventive maintenance intervals to improve on the key performance indicators. We show the potential of data-driven modelling to determine optimal maintenance policy: same system availability and reliability can be achieved with 30% maintenance cost reduction, by prolonging the intervals and re-grouping maintenance actions.

  4. Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study.

    Science.gov (United States)

    Kim, Eun-Kyung; Kim, Hyo-Eun; Han, Kyunghwa; Kang, Bong Joo; Sohn, Yu-Mee; Woo, Ok Hee; Lee, Chan Wha

    2018-02-09

    We assessed the feasibility of a data-driven imaging biomarker based on weakly supervised learning (DIB; an imaging biomarker derived from large-scale medical image data with deep learning technology) in mammography (DIB-MG). A total of 29,107 digital mammograms from five institutions (4,339 cancer cases and 24,768 normal cases) were included. After matching patients' age, breast density, and equipment, 1,238 and 1,238 cases were chosen as validation and test sets, respectively, and the remainder were used for training. The core algorithm of DIB-MG is a deep convolutional neural network; a deep learning algorithm specialized for images. Each sample (case) is an exam composed of 4-view images (RCC, RMLO, LCC, and LMLO). For each case in a training set, the cancer probability inferred from DIB-MG is compared with the per-case ground-truth label. Then the model parameters in DIB-MG are updated based on the error between the prediction and the ground-truth. At the operating point (threshold) of 0.5, sensitivity was 75.6% and 76.1% when specificity was 90.2% and 88.5%, and AUC was 0.903 and 0.906 for the validation and test sets, respectively. This research showed the potential of DIB-MG as a screening tool for breast cancer.

  5. Using data-driven agent-based models for forecasting emerging infectious diseases

    Directory of Open Access Journals (Sweden)

    Srinivasan Venkatramanan

    2018-03-01

    Full Text Available Producing timely, well-informed and reliable forecasts for an ongoing epidemic of an emerging infectious disease is a huge challenge. Epidemiologists and policy makers have to deal with poor data quality, limited understanding of the disease dynamics, rapidly changing social environment and the uncertainty on effects of various interventions in place. Under this setting, detailed computational models provide a comprehensive framework for integrating diverse data sources into a well-defined model of disease dynamics and social behavior, potentially leading to better understanding and actions. In this paper, we describe one such agent-based model framework developed for forecasting the 2014–2015 Ebola epidemic in Liberia, and subsequently used during the Ebola forecasting challenge. We describe the various components of the model, the calibration process and summarize the forecast performance across scenarios of the challenge. We conclude by highlighting how such a data-driven approach can be refined and adapted for future epidemics, and share the lessons learned over the course of the challenge. Keywords: Emerging infectious diseases, Agent-based models, Simulation optimization, Bayesian calibration, Ebola

  6. Effective Data-Driven Calibration for a Galvanometric Laser Scanning System Using Binocular Stereo Vision.

    Science.gov (United States)

    Tu, Junchao; Zhang, Liyan

    2018-01-12

    A new solution to the problem of galvanometric laser scanning (GLS) system calibration is presented. Under the machine learning framework, we build a single-hidden layer feedforward neural network (SLFN)to represent the GLS system, which takes the digital control signal at the drives of the GLS system as input and the space vector of the corresponding outgoing laser beam as output. The training data set is obtained with the aid of a moving mechanism and a binocular stereo system. The parameters of the SLFN are efficiently solved in a closed form by using extreme learning machine (ELM). By quantitatively analyzing the regression precision with respective to the number of hidden neurons in the SLFN, we demonstrate that the proper number of hidden neurons can be safely chosen from a broad interval to guarantee good generalization performance. Compared to the traditional model-driven calibration, the proposed calibration method does not need a complex modeling process and is more accurate and stable. As the output of the network is the space vectors of the outgoing laser beams, it costs much less training time and can provide a uniform solution to both laser projection and 3D-reconstruction, in contrast with the existing data-driven calibration method which only works for the laser triangulation problem. Calibration experiment, projection experiment and 3D reconstruction experiment are respectively conducted to test the proposed method, and good results are obtained.

  7. Effective Data-Driven Calibration for a Galvanometric Laser Scanning System Using Binocular Stereo Vision

    Directory of Open Access Journals (Sweden)

    Junchao Tu

    2018-01-01

    Full Text Available A new solution to the problem of galvanometric laser scanning (GLS system calibration is presented. Under the machine learning framework, we build a single-hidden layer feedforward neural network (SLFN)to represent the GLS system, which takes the digital control signal at the drives of the GLS system as input and the space vector of the corresponding outgoing laser beam as output. The training data set is obtained with the aid of a moving mechanism and a binocular stereo system. The parameters of the SLFN are efficiently solved in a closed form by using extreme learning machine (ELM. By quantitatively analyzing the regression precision with respective to the number of hidden neurons in the SLFN, we demonstrate that the proper number of hidden neurons can be safely chosen from a broad interval to guarantee good generalization performance. Compared to the traditional model-driven calibration, the proposed calibration method does not need a complex modeling process and is more accurate and stable. As the output of the network is the space vectors of the outgoing laser beams, it costs much less training time and can provide a uniform solution to both laser projection and 3D-reconstruction, in contrast with the existing data-driven calibration method which only works for the laser triangulation problem. Calibration experiment, projection experiment and 3D reconstruction experiment are respectively conducted to test the proposed method, and good results are obtained.

  8. A priori data-driven multi-clustered reservoir generation algorithm for echo state network.

    Directory of Open Access Journals (Sweden)

    Xiumin Li

    Full Text Available Echo state networks (ESNs with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision.

  9. Data-free and data-driven spectral perturbations for RANS UQ

    Science.gov (United States)

    Edeling, Wouter; Mishra, Aashwin; Iaccarino, Gianluca

    2017-11-01

    Despite recent developments in high-fidelity turbulent flow simulations, RANS modeling is still vastly used by industry, due to its inherent low cost. Since accuracy is a concern in RANS modeling, model-form UQ is an essential tool for assessing the impacts of this uncertainty on quantities of interest. Applying the spectral decomposition to the modeled Reynolds-Stress Tensor (RST) allows for the introduction of decoupled perturbations into the baseline intensity (kinetic energy), shape (eigenvalues), and orientation (eigenvectors). This constitutes a natural methodology to evaluate the model form uncertainty associated to different aspects of RST modeling. In a predictive setting, one frequently encounters an absence of any relevant reference data. To make data-free predictions with quantified uncertainty we employ physical bounds to a-priori define maximum spectral perturbations. When propagated, these perturbations yield intervals of engineering utility. High-fidelity data opens up the possibility of inferring a distribution of uncertainty, by means of various data-driven machine-learning techniques. We will demonstrate our framework on a number of flow problems where RANS models are prone to failure. This research was partially supported by the Defense Advanced Research Projects Agency under the Enabling Quantification of Uncertainty in Physical Systems (EQUiPS) project (technical monitor: Dr Fariba Fahroo), and the DOE PSAAP-II program.

  10. BMI cyberworkstation: enabling dynamic data-driven brain-machine interface research through cyberinfrastructure.

    Science.gov (United States)

    Zhao, Ming; Rattanatamrong, Prapaporn; DiGiovanna, Jack; Mahmoudi, Babak; Figueiredo, Renato J; Sanchez, Justin C; Príncipe, José C; Fortes, José A B

    2008-01-01

    Dynamic data-driven brain-machine interfaces (DDDBMI) have great potential to advance the understanding of neural systems and improve the design of brain-inspired rehabilitative systems. This paper presents a novel cyberinfrastructure that couples in vivo neurophysiology experimentation with massive computational resources to provide seamless and efficient support of DDDBMI research. Closed-loop experiments can be conducted with in vivo data acquisition, reliable network transfer, parallel model computation, and real-time robot control. Behavioral experiments with live animals are supported with real-time guarantees. Offline studies can be performed with various configurations for extensive analysis and training. A Web-based portal is also provided to allow users to conveniently interact with the cyberinfrastructure, conducting both experimentation and analysis. New motor control models are developed based on this approach, which include recursive least square based (RLS) and reinforcement learning based (RLBMI) algorithms. The results from an online RLBMI experiment shows that the cyberinfrastructure can successfully support DDDBMI experiments and meet the desired real-time requirements.

  11. Data-driven modelling of structured populations a practical guide to the integral projection model

    CERN Document Server

    Ellner, Stephen P; Rees, Mark

    2016-01-01

    This book is a “How To” guide for modeling population dynamics using Integral Projection Models (IPM) starting from observational data. It is written by a leading research team in this area and includes code in the R language (in the text and online) to carry out all computations. The intended audience are ecologists, evolutionary biologists, and mathematical biologists interested in developing data-driven models for animal and plant populations. IPMs may seem hard as they involve integrals. The aim of this book is to demystify IPMs, so they become the model of choice for populations structured by size or other continuously varying traits. The book uses real examples of increasing complexity to show how the life-cycle of the study organism naturally leads to the appropriate statistical analysis, which leads directly to the IPM itself. A wide range of model types and analyses are presented, including model construction, computational methods, and the underlying theory, with the more technical material in B...

  12. Lessons learned from a data-driven college access program: The National College Advising Corps.

    Science.gov (United States)

    Horng, Eileen L; Evans, Brent J; Antonio, Anthony L; Foster, Jesse D; Kalamkarian, Hoori S; Hurd, Nicole F; Bettinger, Eric P

    2013-01-01

    This chapter discusses the collaboration between a national college access program, the National College Advising Corps (NCAC), and its research and evaluation team at Stanford University. NCAC is currently active in almost four hundred high schools and through the placement of a recent college graduate to serve as a college adviser provides necessary information and support for students who may find it difficult to navigate the complex college admission process. The advisers also conduct outreach to underclassmen in an effort to improve the school-wide college-going culture. Analyses include examination of both quantitative and qualitative data from numerous sources and partners with every level of the organization from the national office to individual high schools. The authors discuss balancing the pursuit of evaluation goals with academic scholarship. In an effort to benefit other programs seeking to form successful data-driven interventions, the authors provide explicit examples of the partnership and present several examples of how the program has benefited from the data gathered by the evaluation team. © WILEY PERIODICALS, INC.

  13. A data-driven approach for evaluating multi-modal therapy in traumatic brain injury.

    Science.gov (United States)

    Haefeli, Jenny; Ferguson, Adam R; Bingham, Deborah; Orr, Adrienne; Won, Seok Joon; Lam, Tina I; Shi, Jian; Hawley, Sarah; Liu, Jialing; Swanson, Raymond A; Massa, Stephen M

    2017-02-16

    Combination therapies targeting multiple recovery mechanisms have the potential for additive or synergistic effects, but experimental design and analyses of multimodal therapeutic trials are challenging. To address this problem, we developed a data-driven approach to integrate and analyze raw source data from separate pre-clinical studies and evaluated interactions between four treatments following traumatic brain injury. Histologic and behavioral outcomes were measured in 202 rats treated with combinations of an anti-inflammatory agent (minocycline), a neurotrophic agent (LM11A-31), and physical therapy consisting of assisted exercise with or without botulinum toxin-induced limb constraint. Data was curated and analyzed in a linked workflow involving non-linear principal component analysis followed by hypothesis testing with a linear mixed model. Results revealed significant benefits of the neurotrophic agent LM11A-31 on learning and memory outcomes after traumatic brain injury. In addition, modulations of LM11A-31 effects by co-administration of minocycline and by the type of physical therapy applied reached statistical significance. These results suggest a combinatorial effect of drug and physical therapy interventions that was not evident by univariate analysis. The study designs and analytic techniques applied here form a structured, unbiased, internally validated workflow that may be applied to other combinatorial studies, both in animals and humans.

  14. Forecasting success via early adoptions analysis: A data-driven study.

    Science.gov (United States)

    Rossetti, Giulio; Milli, Letizia; Giannotti, Fosca; Pedreschi, Dino

    2017-01-01

    Innovations are continuously launched over markets, such as new products over the retail market or new artists over the music scene. Some innovations become a success; others don't. Forecasting which innovations will succeed at the beginning of their lifecycle is hard. In this paper, we provide a data-driven, large-scale account of the existence of a special niche among early adopters, individuals that consistently tend to adopt successful innovations before they reach success: we will call them Hit-Savvy. Hit-Savvy can be discovered in very different markets and retain over time their ability to anticipate the success of innovations. As our second contribution, we devise a predictive analytical process, exploiting Hit-Savvy as signals, which achieves high accuracy in the early-stage prediction of successful innovations, far beyond the reach of state-of-the-art time series forecasting models. Indeed, our findings and predictive model can be fruitfully used to support marketing strategies and product placement.

  15. Data-Driven Astrochemistry: One Step Further within the Origin of Life Puzzle.

    Science.gov (United States)

    Ruf, Alexander; d'Hendecourt, Louis L S; Schmitt-Kopplin, Philippe

    2018-06-01

    Astrochemistry, meteoritics and chemical analytics represent a manifold scientific field, including various disciplines. In this review, clarifications on astrochemistry, comet chemistry, laboratory astrophysics and meteoritic research with respect to organic and metalorganic chemistry will be given. The seemingly large number of observed astrochemical molecules necessarily requires explanations on molecular complexity and chemical evolution, which will be discussed. Special emphasis should be placed on data-driven analytical methods including ultrahigh-resolving instruments and their interplay with quantum chemical computations. These methods enable remarkable insights into the complex chemical spaces that exist in meteorites and maximize the level of information on the huge astrochemical molecular diversity. In addition, they allow one to study even yet undescribed chemistry as the one involving organomagnesium compounds in meteorites. Both targeted and non-targeted analytical strategies will be explained and may touch upon epistemological problems. In addition, implications of (metal)organic matter toward prebiotic chemistry leading to the emergence of life will be discussed. The precise description of astrochemical organic and metalorganic matter as seeds for life and their interactions within various astrophysical environments may appear essential to further study questions regarding the emergence of life on a most fundamental level that is within the molecular world and its self-organization properties.

  16. Big Data-Driven Based Real-Time Traffic Flow State Identification and Prediction

    Directory of Open Access Journals (Sweden)

    Hua-pu Lu

    2015-01-01

    Full Text Available With the rapid development of urban informatization, the era of big data is coming. To satisfy the demand of traffic congestion early warning, this paper studies the method of real-time traffic flow state identification and prediction based on big data-driven theory. Traffic big data holds several characteristics, such as temporal correlation, spatial correlation, historical correlation, and multistate. Traffic flow state quantification, the basis of traffic flow state identification, is achieved by a SAGA-FCM (simulated annealing genetic algorithm based fuzzy c-means based traffic clustering model. Considering simple calculation and predictive accuracy, a bilevel optimization model for regional traffic flow correlation analysis is established to predict traffic flow parameters based on temporal-spatial-historical correlation. A two-stage model for correction coefficients optimization is put forward to simplify the bilevel optimization model. The first stage model is built to calculate the number of temporal-spatial-historical correlation variables. The second stage model is present to calculate basic model formulation of regional traffic flow correlation. A case study based on a real-world road network in Beijing, China, is implemented to test the efficiency and applicability of the proposed modeling and computing methods.

  17. A data-driven predictive approach for drug delivery using machine learning techniques.

    Directory of Open Access Journals (Sweden)

    Yuanyuan Li

    Full Text Available In drug delivery, there is often a trade-off between effective killing of the pathogen, and harmful side effects associated with the treatment. Due to the difficulty in testing every dosing scenario experimentally, a computational approach will be helpful to assist with the prediction of effective drug delivery methods. In this paper, we have developed a data-driven predictive system, using machine learning techniques, to determine, in silico, the effectiveness of drug dosing. The system framework is scalable, autonomous, robust, and has the ability to predict the effectiveness of the current drug treatment and the subsequent drug-pathogen dynamics. The system consists of a dynamic model incorporating both the drug concentration and pathogen population into distinct states. These states are then analyzed using a temporal model to describe the drug-cell interactions over time. The dynamic drug-cell interactions are learned in an adaptive fashion and used to make sequential predictions on the effectiveness of the dosing strategy. Incorporated into the system is the ability to adjust the sensitivity and specificity of the learned models based on a threshold level determined by the operator for the specific application. As a proof-of-concept, the system was validated experimentally using the pathogen Giardia lamblia and the drug metronidazole in vitro.

  18. Data-Driven Baseline Estimation of Residential Buildings for Demand Response

    Directory of Open Access Journals (Sweden)

    Saehong Park

    2015-09-01

    Full Text Available The advent of advanced metering infrastructure (AMI generates a large volume of data related with energy service. This paper exploits data mining approach for customer baseline load (CBL estimation in demand response (DR management. CBL plays a significant role in measurement and verification process, which quantifies the amount of demand reduction and authenticates the performance. The proposed data-driven baseline modeling is based on the unsupervised learning technique. Specifically we leverage both the self organizing map (SOM and K-means clustering for accurate estimation. This two-level approach efficiently reduces the large data set into representative weight vectors in SOM, and then these weight vectors are clustered by K-means clustering to find the load pattern that would be similar to the potential load pattern of the DR event day. To verify the proposed method, we conduct nationwide scale experiments where three major cities’ residential consumption is monitored by smart meters. Our evaluation compares the proposed solution with the various types of day matching techniques, showing that our approach outperforms the existing methods by up to a 68.5% lower error rate.

  19. Data-driven model-independent searches for long-lived particles at the LHC

    Science.gov (United States)

    Coccaro, Andrea; Curtin, David; Lubatti, H. J.; Russell, Heather; Shelton, Jessie

    2016-12-01

    Neutral long-lived particles (LLPs) are highly motivated by many beyond the Standard Model scenarios, such as theories of supersymmetry, baryogenesis, and neutral naturalness, and present both tremendous discovery opportunities and experimental challenges for the LHC. A major bottleneck for current LLP searches is the prediction of Standard Model backgrounds, which are often impossible to simulate accurately. In this paper, we propose a general strategy for obtaining differential, data-driven background estimates in LLP searches, thereby notably extending the range of LLP masses and lifetimes that can be discovered at the LHC. We focus on LLPs decaying in the ATLAS muon system, where triggers providing both signal and control samples are available at LHC run 2. While many existing searches require two displaced decays, a detailed knowledge of backgrounds will allow for very inclusive searches that require just one detected LLP decay. As we demonstrate for the h →X X signal model of LLP pair production in exotic Higgs decays, this results in dramatic sensitivity improvements for proper lifetimes ≳10 m . In theories of neutral naturalness, this extends reach to glueball masses far below the b ¯b threshold. Our strategy readily generalizes to other signal models and other detector subsystems. This framework therefore lends itself to the development of a systematic, model-independent LLP search program, in analogy to the highly successful simplified-model framework of prompt searches.

  20. Clinical review: optimizing enteral nutrition for critically ill patients - a simple data-driven formula

    Science.gov (United States)

    2011-01-01

    In modern critical care, the paradigm of 'therapeutic nutrition' is replacing traditional 'supportive nutrition'. Standard enteral formulas meet basic macro- and micronutrient needs; therapeutic enteral formulas meet these basic needs and also contain specific pharmaconutrients that may attenuate hyperinflammatory responses, enhance the immune responses to infection, or improve gastrointestinal tolerance. Choosing the right enteral feeding formula may positively affect a patient's outcome; targeted use of therapeutic formulas can reduce the incidence of infectious complications, shorten lengths of stay in the ICU and in the hospital, and lower risk for mortality. In this paper, we review principles of how to feed (enteral, parenteral, or both) and when to feed (early versus delayed start) patients who are critically ill. We discuss what to feed these patients in the context of specific pharmaconutrients in specialized feeding formulations, that is, arginine, glutamine, antioxidants, certain ω-3 and ω-6 fatty acids, hydrolyzed proteins, and medium-chain triglycerides. We summarize current expert guidelines for nutrition in patients with critical illness, and we present specific clinical evidence on the use of enteral formulas supplemented with anti-inflammatory or immune-modulating nutrients, and gastrointestinal tolerance-promoting nutritional formulas. Finally, we introduce an algorithm to help bedside clinicians make data-driven feeding decisions for patients with critical illness. PMID:22136305

  1. A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults.

    Science.gov (United States)

    Sun, Rui; Cheng, Qi; Wang, Guanyu; Ochieng, Washington Yotto

    2017-09-29

    The use of Unmanned Aerial Vehicles (UAVs) has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs' flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS)-based approach is presented for the detection of on-board navigation sensor faults in UAVs. Contrary to the classic UAV sensor fault detection algorithms, based on predefined or modelled faults, the proposed algorithm combines an online data training mechanism with the ANFIS-based decision system. The main advantages of this algorithm are that it allows real-time model-free residual analysis from Kalman Filter (KF) estimates and the ANFIS to build a reliable fault detection system. In addition, it allows fast and accurate detection of faults, which makes it suitable for real-time applications. Experimental results have demonstrated the effectiveness of the proposed fault detection method in terms of accuracy and misdetection rate.

  2. A Data-Driven Response Virtual Sensor Technique with Partial Vibration Measurements Using Convolutional Neural Network.

    Science.gov (United States)

    Sun, Shan-Bin; He, Yuan-Yuan; Zhou, Si-Da; Yue, Zhen-Jiang

    2017-12-12

    Measurement of dynamic responses plays an important role in structural health monitoring, damage detection and other fields of research. However, in aerospace engineering, the physical sensors are limited in the operational conditions of spacecraft, due to the severe environment in outer space. This paper proposes a virtual sensor model with partial vibration measurements using a convolutional neural network. The transmissibility function is employed as prior knowledge. A four-layer neural network with two convolutional layers, one fully connected layer, and an output layer is proposed as the predicting model. Numerical examples of two different structural dynamic systems demonstrate the performance of the proposed approach. The excellence of the novel technique is further indicated using a simply supported beam experiment comparing to a modal-model-based virtual sensor, which uses modal parameters, such as mode shapes, for estimating the responses of the faulty sensors. The results show that the presented data-driven response virtual sensor technique can predict structural response with high accuracy.

  3. Data-Driven Handover Optimization in Next Generation Mobile Communication Networks

    Directory of Open Access Journals (Sweden)

    Po-Chiang Lin

    2016-01-01

    Full Text Available Network densification is regarded as one of the important ingredients to increase capacity for next generation mobile communication networks. However, it also leads to mobility problems since users are more likely to hand over to another cell in dense or even ultradense mobile communication networks. Therefore, supporting seamless and robust connectivity through such networks becomes a very important issue. In this paper, we investigate handover (HO optimization in next generation mobile communication networks. We propose a data-driven handover optimization (DHO approach, which aims to mitigate mobility problems including too-late HO, too-early HO, HO to wrong cell, ping-pong HO, and unnecessary HO. The key performance indicator (KPI is defined as the weighted average of the ratios of these mobility problems. The DHO approach collects data from the mobile communication measurement results and provides a model to estimate the relationship between the KPI and features from the collected dataset. Based on the model, the handover parameters, including the handover margin and time-to-trigger, are optimized to minimize the KPI. Simulation results show that the proposed DHO approach could effectively mitigate mobility problems.

  4. Design of a data-driven predictive controller for start-up process of AMT vehicles.

    Science.gov (United States)

    Lu, Xiaohui; Chen, Hong; Wang, Ping; Gao, Bingzhao

    2011-12-01

    In this paper, a data-driven predictive controller is designed for the start-up process of vehicles with automated manual transmissions (AMTs). It is obtained directly from the input-output data of a driveline simulation model constructed by the commercial software AMESim. In order to obtain offset-free control for the reference input, the predictor equation is gained with incremental inputs and outputs. Because of the physical characteristics, the input and output constraints are considered explicitly in the problem formulation. The contradictory requirements of less friction losses and less driveline shock are included in the objective function. The designed controller is tested under nominal conditions and changed conditions. The simulation results show that, during the start-up process, the AMT clutch with the proposed controller works very well, and the process meets the control objectives: fast clutch lockup time, small friction losses, and the preservation of driver comfort, i.e., smooth acceleration of the vehicle. At the same time, the closed-loop system has the ability to reject uncertainties, such as the vehicle mass and road grade.

  5. Data-driven quantification of the robustness and sensitivity of cell signaling networks

    International Nuclear Information System (INIS)

    Mukherjee, Sayak; Seok, Sang-Cheol; Vieland, Veronica J; Das, Jayajit

    2013-01-01

    Robustness and sensitivity of responses generated by cell signaling networks has been associated with survival and evolvability of organisms. However, existing methods analyzing robustness and sensitivity of signaling networks ignore the experimentally observed cell-to-cell variations of protein abundances and cell functions or contain ad hoc assumptions. We propose and apply a data-driven maximum entropy based method to quantify robustness and sensitivity of Escherichia coli (E. coli) chemotaxis signaling network. Our analysis correctly rank orders different models of E. coli chemotaxis based on their robustness and suggests that parameters regulating cell signaling are evolutionary selected to vary in individual cells according to their abilities to perturb cell functions. Furthermore, predictions from our approach regarding distribution of protein abundances and properties of chemotactic responses in individual cells based on cell population averaged data are in excellent agreement with their experimental counterparts. Our approach is general and can be used to evaluate robustness as well as generate predictions of single cell properties based on population averaged experimental data in a wide range of cell signaling systems. (paper)

  6. Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds

    Science.gov (United States)

    Li, Rui; Chen, Lei; Li, Wen-Syan

    Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.

  7. VLAM-G: Interactive Data Driven Workflow Engine for Grid-Enabled Resources

    Directory of Open Access Journals (Sweden)

    Vladimir Korkhov

    2007-01-01

    Full Text Available Grid brings the power of many computers to scientists. However, the development of Grid-enabled applications requires knowledge about Grid infrastructure and low-level API to Grid services. In turn, workflow management systems provide a high-level environment for rapid prototyping of experimental computing systems. Coupling Grid and workflow paradigms is important for the scientific community: it makes the power of the Grid easily available to the end user. The paradigm of data driven workflow execution is one of the ways to enable distributed workflow on the Grid. The work presented in this paper is carried out in the context of the Virtual Laboratory for e-Science project. We present the VLAM-G workflow management system and its core component: the Run-Time System (RTS. The RTS is a dataflow driven workflow engine which utilizes Grid resources, hiding the complexity of the Grid from a scientist. Special attention is paid to the concept of dataflow and direct data streaming between distributed workflow components. We present the architecture and components of the RTS, describe the features of VLAM-G workflow execution, and evaluate the system by performance measurements and a real life use case.

  8. An asynchronous data-driven readout prototype for CEPC vertex detector

    Science.gov (United States)

    Yang, Ping; Sun, Xiangming; Huang, Guangming; Xiao, Le; Gao, Chaosong; Huang, Xing; Zhou, Wei; Ren, Weiping; Li, Yashu; Liu, Jianchao; You, Bihui; Zhang, Li

    2017-12-01

    The Circular Electron Positron Collider (CEPC) is proposed as a Higgs boson and/or Z boson factory for high-precision measurements on the Higgs boson. The precision of secondary vertex impact parameter plays an important role in such measurements which typically rely on flavor-tagging. Thus silicon CMOS Pixel Sensors (CPS) are the most promising technology candidate for a CEPC vertex detector, which can most likely feature a high position resolution, a low power consumption and a fast readout simultaneously. For the R&D of the CEPC vertex detector, we have developed a prototype MIC4 in the Towerjazz 180 nm CMOS Image Sensor (CIS) process. We have proposed and implemented a new architecture of asynchronous zero-suppression data-driven readout inside the matrix combined with a binary front-end inside the pixel. The matrix contains 128 rows and 64 columns with a small pixel pitch of 25 μm. The readout architecture has implemented the traditional OR-gate chain inside a super pixel combined with a priority arbiter tree between the super pixels, only reading out relevant pixels. The MIC4 architecture will be introduced in more detail in this paper. It will be taped out in May and will be characterized when the chip comes back.

  9. Dynamic model reduction using data-driven Loewner-framework applied to thermally morphing structures

    Science.gov (United States)

    Phoenix, Austin A.; Tarazaga, Pablo A.

    2017-05-01

    The work herein proposes the use of the data-driven Loewner-framework for reduced order modeling as applied to dynamic Finite Element Models (FEM) of thermally morphing structures. The Loewner-based modeling approach is computationally efficient and accurately constructs reduced models using analytical output data from a FEM. This paper details the two-step process proposed in the Loewner approach. First, a random vibration FEM simulation is used as the input for the development of a Single Input Single Output (SISO) data-based dynamic Loewner state space model. Second, an SVD-based truncation is used on the Loewner state space model, such that the minimal, dynamically representative, state space model is achieved. For this second part, varying levels of reduction are generated and compared. The work herein can be extended to model generation using experimental measurements by replacing the FEM output data in the first step and following the same procedure. This method will be demonstrated on two thermally morphing structures, a rigidly fixed hexapod in multiple geometric configurations and a low mass anisotropic morphing boom. This paper is working to detail the method and identify the benefits of the reduced model methodology.

  10. Outcomes from the GLEON fellowship program. Training graduate students in data driven network science.

    Science.gov (United States)

    Dugan, H.; Hanson, P. C.; Weathers, K. C.

    2016-12-01

    In the water sciences there is a massive need for graduate students who possess the analytical and technical skills to deal with large datasets and function in the new paradigm of open, collaborative -science. The Global Lake Ecological Observatory Network (GLEON) graduate fellowship program (GFP) was developed as an interdisciplinary training program to supplement the intensive disciplinary training of traditional graduate education. The primary goal of the GFP was to train a diverse cohort of graduate students in network science, open-web technologies, collaboration, and data analytics, and importantly to provide the opportunity to use these skills to conduct collaborative research resulting in publishable scientific products. The GFP is run as a series of three week-long workshops over two years that brings together a cohort of twelve students. In addition, fellows are expected to attend and contribute to at least one international GLEON all-hands' meeting. Here, we provide examples of training modules in the GFP (model building, data QA/QC, information management, bayesian modeling, open coding/version control, national data programs), as well as scientific outputs (manuscripts, software products, and new global datasets) produced by the fellows, as well as the process by which this team science was catalyzed. Data driven education that lets students apply learned skills to real research projects reinforces concepts, provides motivation, and can benefit their publication record. This program design is extendable to other institutions and networks.

  11. Automatic data-driven real-time segmentation and recognition of surgical workflow.

    Science.gov (United States)

    Dergachyova, Olga; Bouget, David; Huaulmé, Arnaud; Morandi, Xavier; Jannin, Pierre

    2016-06-01

    With the intention of extending the perception and action of surgical staff inside the operating room, the medical community has expressed a growing interest towards context-aware systems. Requiring an accurate identification of the surgical workflow, such systems make use of data from a diverse set of available sensors. In this paper, we propose a fully data-driven and real-time method for segmentation and recognition of surgical phases using a combination of video data and instrument usage signals, exploiting no prior knowledge. We also introduce new validation metrics for assessment of workflow detection. The segmentation and recognition are based on a four-stage process. Firstly, during the learning time, a Surgical Process Model is automatically constructed from data annotations to guide the following process. Secondly, data samples are described using a combination of low-level visual cues and instrument information. Then, in the third stage, these descriptions are employed to train a set of AdaBoost classifiers capable of distinguishing one surgical phase from others. Finally, AdaBoost responses are used as input to a Hidden semi-Markov Model in order to obtain a final decision. On the MICCAI EndoVis challenge laparoscopic dataset we achieved a precision and a recall of 91 % in classification of 7 phases. Compared to the analysis based on one data type only, a combination of visual features and instrument signals allows better segmentation, reduction of the detection delay and discovery of the correct phase order.

  12. A Data-Driven Air Transportation Delay Propagation Model Using Epidemic Process Models

    Directory of Open Access Journals (Sweden)

    B. Baspinar

    2016-01-01

    Full Text Available In air transport network management, in addition to defining the performance behavior of the system’s components, identification of their interaction dynamics is a delicate issue in both strategic and tactical decision-making process so as to decide which elements of the system are “controlled” and how. This paper introduces a novel delay propagation model utilizing epidemic spreading process, which enables the definition of novel performance indicators and interaction rates of the elements of the air transportation network. In order to understand the behavior of the delay propagation over the network at different levels, we have constructed two different data-driven epidemic models approximating the dynamics of the system: (a flight-based epidemic model and (b airport-based epidemic model. The flight-based epidemic model utilizing SIS epidemic model focuses on the individual flights where each flight can be in susceptible or infected states. The airport-centric epidemic model, in addition to the flight-to-flight interactions, allows us to define the collective behavior of the airports, which are modeled as metapopulations. In network model construction, we have utilized historical flight-track data of Europe and performed analysis for certain days involving certain disturbances. Through this effort, we have validated the proposed delay propagation models under disruptive events.

  13. A transparent and data-driven global tectonic regionalization model for seismic hazard assessment

    Science.gov (United States)

    Chen, Yen-Shin; Weatherill, Graeme; Pagani, Marco; Cotton, Fabrice

    2018-05-01

    A key concept that is common to many assumptions inherent within seismic hazard assessment is that of tectonic similarity. This recognizes that certain regions of the globe may display similar geophysical characteristics, such as in the attenuation of seismic waves, the magnitude scaling properties of seismogenic sources or the seismic coupling of the lithosphere. Previous attempts at tectonic regionalization, particularly within a seismic hazard assessment context, have often been based on expert judgements; in most of these cases, the process for delineating tectonic regions is neither reproducible nor consistent from location to location. In this work, the regionalization process is implemented in a scheme that is reproducible, comprehensible from a geophysical rationale, and revisable when new relevant data are published. A spatial classification-scheme is developed based on fuzzy logic, enabling the quantification of concepts that are approximate rather than precise. Using the proposed methodology, we obtain a transparent and data-driven global tectonic regionalization model for seismic hazard applications as well as the subjective probabilities (e.g. degree of being active/degree of being cratonic) that indicate the degree to which a site belongs in a tectonic category.

  14. Modern data-driven decision support systems: the role of computing with words and computational linguistics

    Science.gov (United States)

    Kacprzyk, Janusz; Zadrożny, Sławomir

    2010-05-01

    We present how the conceptually and numerically simple concept of a fuzzy linguistic database summary can be a very powerful tool for gaining much insight into the very essence of data. The use of linguistic summaries provides tools for the verbalisation of data analysis (mining) results which, in addition to the more commonly used visualisation, e.g. via a graphical user interface, can contribute to an increased human consistency and ease of use, notably for supporting decision makers via the data-driven decision support system paradigm. Two new relevant aspects of the analysis are also outlined which were first initiated by the authors. First, following Kacprzyk and Zadrożny, it is further considered how linguistic data summarisation is closely related to some types of solutions used in natural language generation (NLG). This can make it possible to use more and more effective and efficient tools and techniques developed in NLG. Second, similar remarks are given on relations to systemic functional linguistics. Moreover, following Kacprzyk and Zadrożny, comments are given on an extremely relevant aspect of scalability of linguistic summarisation of data, using a new concept of a conceptual scalability.

  15. Data-driven classification of ventilated lung tissues using electrical impedance tomography

    International Nuclear Information System (INIS)

    Gómez-Laberge, Camille; Hogan, Matthew J; Elke, Gunnar; Weiler, Norbert; Frerichs, Inéz; Adler, Andy

    2011-01-01

    Current methods for identifying ventilated lung regions utilizing electrical impedance tomography images rely on dividing the image into arbitrary regions of interest (ROI), manually delineating ROI, or forming ROI with pixels whose signal properties surpass an arbitrary threshold. In this paper, we propose a novel application of a data-driven classification method to identify ventilated lung ROI based on forming k clusters from pixels with correlated signals. A standard first-order model for lung mechanics is then applied to determine which ROI correspond to ventilated lung tissue. We applied the method in an experimental study of 16 mechanically ventilated swine in the supine position, which underwent changes in positive end-expiratory pressure (PEEP) and fraction of inspired oxygen (F I O 2 ). In each stage of the experimental protocol, the method performed best with k = 4 and consistently identified 3 lung tissue ROI and 1 boundary tissue ROI in 15 of the 16 subjects. When testing for changes from baseline in lung position, tidal volume, and respiratory system compliance, we found that PEEP displaced the ventilated lung region dorsally by 2 cm, decreased tidal volume by 1.3%, and increased the respiratory system compliance time constant by 0.3 s. F I O 2 decreased tidal volume by 0.7%. All effects were tested at p < 0.05 with n = 16. These findings suggest that the proposed ROI detection method is robust and sensitive to ventilation dynamics in the experimental setting

  16. On the selection of user-defined parameters in data-driven stochastic subspace identification

    Science.gov (United States)

    Priori, C.; De Angelis, M.; Betti, R.

    2018-02-01

    The paper focuses on the time domain output-only technique called Data-Driven Stochastic Subspace Identification (DD-SSI); in order to identify modal models (frequencies, damping ratios and mode shapes), the role of its user-defined parameters is studied, and rules to determine their minimum values are proposed. Such investigation is carried out using, first, the time histories of structural responses to stationary excitations, with a large number of samples, satisfying the hypothesis on the input imposed by DD-SSI. Then, the case of non-stationary seismic excitations with a reduced number of samples is considered. In this paper, partitions of the data matrix different from the one proposed in the SSI literature are investigated, together with the influence of different choices of the weighting matrices. The study is carried out considering two different applications: (1) data obtained from vibration tests on a scaled structure and (2) in-situ tests on a reinforced concrete building. Referring to the former, the identification of a steel frame structure tested on a shaking table is performed using its responses in terms of absolute accelerations to a stationary (white noise) base excitation and to non-stationary seismic excitations of low intensity. Black-box and modal models are identified in both cases and the results are compared with those from an input-output subspace technique. With regards to the latter, the identification of a complex hospital building is conducted using data obtained from ambient vibration tests.

  17. Data-driven asthma endotypes defined from blood biomarker and gene expression data.

    Directory of Open Access Journals (Sweden)

    Barbara Jane George

    Full Text Available The diagnosis and treatment of childhood asthma is complicated by its mechanistically distinct subtypes (endotypes driven by genetic susceptibility and modulating environmental factors. Clinical biomarkers and blood gene expression were collected from a stratified, cross-sectional study of asthmatic and non-asthmatic children from Detroit, MI. This study describes four distinct asthma endotypes identified via a purely data-driven method. Our method was specifically designed to integrate blood gene expression and clinical biomarkers in a way that provides new mechanistic insights regarding the different asthma endotypes. For example, we describe metabolic syndrome-induced systemic inflammation as an associated factor in three of the four asthma endotypes. Context provided by the clinical biomarker data was essential in interpreting gene expression patterns and identifying putative endotypes, which emphasizes the importance of integrated approaches when studying complex disease etiologies. These synthesized patterns of gene expression and clinical markers from our research may lead to development of novel serum-based biomarker panels.

  18. Data Driven - Android based displays on data acquisition and system status

    CERN Document Server

    Canilho, Paulo

    2014-01-01

    For years, both hardware and software engineers have struggled with the acquisition of device information in a flexible and fast perspective, numerous devices cannot have their status quickly tested due to time limitation associated with the travelling to a computer terminal. For instance, in order to test a scintillator status, one has to inject beam into the device and quickly return to a terminal to see the results, this is not only time demanding but extremely inconvenient for the person responsible, it consumes time that would be used in more pressing matters. In this train of thoughts, the proposal of creating an interface to bring a stable, flexible, user friendly and data driven solution to this problem was created. Being the most common operative system for mobile display, the Android API proved to have the best efficient in financing, since it is based on an open source software, and in implementation difficulty since it’s backend development resides in JAVA calls and XML for visual representation...

  19. NERI PROJECT 99-119. TASK 2. DATA-DRIVEN PREDICTION OF PROCESS VARIABLES. FINAL REPORT

    Energy Technology Data Exchange (ETDEWEB)

    Upadhyaya, B.R.

    2003-04-10

    This report describes the detailed results for task 2 of DOE-NERI project number 99-119 entitled ''Automatic Development of Highly Reliable Control Architecture for Future Nuclear Power Plants''. This project is a collaboration effort between the Oak Ridge National Laboratory (ORNL,) The University of Tennessee, Knoxville (UTK) and the North Carolina State University (NCSU). UTK is the lead organization for Task 2 under contract number DE-FG03-99SF21906. Under task 2 we completed the development of data-driven models for the characterization of sub-system dynamics for predicting state variables, control functions, and expected control actions. We have also developed the ''Principal Component Analysis (PCA)'' approach for mapping system measurements, and a nonlinear system modeling approach called the ''Group Method of Data Handling (GMDH)'' with rational functions, and includes temporal data information for transient characterization. The majority of the results are presented in detailed reports for Phases 1 through 3 of our research, which are attached to this report.

  20. Data-driven fault detection, isolation and estimation of aircraft gas turbine engine actuator and sensors

    Science.gov (United States)

    Naderi, E.; Khorasani, K.

    2018-02-01

    In this work, a data-driven fault detection, isolation, and estimation (FDI&E) methodology is proposed and developed specifically for monitoring the aircraft gas turbine engine actuator and sensors. The proposed FDI&E filters are directly constructed by using only the available system I/O data at each operating point of the engine. The healthy gas turbine engine is stimulated by a sinusoidal input containing a limited number of frequencies. First, the associated system Markov parameters are estimated by using the FFT of the input and output signals to obtain the frequency response of the gas turbine engine. These data are then used for direct design and realization of the fault detection, isolation and estimation filters. Our proposed scheme therefore does not require any a priori knowledge of the system linear model or its number of poles and zeros at each operating point. We have investigated the effects of the size of the frequency response data on the performance of our proposed schemes. We have shown through comprehensive case studies simulations that desirable fault detection, isolation and estimation performance metrics defined in terms of the confusion matrix criterion can be achieved by having access to only the frequency response of the system at only a limited number of frequencies.

  1. A data-driven, mathematical model of mammalian cell cycle regulation.

    Directory of Open Access Journals (Sweden)

    Michael C Weis

    Full Text Available Few of >150 published cell cycle modeling efforts use significant levels of data for tuning and validation. This reflects the difficultly to generate correlated quantitative data, and it points out a critical uncertainty in modeling efforts. To develop a data-driven model of cell cycle regulation, we used contiguous, dynamic measurements over two time scales (minutes and hours calculated from static multiparametric cytometry data. The approach provided expression profiles of cyclin A2, cyclin B1, and phospho-S10-histone H3. The model was built by integrating and modifying two previously published models such that the model outputs for cyclins A and B fit cyclin expression measurements and the activation of B cyclin/Cdk1 coincided with phosphorylation of histone H3. The model depends on Cdh1-regulated cyclin degradation during G1, regulation of B cyclin/Cdk1 activity by cyclin A/Cdk via Wee1, and transcriptional control of the mitotic cyclins that reflects some of the current literature. We introduced autocatalytic transcription of E2F, E2F regulated transcription of cyclin B, Cdc20/Cdh1 mediated E2F degradation, enhanced transcription of mitotic cyclins during late S/early G2 phase, and the sustained synthesis of cyclin B during mitosis. These features produced a model with good correlation between state variable output and real measurements. Since the method of data generation is extensible, this model can be continually modified based on new correlated, quantitative data.

  2. A Dynamic Remote Sensing Data-Driven Approach for Oil Spill Simulation in the Sea

    Directory of Open Access Journals (Sweden)

    Jining Yan

    2015-05-01

    Full Text Available In view of the fact that oil spill remote sensing could only generate the oil slick information at a specific time and that traditional oil spill simulation models were not designed to deal with dynamic conditions, a dynamic data-driven application system (DDDAS was introduced. The DDDAS entails both the ability to incorporate additional data into an executing application and, in reverse, the ability of applications to dynamically steer the measurement process. Based on the DDDAS, combing a remote sensor system that detects oil spills with a numerical simulation, an integrated data processing, analysis, forecasting and emergency response system was established. Once an oil spill accident occurs, the DDDAS-based oil spill model receives information about the oil slick extracted from the dynamic remote sensor data in the simulation. Through comparison, information fusion and feedback updates, continuous and more precise oil spill simulation results can be obtained. Then, the simulation results can provide help for disaster control and clean-up. The Penglai, Xingang and Suizhong oil spill results showed our simulation model could increase the prediction accuracy and reduce the error caused by empirical parameters in existing simulation systems. Therefore, the DDDAS-based detection and simulation system can effectively improve oil spill simulation and diffusion forecasting, as well as provide decision-making information and technical support for emergency responses to oil spills.

  3. A data-driven decomposition approach to model aerodynamic forces on flapping airfoils

    Science.gov (United States)

    Raiola, Marco; Discetti, Stefano; Ianiro, Andrea

    2017-11-01

    In this work, we exploit a data-driven decomposition of experimental data from a flapping airfoil experiment with the aim of isolating the main contributions to the aerodynamic force and obtaining a phenomenological model. Experiments are carried out on a NACA 0012 airfoil in forward flight with both heaving and pitching motion. Velocity measurements of the near field are carried out with Planar PIV while force measurements are performed with a load cell. The phase-averaged velocity fields are transformed into the wing-fixed reference frame, allowing for a description of the field in a domain with fixed boundaries. The decomposition of the flow field is performed by means of the POD applied on the velocity fluctuations and then extended to the phase-averaged force data by means of the Extended POD approach. This choice is justified by the simple consideration that aerodynamic forces determine the largest contributions to the energetic balance in the flow field. Only the first 6 modes have a relevant contribution to the force. A clear relationship can be drawn between the force and the flow field modes. Moreover, the force modes are closely related (yet slightly different) to the contributions of the classic potential models in literature, allowing for their correction. This work has been supported by the Spanish MINECO under Grant TRA2013-41103-P.

  4. A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults

    Directory of Open Access Journals (Sweden)

    Rui Sun

    2017-09-01

    Full Text Available The use of Unmanned Aerial Vehicles (UAVs has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs’ flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS-based approach is presented for the detection of on-board navigation sensor faults in UAVs. Contrary to the classic UAV sensor fault detection algorithms, based on predefined or modelled faults, the proposed algorithm combines an online data training mechanism with the ANFIS-based decision system. The main advantages of this algorithm are that it allows real-time model-free residual analysis from Kalman Filter (KF estimates and the ANFIS to build a reliable fault detection system. In addition, it allows fast and accurate detection of faults, which makes it suitable for real-time applications. Experimental results have demonstrated the effectiveness of the proposed fault detection method in terms of accuracy and misdetection rate.

  5. Data-driven strategies for robust forecast of continuous glucose monitoring time-series.

    Science.gov (United States)

    Fiorini, Samuele; Martini, Chiara; Malpassi, Davide; Cordera, Renzo; Maggi, Davide; Verri, Alessandro; Barla, Annalisa

    2017-07-01

    Over the past decade, continuous glucose monitoring (CGM) has proven to be a very resourceful tool for diabetes management. To date, CGM devices are employed for both retrospective and online applications. Their use allows to better describe the patients' pathology as well as to achieve a better control of patients' level of glycemia. The analysis of CGM sensor data makes possible to observe a wide range of metrics, such as the glycemic variability during the day or the amount of time spent below or above certain glycemic thresholds. However, due to the high variability of the glycemic signals among sensors and individuals, CGM data analysis is a non-trivial task. Standard signal filtering solutions fall short when an appropriate model personalization is not applied. State-of-the-art data-driven strategies for online CGM forecasting rely upon the use of recursive filters. Each time a new sample is collected, such models need to adjust their parameters in order to predict the next glycemic level. In this paper we aim at demonstrating that the problem of online CGM forecasting can be successfully tackled by personalized machine learning models, that do not need to recursively update their parameters.

  6. Pengembangan Data Warehouse Menggunakan Pendekatan Data-Driven untuk Membantu Pengelolaan SDM

    Directory of Open Access Journals (Sweden)

    Mujiono Mujiono

    2016-01-01

    Full Text Available The basis of bureaucratic reform is the reform of human resources management. One supporting factor is the development of an employee database. To support the management of human resources required including data warehouse and business intelligent tools. The data warehouse is an integrated concept of reliable data storage to provide support to all the needs of the data analysis. In this study developed a data warehouse using the data-driven approach to the source data comes from SIMPEG, SAPK and electronic presence. Data warehouses are designed using the nine steps methodology and unified modeling language (UML notation. Extract transform load (ETL is done by using Pentaho Data Integration by applying transformation maps. Furthermore, to help human resource management, the system is built to perform online analytical processing (OLAP to facilitate web-based information. In this study generated BI application development framework with Model-View-Controller (MVC architecture and OLAP operations are built using the dynamic query generation, PivotTable, and HighChart to present information about PNS, CPNS, Retirement, Kenpa and Presence

  7. Review of the Remaining Useful Life Prognostics of Vehicle Lithium-Ion Batteries Using Data-Driven Methodologies

    Directory of Open Access Journals (Sweden)

    Lifeng Wu

    2016-05-01

    Full Text Available Lithium-ion batteries are the primary power source in electric vehicles, and the prognosis of their remaining useful life is vital for ensuring the safety, stability, and long lifetime of electric vehicles. Accurately establishing a mechanism model of a vehicle lithium-ion battery involves a complex electrochemical process. Remaining useful life (RUL prognostics based on data-driven methods has become a focus of research. Current research on data-driven methodologies is summarized in this paper. By analyzing the problems of vehicle lithium-ion batteries in practical applications, the problems that need to be solved in the future are identified.

  8. NGBAuth - Next Generation Batch Authentication for long running batch jobs.

    CERN Document Server

    Juto, Zakarias

    2015-01-01

    This document describes the prototyping of a new solution for the CERN batch authentication of long running jobs. While the job submission requires valid user credentials, these have to be renewed due to long queuing and execution times. Described within is a new system which will guarantee a similar level of security as the old LSFAuth while simplifying the implementation and the overall architecture. The new system is being built on solid, streamlined and tested components (notably OpenSSL) and a priority has been to make it more generic in order to facilitate the evolution of the current system such as for the expected migration from LSF to Condor as backend batch system.

  9. PROOF on a Batch System

    International Nuclear Information System (INIS)

    Behrenhoff, W; Ehrenfeld, W; Samson, J; Stadie, H

    2011-01-01

    The 'parallel ROOT facility' (PROOF) from the ROOT framework provides a mechanism to distribute the load of interactive and non-interactive ROOT sessions on a set of worker nodes optimising the overall execution time. While PROOF is designed to work on a dedicated PROOF cluster, the benefits of PROOF can also be used on top of another batch scheduling system with the help of temporary per user PROOF clusters. We will present a lightweight tool which starts a temporary PROOF cluster on a SGE based batch cluster or, via a plugin mechanism, e.g. on a set of bare desktops via ssh. Further, we will present the result of benchmarks which compare the data throughput for different data storage back ends available at the German National Analysis Facility (NAF) at DESY.

  10. Open-source chemogenomic data-driven algorithms for predicting drug-target interactions.

    Science.gov (United States)

    Hao, Ming; Bryant, Stephen H; Wang, Yanli

    2018-02-06

    While novel technologies such as high-throughput screening have advanced together with significant investment by pharmaceutical companies during the past decades, the success rate for drug development has not yet been improved prompting researchers looking for new strategies of drug discovery. Drug repositioning is a potential approach to solve this dilemma. However, experimental identification and validation of potential drug targets encoded by the human genome is both costly and time-consuming. Therefore, effective computational approaches have been proposed to facilitate drug repositioning, which have proved to be successful in drug discovery. Doubtlessly, the availability of open-accessible data from basic chemical biology research and the success of human genome sequencing are crucial to develop effective in silico drug repositioning methods allowing the identification of potential targets for existing drugs. In this work, we review several chemogenomic data-driven computational algorithms with source codes publicly accessible for predicting drug-target interactions (DTIs). We organize these algorithms by model properties and model evolutionary relationships. We re-implemented five representative algorithms in R programming language, and compared these algorithms by means of mean percentile ranking, a new recall-based evaluation metric in the DTI prediction research field. We anticipate that this review will be objective and helpful to researchers who would like to further improve existing algorithms or need to choose appropriate algorithms to infer potential DTIs in the projects. The source codes for DTI predictions are available at: https://github.com/minghao2016/chemogenomicAlg4DTIpred. Published by Oxford University Press 2018. This work is written by US Government employees and is in the public domain in the US.

  11. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application

    Directory of Open Access Journals (Sweden)

    A. Elshorbagy

    2010-10-01

    Full Text Available In this second part of the two-part paper, the data driven modeling (DDM experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs, genetic programming (GP, evolutionary polynomial regression (EPR, Support vector machines (SVM, M5 model trees (M5, K-nearest neighbors (K-nn, and multiple linear regression (MLR techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it

  12. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 2: Application

    Science.gov (United States)

    Elshorbagy, A.; Corzo, G.; Srinivasulu, S.; Solomatine, D. P.

    2010-10-01

    In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K-nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should

  13. Data-driven analysis of simultaneous EEG/fMRI reveals neurophysiological phenotypes of impulse control.

    Science.gov (United States)

    Schmüser, Lena; Sebastian, Alexandra; Mobascher, Arian; Lieb, Klaus; Feige, Bernd; Tüscher, Oliver

    2016-09-01

    Response inhibition is the ability to suppress inadequate but prepotent or ongoing response tendencies. A fronto-striatal network is involved in these processes. Between-subject differences in the intra-individual variability have been suggested to constitute a key to pathological processes underlying impulse control disorders. Single-trial EEG/fMRI analysis allows to increase sensitivity for inter-individual differences by incorporating intra-individual variability. Thirty-eight healthy subjects performed a visual Go/Nogo task during simultaneous EEG/fMRI. Of 38 healthy subjects, 21 subjects reliably showed Nogo-related ICs (Nogo-IC-positive) while 17 subjects (Nogo-IC-negative) did not. Comparing both groups revealed differences on various levels: On trait level, Nogo-IC-negative subjects scored higher on questionnaires regarding attention deficit/hyperactivity disorder; on a behavioral level, they displayed slower response times (RT) and higher intra-individual RT variability while both groups did not differ in their inhibitory performance. On the neurophysiological level, Nogo-IC-negative subjects showed a hyperactivation of left inferior frontal cortex/insula and left putamen as well as significantly reduced P3 amplitudes. Thus, a data-driven approach for IC classification and the resulting presence or absence of early Nogo-specific ICs as criterion for group selection revealed group differences at behavioral and neurophysiological levels. This may indicate electrophysiological phenotypes characterized by inter-individual variations of neural and behavioral correlates of impulse control. We demonstrated that the inter-individual difference in an electrophysiological correlate of response inhibition is correlated with distinct, potentially compensatory neural activity. This may suggest the existence of electrophysiologically dissociable phenotypes of behavioral and neural motor response inhibition with the Nogo-IC-positive phenotype possibly providing

  14. Access Control with Delegated Authorization Policy Evaluation for Data-Driven Microservice Workflows

    Directory of Open Access Journals (Sweden)

    Davy Preuveneers

    2017-09-01

    Full Text Available Microservices offer a compelling competitive advantage for building data flow systems as a choreography of self-contained data endpoints that each implement a specific data processing functionality. Such a ‘single responsibility principle’ design makes them well suited for constructing scalable and flexible data integration and real-time data flow applications. In this paper, we investigate microservice based data processing workflows from a security point of view, i.e., (1 how to constrain data processing workflows with respect to dynamic authorization policies granting or denying access to certain microservice results depending on the flow of the data; (2 how to let multiple microservices contribute to a collective data-driven authorization decision and (3 how to put adequate measures in place such that the data within each individual microservice is protected against illegitimate access from unauthorized users or other microservices. Due to this multifold objective, enforcing access control on the data endpoints to prevent information leakage or preserve one’s privacy becomes far more challenging, as authorization policies can have dependencies and decision outcomes cross-cutting data in multiple microservices. To address this challenge, we present and evaluate a workflow-oriented authorization framework that enforces authorization policies in a decentralized manner and where the delegated policy evaluation leverages feature toggles that are managed at runtime by software circuit breakers to secure the distributed data processing workflows. The benefit of our solution is that, on the one hand, authorization policies restrict access to the data endpoints of the microservices, and on the other hand, microservices can safely rely on other data endpoints to collectively evaluate cross-cutting access control decisions without having to rely on a shared storage backend holding all the necessary information for the policy evaluation.

  15. A data-driven approach for denoising GNSS position time series

    Science.gov (United States)

    Li, Yanyan; Xu, Caijun; Yi, Lei; Fang, Rongxin

    2017-12-01

    Global navigation satellite system (GNSS) datasets suffer from common mode error (CME) and other unmodeled errors. To decrease the noise level in GNSS positioning, we propose a new data-driven adaptive multiscale denoising method in this paper. Both synthetic and real-world long-term GNSS datasets were employed to assess the performance of the proposed method, and its results were compared with those of stacking filtering, principal component analysis (PCA) and the recently developed multiscale multiway PCA. It is found that the proposed method can significantly eliminate the high-frequency white noise and remove the low-frequency CME. Furthermore, the proposed method is more precise for denoising GNSS signals than the other denoising methods. For example, in the real-world example, our method reduces the mean standard deviation of the north, east and vertical components from 1.54 to 0.26, 1.64 to 0.21 and 4.80 to 0.72 mm, respectively. Noise analysis indicates that for the original signals, a combination of power-law plus white noise model can be identified as the best noise model. For the filtered time series using our method, the generalized Gauss-Markov model is the best noise model with the spectral indices close to - 3, indicating that flicker walk noise can be identified. Moreover, the common mode error in the unfiltered time series is significantly reduced by the proposed method. After filtering with our method, a combination of power-law plus white noise model is the best noise model for the CMEs in the study region.

  16. Dynamic Data-Driven Prediction of Lean Blowout in a Swirl-Stabilized Combustor

    Directory of Open Access Journals (Sweden)

    Soumalya Sarkar

    2015-09-01

    Full Text Available This paper addresses dynamic data-driven prediction of lean blowout (LBO phenomena in confined combustion processes, which are prevalent in many physical applications (e.g., land-based and aircraft gas-turbine engines. The underlying concept is built upon pattern classification and is validated for LBO prediction with time series of chemiluminescence sensor data from a laboratory-scale swirl-stabilized dump combustor. The proposed method of LBO prediction makes use of the theory of symbolic dynamics, where (finite-length time series data are partitioned to produce symbol strings that, in turn, generate a special class of probabilistic finite state automata (PFSA. These PFSA, called D-Markov machines, have a deterministic algebraic structure and their states are represented by symbol blocks of length D or less, where D is a positive integer. The D-Markov machines are constructed in two steps: (i state splitting, i.e., the states are split based on their information contents, and (ii state merging, i.e., two or more states (of possibly different lengths are merged together to form a new state without any significant loss of the embedded information. The modeling complexity (e.g., number of states of a D-Markov machine model is observed to be drastically reduced as the combustor approaches LBO. An anomaly measure, based on Kullback-Leibler divergence, is constructed to predict the proximity of LBO. The problem of LBO prediction is posed in a pattern classification setting and the underlying algorithms have been tested on experimental data at different extents of fuel-air premixing and fuel/air ratio. It is shown that, over a wide range of fuel-air premixing, D-Markov machines with D > 1 perform better as predictors of LBO than those with D = 1.

  17. Full field reservoir modeling of shale assets using advanced data-driven analytics

    Directory of Open Access Journals (Sweden)

    Soodabeh Esmaili

    2016-01-01

    Full Text Available Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism (sorption process and flow behavior in complex fracture systems - induced or natural leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this technology is that it incorporates the so-called “hard data” directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The “hard data” refers to field measurements during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of “soft data” (non-measured, interpretive data such as frac length, width, height and conductivity in the reservoir models. The study focuses on a Marcellus shale asset that includes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.

  18. Assigning clinical codes with data-driven concept representation on Dutch clinical free text.

    Science.gov (United States)

    Scheurwegs, Elyne; Luyckx, Kim; Luyten, Léon; Goethals, Bart; Daelemans, Walter

    2017-05-01

    Clinical codes are used for public reporting purposes, are fundamental to determining public financing for hospitals, and form the basis for reimbursement claims to insurance providers. They are assigned to a patient stay to reflect the diagnosis and performed procedures during that stay. This paper aims to enrich algorithms for automated clinical coding by taking a data-driven approach and by using unsupervised and semi-supervised techniques for the extraction of multi-word expressions that convey a generalisable medical meaning (referred to as concepts). Several methods for extracting concepts from text are compared, two of which are constructed from a large unannotated corpus of clinical free text. A distributional semantic model (i.c. the word2vec skip-gram model) is used to generalize over concepts and retrieve relations between them. These methods are validated on three sets of patient stay data, in the disease areas of urology, cardiology, and gastroenterology. The datasets are in Dutch, which introduces a limitation on available concept definitions from expert-based ontologies (e.g. UMLS). The results show that when expert-based knowledge in ontologies is unavailable, concepts derived from raw clinical texts are a reliable alternative. Both concepts derived from raw clinical texts perform and concepts derived from expert-created dictionaries outperform a bag-of-words approach in clinical code assignment. Adding features based on tokens that appear in a semantically similar context has a positive influence for predicting diagnostic codes. Furthermore, the experiments indicate that a distributional semantics model can find relations between semantically related concepts in texts but also introduces erroneous and redundant relations, which can undermine clinical coding performance. Copyright © 2017. Published by Elsevier Inc.

  19. WaveSeq: a novel data-driven method of detecting histone modification enrichments using wavelets.

    Directory of Open Access Journals (Sweden)

    Apratim Mitra

    Full Text Available BACKGROUND: Chromatin immunoprecipitation followed by next-generation sequencing is a genome-wide analysis technique that can be used to detect various epigenetic phenomena such as, transcription factor binding sites and histone modifications. Histone modification profiles can be either punctate or diffuse which makes it difficult to distinguish regions of enrichment from background noise. With the discovery of histone marks having a wide variety of enrichment patterns, there is an urgent need for analysis methods that are robust to various data characteristics and capable of detecting a broad range of enrichment patterns. RESULTS: To address these challenges we propose WaveSeq, a novel data-driven method of detecting regions of significant enrichment in ChIP-Seq data. Our approach utilizes the wavelet transform, is free of distributional assumptions and is robust to diverse data characteristics such as low signal-to-noise ratios and broad enrichment patterns. Using publicly available datasets we showed that WaveSeq compares favorably with other published methods, exhibiting high sensitivity and precision for both punctate and diffuse enrichment regions even in the absence of a control data set. The application of our algorithm to a complex histone modification data set helped make novel functional discoveries which further underlined its utility in such an experimental setup. CONCLUSIONS: WaveSeq is a highly sensitive method capable of accurate identification of enriched regions in a broad range of data sets. WaveSeq can detect both narrow and broad peaks with a high degree of accuracy even in low signal-to-noise ratio data sets. WaveSeq is also suited for application in complex experimental scenarios, helping make biologically relevant functional discoveries.

  20. Data Science and its Relationship to Big Data and Data-Driven Decision Making.

    Science.gov (United States)

    Provost, Foster; Fawcett, Tom

    2013-03-01

    Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot-even "sexy"-career choice. However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science. One reason is that data science is intricately intertwined with other important concepts also of growing importance, such as big data and data-driven decision making. Another reason is the natural tendency to associate what a practitioner does with the definition of the practitioner's field; this can result in overlooking the fundamentals of the field. We believe that trying to define the boundaries of data science precisely is not of the utmost importance. We can debate the boundaries of the field in an academic setting, but in order for data science to serve business effectively, it is important (i) to understand its relationships to other important related concepts, and (ii) to begin to identify the fundamental principles underlying data science. Once we embrace (ii), we can much better understand and explain exactly what data science has to offer. Furthermore, only once we embrace (ii) should we be comfortable calling it data science. In this article, we present a perspective that addresses all these concepts. We close by offering, as examples, a partial list of fundamental principles underlying data science.

  1. Practical aspects of data-driven motion correction approach for brain SPECT

    International Nuclear Information System (INIS)

    Kyme, A.Z.; Hutton, B.F.; Hatton, R.L.; Skerrett, D.; Barnden, L.

    2002-01-01

    Full text: Patient motion can cause image artifacts in SPECT despite restraining measures. Data-driven detection and correction of motion can be achieved by comparison of acquired data with the forward-projections. By optimising the orientation of a partial reconstruction, parameters can be obtained for each misaligned projection and applied to update this volume using a 3D reconstruction algorithm. Phantom validation was performed to explore practical aspects of this approach. Noisy projection datasets simulating a patient undergoing at least one fully 3D movement during acquisition were compiled from various projections of the digital Hoffman brain phantom. Motion correction was then applied to the reconstructed studies. Correction success was assessed visually and quantitatively. Resilience with respect to subset order and missing data in the reconstruction and updating stages, detector geometry considerations, and the need for implementing an iterated correction were assessed in the process. Effective correction of the corrupted studies was achieved. Visually, artifactual regions in the reconstructed slices were suppressed and/or removed. Typically the ratio of mean square difference between the corrected and reference studies compared to that between the corrupted and reference studies was > 2. Although components of the motions are missed using a single-head implementation, improvement was still evident in the correction. The need for multiple iterations in the approach was small due to the bulk of misalignment errors being corrected in the first pass. Dispersion of subsets for reconstructing and updating the partial reconstruction appears to give optimal correction. Further validation is underway using triple-head physical phantom data. Copyright (2002) The Australian and New Zealand Society of Nuclear Medicine Inc

  2. Data-driven haemodynamic response function extraction using Fourier-wavelet regularised deconvolution

    Directory of Open Access Journals (Sweden)

    Roerdink Jos BTM

    2008-04-01

    Full Text Available Abstract Background We present a simple, data-driven method to extract haemodynamic response functions (HRF from functional magnetic resonance imaging (fMRI time series, based on the Fourier-wavelet regularised deconvolution (ForWaRD technique. HRF data are required for many fMRI applications, such as defining region-specific HRFs, effciently representing a general HRF, or comparing subject-specific HRFs. Results ForWaRD is applied to fMRI time signals, after removing low-frequency trends by a wavelet-based method, and the output of ForWaRD is a time series of volumes, containing the HRF in each voxel. Compared to more complex methods, this extraction algorithm requires few assumptions (separability of signal and noise in the frequency and wavelet domains and the general linear model and it is fast (HRF extraction from a single fMRI data set takes about the same time as spatial resampling. The extraction method is tested on simulated event-related activation signals, contaminated with noise from a time series of real MRI images. An application for HRF data is demonstrated in a simple event-related experiment: data are extracted from a region with significant effects of interest in a first time series. A continuous-time HRF is obtained by fitting a nonlinear function to the discrete HRF coeffcients, and is then used to analyse a later time series. Conclusion With the parameters used in this paper, the extraction method presented here is very robust to changes in signal properties. Comparison of analyses with fitted HRFs and with a canonical HRF shows that a subject-specific, regional HRF significantly improves detection power. Sensitivity and specificity increase not only in the region from which the HRFs are extracted, but also in other regions of interest.

  3. Management and Nonlinear Analysis of Disinfection System of Water Distribution Networks Using Data Driven Methods

    Directory of Open Access Journals (Sweden)

    Mohammad Zounemat-Kermani

    2018-03-01

    Full Text Available Chlorination unit is widely used to supply safe drinking water and removal of pathogens from water distribution networks. Data-driven approach is one appropriate method for analyzing performance of chlorine in water supply network. In this study, multi-layer perceptron neural network (MLP with three training algorithms (gradient descent, conjugate gradient and BFGS and support vector machine (SVM with RBF kernel function were used to predict the concentration of residual chlorine in water supply networks of Ahmadabad Dafeh and Ahruiyeh villages in Kerman Province. Daily data including discharge (flow, chlorine consumption and residual chlorine were employed from the beginning of 1391 Hijri until the end of 1393 Hijri (for 3 years. To assess the performance of studied models, the criteria such as Nash-Sutcliffe efficiency (NS, root mean square error (RMSE, mean absolute percentage error (MAPE and correlation coefficient (CORR were used that in best modeling situation were 0.9484, 0.0255, 1.081, and 0.974 respectively which resulted from BFGS algorithm. The criteria indicated that MLP model with BFGS and conjugate gradient algorithms were better than all other models in 90 and 10 percent of cases respectively; while the MLP model based on gradient descent algorithm and the SVM model were better in none of the cases. According to the results of this study, proper management of chlorine concentration can be implemented by predicted values of residual chlorine in water supply network. Thus, decreased performance of perceptron network and support vector machine in water supply network of Ahruiyeh in comparison to Ahmadabad Dafeh can be inferred from improper management of chlorination.

  4. Simulation of shallow groundwater levels: Comparison of a data-driven and a conceptual model

    Science.gov (United States)

    Fahle, Marcus; Dietrich, Ottfried; Lischeid, Gunnar

    2015-04-01

    Despite an abundance of models aimed at simulating shallow groundwater levels, application of such models is often hampered by a lack of appropriate input data. Difficulties especially arise with regard to soil data, which are typically hard to obtain and prone to spatial variability, eventually leading to uncertainties in the model results. Modelling approaches relying entirely on easily measured quantities are therefore an alternative to encourage the applicability of models. We present and compare two models for calculating 1-day-ahead predictions of the groundwater level that are only based on measurements of potential evapotranspiration, precipitation and groundwater levels. The first model is a newly developed conceptual model that is parametrized using the White method (which estimates the actual evapotranspiration on basis of diurnal groundwater fluctuations) and a rainfall-response ratio. Inverted versions of the two latter approaches are then used to calculate the predictions of the groundwater level. Furthermore, as a completely data-driven alternative, a simple feed-forward multilayer perceptron neural network was trained based on the same inputs and outputs. Data of 4 growing periods (April to October) from a study site situated in the Spreewald wetland in North-east Germany were taken to set-up the models and compare their performance. In addition, response surfaces that relate model outputs to combinations of different input variables are used to reveal those aspects in which the two approaches coincide and those in which they differ. Finally, it will be evaluated whether the conceptual approach can be enhanced by extracting knowledge of the neural network. This is done by replacing in the conceptual model the default function that relates groundwater recharge and groundwater level, which is assumed to be linear, by the non-linear function extracted from the neural network.

  5. Alaska/Yukon Geoid Improvement by a Data-Driven Stokes's Kernel Modification Approach

    Science.gov (United States)

    Li, Xiaopeng; Roman, Daniel R.

    2015-04-01

    Geoid modeling over Alaska of USA and Yukon Canada being a trans-national issue faces a great challenge primarily due to the inhomogeneous surface gravity data (Saleh et al, 2013) and the dynamic geology (Freymueller et al, 2008) as well as its complex geological rheology. Previous study (Roman and Li 2014) used updated satellite models (Bruinsma et al 2013) and newly acquired aerogravity data from the GRAV-D project (Smith 2007) to capture the gravity field changes in the targeting areas primarily in the middle-to-long wavelength. In CONUS, the geoid model was largely improved. However, the precision of the resulted geoid model in Alaska was still in the decimeter level, 19cm at the 32 tide bench marks and 24cm on the 202 GPS/Leveling bench marks that gives a total of 23.8cm at all of these calibrated surface control points, where the datum bias was removed. Conventional kernel modification methods in this area (Li and Wang 2011) had limited effects on improving the precision of the geoid models. To compensate the geoid miss fits, a new Stokes's kernel modification method based on a data-driven technique is presented in this study. First, the method was tested on simulated data sets (Fig. 1), where the geoid errors have been reduced by 2 orders of magnitude (Fig 2). For the real data sets, some iteration steps are required to overcome the rank deficiency problem caused by the limited control data that are irregularly distributed in the target area. For instance, after 3 iterations, the standard deviation dropped about 2.7cm (Fig 3). Modification at other critical degrees can further minimize the geoid model miss fits caused either by the gravity error or the remaining datum error in the control points.

  6. Data Driven Trigger Design and Analysis for the NOvA Experiment

    Energy Technology Data Exchange (ETDEWEB)

    Kurbanov, Serdar [Univ. of Virginia, Charlottesville, VA (United States)

    2016-01-01

    This thesis primarily describes analysis related to studying the Moon shadow with cosmic rays, an analysis using upward-going muons trigger data, and other work done as part of MSc thesis work conducted at Fermi National Laboratory. While at Fermilab I made hardware and software contributions to two experiments - NOvA and Mu2e. NOvA is a neutrino experiment with the primary goal of measuring parameters related to neutrino oscillation. This is a running experiment, so it's possible to provide analysis of real beam and cosmic data. Most of this work was related to the Data-Driven Trigger (DDT) system of NOvA. The results of the Upward-Going muon analysis was presented at ICHEP in August 2016. The analysis demonstrates the proof of principle for a low-mass dark matter search. Mu2e is an experiment currently being built at Fermilab. Its primary goal is to detect the hypothetical neutrinoless conversion from a muon into an electron. I contributed to the production and tests of Cathode Strip Chambers (CSCs) which are required for testing the Cosmic Ray Veto (CRV) system for the experiment. This contribution is described in the last chapter along with a short description of the technical work provided for the DDT system of the NOvA experiment. All of the work described in this thesis will be extended by the next generation of UVA graduate students and postdocs as new data is collected by the experiment. I hope my eorts of have helped lay the foundation for many years of beautiful results from Mu2e and NOvA.

  7. Data-driven nutrient analysis and reality check: Human inputs, catchment delivery and management effects

    Science.gov (United States)

    Destouni, G.

    2017-12-01

    Measures for mitigating nutrient loads to aquatic ecosystems should have observable effects, e.g, in the Baltic region after joint first periods of nutrient management actions under the Baltic Sea Action Plan (BASP; since 2007) and the EU Water Framework Directive (WFD; since 2009). Looking for such observable effects, all openly available water and nutrient monitoring data since 2003 are compiled and analyzed for Sweden as a case study. Results show that hydro-climatically driven water discharge dominates the determination of waterborne loads of both phosphorus and nitrogen. Furthermore, the nutrient loads and water discharge are all similarly well correlated with the ecosystem status classification of Swedish water bodies according to the WFD. Nutrient concentrations, which are hydro-climatically correlated and should thus reflect human effects better than loads, have changed only slightly over the study period (2003-2013) and even increased in moderate-to-bad status waters, where the WFD and BSAP jointly target nutrient decreases. These results indicate insufficient distinction and mitigation of human-driven nutrient components by the internationally harmonized applications of both the WFD and the BSAP. Aiming for better general identification of such components, nutrient data for the large transboundary catchments of the Baltic Sea and the Sava River are compared. The comparison shows cross-regional consistency in nutrient relationships to driving hydro-climatic conditions (water discharge) for nutrient loads, and socio-economic conditions (population density and farmland share) for nutrient concentrations. A data-driven screening methodology is further developed for estimating nutrient input and retention-delivery in catchments. Its first application to nested Sava River catchments identifies characteristic regional values of nutrient input per area and relative delivery, and hotspots of much larger inputs, related to urban high-population areas.

  8. A review on data-driven fault severity assessment in rolling bearings

    Science.gov (United States)

    Cerrada, Mariela; Sánchez, René-Vinicio; Li, Chuan; Pacheco, Fannia; Cabrera, Diego; Valente de Oliveira, José; Vásquez, Rafael E.

    2018-01-01

    Health condition monitoring of rotating machinery is a crucial task to guarantee reliability in industrial processes. In particular, bearings are mechanical components used in most rotating devices and they represent the main source of faults in such equipments; reason for which research activities on detecting and diagnosing their faults have increased. Fault detection aims at identifying whether the device is or not in a fault condition, and diagnosis is commonly oriented towards identifying the fault mode of the device, after detection. An important step after fault detection and diagnosis is the analysis of the magnitude or the degradation level of the fault, because this represents a support to the decision-making process in condition based-maintenance. However, no extensive works are devoted to analyse this problem, or some works tackle it from the fault diagnosis point of view. In a rough manner, fault severity is associated with the magnitude of the fault. In bearings, fault severity can be related to the physical size of fault or a general degradation of the component. Due to literature regarding the severity assessment of bearing damages is limited, this paper aims at discussing the recent methods and techniques used to achieve the fault severity evaluation in the main components of the rolling bearings, such as inner race, outer race, and ball. The review is mainly focused on data-driven approaches such as signal processing for extracting the proper fault signatures associated with the damage degradation, and learning approaches that are used to identify degradation patterns with regards to health conditions. Finally, new challenges are highlighted in order to develop new contributions in this field.

  9. Data-Driven Modeling of Complex Systems by means of a Dynamical ANN

    Science.gov (United States)

    Seleznev, A.; Mukhin, D.; Gavrilov, A.; Loskutov, E.; Feigin, A.

    2017-12-01

    The data-driven methods for modeling and prognosis of complex dynamical systems become more and more popular in various fields due to growth of high-resolution data. We distinguish the two basic steps in such an approach: (i) determining the phase subspace of the system, or embedding, from available time series and (ii) constructing an evolution operator acting in this reduced subspace. In this work we suggest a novel approach combining these two steps by means of construction of an artificial neural network (ANN) with special topology. The proposed ANN-based model, on the one hand, projects the data onto a low-dimensional manifold, and, on the other hand, models a dynamical system on this manifold. Actually, this is a recurrent multilayer ANN which has internal dynamics and capable of generating time series. Very important point of the proposed methodology is the optimization of the model allowing us to avoid overfitting: we use Bayesian criterion to optimize the ANN structure and estimate both the degree of evolution operator nonlinearity and the complexity of nonlinear manifold which the data are projected on. The proposed modeling technique will be applied to the analysis of high-dimensional dynamical systems: Lorenz'96 model of atmospheric turbulence, producing high-dimensional space-time chaos, and quasi-geostrophic three-layer model of the Earth's atmosphere with the natural orography, describing the dynamics of synoptical vortexes as well as mesoscale blocking systems. The possibility of application of the proposed methodology to analyze real measured data is also discussed. The study was supported by the Russian Science Foundation (grant #16-12-10198).

  10. Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

    Energy Technology Data Exchange (ETDEWEB)

    Leistedt, Boris; Hogg, David W., E-mail: boris.leistedt@nyu.edu, E-mail: david.hogg@nyu.edu [Center for Cosmology and Particle Physics, Department of Physics, New York University, New York, NY 10003 (United States)

    2017-03-20

    We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux–redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i -magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST ) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

  11. Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

    International Nuclear Information System (INIS)

    Leistedt, Boris; Hogg, David W.

    2017-01-01

    We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux–redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i -magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST ) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

  12. Data-Driven Method for Wind Turbine Yaw Angle Sensor Zero-Point Shifting Fault Detection

    Directory of Open Access Journals (Sweden)

    Yan Pei

    2018-03-01

    Full Text Available Wind turbine yaw control plays an important role in increasing the wind turbine production and also in protecting the wind turbine. Accurate measurement of yaw angle is the basis of an effective wind turbine yaw controller. The accuracy of yaw angle measurement is affected significantly by the problem of zero-point shifting. Hence, it is essential to evaluate the zero-point shifting error on wind turbines on-line in order to improve the reliability of yaw angle measurement in real time. Particularly, qualitative evaluation of the zero-point shifting error could be useful for wind farm operators to realize prompt and cost-effective maintenance on yaw angle sensors. In the aim of qualitatively evaluating the zero-point shifting error, the yaw angle sensor zero-point shifting fault is firstly defined in this paper. A data-driven method is then proposed to detect the zero-point shifting fault based on Supervisory Control and Data Acquisition (SCADA data. The zero-point shifting fault is detected in the proposed method by analyzing the power performance under different yaw angles. The SCADA data are partitioned into different bins according to both wind speed and yaw angle in order to deeply evaluate the power performance. An indicator is proposed in this method for power performance evaluation under each yaw angle. The yaw angle with the largest indicator is considered as the yaw angle measurement error in our work. A zero-point shifting fault would trigger an alarm if the error is larger than a predefined threshold. Case studies from several actual wind farms proved the effectiveness of the proposed method in detecting zero-point shifting fault and also in improving the wind turbine performance. Results of the proposed method could be useful for wind farm operators to realize prompt adjustment if there exists a large error of yaw angle measurement.

  13. Enhancing Transparency and Control When Drawing Data-Driven Inferences About Individuals.

    Science.gov (United States)

    Chen, Daizhuo; Fraiberger, Samuel P; Moakler, Robert; Provost, Foster

    2017-09-01

    Recent studies show the remarkable power of fine-grained information disclosed by users on social network sites to infer users' personal characteristics via predictive modeling. Similar fine-grained data are being used successfully in other commercial applications. In response, attention is turning increasingly to the transparency that organizations provide to users as to what inferences are drawn and why, as well as to what sort of control users can be given over inferences that are drawn about them. In this article, we focus on inferences about personal characteristics based on information disclosed by users' online actions. As a use case, we explore personal inferences that are made possible from "Likes" on Facebook. We first present a means for providing transparency into the information responsible for inferences drawn by data-driven models. We then introduce the "cloaking device"-a mechanism for users to inhibit the use of particular pieces of information in inference. Using these analytical tools we ask two main questions: (1) How much information must users cloak to significantly affect inferences about their personal traits? We find that usually users must cloak only a small portion of their actions to inhibit inference. We also find that, encouragingly, false-positive inferences are significantly easier to cloak than true-positive inferences. (2) Can firms change their modeling behavior to make cloaking more difficult? The answer is a definitive yes. We demonstrate a simple modeling change that requires users to cloak substantially more information to affect the inferences drawn. The upshot is that organizations can provide transparency and control even into complicated, predictive model-driven inferences, but they also can make control easier or harder for their users.

  14. Disruption of functional networks in dyslexia: A whole-brain, data-driven analysis of connectivity

    Science.gov (United States)

    Finn, Emily S.; Shen, Xilin; Holahan, John M.; Scheinost, Dustin; Lacadie, Cheryl; Papademetris, Xenophon; Shaywitz, Sally E.; Shaywitz, Bennett A.; Constable, R. Todd

    2013-01-01

    Background Functional connectivity analyses of fMRI data are a powerful tool for characterizing brain networks and how they are disrupted in neural disorders. However, many such analyses examine only one or a small number of a priori seed regions. Studies that consider the whole brain frequently rely on anatomic atlases to define network nodes, which may result in mixing distinct activation timecourses within a single node. Here, we improve upon previous methods by using a data-driven brain parcellation to compare connectivity profiles of dyslexic (DYS) versus non-impaired (NI) readers in the first whole-brain functional connectivity analysis of dyslexia. Methods Whole-brain connectivity was assessed in children (n = 75; 43 NI, 32 DYS) and adult (n = 104; 64 NI, 40 DYS) readers. Results Compared to NI readers, DYS readers showed divergent connectivity within the visual pathway and between visual association areas and prefrontal attention areas; increased right-hemisphere connectivity; reduced connectivity in the visual word-form area (part of the left fusiform gyrus specialized for printed words); and persistent connectivity to anterior language regions around the inferior frontal gyrus. Conclusions Together, findings suggest that NI readers are better able to integrate visual information and modulate their attention to visual stimuli, allowing them to recognize words based on their visual properties, while DYS readers recruit altered reading circuits and rely on laborious phonology-based “sounding out” strategies into adulthood. These results deepen our understanding of the neural basis of dyslexia and highlight the importance of synchrony between diverse brain regions for successful reading. PMID:24124929

  15. Improving Spoken Language Outcomes for Children With Hearing Loss: Data-driven Instruction.

    Science.gov (United States)

    Douglas, Michael

    2016-02-01

    To assess the effects of data-driven instruction (DDI) on spoken language outcomes of children with cochlear implants and hearing aids. Retrospective, matched-pairs comparison of post-treatment speech/language data of children who did and did not receive DDI. Private, spoken-language preschool for children with hearing loss. Eleven matched pairs of children with cochlear implants who attended the same spoken language preschool. Groups were matched for age of hearing device fitting, time in the program, degree of predevice fitting hearing loss, sex, and age at testing. Daily informal language samples were collected and analyzed over a 2-year period, per preschool protocol. Annual informal and formal spoken language assessments in articulation, vocabulary, and omnibus language were administered at the end of three time intervals: baseline, end of year one, and end of year two. The primary outcome measures were total raw score performance of spontaneous utterance sentence types and syntax element use as measured by the Teacher Assessment of Spoken Language (TASL). In addition, standardized assessments (the Clinical Evaluation of Language Fundamentals--Preschool Version 2 (CELF-P2), the Expressive One-Word Picture Vocabulary Test (EOWPVT), the Receptive One-Word Picture Vocabulary Test (ROWPVT), and the Goldman-Fristoe Test of Articulation 2 (GFTA2)) were also administered and compared with the control group. The DDI group demonstrated significantly higher raw scores on the TASL each year of the study. The DDI group also achieved statistically significant higher scores for total language on the CELF-P and expressive vocabulary on the EOWPVT, but not for articulation nor receptive vocabulary. Post-hoc assessment revealed that 78% of the students in the DDI group achieved scores in the average range compared with 59% in the control group. The preliminary results of this study support further investigation regarding DDI to investigate whether this method can consistently

  16. STUDY OF THE POYNTING FLUX IN ACTIVE REGION 10930 USING DATA-DRIVEN MAGNETOHYDRODYNAMIC SIMULATION

    International Nuclear Information System (INIS)

    Fan, Y. L.; Wang, H. N.; He, H.; Zhu, X. S.

    2011-01-01

    Powerful solar flares are closely related to the evolution of magnetic field configuration on the photosphere. We choose the Poynting flux as a parameter in the study of magnetic field changes. We use time-dependent multidimensional MHD simulations around a flare occurrence to generate the results, with the temporal variation of the bottom boundary conditions being deduced from the projected normal characteristic method. By this method, the photospheric magnetogram could be incorporated self-consistently as the bottom condition of data-driven simulations. The model is first applied to a simulation datum produced by an emerging magnetic flux rope as a test case. Then, the model is used to study NOAA AR 10930, which has an X3.4 flare, the data of which has been obtained by the Hinode/Solar Optical Telescope on 2006 December 13. We compute the magnitude of Poynting flux (S total ), radial Poynting flux (S z ), a proxy for ideal radial Poynting flux (S proxy ), Poynting flux due to plasma surface motion (S sur ), and Poynting flux due to plasma emergence (S emg ) and analyze their extensive properties in four selected areas: the whole sunspot, the positive sunspot, the negative sunspot, and the strong-field polarity inversion line (SPIL) area. It is found that (1) the S total , S z , and S proxy parameters show similar behaviors in the whole sunspot area and in the negative sunspot area. The evolutions of these three parameters in the positive area and the SPIL area are more volatile because of the effect of sunspot rotation and flux emergence. (2) The evolution of S sur is largely influenced by the process of sunspot rotation, especially in the positive sunspot. The evolution of S emg is greatly affected by flux emergence, especially in the SPIL area.

  17. Central focused convolutional neural networks: Developing a data-driven model for lung nodule segmentation.

    Science.gov (United States)

    Wang, Shuo; Zhou, Mu; Liu, Zaiyi; Liu, Zhenyu; Gu, Dongsheng; Zang, Yali; Dong, Di; Gevaert, Olivier; Tian, Jie

    2017-08-01

    Accurate lung nodule segmentation from computed tomography (CT) images is of great importance for image-driven lung cancer analysis. However, the heterogeneity of lung nodules and the presence of similar visual characteristics between nodules and their surroundings make it difficult for robust nodule segmentation. In this study, we propose a data-driven model, termed the Central Focused Convolutional Neural Networks (CF-CNN), to segment lung nodules from heterogeneous CT images. Our approach combines two key insights: 1) the proposed model captures a diverse set of nodule-sensitive features from both 3-D and 2-D CT images simultaneously; 2) when classifying an image voxel, the effects of its neighbor voxels can vary according to their spatial locations. We describe this phenomenon by proposing a novel central pooling layer retaining much information on voxel patch center, followed by a multi-scale patch learning strategy. Moreover, we design a weighted sampling to facilitate the model training, where training samples are selected according to their degree of segmentation difficulty. The proposed method has been extensively evaluated on the public LIDC dataset including 893 nodules and an independent dataset with 74 nodules from Guangdong General Hospital (GDGH). We showed that CF-CNN achieved superior segmentation performance with average dice scores of 82.15% and 80.02% for the two datasets respectively. Moreover, we compared our results with the inter-radiologists consistency on LIDC dataset, showing a difference in average dice score of only 1.98%. Copyright © 2017. Published by Elsevier B.V.

  18. Telling Anthropocene Tales: Localizing the impacts of global change using data-driven story maps

    Science.gov (United States)

    Mychajliw, A.; Hadly, E. A.

    2016-12-01

    Navigating the Anthropocene requires innovative approaches for generating scientific knowledge and for its communication outside academia. The global, synergistic nature of the environmental challenges we face - climate change, human population growth, biodiversity loss, pollution, invasive species and diseases - highlight the need for public outreach strategies that incorporate multiple scales and perspectives in an easily understandable and rapidly accessible format. Data-driven story-telling maps are optimal in that they can display variable geographic scales and their intersections with the environmental challenges relevant to both scientists and non-scientists. Maps are a powerful way to present complex data to all stakeholders. We present an overview of best practices in community-engaged scientific story-telling and data translation for policy-makers by reviewing three Story Map projects that map the geographic impacts of global change across multiple spatial and policy scales: the entire United States, the state of California, and the town of Pescadero, California. We document a chain of translation from a primary scientific manscript to a policy document (Scientific Consensus Statement on Maintaining Humanity's Life Support Systems in the 21st Century) to a set of interactive ArcGIS Story Maps. We discuss the widening breadth of participants (students, community members) and audiences (White House, Governor's Office of California, California Congressional Offices, general public) involved. We highlight how scientists, through careful curation of popular news media articles and stakeholder interviews, can co-produce these communication modules with community partners such as non-governmental organizations and government agencies. The placement of scientific and citizen's everyday knowledge of global change into an appropriate geographic context allows for effective dissemination by political units such as congressional districts and agency management units

  19. Data-Driven Neural Network Model for Robust Reconstruction of Automobile Casting

    Science.gov (United States)

    Lin, Jinhua; Wang, Yanjie; Li, Xin; Wang, Lu

    2017-09-01

    In computer vision system, it is a challenging task to robustly reconstruct complex 3D geometries of automobile castings. However, 3D scanning data is usually interfered by noises, the scanning resolution is low, these effects normally lead to incomplete matching and drift phenomenon. In order to solve these problems, a data-driven local geometric learning model is proposed to achieve robust reconstruction of automobile casting. In order to relieve the interference of sensor noise and to be compatible with incomplete scanning data, a 3D convolution neural network is established to match the local geometric features of automobile casting. The proposed neural network combines the geometric feature representation with the correlation metric function to robustly match the local correspondence. We use the truncated distance field(TDF) around the key point to represent the 3D surface of casting geometry, so that the model can be directly embedded into the 3D space to learn the geometric feature representation; Finally, the training labels is automatically generated for depth learning based on the existing RGB-D reconstruction algorithm, which accesses to the same global key matching descriptor. The experimental results show that the matching accuracy of our network is 92.2% for automobile castings, the closed loop rate is about 74.0% when the matching tolerance threshold τ is 0.2. The matching descriptors performed well and retained 81.6% matching accuracy at 95% closed loop. For the sparse geometric castings with initial matching failure, the 3D matching object can be reconstructed robustly by training the key descriptors. Our method performs 3D reconstruction robustly for complex automobile castings.

  20. The Application of Cyber Physical System for Thermal Power Plants: Data-Driven Modeling

    Directory of Open Access Journals (Sweden)

    Yongping Yang

    2018-03-01

    Full Text Available Optimal operation of energy systems plays an important role to enhance their lifetime security and efficiency. The determination of optimal operating strategies requires intelligent utilization of massive data accumulated during operation or prediction. The investigation of these data solely without combining physical models may run the risk that the established relationships between inputs and outputs, the models which reproduce the behavior of the considered system/component in a wide range of boundary conditions, are invalid for certain boundary conditions, which never occur in the database employed. Therefore, combining big data with physical models via cyber physical systems (CPS is of great importance to derive highly-reliable and -accurate models and becomes more and more popular in practical applications. In this paper, we focus on the description of a systematic method to apply CPS to the performance analysis and decision making of thermal power plants. We proposed a general procedure of CPS with both offline and online phases for its application to thermal power plants and discussed the corresponding methods employed to support each sub-procedure. As an example, a data-driven model of turbine island of an existing air-cooling based thermal power plant is established with the proposed procedure and demonstrates its practicality, validity and flexibility. To establish such model, the historical operating data are employed in the cyber layer for modeling and linking each physical component. The decision-making procedure of optimal frequency of air-cooling condenser is also illustrated to show its applicability of online use. It is concluded that the cyber physical system with the data mining technique is effective and promising to facilitate the real-time analysis and control of thermal power plants.

  1. Finding candidate locations for aerosol pollution monitoring at street level using a data-driven methodology

    Science.gov (United States)

    Moosavi, V.; Aschwanden, G.; Velasco, E.

    2015-09-01

    Finding the number and best locations of fixed air quality monitoring stations at street level is challenging because of the complexity of the urban environment and the large number of factors affecting the pollutants concentration. Data sets of such urban parameters as land use, building morphology and street geometry in high-resolution grid cells in combination with direct measurements of airborne pollutants at high frequency (1-10 s) along a reasonable number of streets can be used to interpolate concentration of pollutants in a whole gridded domain and determine the optimum number of monitoring sites and best locations for a network of fixed monitors at ground level. In this context, a data-driven modeling methodology is developed based on the application of Self-Organizing Map (SOM) to approximate the nonlinear relations between urban parameters (80 in this work) and aerosol pollution data, such as mass and number concentrations measured along streets of a commercial/residential neighborhood of Singapore. Cross-validations between measured and predicted aerosol concentrations based on the urban parameters at each individual grid cell showed satisfying results. This proof of concept study showed that the selected urban parameters proved to be an appropriate indirect measure of aerosol concentrations within the studied area. The potential locations for fixed air quality monitors are identified through clustering of areas (i.e., group of cells) with similar urban patterns. The typological center of each cluster corresponds to the most representative cell for all other cells in the cluster. In the studied neighborhood four different clusters were identified and for each cluster potential sites for air quality monitoring at ground level are identified.

  2. Probing the dynamics of identified neurons with a data-driven modeling approach.

    Directory of Open Access Journals (Sweden)

    Thomas Nowotny

    2008-07-01

    Full Text Available In controlling animal behavior the nervous system has to perform within the operational limits set by the requirements of each specific behavior. The implications for the corresponding range of suitable network, single neuron, and ion channel properties have remained elusive. In this article we approach the question of how well-constrained properties of neuronal systems may be on the neuronal level. We used large data sets of the activity of isolated invertebrate identified cells and built an accurate conductance-based model for this cell type using customized automated parameter estimation techniques. By direct inspection of the data we found that the variability of the neurons is larger when they are isolated from the circuit than when in the intact system. Furthermore, the responses of the neurons to perturbations appear to be more consistent than their autonomous behavior under stationary conditions. In the developed model, the constraints on different parameters that enforce appropriate model dynamics vary widely from some very tightly controlled parameters to others that are almost arbitrary. The model also allows predictions for the effect of blocking selected ionic currents and to prove that the origin of irregular dynamics in the neuron model is proper chaoticity and that this chaoticity is typical in an appropriate sense. Our results indicate that data driven models are useful tools for the in-depth analysis of neuronal dynamics. The better consistency of responses to perturbations, in the real neurons as well as in the model, suggests a paradigm shift away from measuring autonomous dynamics alone towards protocols of controlled perturbations. Our predictions for the impact of channel blockers on the neuronal dynamics and the proof of chaoticity underscore the wide scope of our approach.

  3. Weather models as virtual sensors to data-driven rainfall predictions in urban watersheds

    Science.gov (United States)

    Cozzi, Lorenzo; Galelli, Stefano; Pascal, Samuel Jolivet De Marc; Castelletti, Andrea

    2013-04-01

    Weather and climate predictions are a key element of urban hydrology where they are used to inform water management and assist in flood warning delivering. Indeed, the modelling of the very fast dynamics of urbanized catchments can be substantially improved by the use of weather/rainfall predictions. For example, in Singapore Marina Reservoir catchment runoff processes have a very short time of concentration (roughly one hour) and observational data are thus nearly useless for runoff predictions and weather prediction are required. Unfortunately, radar nowcasting methods do not allow to carrying out long - term weather predictions, whereas numerical models are limited by their coarse spatial scale. Moreover, numerical models are usually poorly reliable because of the fast motion and limited spatial extension of rainfall events. In this study we investigate the combined use of data-driven modelling techniques and weather variables observed/simulated with a numerical model as a way to improve rainfall prediction accuracy and lead time in the Singapore metropolitan area. To explore the feasibility of the approach, we use a Weather Research and Forecast (WRF) model as a virtual sensor network for the input variables (the states of the WRF model) to a machine learning rainfall prediction model. More precisely, we combine an input variable selection method and a non-parametric tree-based model to characterize the empirical relation between the rainfall measured at the catchment level and all possible weather input variables provided by WRF model. We explore different lead time to evaluate the model reliability for different long - term predictions, as well as different time lags to see how past information could improve results. Results show that the proposed approach allow a significant improvement of the prediction accuracy of the WRF model on the Singapore urban area.

  4. Input variable selection for data-driven models of Coriolis flowmeters for two-phase flow measurement

    International Nuclear Information System (INIS)

    Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao

    2017-01-01

    Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction. (paper)

  5. Employment relations: A data driven analysis of job markets using online job boards and online professional networks

    CSIR Research Space (South Africa)

    Marivate, Vukosi N

    2017-08-01

    Full Text Available Data from online job boards and online professional networks present an opportunity to understand job markets as well as how professionals transition from one job/career to another. We propose a data driven approach to begin to understand a slice...

  6. Keys to success for data-driven decision making: Lessons from participatory monitoring and collaborative adaptive management

    Science.gov (United States)

    Recent years have witnessed a call for evidence-based decisions in conservation and natural resource management, including data-driven decision-making. Adaptive management (AM) is one prevalent model for integrating scientific data into decision-making, yet AM has faced numerous challenges and limit...

  7. The Effects of Data-Driven Learning upon Vocabulary Acquisition for Secondary International School Students in Vietnam

    Science.gov (United States)

    Karras, Jacob Nolen

    2016-01-01

    Within the field of computer assisted language learning (CALL), scant literature exists regarding the effectiveness and practicality for secondary students to utilize data-driven learning (DDL) for vocabulary acquisition. In this study, there were 100 participants, who had a mean age of thirteen years, and were attending an international school in…

  8. Data-driven drug safety signal detection methods in pharmacovigilance using electronic primary care records: A population based study

    Directory of Open Access Journals (Sweden)

    Shang-Ming Zhou

    2017-04-01

    Data-driven analytic methods are a valuable aid to signal detection of ADEs from large electronic health records for drug safety monitoring. This study finds the methods can detect known ADE and so could potentially be used to detect unknown ADE.

  9. How Instructional Coaches Support Data-Driven Decision Making: Policy Implementation and Effects in Florida Middle Schools

    Science.gov (United States)

    Marsh, Julie A.; McCombs, Jennifer Sloan; Martorell, Francisco

    2010-01-01

    This article examines the convergence of two popular school improvement policies: instructional coaching and data-driven decision making (DDDM). Drawing on a mixed methods study of a statewide reading coach program in Florida middle schools, the article examines how coaches support DDDM and how this support relates to student and teacher outcomes.…

  10. Analyzing the Discourse of Chais Conferences for the Study of Innovation and Learning Technologies via a Data-Driven Approach

    Science.gov (United States)

    Silber-Varod, Vered; Eshet-Alkalai, Yoram; Geri, Nitza

    2016-01-01

    The current rapid technological changes confront researchers of learning technologies with the challenge of evaluating them, predicting trends, and improving their adoption and diffusion. This study utilizes a data-driven discourse analysis approach, namely culturomics, to investigate changes over time in the research of learning technologies. The…

  11. The Use of Linking Adverbials in Academic Essays by Non-Native Writers: How Data-Driven Learning Can Help

    Science.gov (United States)

    Garner, James Robert

    2013-01-01

    Over the past several decades, the TESOL community has seen an increased interest in the use of data-driven learning (DDL) approaches. Most studies of DDL have focused on the acquisition of vocabulary items, including a wide range of information necessary for their correct usage. One type of vocabulary that has yet to be properly investigated has…

  12. Examining Data Driven Decision Making via Formative Assessment: A Confluence of Technology, Data Interpretation Heuristics and Curricular Policy

    Science.gov (United States)

    Swan, Gerry; Mazur, Joan

    2011-01-01

    Although the term data-driven decision making (DDDM) is relatively new (Moss, 2007), the underlying concept of DDDM is not. For example, the practices of formative assessment and computer-managed instruction have historically involved the use of student performance data to guide what happens next in the instructional sequence (Morrison, Kemp, &…

  13. Strength in Numbers: Data-Driven Collaboration May Not Sound Sexy, But it Could Save Your Job

    Science.gov (United States)

    Buzzeo, Toni

    2010-01-01

    This article describes a practical, sure-fire way for media specialists to boost student achievement. The method is called data-driven collaboration, and it's a practical, easy-to-use technique in which media specialists and teachers work together to pinpoint kids' instructional needs and improve their essential skills. The author discusses the…

  14. Fork-join and data-driven execution models on multi-core architectures: Case study of the FMM

    KAUST Repository

    Amer, Abdelhalim; Maruyama, Naoya; Pericà s, Miquel; Taura, Kenjiro; Yokota, Rio; Matsuoka, Satoshi

    2013-01-01

    Extracting maximum performance of multi-core architectures is a difficult task primarily due to bandwidth limitations of the memory subsystem and its complex hierarchy. In this work, we study the implications of fork-join and data-driven execution

  15. Unravelling abiotic and biotic controls on the seasonal water balance using data-driven dimensionless diagnostics

    Directory of Open Access Journals (Sweden)

    S. P. Seibert

    2017-06-01

    Full Text Available The baffling diversity of runoff generation processes, alongside our sketchy understanding of how physiographic characteristics control fundamental hydrological functions of water collection, storage, and release, continue to pose major research challenges in catchment hydrology. Here, we propose innovative data-driven diagnostic signatures for overcoming the prevailing status quo in catchment inter-comparison. More specifically, we present dimensionless double mass curves (dDMC which allow inference of information on runoff generation and the water balance at the seasonal and annual timescales. By separating the vegetation and winter periods, dDMC furthermore provide information on the role of biotic and abiotic controls in seasonal runoff formation. A key aspect we address in this paper is the derivation of dimensionless expressions of fluxes which ensure the comparability of the signatures in space and time. We achieve this by using the limiting factors of a hydrological process as a scaling reference. We show that different references result in different diagnostics. As such we define two kinds of dDMC which allow us to derive seasonal runoff coefficients and to characterize dimensionless streamflow release as a function of the potential renewal rate of the soil storage. We expect these signatures for storage controlled seasonal runoff formation to remain invariant, as long as the ratios of release over supply and supply over storage capacity develop similarly in different catchments. We test the proposed methods by applying them to an operational data set comprising 22 catchments (12–166 km2 from different environments in southern Germany and hydrometeorological data from 4 hydrological years. The diagnostics are used to compare the sites and to reveal the dominant controls on runoff formation. The key findings are that dDMC are meaningful signatures for catchment runoff formation at the seasonal to annual scale and that the type of

  16. qPortal: A platform for data-driven biomedical research.

    Science.gov (United States)

    Mohr, Christopher; Friedrich, Andreas; Wojnar, David; Kenar, Erhan; Polatkan, Aydin Can; Codrea, Marius Cosmin; Czemmel, Stefan; Kohlbacher, Oliver; Nahnsen, Sven

    2018-01-01

    Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on

  17. Data-driven reverse engineering of signaling pathways using ensembles of dynamic models.

    Directory of Open Access Journals (Sweden)

    David Henriques

    2017-02-01

    Full Text Available Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models, which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks: it builds dynamic (based on ordinary differential equation models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training. For this task, SELDOM's ensemble prediction is not only consistently better

  18. Current Trends in the Detection of Sociocultural Signatures: Data-Driven Models

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Bell, Eric B.; Corley, Courtney D.

    2014-09-15

    available that are shaping social computing as a strongly data-driven experimental discipline with an increasingly stronger impact on the decision-making process of groups and individuals alike. In this chapter, we review current advances and trends in the detection of sociocultural signatures. Specific embodiments of the issues discussed are provided with respect to the assessment of violent intent and sociopolitical contention. We begin by reviewing current approaches to the detection of sociocultural signatures in these domains. Next, we turn to the review of novel data harvesting methods for social media content. Finally, we discuss the application of sociocultural models to social media content, and conclude by commenting on current challenges and future developments.

  19. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks.

    Science.gov (United States)

    Vlachas, Pantelis R; Byeon, Wonmin; Wan, Zhong Y; Sapsis, Themistoklis P; Koumoutsakos, Petros

    2018-05-01

    We introduce a data-driven forecasting method for high-dimensional chaotic systems using long short-term memory (LSTM) recurrent neural networks. The proposed LSTM neural networks perform inference of high-dimensional dynamical systems in their reduced order space and are shown to be an effective set of nonlinear approximators of their attractor. We demonstrate the forecasting performance of the LSTM and compare it with Gaussian processes (GPs) in time series obtained from the Lorenz 96 system, the Kuramoto-Sivashinsky equation and a prototype climate model. The LSTM networks outperform the GPs in short-term forecasting accuracy in all applications considered. A hybrid architecture, extending the LSTM with a mean stochastic model (MSM-LSTM), is proposed to ensure convergence to the invariant measure. This novel hybrid method is fully data-driven and extends the forecasting capabilities of LSTM networks.

  20. Cognitive load privileges memory-based over data-driven processing, not group-level over person-level processing.

    Science.gov (United States)

    Skorich, Daniel P; Mavor, Kenneth I

    2013-09-01

    In the current paper, we argue that categorization and individuation, as traditionally discussed and as experimentally operationalized, are defined in terms of two confounded underlying dimensions: a person/group dimension and a memory-based/data-driven dimension. In a series of three experiments, we unconfound these dimensions and impose a cognitive load. Across the three experiments, two with laboratory-created targets and one with participants' friends as the target, we demonstrate that cognitive load privileges memory-based over data-driven processing, not group- over person-level processing. We discuss the results in terms of their implications for conceptualizations of the categorization/individuation distinction, for the equivalence of person and group processes, for the ultimate 'purpose' and meaningfulness of group-based perception and, fundamentally, for the process of categorization, broadly defined. © 2012 The British Psychological Society.

  1. Data-driven technology for engineering systems health management design approach, feature construction, fault diagnosis, prognosis, fusion and decisions

    CERN Document Server

    Niu, Gang

    2017-01-01

    This book introduces condition-based maintenance (CBM)/data-driven prognostics and health management (PHM) in detail, first explaining the PHM design approach from a systems engineering perspective, then summarizing and elaborating on the data-driven methodology for feature construction, as well as feature-based fault diagnosis and prognosis. The book includes a wealth of illustrations and tables to help explain the algorithms, as well as practical examples showing how to use this tool to solve situations for which analytic solutions are poorly suited. It equips readers to apply the concepts discussed in order to analyze and solve a variety of problems in PHM system design, feature construction, fault diagnosis and prognosis.

  2. A Data-Driven Stochastic Reactive Power Optimization Considering Uncertainties in Active Distribution Networks and Decomposition Method

    DEFF Research Database (Denmark)

    Ding, Tao; Yang, Qingrun; Yang, Yongheng

    2018-01-01

    To address the uncertain output of distributed generators (DGs) for reactive power optimization in active distribution networks, the stochastic programming model is widely used. The model is employed to find an optimal control strategy with minimum expected network loss while satisfying all......, in this paper, a data-driven modeling approach is introduced to assume that the probability distribution from the historical data is uncertain within a confidence set. Furthermore, a data-driven stochastic programming model is formulated as a two-stage problem, where the first-stage variables find the optimal...... control for discrete reactive power compensation equipment under the worst probability distribution of the second stage recourse. The second-stage variables are adjusted to uncertain probability distribution. In particular, this two-stage problem has a special structure so that the second-stage problem...

  3. Knowledge Based Cloud FE simulation - data-driven material characterization guidelines for the hot stamping of aluminium alloys

    Science.gov (United States)

    Wang, Ailing; Zheng, Yang; Liu, Jun; El Fakir, Omer; Masen, Marc; Wang, Liliang

    2016-08-01

    The Knowledge Based Cloud FEA (KBC-FEA) simulation technique allows multiobjective FE simulations to be conducted on a cloud-computing environment, which effectively reduces computation time and expands the capability of FE simulation software. In this paper, a novel functional module was developed for the data mining of experimentally verified FE simulation results for metal forming processes obtained from KBC-FE. Through this functional module, the thermo-mechanical characteristics of a metal forming process were deduced, enabling a systematic and data-driven guideline for mechanical property characterization to be developed, which will directly guide the material tests for a metal forming process towards the most efficient and effective scheme. Successful application of this data-driven guideline would reduce the efforts for material characterization, leading to the development of more accurate material models, which in turn enhance the accuracy of FE simulations.

  4. A Model-based B2B (Batch to Batch) Control for An Industrial Batch Polymerization Process

    Science.gov (United States)

    Ogawa, Morimasa

    This paper describes overview of a model-based B2B (batch to batch) control for an industrial batch polymerization process. In order to control the reaction temperature precisely, several methods based on the rigorous process dynamics model are employed at all design stage of the B2B control, such as modeling and parameter estimation of the reaction kinetics which is one of the important part of the process dynamics model. The designed B2B control consists of the gain scheduled I-PD/II2-PD control (I-PD with double integral control), the feed-forward compensation at the batch start time, and the model adaptation utilizing the results of the last batch operation. Throughout the actual batch operations, the B2B control provides superior control performance compared with that of conventional control methods.

  5. Data-driven gating in PET: Influence of respiratory signal noise on motion resolution.

    Science.gov (United States)

    Büther, Florian; Ernst, Iris; Frohwein, Lynn Johann; Pouw, Joost; Schäfers, Klaus Peter; Stegger, Lars

    2018-05-21

    Data-driven gating (DDG) approaches for positron emission tomography (PET) are interesting alternatives to conventional hardware-based gating methods. In DDG, the measured PET data themselves are utilized to calculate a respiratory signal, that is, subsequently used for gating purposes. The success of gating is then highly dependent on the statistical quality of the PET data. In this study, we investigate how this quality determines signal noise and thus motion resolution in clinical PET scans using a center-of-mass-based (COM) DDG approach, specifically with regard to motion management of target structures in future radiotherapy planning applications. PET list mode datasets acquired in one bed position of 19 different radiotherapy patients undergoing pretreatment [ 18 F]FDG PET/CT or [ 18 F]FDG PET/MRI were included into this retrospective study. All scans were performed over a region with organs (myocardium, kidneys) or tumor lesions of high tracer uptake and under free breathing. Aside from the original list mode data, datasets with progressively decreasing PET statistics were generated. From these, COM DDG signals were derived for subsequent amplitude-based gating of the original list mode file. The apparent respiratory shift d from end-expiration to end-inspiration was determined from the gated images and expressed as a function of signal-to-noise ratio SNR of the determined gating signals. This relation was tested against additional 25 [ 18 F]FDG PET/MRI list mode datasets where high-precision MR navigator-like respiratory signals were available as reference signal for respiratory gating of PET data, and data from a dedicated thorax phantom scan. All original 19 high-quality list mode datasets demonstrated the same behavior in terms of motion resolution when reducing the amount of list mode events for DDG signal generation. Ratios and directions of respiratory shifts between end-respiratory gates and the respective nongated image were constant over all

  6. Data driven analysis of rain events: feature extraction, clustering, microphysical /macro physical relationship

    Science.gov (United States)

    Djallel Dilmi, Mohamed; Mallet, Cécile; Barthes, Laurent; Chazottes, Aymeric

    2017-04-01

    that a rain time series can be considered by an alternation of independent rain event and no rain period. The five selected feature are used to perform a hierarchical clustering of the events. The well-known division between stratiform and convective events appears clearly. This classification into two classes is then refined in 5 fairly homogeneous subclasses. The data driven analysis performed on whole rain events instead of fixed length samples allows identifying strong relationships between macrophysics (based on rain rate) and microphysics (based on raindrops) features. We show that among the 5 identified subclasses some of them have specific microphysics characteristics. Obtaining information on microphysical characteristics of rainfall events from rain gauges measurement suggests many implications in development of the quantitative precipitation estimation (QPE), for the improvement of rain rate retrieval algorithm in remote sensing context.

  7. Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

    OpenAIRE

    Bettadapura, Vinay; Schindler, Grant; Plotz, Thomaz; Essa, Irfan

    2015-01-01

    We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use of randomly sampled regular ...

  8. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology

    Directory of Open Access Journals (Sweden)

    A. Elshorbagy

    2010-10-01

    Full Text Available A comprehensive data driven modeling experiment is presented in a two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed, in the second paper, for the modeling experiment. Twelve different realizations (groups from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both prediction accuracy and uncertainty of the modeling techniques can be evaluated. The description of the datasets, the implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.

  9. Minimization of energy consumption in HVAC systems with data-driven models and an interior-point method

    International Nuclear Information System (INIS)

    Kusiak, Andrew; Xu, Guanglin; Zhang, Zijun

    2014-01-01

    Highlights: • We study the energy saving of HVAC systems with a data-driven approach. • We conduct an in-depth analysis of the topology of developed Neural Network based HVAC model. • We apply interior-point method to solving a Neural Network based HVAC optimization model. • The uncertain building occupancy is incorporated in the minimization of HVAC energy consumption. • A significant potential of saving HVAC energy is discovered. - Abstract: In this paper, a data-driven approach is applied to minimize energy consumption of a heating, ventilating, and air conditioning (HVAC) system while maintaining the thermal comfort of a building with uncertain occupancy level. The uncertainty of arrival and departure rate of occupants is modeled by the Poisson and uniform distributions, respectively. The internal heating gain is calculated from the stochastic process of the building occupancy. Based on the observed and simulated data, a multilayer perceptron algorithm is employed to model and simulate the HVAC system. The data-driven models accurately predict future performance of the HVAC system based on the control settings and the observed historical information. An optimization model is formulated and solved with the interior-point method. The optimization results are compared with the results produced by the simulation models

  10. A data-driven approach for modeling post-fire debris-flow volumes and their uncertainty

    Science.gov (United States)

    Friedel, Michael J.

    2011-01-01

    This study demonstrates the novel application of genetic programming to evolve nonlinear post-fire debris-flow volume equations from variables associated with a data-driven conceptual model of the western United States. The search space is constrained using a multi-component objective function that simultaneously minimizes root-mean squared and unit errors for the evolution of fittest equations. An optimization technique is then used to estimate the limits of nonlinear prediction uncertainty associated with the debris-flow equations. In contrast to a published multiple linear regression three-variable equation, linking basin area with slopes greater or equal to 30 percent, burn severity characterized as area burned moderate plus high, and total storm rainfall, the data-driven approach discovers many nonlinear and several dimensionally consistent equations that are unbiased and have less prediction uncertainty. Of the nonlinear equations, the best performance (lowest prediction uncertainty) is achieved when using three variables: average basin slope, total burned area, and total storm rainfall. Further reduction in uncertainty is possible for the nonlinear equations when dimensional consistency is not a priority and by subsequently applying a gradient solver to the fittest solutions. The data-driven modeling approach can be applied to nonlinear multivariate problems in all fields of study.

  11. The Financial and Non-Financial Aspects of Developing a Data-Driven Decision-Making Mindset in an Undergraduate Business Curriculum

    Science.gov (United States)

    Bohler, Jeffrey; Krishnamoorthy, Anand; Larson, Benjamin

    2017-01-01

    Making data-driven decisions is becoming more important for organizations faced with confusing and often contradictory information available to them from their operating environment. This article examines one college of business' journey of developing a data-driven decision-making mindset within its undergraduate curriculum. Lessons learned may be…

  12. Challenges and best practices for big data-driven healthcare innovations conducted by profit–non-profit partnerships – a quantitative prioritization

    NARCIS (Netherlands)

    Witjas-Paalberends, E. R.; van Laarhoven, L. P.M.; van de Burgwal, L. H.M.; Feilzer, J.; de Swart, J.; Claassen, H.J.H.M.; Jansen, W. T.M.

    2017-01-01

    Big data-driven innovations are key in improving healthcare system sustainability. Given the complexity, these are frequently conducted by public-private-partnerships (PPPs) between profit and non-profit parties. However, information on how to manage big data-driven healthcare innovations by PPPs is

  13. Monte Carlo simulation on kinetics of batch and semi-batch free radical polymerization

    KAUST Repository

    Shao, Jing; Tang, Wei; Xia, Ru; Feng, Xiaoshuang; Chen, Peng; Qian, Jiasheng; Song, Changjiang

    2015-01-01

    experimental and simulation studies, we showed the capability of our Monte Carlo scheme on representing polymerization kinetics in batch and semi-batch processes. Various kinetics information, such as instant monomer conversion, molecular weight

  14. Family based dispatching with batch availability

    NARCIS (Netherlands)

    van der Zee, D.J.

    2013-01-01

    Family based dispatching rules seek to lower set-up frequencies by grouping (batching) similar types of jobs for joint processing. Hence shop flow times may be improved, as less time is spent on set-ups. Motivated by an industrial project we study the control of machines with batch availability,

  15. Data-Driven Identification of Risk Factors of Patient Satisfaction at a Large Urban Academic Medical Center.

    Science.gov (United States)

    Li, Li; Lee, Nathan J; Glicksberg, Benjamin S; Radbill, Brian D; Dudley, Joel T

    2016-01-01

    The Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey is the first publicly reported nationwide survey to evaluate and compare hospitals. Increasing patient satisfaction is an important goal as it aims to achieve a more effective and efficient healthcare delivery system. In this study, we develop and apply an integrative, data-driven approach to identify clinical risk factors that associate with patient satisfaction outcomes. We included 1,771 unique adult patients who completed the HCAHPS survey and were discharged from the inpatient Medicine service from 2010 to 2012. We collected 266 clinical features including patient demographics, lab measurements, medications, disease categories, and procedures. We developed and applied a data-driven approach to identify risk factors that associate with patient satisfaction outcomes. We identify 102 significant risk factors associating with 18 surveyed questions. The most significantly recurrent clinical risk factors were: self-evaluation of health, education level, Asian, White, treatment in BMT oncology division, being prescribed a new medication. Patients who were prescribed pregabalin were less satisfied particularly in relation to communication with nurses and pain management. Explanation of medication usage was associated with communication with nurses (q = 0.001); however, explanation of medication side effects was associated with communication with doctors (q = 0.003). Overall hospital rating was associated with hospital environment, communication with doctors, and communication about medicines. However, patient likelihood to recommend hospital was associated with hospital environment, communication about medicines, pain management, and communication with nurse. Our study identified a number of putatively novel clinical risk factors for patient satisfaction that suggest new opportunities to better understand and manage patient satisfaction. Hospitals can use a data-driven approach to

  16. A data-driven approach to identify controls on global fire activity from satellite and climate observations (SOFIA V1

    Directory of Open Access Journals (Sweden)

    M. Forkel

    2017-12-01

    Full Text Available Vegetation fires affect human infrastructures, ecosystems, global vegetation distribution, and atmospheric composition. However, the climatic, environmental, and socioeconomic factors that control global fire activity in vegetation are only poorly understood, and in various complexities and formulations are represented in global process-oriented vegetation-fire models. Data-driven model approaches such as machine learning algorithms have successfully been used to identify and better understand controlling factors for fire activity. However, such machine learning models cannot be easily adapted or even implemented within process-oriented global vegetation-fire models. To overcome this gap between machine learning-based approaches and process-oriented global fire models, we introduce a new flexible data-driven fire modelling approach here (Satellite Observations to predict FIre Activity, SOFIA approach version 1. SOFIA models can use several predictor variables and functional relationships to estimate burned area that can be easily adapted with more complex process-oriented vegetation-fire models. We created an ensemble of SOFIA models to test the importance of several predictor variables. SOFIA models result in the highest performance in predicting burned area if they account for a direct restriction of fire activity under wet conditions and if they include a land cover-dependent restriction or allowance of fire activity by vegetation density and biomass. The use of vegetation optical depth data from microwave satellite observations, a proxy for vegetation biomass and water content, reaches higher model performance than commonly used vegetation variables from optical sensors. We further analyse spatial patterns of the sensitivity between anthropogenic, climate, and vegetation predictor variables and burned area. We finally discuss how multiple observational datasets on climate, hydrological, vegetation, and socioeconomic variables together with

  17. Limited angle CT reconstruction by simultaneous spatial and Radon domain regularization based on TV and data-driven tight frame

    Science.gov (United States)

    Zhang, Wenkun; Zhang, Hanming; Wang, Linyuan; Cai, Ailong; Li, Lei; Yan, Bin

    2018-02-01

    Limited angle computed tomography (CT) reconstruction is widely performed in medical diagnosis and industrial testing because of the size of objects, engine/armor inspection requirements, and limited scan flexibility. Limited angle reconstruction necessitates usage of optimization-based methods that utilize additional sparse priors. However, most of conventional methods solely exploit sparsity priors of spatial domains. When CT projection suffers from serious data deficiency or various noises, obtaining reconstruction images that meet the requirement of quality becomes difficult and challenging. To solve this problem, this paper developed an adaptive reconstruction method for limited angle CT problem. The proposed method simultaneously uses spatial and Radon domain regularization model based on total variation (TV) and data-driven tight frame. Data-driven tight frame being derived from wavelet transformation aims at exploiting sparsity priors of sinogram in Radon domain. Unlike existing works that utilize pre-constructed sparse transformation, the framelets of the data-driven regularization model can be adaptively learned from the latest projection data in the process of iterative reconstruction to provide optimal sparse approximations for given sinogram. At the same time, an effective alternating direction method is designed to solve the simultaneous spatial and Radon domain regularization model. The experiments for both simulation and real data demonstrate that the proposed algorithm shows better performance in artifacts depression and details preservation than the algorithms solely using regularization model of spatial domain. Quantitative evaluations for the results also indicate that the proposed algorithm applying learning strategy performs better than the dual domains algorithms without learning regularization model

  18. A data-driven approach to identify controls on global fire activity from satellite and climate observations (SOFIA V1)

    Science.gov (United States)

    Forkel, Matthias; Dorigo, Wouter; Lasslop, Gitta; Teubner, Irene; Chuvieco, Emilio; Thonicke, Kirsten

    2017-12-01

    Vegetation fires affect human infrastructures, ecosystems, global vegetation distribution, and atmospheric composition. However, the climatic, environmental, and socioeconomic factors that control global fire activity in vegetation are only poorly understood, and in various complexities and formulations are represented in global process-oriented vegetation-fire models. Data-driven model approaches such as machine learning algorithms have successfully been used to identify and better understand controlling factors for fire activity. However, such machine learning models cannot be easily adapted or even implemented within process-oriented global vegetation-fire models. To overcome this gap between machine learning-based approaches and process-oriented global fire models, we introduce a new flexible data-driven fire modelling approach here (Satellite Observations to predict FIre Activity, SOFIA approach version 1). SOFIA models can use several predictor variables and functional relationships to estimate burned area that can be easily adapted with more complex process-oriented vegetation-fire models. We created an ensemble of SOFIA models to test the importance of several predictor variables. SOFIA models result in the highest performance in predicting burned area if they account for a direct restriction of fire activity under wet conditions and if they include a land cover-dependent restriction or allowance of fire activity by vegetation density and biomass. The use of vegetation optical depth data from microwave satellite observations, a proxy for vegetation biomass and water content, reaches higher model performance than commonly used vegetation variables from optical sensors. We further analyse spatial patterns of the sensitivity between anthropogenic, climate, and vegetation predictor variables and burned area. We finally discuss how multiple observational datasets on climate, hydrological, vegetation, and socioeconomic variables together with data-driven

  19. CEREF: A hybrid data-driven model for forecasting annual streamflow from a socio-hydrological system

    Science.gov (United States)

    Zhang, Hongbo; Singh, Vijay P.; Wang, Bin; Yu, Yinghao

    2016-09-01

    Hydrological forecasting is complicated by flow regime alterations in a coupled socio-hydrologic system, encountering increasingly non-stationary, nonlinear and irregular changes, which make decision support difficult for future water resources management. Currently, many hybrid data-driven models, based on the decomposition-prediction-reconstruction principle, have been developed to improve the ability to make predictions of annual streamflow. However, there exist many problems that require further investigation, the chief among which is the direction of trend components decomposed from annual streamflow series and is always difficult to ascertain. In this paper, a hybrid data-driven model was proposed to capture this issue, which combined empirical mode decomposition (EMD), radial basis function neural networks (RBFNN), and external forces (EF) variable, also called the CEREF model. The hybrid model employed EMD for decomposition and RBFNN for intrinsic mode function (IMF) forecasting, and determined future trend component directions by regression with EF as basin water demand representing the social component in the socio-hydrologic system. The Wuding River basin was considered for the case study, and two standard statistical measures, root mean squared error (RMSE) and mean absolute error (MAE), were used to evaluate the performance of CEREF model and compare with other models: the autoregressive (AR), RBFNN and EMD-RBFNN. Results indicated that the CEREF model had lower RMSE and MAE statistics, 42.8% and 7.6%, respectively, than did other models, and provided a superior alternative for forecasting annual runoff in the Wuding River basin. Moreover, the CEREF model can enlarge the effective intervals of streamflow forecasting compared to the EMD-RBFNN model by introducing the water demand planned by the government department to improve long-term prediction accuracy. In addition, we considered the high-frequency component, a frequent subject of concern in EMD

  20. Feature Extraction for Digging Operation of Excavator Based on Data-Driven Skill-Based PID Controller

    Directory of Open Access Journals (Sweden)

    Kazushige Koiwai

    2017-11-01

    Full Text Available Improvement of the work efficiency is demanded by aging and reducing of the working population in the construction field, so that some automation technologies are applied to construction equipment, such as bulldozers and excavators. However, not only the automation technologies but also expert skills are necessary to improve the work efficiency. In this paper, the human skill evaluation is proposed by the data-driven skill-based PID controller. The proposed method is applied to the excavator digging operation. As the result, the difference between the novice operation and the skilled operation is extracted. Moreover, the numerical difference is clarified based on the result.

  1. Uneven batch data alignment with application to the control of batch end-product quality.

    Science.gov (United States)

    Wan, Jian; Marjanovic, Ognjen; Lennox, Barry

    2014-03-01

    Batch processes are commonly characterized by uneven trajectories due to the existence of batch-to-batch variations. The batch end-product quality is usually measured at the end of these uneven trajectories. It is necessary to align the time differences for both the measured trajectories and the batch end-product quality in order to implement statistical process monitoring and control schemes. Apart from synchronizing trajectories with variable lengths using an indicator variable or dynamic time warping, this paper proposes a novel approach to align uneven batch data by identifying short-window PCA&PLS models at first and then applying these identified models to extend shorter trajectories and predict future batch end-product quality. Furthermore, uneven batch data can also be aligned to be a specified batch length using moving window estimation. The proposed approach and its application to the control of batch end-product quality are demonstrated with a simulated example of fed-batch fermentation for penicillin production. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.

  2. Data-driven modeling and predictive control for boiler-turbine unit using fuzzy clustering and subspace methods.

    Science.gov (United States)

    Wu, Xiao; Shen, Jiong; Li, Yiguo; Lee, Kwang Y

    2014-05-01

    This paper develops a novel data-driven fuzzy modeling strategy and predictive controller for boiler-turbine unit using fuzzy clustering and subspace identification (SID) methods. To deal with the nonlinear behavior of boiler-turbine unit, fuzzy clustering is used to provide an appropriate division of the operation region and develop the structure of the fuzzy model. Then by combining the input data with the corresponding fuzzy membership functions, the SID method is extended to extract the local state-space model parameters. Owing to the advantages of the both methods, the resulting fuzzy model can represent the boiler-turbine unit very closely, and a fuzzy model predictive controller is designed based on this model. As an alternative approach, a direct data-driven fuzzy predictive control is also developed following the same clustering and subspace methods, where intermediate subspace matrices developed during the identification procedure are utilized directly as the predictor. Simulation results show the advantages and effectiveness of the proposed approach. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  3. EEG-based functional networks evoked by acupuncture at ST 36: A data-driven thresholding study

    Science.gov (United States)

    Li, Huiyan; Wang, Jiang; Yi, Guosheng; Deng, Bin; Zhou, Hexi

    2017-10-01

    This paper investigates how acupuncture at ST 36 modulates the brain functional network. 20 channel EEG signals from 15 healthy subjects are respectively recorded before, during and after acupuncture. The correlation between two EEG channels is calculated by using Pearson’s coefficient. A data-driven approach is applied to determine the threshold, which is performed by considering the connected set, connected edge and network connectivity. Based on such thresholding approach, the functional network in each acupuncture period is built with graph theory, and the associated functional connectivity is determined. We show that acupuncturing at ST 36 increases the connectivity of the EEG-based functional network, especially for the long distance ones between two hemispheres. The properties of the functional network in five EEG sub-bands are also characterized. It is found that the delta and gamma bands are affected more obviously by acupuncture than the other sub-bands. These findings highlight the modulatory effects of acupuncture on the EEG-based functional connectivity, which is helpful for us to understand how it participates in the cortical or subcortical activities. Further, the data-driven threshold provides an alternative approach to infer the functional connectivity under other physiological conditions.

  4. Data-Driven Diffusion Of Innovations: Successes And Challenges In 3 Large-Scale Innovative Delivery Models.

    Science.gov (United States)

    Dorr, David A; Cohen, Deborah J; Adler-Milstein, Julia

    2018-02-01

    Failed diffusion of innovations may be linked to an inability to use and apply data, information, and knowledge to change perceptions of current practice and motivate change. Using qualitative and quantitative data from three large-scale health care delivery innovations-accountable care organizations, advanced primary care practice, and EvidenceNOW-we assessed where data-driven innovation is occurring and where challenges lie. We found that implementation of some technological components of innovation (for example, electronic health records) has occurred among health care organizations, but core functions needed to use data to drive innovation are lacking. Deficits include the inability to extract and aggregate data from the records; gaps in sharing data; and challenges in adopting advanced data functions, particularly those related to timely reporting of performance data. The unexpectedly high costs and burden incurred during implementation of the innovations have limited organizations' ability to address these and other deficits. Solutions that could help speed progress in data-driven innovation include facilitating peer-to-peer technical assistance, providing tailored feedback reports to providers from data aggregators, and using practice facilitators skilled in using data technology for quality improvement to help practices transform. Policy efforts that promote these solutions may enable more rapid uptake of and successful participation in innovative delivery system reforms.

  5. Protein engineering of Bacillus acidopullulyticus pullulanase for enhanced thermostability using in silico data driven rational design methods.

    Science.gov (United States)

    Chen, Ana; Li, Yamei; Nie, Jianqi; McNeil, Brian; Jeffrey, Laura; Yang, Yankun; Bai, Zhonghu

    2015-10-01

    Thermostability has been considered as a requirement in the starch processing industry to maintain high catalytic activity of pullulanase under high temperatures. Four data driven rational design methods (B-FITTER, proline theory, PoPMuSiC-2.1, and sequence consensus approach) were adopted to identify the key residue potential links with thermostability, and 39 residues of Bacillus acidopullulyticus pullulanase were chosen as mutagenesis targets. Single mutagenesis followed by combined mutagenesis resulted in the best mutant E518I-S662R-Q706P, which exhibited an 11-fold half-life improvement at 60 °C and a 9.5 °C increase in Tm. The optimum temperature of the mutant increased from 60 to 65 °C. Fluorescence spectroscopy results demonstrated that the tertiary structure of the mutant enzyme was more compact than that of the wild-type (WT) enzyme. Structural change analysis revealed that the increase in thermostability was most probably caused by a combination of lower stability free-energy and higher hydrophobicity of E518I, more hydrogen bonds of S662R, and higher rigidity of Q706P compared with the WT. The findings demonstrated the effectiveness of combined data-driven rational design approaches in engineering an industrial enzyme to improve thermostability. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. The Orion GN and C Data-Driven Flight Software Architecture for Automated Sequencing and Fault Recovery

    Science.gov (United States)

    King, Ellis; Hart, Jeremy; Odegard, Ryan

    2010-01-01

    The Orion Crew Exploration Vehicle (CET) is being designed to include significantly more automation capability than either the Space Shuttle or the International Space Station (ISS). In particular, the vehicle flight software has requirements to accommodate increasingly automated missions throughout all phases of flight. A data-driven flight software architecture will provide an evolvable automation capability to sequence through Guidance, Navigation & Control (GN&C) flight software modes and configurations while maintaining the required flexibility and human control over the automation. This flexibility is a key aspect needed to address the maturation of operational concepts, to permit ground and crew operators to gain trust in the system and mitigate unpredictability in human spaceflight. To allow for mission flexibility and reconfrgurability, a data driven approach is being taken to load the mission event plan as well cis the flight software artifacts associated with the GN&C subsystem. A database of GN&C level sequencing data is presented which manages and tracks the mission specific and algorithm parameters to provide a capability to schedule GN&C events within mission segments. The flight software data schema for performing automated mission sequencing is presented with a concept of operations for interactions with ground and onboard crew members. A prototype architecture for fault identification, isolation and recovery interactions with the automation software is presented and discussed as a forward work item.

  7. A predictive estimation method for carbon dioxide transport by data-driven modeling with a physically-based data model

    Science.gov (United States)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young; Jun, Seong-Chun; Choung, Sungwook; Yun, Seong-Taek; Oh, Junho; Kim, Hyun-Jun

    2017-11-01

    In this study, a data-driven method for predicting CO2 leaks and associated concentrations from geological CO2 sequestration is developed. Several candidate models are compared based on their reproducibility and predictive capability for CO2 concentration measurements from the Environment Impact Evaluation Test (EIT) site in Korea. Based on the data mining results, a one-dimensional solution of the advective-dispersive equation for steady flow (i.e., Ogata-Banks solution) is found to be most representative for the test data, and this model is adopted as the data model for the developed method. In the validation step, the method is applied to estimate future CO2 concentrations with the reference estimation by the Ogata-Banks solution, where a part of earlier data is used as the training dataset. From the analysis, it is found that the ensemble mean of multiple estimations based on the developed method shows high prediction accuracy relative to the reference estimation. In addition, the majority of the data to be predicted are included in the proposed quantile interval, which suggests adequate representation of the uncertainty by the developed method. Therefore, the incorporation of a reasonable physically-based data model enhances the prediction capability of the data-driven model. The proposed method is not confined to estimations of CO2 concentration and may be applied to various real-time monitoring data from subsurface sites to develop automated control, management or decision-making systems.

  8. Analyzing the Discourse of Chais Conferences for the Study of Innovation and Learning Technologies via a Data-Driven Approach

    Directory of Open Access Journals (Sweden)

    Vered Silber-Varod

    2016-12-01

    Full Text Available The current rapid technological changes confront researchers of learning technologies with the challenge of evaluating them, predicting trends, and improving their adoption and diffusion. This study utilizes a data-driven discourse analysis approach, namely culturomics, to investigate changes over time in the research of learning technologies. The patterns and changes were examined on a corpus of articles published over the past decade (2006-2014 in the proceedings of Chais Conference for the Study of Innovation and Learning Technologies – the leading research conference on learning technologies in Israel. The interesting findings of the exhaustive process of analyzing all the words in the corpus were that the most commonly used terms (e.g., pupil, teacher, student and the most commonly used phrases (e.g., face-to-face in the field of learning technologies reflect a pedagogical rather than a technological aspect of learning technologies. The study also demonstrates two cases of change over time in prominent themes, such as “Facebook” and “the National Information and Communication Technology (ICT program”. Methodologically, this research demonstrates the effectiveness of a data-driven approach for identifying discourse trends over time.

  9. LSF usage for batch at CERN

    CERN Multimedia

    Schwickerath, Ulrich

    2007-01-01

    Contributed poster to the CHEP07. Original abstract: LSF 7, the latest version of Platform's batch workload management system, addresses many issues which limited the ability of LSF 6.1 to support large scale batch farms, such as the lxbatch service at CERN. In this paper we will present the status of the evaluation and deployment of LSF 7 at CERN, including issues concerning the integration of LSF 7 with the gLite grid middleware suite and, in particular, the steps taken to endure an efficient reporting of the local batch system status and usage to the Grid Information System

  10. Fuzzy batch controller for granular materials

    OpenAIRE

    Zamyatin Nikolaj; Smirnov Gennadij; Fedorchuk Yuri; Rusina Olga

    2018-01-01

    The paper focuses on batch control of granular materials in production of building materials from fluorine anhydrite. Batching equipment is intended for smooth operation and timely feeding of supply hoppers at a required level. Level sensors and a controller of an asynchronous screw drive motor are used to control filling of the hopper with industrial anhydrite binders. The controller generates a required frequency and ensures required productivity of a feed conveyor. Mamdani-type fuzzy infer...

  11. Batch Computed Tomography Analysis of Projectiles

    Science.gov (United States)

    2016-05-01

    ARL-TR-7681 ● MAY 2016 US Army Research Laboratory Batch Computed Tomography Analysis of Projectiles by Michael C Golt, Chris M...Laboratory Batch Computed Tomography Analysis of Projectiles by Michael C Golt and Matthew S Bratcher Weapons and Materials Research...values to account for projectile variability in the ballistic evaluation of armor. 15. SUBJECT TERMS computed tomography , CT, BS41, projectiles

  12. Redefining the Practice of Peer Review Through Intelligent Automation Part 2: Data-Driven Peer Review Selection and Assignment.

    Science.gov (United States)

    Reiner, Bruce I

    2017-12-01

    In conventional radiology peer review practice, a small number of exams (routinely 5% of the total volume) is randomly selected, which may significantly underestimate the true error rate within a given radiology practice. An alternative and preferable approach would be to create a data-driven model which mathematically quantifies a peer review risk score for each individual exam and uses this data to identify high risk exams and readers, and selectively target these exams for peer review. An analogous model can also be created to assist in the assignment of these peer review cases in keeping with specific priorities of the service provider. An additional option to enhance the peer review process would be to assign the peer review cases in a truly blinded fashion. In addition to eliminating traditional peer review bias, this approach has the potential to better define exam-specific standard of care, particularly when multiple readers participate in the peer review process.

  13. Classification of iRBD and Parkinson's patients using a general data-driven sleep staging model built on EEG

    DEFF Research Database (Denmark)

    Koch, Henriette; Christensen, Julie Anja Engelhard; Frandsen, Rune

    2013-01-01

    Sleep analysis is an important diagnostic tool for sleep disorders. However, the current manual sleep scoring is time-consuming as it is a crude discretization in time and stages. This study changes Esbroeck and Westover's [1] latent sleep staging model into a global model. The proposed data......-driven method trained a topic mixture model on 10 control subjects and was applied on 10 other control subjects, 10 iRBD patients and 10 Parkinson's patients. In that way 30 topic mixture diagrams were obtained from which features reflecting distinct sleep architectures between control subjects and patients...... were extracted. Two features calculated on basis of two latent sleep states classified subjects as “control” or “patient” by a simple clustering algorithm. The mean sleep staging accuracy compared to classical AASM scoring was 72.4% for control subjects and a clustering of the derived features resulted...

  14. Prognostic and health management for engineering systems: a review of the data-driven approach and algorithms

    Directory of Open Access Journals (Sweden)

    Thamo Sutharssan

    2015-07-01

    Full Text Available Prognostics and health management (PHM has become an important component of many engineering systems and products, where algorithms are used to detect anomalies, diagnose faults and predict remaining useful lifetime (RUL. PHM can provide many advantages to users and maintainers. Although primary goals are to ensure the safety, provide state of the health and estimate RUL of the components and systems, there are also financial benefits such as operational and maintenance cost reductions and extended lifetime. This study aims at reviewing the current status of algorithms and methods used to underpin different existing PHM approaches. The focus is on providing a structured and comprehensive classification of the existing state-of-the-art PHM approaches, data-driven approaches and algorithms.

  15. An Interactive Platform to Visualize Data-Driven Clinical Pathways for the Management of Multiple Chronic Conditions.

    Science.gov (United States)

    Zhang, Yiye; Padman, Rema

    2017-01-01

    Patients with multiple chronic conditions (MCC) pose an increasingly complex health management challenge worldwide, particularly due to the significant gap in our understanding of how to provide coordinated care. Drawing on our prior research on learning data-driven clinical pathways from actual practice data, this paper describes a prototype, interactive platform for visualizing the pathways of MCC to support shared decision making. Created using Python web framework, JavaScript library and our clinical pathway learning algorithm, the visualization platform allows clinicians and patients to learn the dominant patterns of co-progression of multiple clinical events from their own data, and interactively explore and interpret the pathways. We demonstrate functionalities of the platform using a cluster of 36 patients, identified from a dataset of 1,084 patients, who are diagnosed with at least chronic kidney disease, hypertension, and diabetes. Future evaluation studies will explore the use of this platform to better understand and manage MCC.

  16. Data-driven modeling of sleep EEG and EOG reveals characteristics indicative of pre-Parkinson's and Parkinson's disease

    DEFF Research Database (Denmark)

    Christensen, Julie Anja Engelhard; Zoetmulder, Marielle; Koch, Henriette

    2014-01-01

    patients with idiopathic REM sleep behavior disorder (iRBD) and 36 patients with Parkinson's disease (PD). The data were divided into training and validation datasets and features reflecting EEG and EOG characteristics based on topics were computed. The most discriminative feature subset for separating i...... and the ability to maintain NREM and REM sleep have potential as early PD biomarkers. Data-driven analysis of sleep may contribute to the evaluation of neurodegenerative patients. (C) 2014 Elsevier B.V. All rights reserved.......Background: Manual scoring of sleep relies on identifying certain characteristics in polysomnograph (PSG) signals. However, these characteristics are disrupted in patients with neurodegenerative diseases. New method: This study evaluates sleep using a topic modeling and unsupervised learning...

  17. Big data-driven business how to use big data to win customers, beat competitors, and boost profits

    CERN Document Server

    Glass, Russell

    2014-01-01

    Get the expert perspective and practical advice on big data The Big Data-Driven Business: How to Use Big Data to Win Customers, Beat Competitors, and Boost Profits makes the case that big data is for real, and more than just big hype. The book uses real-life examples-from Nate Silver to Copernicus, and Apple to Blackberry-to demonstrate how the winners of the future will use big data to seek the truth. Written by a marketing journalist and the CEO of a multi-million-dollar B2B marketing platform that reaches more than 90% of the U.S. business population, this book is a comprehens

  18. Regional regression models of percentile flows for the contiguous United States: Expert versus data-driven independent variable selection

    Directory of Open Access Journals (Sweden)

    Geoffrey Fouad

    2018-06-01

    New hydrological insights for the region: A set of three variables selected based on an expert assessment of factors that influence percentile flows performed similarly to larger sets of variables selected using a data-driven method. Expert assessment variables included mean annual precipitation, potential evapotranspiration, and baseflow index. Larger sets of up to 37 variables contributed little, if any, additional predictive information. Variables used to describe the distribution of basin data (e.g. standard deviation were not useful, and average values were sufficient to characterize physical and climatic basin conditions. Effectiveness of the expert assessment variables may be due to the high degree of multicollinearity (i.e. cross-correlation among additional variables. A tool is provided in the Supplementary material to predict percentile flows based on the three expert assessment variables. Future work should develop new variables with a strong understanding of the processes related to percentile flows.

  19. A data-driven adaptive controller for a class of unknown nonlinear discrete-time systems with estimated PPD

    Directory of Open Access Journals (Sweden)

    Chidentree Treesatayapun

    2015-06-01

    Full Text Available An adaptive control scheme based on data-driven controller (DDC is proposed in this article. Unlike several DDC techniques, the proposed controller is constructed by an adaptive fuzzy rule emulated network (FREN which is able to include human knowledge based on controlled plant's input–output signals within the format of IF-THEN rules. Regarding to this advantage, an on-line estimation of pseudo partial derivative (PPD and resetting algorithms, which are commonly used by DDC, can be omitted here. Furthermore, a novel adaptive algorithm is introduced to minimize for both tracking error and control effort with stability analysis for the closed-loop system. The experimental system with brushed DC-motor current control is constructed to validate the performance of the proposed control scheme. Comparative results with conventional DDC and radial basis function (RBF controllers demonstrate that the proposed controller can provide the less tracking error and minimize the control effort.

  20. Data Driven Professional Development Design for Out-of-School Time Educators Using Planetary Science and Engineering Educational Materials

    Science.gov (United States)

    Clark, J.; Bloom, N.

    2017-12-01

    Data driven design practices should be the basis for any effective educational product, particularly those used to support STEM learning and literacy. Planetary Learning that Advances the Nexus of Engineering, Technology, and Science (PLANETS) is a five-year NASA-funded (NNX16AC53A) interdisciplinary and cross-institutional partnership to develop and disseminate STEM out-of-school time (OST) curricular and professional development units that integrate planetary science, technology, and engineering. The Center for Science Teaching and Learning at Northern Arizona University, the U.S. Geological Survey Astrogeology Science Center, and the Museum of Science Boston are partners in developing, piloting, and researching the impact of three out of school time units. Two units are for middle grades youth and one is for upper elementary aged youth. The presentation will highlight the data driven development process of the educational products used to provide support for educators teaching these curriculum units. This includes how data from the project needs assessment, curriculum pilot testing, and professional support product field tests are used in the design of products for out of school time educators. Based on data analysis, the project is developing and testing four tiers of professional support for OST educators. Tier 1 meets the immediate needs of OST educators to teach curriculum and include how-to videos and other direct support materials. Tier 2 provides additional content and pedagogical knowledge and includes short content videos designed to specifically address the content of the curriculum. Tier 3 elaborates on best practices in education and gives guidance on methods, for example, to develop cultural relevancy for underrepresented students. Tier 4 helps make connections to other NASA or educational products that support STEM learning in out of school settings. Examples of the tiers of support will be provided.

  1. Data-Driven Nonlinear Subspace Modeling for Prediction and Control of Molten Iron Quality Indices in Blast Furnace Ironmaking

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Ping; Song, Heda; Wang, Hong; Chai, Tianyou

    2017-09-01

    Blast furnace (BF) in ironmaking is a nonlinear dynamic process with complicated physical-chemical reactions, where multi-phase and multi-field coupling and large time delay occur during its operation. In BF operation, the molten iron temperature (MIT) as well as Si, P and S contents of molten iron are the most essential molten iron quality (MIQ) indices, whose measurement, modeling and control have always been important issues in metallurgic engineering and automation field. This paper develops a novel data-driven nonlinear state space modeling for the prediction and control of multivariate MIQ indices by integrating hybrid modeling and control techniques. First, to improve modeling efficiency, a data-driven hybrid method combining canonical correlation analysis and correlation analysis is proposed to identify the most influential controllable variables as the modeling inputs from multitudinous factors would affect the MIQ indices. Then, a Hammerstein model for the prediction of MIQ indices is established using the LS-SVM based nonlinear subspace identification method. Such a model is further simplified by using piecewise cubic Hermite interpolating polynomial method to fit the complex nonlinear kernel function. Compared to the original Hammerstein model, this simplified model can not only significantly reduce the computational complexity, but also has almost the same reliability and accuracy for a stable prediction of MIQ indices. Last, in order to verify the practicability of the developed model, it is applied in designing a genetic algorithm based nonlinear predictive controller for multivariate MIQ indices by directly taking the established model as a predictor. Industrial experiments show the advantages and effectiveness of the proposed approach.

  2. A data-driven emulation framework for representing water-food nexus in a changing cold region

    Science.gov (United States)

    Nazemi, A.; Zandmoghaddam, S.; Hatami, S.

    2017-12-01

    Water resource systems are under increasing pressure globally. Growing population along with competition between water demands and emerging effects of climate change have caused enormous vulnerabilities in water resource management across many regions. Diagnosing such vulnerabilities and provision of effective adaptation strategies requires the availability of simulation tools that can adequately represent the interactions between competing water demands for limiting water resources and inform decision makers about the critical vulnerability thresholds under a range of potential natural and anthropogenic conditions. Despite a significant progress in integrated modeling of water resource systems, regional models are often unable to fully represent the contemplating dynamics within the key elements of water resource systems locally. Here we propose a data-driven approach to emulate a complex regional water resource system model developed for Oldman River Basin in southern Alberta, Canada. The aim of the emulation is to provide a detailed understanding of the trade-offs and interaction at the Oldman Reservoir, which is the key to flood control and irrigated agriculture in this over-allocated semi-arid cold region. Different surrogate models are developed to represent the dynamic of irrigation demand and withdrawal as well as reservoir evaporation and release individually. The nan-falsified offline models are then integrated through the water balance equation at the reservoir location to provide a coupled model for representing the dynamic of reservoir operation and water allocation at the local scale. The performance of individual and integrated models are rigorously examined and sources of uncertainty are highlighted. To demonstrate the practical utility of such surrogate modeling approach, we use the integrated data-driven model for examining the trade-off in irrigation water supply, reservoir storage and release under a range of changing climate, upstream

  3. Data-driven Development of ROTEM and TEG Algorithms for the Management of Trauma Hemorrhage: A Prospective Observational Multicenter Study.

    Science.gov (United States)

    Baksaas-Aasen, Kjersti; Van Dieren, Susan; Balvers, Kirsten; Juffermans, Nicole P; Næss, Pål A; Rourke, Claire; Eaglestone, Simon; Ostrowski, Sisse R; Stensballe, Jakob; Stanworth, Simon; Maegele, Marc; Goslings, Carel; Johansson, Pär I; Brohi, Karim; Gaarder, Christine

    2018-05-23

    Developing pragmatic data-driven algorithms for management of trauma induced coagulopathy (TIC) during trauma hemorrhage for viscoelastic hemostatic assays (VHAs). Admission data from conventional coagulation tests (CCT), rotational thrombelastometry (ROTEM) and thrombelastography (TEG) were collected prospectively at 6 European trauma centers during 2008 to 2013. To identify significant VHA parameters capable of detecting TIC (defined as INR > 1.2), hypofibrinogenemia (< 2.0 g/L), and thrombocytopenia (< 100 x10/L), univariate regression models were constructed. Area under the curve (AUC) was calculated, and threshold values for TEG and ROTEM parameters with 70% sensitivity were included in the algorithms. A total of, 2287 adult trauma patients (ROTEM: 2019 and TEG: 968) were enrolled. FIBTEM clot amplitude at 5 minutes (CA5) had the largest AUC and 10 mm detected hypofibrinogenemia with 70% sensitivity. The corresponding value for functional fibrinogen (FF) TEG maximum amplitude (MA) was 19 mm. Thrombocytopenia was similarly detected using the calculated threshold EXTEM-FIBTEM CA5 30 mm. The corresponding rTEG-FF TEG MA was 46 mm. TIC was identified by EXTEM CA5 41 mm, rTEG MA 64 mm (80% sensitivity). For hyperfibrinolysis, we examined the relationship between viscoelastic lysis parameters and clinical outcomes, with resulting threshold values of 85% for EXTEM Li30 and 10% for rTEG Ly30.Based on these analyses, we constructed algorithms for ROTEM, TEG, and CCTs to be used in addition to ratio driven transfusion and tranexamic acid. We describe a systematic approach to define threshold parameters for ROTEM and TEG. These parameters were incorporated into algorithms to support data-driven adjustments of resuscitation with therapeutics, to optimize damage control resuscitation practice in trauma.

  4. Migraine Subclassification via a Data-Driven Automated Approach Using Multimodality Factor Mixture Modeling of Brain Structure Measurements.

    Science.gov (United States)

    Schwedt, Todd J; Si, Bing; Li, Jing; Wu, Teresa; Chong, Catherine D

    2017-07-01

    The current subclassification of migraine is according to headache frequency and aura status. The variability in migraine symptoms, disease course, and response to treatment suggest the presence of additional heterogeneity or subclasses within migraine. The study objective was to subclassify migraine via a data-driven approach, identifying latent factors by jointly exploiting multiple sets of brain structural features obtained via magnetic resonance imaging (MRI). Migraineurs (n = 66) and healthy controls (n = 54) had brain MRI measurements of cortical thickness, cortical surface area, and volumes for 68 regions. A multimodality factor mixture model was used to subclassify MRIs and to determine the brain structural factors that most contributed to the subclassification. Clinical characteristics of subjects in each subgroup were compared. Automated MRI classification divided the subjects into two subgroups. Migraineurs in subgroup #1 had more severe allodynia symptoms during migraines (6.1 ± 5.3 vs. 3.6 ± 3.2, P = .03), more years with migraine (19.2 ± 11.3 years vs 13 ± 8.3 years, P = .01), and higher Migraine Disability Assessment (MIDAS) scores (25 ± 22.9 vs 15.7 ± 12.2, P = .04). There were not differences in headache frequency or migraine aura status between the two subgroups. Data-driven subclassification of brain MRIs based upon structural measurements identified two subgroups. Amongst migraineurs, the subgroups differed in allodynia symptom severity, years with migraine, and migraine-related disability. Since allodynia is associated with this imaging-based subclassification of migraine and prior publications suggest that allodynia impacts migraine treatment response and disease prognosis, future migraine diagnostic criteria could consider allodynia when defining migraine subgroups. © 2017 American Headache Society.

  5. DeDaL: Cytoscape 3 app for producing and morphing data-driven and structure-driven network layouts.

    Science.gov (United States)

    Czerwinska, Urszula; Calzone, Laurence; Barillot, Emmanuel; Zinovyev, Andrei

    2015-08-14

    Visualization and analysis of molecular profiling data together with biological networks are able to provide new mechanistic insights into biological functions. Currently, it is possible to visualize high-throughput data on top of pre-defined network layouts, but they are not always adapted to a given data analysis task. A network layout based simultaneously on the network structure and the associated multidimensional data might be advantageous for data visualization and analysis in some cases. We developed a Cytoscape app, which allows constructing biological network layouts based on the data from molecular profiles imported as values of node attributes. DeDaL is a Cytoscape 3 app, which uses linear and non-linear algorithms of dimension reduction to produce data-driven network layouts based on multidimensional data (typically gene expression). DeDaL implements several data pre-processing and layout post-processing steps such as continuous morphing between two arbitrary network layouts and aligning one network layout with respect to another one by rotating and mirroring. The combination of all these functionalities facilitates the creation of insightful network layouts representing both structural network features and correlation patterns in multivariate data. We demonstrate the added value of applying DeDaL in several practical applications, including an example of a large protein-protein interaction network. DeDaL is a convenient tool for applying data dimensionality reduction methods and for designing insightful data displays based on data-driven layouts of biological networks, built within Cytoscape environment. DeDaL is freely available for downloading at http://bioinfo-out.curie.fr/projects/dedal/.

  6. Reproducibility of data-driven dietary patterns in two groups of adult Spanish women from different studies.

    Science.gov (United States)

    Castelló, Adela; Lope, Virginia; Vioque, Jesús; Santamariña, Carmen; Pedraz-Pingarrón, Carmen; Abad, Soledad; Ederra, Maria; Salas-Trejo, Dolores; Vidal, Carmen; Sánchez-Contador, Carmen; Aragonés, Nuria; Pérez-Gómez, Beatriz; Pollán, Marina

    2016-08-01

    The objective of the present study was to assess the reproducibility of data-driven dietary patterns in different samples extracted from similar populations. Dietary patterns were extracted by applying principal component analyses to the dietary information collected from a sample of 3550 women recruited from seven screening centres belonging to the Spanish breast cancer (BC) screening network (Determinants of Mammographic Density in Spain (DDM-Spain) study). The resulting patterns were compared with three dietary patterns obtained from a previous Spanish case-control study on female BC (Epidemiological study of the Spanish group for breast cancer research (GEICAM: grupo Español de investigación en cáncer de mama)) using the dietary intake data of 973 healthy participants. The level of agreement between patterns was determined using both the congruence coefficient (CC) between the pattern loadings (considering patterns with a CC≥0·85 as fairly similar) and the linear correlation between patterns scores (considering as fairly similar those patterns with a statistically significant correlation). The conclusions reached with both methods were compared. This is the first study exploring the reproducibility of data-driven patterns from two studies and the first using the CC to determine pattern similarity. We were able to reproduce the EpiGEICAM Western pattern in the DDM-Spain sample (CC=0·90). However, the reproducibility of the Prudent (CC=0·76) and Mediterranean (CC=0·77) patterns was not as good. The linear correlation between pattern scores was statistically significant in all cases, highlighting its arbitrariness for determining pattern similarity. We conclude that the reproducibility of widely prevalent dietary patterns is better than the reproducibility of more population-specific patterns. More methodological studies are needed to establish an objective measurement and threshold to determine pattern similarity.

  7. A predictive estimation method for carbon dioxide transport by data-driven modeling with a physically-based data model.

    Science.gov (United States)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young; Jun, Seong-Chun; Choung, Sungwook; Yun, Seong-Taek; Oh, Junho; Kim, Hyun-Jun

    2017-11-01

    In this study, a data-driven method for predicting CO 2 leaks and associated concentrations from geological CO 2 sequestration is developed. Several candidate models are compared based on their reproducibility and predictive capability for CO 2 concentration measurements from the Environment Impact Evaluation Test (EIT) site in Korea. Based on the data mining results, a one-dimensional solution of the advective-dispersive equation for steady flow (i.e., Ogata-Banks solution) is found to be most representative for the test data, and this model is adopted as the data model for the developed method. In the validation step, the method is applied to estimate future CO 2 concentrations with the reference estimation by the Ogata-Banks solution, where a part of earlier data is used as the training dataset. From the analysis, it is found that the ensemble mean of multiple estimations based on the developed method shows high prediction accuracy relative to the reference estimation. In addition, the majority of the data to be predicted are included in the proposed quantile interval, which suggests adequate representation of the uncertainty by the developed method. Therefore, the incorporation of a reasonable physically-based data model enhances the prediction capability of the data-driven model. The proposed method is not confined to estimations of CO 2 concentration and may be applied to various real-time monitoring data from subsurface sites to develop automated control, management or decision-making systems. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Data-Driven Shakespeare

    Science.gov (United States)

    Bambrick-Santoyo, Paul

    2016-01-01

    Write first, talk second--it's a simple strategy, but one that's underused in literature classes, writes Paul Bambrick-Santoyo. The author describes a lesson on Shakespeare's Sonnet 65 conducted by a middle school English teacher, who incorporates writing as an important precursor to classroom discussion. By having students write about the poem…

  9. Energy efficiency of batch and semi-batch (CCRO) reverse osmosis desalination.

    Science.gov (United States)

    Warsinger, David M; Tow, Emily W; Nayar, Kishor G; Maswadeh, Laith A; Lienhard V, John H

    2016-12-01

    As reverse osmosis (RO) desalination capacity increases worldwide, the need to reduce its specific energy consumption becomes more urgent. In addition to the incremental changes attainable with improved components such as membranes and pumps, more significant reduction of energy consumption can be achieved through time-varying RO processes including semi-batch processes such as closed-circuit reverse osmosis (CCRO) and fully-batch processes that have not yet been commercialized or modelled in detail. In this study, numerical models of the energy consumption of batch RO (BRO), CCRO, and the standard continuous RO process are detailed. Two new energy-efficient configurations of batch RO are analyzed. Batch systems use significantly less energy than continuous RO over a wide range of recovery ratios and source water salinities. Relative to continuous RO, models predict that CCRO and batch RO demonstrate up to 37% and 64% energy savings, respectively, for brackish water desalination at high water recovery. For batch RO and CCRO, the primary reductions in energy use stem from atmospheric pressure brine discharge and reduced streamwise variation in driving pressure. Fully-batch systems further reduce energy consumption by not mixing streams of different concentrations, which CCRO does. These results demonstrate that time-varying processes can significantly raise RO energy efficiency. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. PEPSI-Dock: a detailed data-driven protein-protein interaction potential accelerated by polar Fourier correlation.

    Science.gov (United States)

    Neveu, Emilie; Ritchie, David W; Popov, Petr; Grudinin, Sergei

    2016-09-01

    Docking prediction algorithms aim to find the native conformation of a complex of proteins from knowledge of their unbound structures. They rely on a combination of sampling and scoring methods, adapted to different scales. Polynomial Expansion of Protein Structures and Interactions for Docking (PEPSI-Dock) improves the accuracy of the first stage of the docking pipeline, which will sharpen up the final predictions. Indeed, PEPSI-Dock benefits from the precision of a very detailed data-driven model of the binding free energy used with a global and exhaustive rigid-body search space. As well as being accurate, our computations are among the fastest by virtue of the sparse representation of the pre-computed potentials and FFT-accelerated sampling techniques. Overall, this is the first demonstration of a FFT-accelerated docking method coupled with an arbitrary-shaped distance-dependent interaction potential. First, we present a novel learning process to compute data-driven distant-dependent pairwise potentials, adapted from our previous method used for rescoring of putative protein-protein binding poses. The potential coefficients are learned by combining machine-learning techniques with physically interpretable descriptors. Then, we describe the integration of the deduced potentials into a FFT-accelerated spherical sampling provided by the Hex library. Overall, on a training set of 163 heterodimers, PEPSI-Dock achieves a success rate of 91% mid-quality predictions in the top-10 solutions. On a subset of the protein docking benchmark v5, it achieves 44.4% mid-quality predictions in the top-10 solutions when starting from bound structures and 20.5% when starting from unbound structures. The method runs in 5-15 min on a modern laptop and can easily be extended to other types of interactions. https://team.inria.fr/nano-d/software/PEPSI-Dock sergei.grudinin@inria.fr. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e

  11. Nursing Theory, Terminology, and Big Data: Data-Driven Discovery of Novel Patterns in Archival Randomized Clinical Trial Data.

    Science.gov (United States)

    Monsen, Karen A; Kelechi, Teresa J; McRae, Marion E; Mathiason, Michelle A; Martin, Karen S

    The growth and diversification of nursing theory, nursing terminology, and nursing data enable a convergence of theory- and data-driven discovery in the era of big data research. Existing datasets can be viewed through theoretical and terminology perspectives using visualization techniques in order to reveal new patterns and generate hypotheses. The Omaha System is a standardized terminology and metamodel that makes explicit the theoretical perspective of the nursing discipline and enables terminology-theory testing research. The purpose of this paper is to illustrate the approach by exploring a large research dataset consisting of 95 variables (demographics, temperature measures, anthropometrics, and standardized instruments measuring quality of life and self-efficacy) from a theory-based perspective using the Omaha System. Aims were to (a) examine the Omaha System dataset to understand the sample at baseline relative to Omaha System problem terms and outcome measures, (b) examine relationships within the normalized Omaha System dataset at baseline in predicting adherence, and (c) examine relationships within the normalized Omaha System dataset at baseline in predicting incident venous ulcer. Variables from a randomized clinical trial of a cryotherapy intervention for the prevention of venous ulcers were mapped onto Omaha System terms and measures to derive a theoretical framework for the terminology-theory testing study. The original dataset was recoded using the mapping to create an Omaha System dataset, which was then examined using visualization to generate hypotheses. The hypotheses were tested using standard inferential statistics. Logistic regression was used to predict adherence and incident venous ulcer. Findings revealed novel patterns in the psychosocial characteristics of the sample that were discovered to be drivers of both adherence (Mental health Behavior: OR = 1.28, 95% CI [1.02, 1.60]; AUC = .56) and incident venous ulcer (Mental health Behavior

  12. Excellence and evidence in staffing: a data-driven model for excellence in staffing (2nd edition).

    Science.gov (United States)

    Baggett, Margarita; Batcheller, Joyce; Blouin, Ann Scott; Behrens, Elizabeth; Bradley, Carol; Brown, Mary J; Brown, Diane Storer; Bolton, Linda Burnes; Borromeo, Annabelle R; Burtson, Paige; Caramanica, Laura; Caspers, Barbara A; Chow, Marilyn; Christopher, Mary Ann; Clarke, Sean P; Delucas, Christine; Dent, Robert L; Disser, Tony; Eliopoulos, Charlotte; Everett, Linda Q; Garcia, Amy; Glassman, Kimberly; Goodwin, Susan; Haagenson, Deb; Harper, Ellen; Harris, Kathy; Hoying, Cheryl L; Hughes-Rease, Marsha; Kelly, Lesly; Kiger, Anna J; Kobs-Abbott, Ann; Krueger, Janelle; Larson, Jackie; March, Connie; Martin, Deborah Maust; Mazyck, Donna; Meenan, Penny; McGaffigan, Patricia; Myers, Karen K; Nell, Kate; Newcomer, Britta; Cathy, Rick; O'Rourke, Maria; Rosa, Billy; Rose, Robert; Rudisill, Pamela; Sanford, Kathy; Simpson, Roy L; Snowden, Tami; Strickland, Bob; Strohecker, Sharon; Weems, Roger B; Welton, John; Weston, Marla; Valentine, Nancy M; Vento, Laura; Yendro, Susan

    2014-01-01

    The Patient Protection and Affordable Care Act (PPACA, 2010) and the Institute of Medicine's (IOM, 2011) Future of Nursing report have prompted changes in the U.S. health care system. This has also stimulated a new direction of thinking for the profession of nursing. New payment and priority structures, where value is placed ahead of volume in care, will start to define our health system in new and unknown ways for years. One thing we all know for sure: we cannot afford the same inefficient models and systems of care of yesterday any longer. The Data-Driven Model for Excellence in Staffing was created as the organizing framework to lead the development of best practices for nurse staffing across the continuum through research and innovation. Regardless of the setting, nurses must integrate multiple concepts with the value of professional nursing to create new care and staffing models. Traditional models demonstrate that nurses are a commodity. If the profession is to make any significant changes in nurse staffing, it is through the articulation of the value of our professional practice within the overall health care environment. This position paper is organized around the concepts from the Data-Driven Model for Excellence in Staffing. The main concepts are: Core Concept 1: Users and Patients of Health Care, Core Concept 2: Providers of Health Care, Core Concept 3: Environment of Care, Core Concept 4: Delivery of Care, Core Concept 5: Quality, Safety, and Outcomes of Care. This position paper provides a comprehensive view of those concepts and components, why those concepts and components are important in this new era of nurse staffing, and a 3-year challenge that will push the nursing profession forward in all settings across the care continuum. There are decades of research supporting various changes to nurse staffing. Yet little has been done to move that research into practice and operations. While the primary goal of this position paper is to generate research

  13. Multivariate modeling of complications with data driven variable selection: Guarding against overfitting and effects of data set size

    International Nuclear Information System (INIS)

    Schaaf, Arjen van der; Xu Chengjian; Luijk, Peter van; Veld, Aart A. van’t; Langendijk, Johannes A.; Schilstra, Cornelis

    2012-01-01

    Purpose: Multivariate modeling of complications after radiotherapy is frequently used in conjunction with data driven variable selection. This study quantifies the risk of overfitting in a data driven modeling method using bootstrapping for data with typical clinical characteristics, and estimates the minimum amount of data needed to obtain models with relatively high predictive power. Materials and methods: To facilitate repeated modeling and cross-validation with independent datasets for the assessment of true predictive power, a method was developed to generate simulated data with statistical properties similar to real clinical data sets. Characteristics of three clinical data sets from radiotherapy treatment of head and neck cancer patients were used to simulate data with set sizes between 50 and 1000 patients. A logistic regression method using bootstrapping and forward variable selection was used for complication modeling, resulting for each simulated data set in a selected number of variables and an estimated predictive power. The true optimal number of variables and true predictive power were calculated using cross-validation with very large independent data sets. Results: For all simulated data set sizes the number of variables selected by the bootstrapping method was on average close to the true optimal number of variables, but showed considerable spread. Bootstrapping is more accurate in selecting the optimal number of variables than the AIC and BIC alternatives, but this did not translate into a significant difference of the true predictive power. The true predictive power asymptotically converged toward a maximum predictive power for large data sets, and the estimated predictive power converged toward the true predictive power. More than half of the potential predictive power is gained after approximately 200 samples. Our simulations demonstrated severe overfitting (a predicative power lower than that of predicting 50% probability) in a number of small

  14. Fuzzy batch controller for granular materials

    Directory of Open Access Journals (Sweden)

    Zamyatin Nikolaj

    2018-01-01

    Full Text Available The paper focuses on batch control of granular materials in production of building materials from fluorine anhydrite. Batching equipment is intended for smooth operation and timely feeding of supply hoppers at a required level. Level sensors and a controller of an asynchronous screw drive motor are used to control filling of the hopper with industrial anhydrite binders. The controller generates a required frequency and ensures required productivity of a feed conveyor. Mamdani-type fuzzy inference is proposed for controlling the speed of the screw that feeds mixture components. As related to production of building materials based on fluoride anhydrite, this method is used for the first time. A fuzzy controller is proven to be effective in controlling the filling level of the supply hopper. In addition, the authors determined optimal parameters of the batching process to ensure smooth operation and production of fluorine anhydrite materials of specified properties that can compete with gypsum-based products.

  15. History based batch method preserving tally means

    International Nuclear Information System (INIS)

    Shim, Hyung Jin; Choi, Sung Hoon

    2012-01-01

    In the Monte Carlo (MC) eigenvalue calculations, the sample variance of a tally mean calculated from its cycle-wise estimates is biased because of the inter-cycle correlations of the fission source distribution (FSD). Recently, we proposed a new real variance estimation method named the history-based batch method in which a MC run is treated as multiple runs with small number of histories per cycle to generate independent tally estimates. In this paper, the history-based batch method based on the weight correction is presented to preserve the tally mean from the original MC run. The effectiveness of the new method is examined for the weakly coupled fissile array problem as a function of the dominance ratio and the batch size, in comparison with other schemes available

  16. Following an Optimal Batch Bioreactor Operations Model

    DEFF Research Database (Denmark)

    Ibarra-Junquera, V.; Jørgensen, Sten Bay; Virgen-Ortíz, J.J.

    2012-01-01

    The problem of following an optimal batch operation model for a bioreactor in the presence of uncertainties is studied. The optimal batch bioreactor operation model (OBBOM) refers to the bioreactor trajectory for nominal cultivation to be optimal. A multiple-variable dynamic optimization of fed...... as the master system which includes the optimal cultivation trajectory for the feed flow rate and the substrate concentration. The “real” bioreactor, the one with unknown dynamics and perturbations, is considered as the slave system. Finally, the controller is designed such that the real bioreactor...

  17. Exploring the Transition From Batch to Online

    DEFF Research Database (Denmark)

    Jørgensen, Anker Helms

    2010-01-01

    of the truly interactive use of computers known today. The transition invoked changes in a number of areas: technological, such as hybrid forms between batch and online; organisational such as decentralization; and personal as users and developers alike had to adopt new technology, shape new organizational...... structures, and acquire new skills. This work-in-progress paper extends an earlier study of the transition from batch to online, based on oral history interviews with (ex)-employees in two large Danish Service Bureaus. The paper takes the next step by ana-lyzing a particular genre: the commercial computer...

  18. Using Data-Driven and Process Mining Techniques for Identifying and Characterizing Problem Gamblers in New Zealand

    Directory of Open Access Journals (Sweden)

    Suriadi Suriadi

    2016-12-01

    Full Text Available This article uses data-driven techniques combined with established theory in order to analyse gambling behavioural patterns of 91 thousand individuals on a real-world fixed-odds gambling dataset in New Zealand. This research uniquely integrates a mixture of process mining, data mining and confirmatory statistical techniques in order to categorise different sub-groups of gamblers, with the explicit motivation of identifying problem gambling behaviours and reporting on the challenges and lessons learned from our case study.We demonstrate how techniques from various disciplines can be combined in order to gain insight into the behavioural patterns exhibited by different types of gamblers, as well as provide assurances of the correctness of our approach and findings. A highlight of this case study is both the methodology which demonstrates how such a combination of techniques provides a rich set of effective tools to undertake an exploratory and open-ended data analysis project that is guided by the process cube concept, as well as the findings themselves which indicate that the contribution that problem gamblers make to the total volume, expenditure, and revenue is higher than previous studies have maintained.

  19. Data-driven simultaneous fault diagnosis for solid oxide fuel cell system using multi-label pattern identification

    Science.gov (United States)

    Li, Shuanghong; Cao, Hongliang; Yang, Yupu

    2018-02-01

    Fault diagnosis is a key process for the reliability and safety of solid oxide fuel cell (SOFC) systems. However, it is difficult to rapidly and accurately identify faults for complicated SOFC systems, especially when simultaneous faults appear. In this research, a data-driven Multi-Label (ML) pattern identification approach is proposed to address the simultaneous fault diagnosis of SOFC systems. The framework of the simultaneous-fault diagnosis primarily includes two components: feature extraction and ML-SVM classifier. The simultaneous-fault diagnosis approach can be trained to diagnose simultaneous SOFC faults, such as fuel leakage, air leakage in different positions in the SOFC system, by just using simple training data sets consisting only single fault and not demanding simultaneous faults data. The experimental result shows the proposed framework can diagnose the simultaneous SOFC system faults with high accuracy requiring small number training data and low computational burden. In addition, Fault Inference Tree Analysis (FITA) is employed to identify the correlations among possible faults and their corresponding symptoms at the system component level.

  20. Towards efficient data exchange and sharing for big-data driven materials science: metadata and data formats

    Science.gov (United States)

    Ghiringhelli, Luca M.; Carbogno, Christian; Levchenko, Sergey; Mohamed, Fawzi; Huhs, Georg; Lüders, Martin; Oliveira, Micael; Scheffler, Matthias

    2017-11-01

    With big-data driven materials research, the new paradigm of materials science, sharing and wide accessibility of data are becoming crucial aspects. Obviously, a prerequisite for data exchange and big-data analytics is standardization, which means using consistent and unique conventions for, e.g., units, zero base lines, and file formats. There are two main strategies to achieve this goal. One accepts the heterogeneous nature of the community, which comprises scientists from physics, chemistry, bio-physics, and materials science, by complying with the diverse ecosystem of computer codes and thus develops "converters" for the input and output files of all important codes. These converters then translate the data of each code into a standardized, code-independent format. The other strategy is to provide standardized open libraries that code developers can adopt for shaping their inputs, outputs, and restart files, directly into the same code-independent format. In this perspective paper, we present both strategies and argue that they can and should be regarded as complementary, if not even synergetic. The represented appropriate format and conventions were agreed upon by two teams, the Electronic Structure Library (ESL) of the European Center for Atomic and Molecular Computations (CECAM) and the NOvel MAterials Discovery (NOMAD) Laboratory, a European Centre of Excellence (CoE). A key element of this work is the definition of hierarchical metadata describing state-of-the-art electronic-structure calculations.

  1. Extended dynamic mode decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the Koopman operator.

    Science.gov (United States)

    Li, Qianxiao; Dietrich, Felix; Bollt, Erik M; Kevrekidis, Ioannis G

    2017-10-01

    Numerical approximation methods for the Koopman operator have advanced considerably in the last few years. In particular, data-driven approaches such as dynamic mode decomposition (DMD) 51 and its generalization, the extended-DMD (EDMD), are becoming increasingly popular in practical applications. The EDMD improves upon the classical DMD by the inclusion of a flexible choice of dictionary of observables which spans a finite dimensional subspace on which the Koopman operator can be approximated. This enhances the accuracy of the solution reconstruction and broadens the applicability of the Koopman formalism. Although the convergence of the EDMD has been established, applying the method in practice requires a careful choice of the observables to improve convergence with just a finite number of terms. This is especially difficult for high dimensional and highly nonlinear systems. In this paper, we employ ideas from machine learning to improve upon the EDMD method. We develop an iterative approximation algorithm which couples the EDMD with a trainable dictionary represented by an artificial neural network. Using the Duffing oscillator and the Kuramoto Sivashinsky partical differential equation as examples, we show that our algorithm can effectively and efficiently adapt the trainable dictionary to the problem at hand to achieve good reconstruction accuracy without the need to choose a fixed dictionary a priori. Furthermore, to obtain a given accuracy, we require fewer dictionary terms than EDMD with fixed dictionaries. This alleviates an important shortcoming of the EDMD algorithm and enhances the applicability of the Koopman framework to practical problems.

  2. HOMOLOGOUS HELICAL JETS: OBSERVATIONS BY IRIS, SDO, AND HINODE AND MAGNETIC MODELING WITH DATA-DRIVEN SIMULATIONS

    Energy Technology Data Exchange (ETDEWEB)

    Cheung, Mark C. M.; Pontieu, B. De; Tarbell, T. D.; Fu, Y.; Martínez-Sykora, J.; Boerner, P.; Wülser, J. P.; Lemen, J.; Title, A. M.; Hurlburt, N. [Lockheed Martin Solar and Astrophysics Laboratory, 3251 Hanover Street Bldg. 252, Palo Alto, CA 94304 (United States); Tian, H.; Testa, P.; Reeves, K. K.; Golub, L.; McKillop, S.; Saar, S. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Kleint, L. [University of Applied Sciences and Arts Northwestern Switzerland, Bahnhofstr. 6, 5210 Windisch (Switzerland); Kankelborg, C.; Jaeggli, S. [Department of Physics, Montana State University, Bozeman, P.O. Box 173840, Bozeman, MT 59717 (United States); Carlsson, M., E-mail: cheung@lmsal.com [Institute of Theoretical Astrophysics, University of Oslo, P.O. Box 1029, Blindern, NO-0315 Oslo (Norway); and others

    2015-03-10

    We report on observations of recurrent jets by instruments on board the Interface Region Imaging Spectrograph, Solar Dynamics Observatory (SDO), and Hinode spacecraft. Over a 4 hr period on 2013 July 21, recurrent coronal jets were observed to emanate from NOAA Active Region 11793. Far-ultraviolet spectra probing plasma at transition region temperatures show evidence of oppositely directed flows with components reaching Doppler velocities of ±100 km s{sup −1}. Raster Doppler maps using a Si iv transition region line show all four jets to have helical motion of the same sense. Simultaneous observations of the region by SDO and Hinode show that the jets emanate from a source region comprising a pore embedded in the interior of a supergranule. The parasitic pore has opposite polarity flux compared to the surrounding network field. This leads to a spine-fan magnetic topology in the coronal field that is amenable to jet formation. Time-dependent data-driven simulations are used to investigate the underlying drivers for the jets. These numerical experiments show that the emergence of current-carrying magnetic field in the vicinity of the pore supplies the magnetic twist needed for recurrent helical jet formation.

  3. Flood probability quantification for road infrastructure: Data-driven spatial-statistical approach and case study applications.

    Science.gov (United States)

    Kalantari, Zahra; Cavalli, Marco; Cantone, Carolina; Crema, Stefano; Destouni, Georgia

    2017-03-01

    Climate-driven increase in the frequency of extreme hydrological events is expected to impose greater strain on the built environment and major transport infrastructure, such as roads and railways. This study develops a data-driven spatial-statistical approach to quantifying and mapping the probability of flooding at critical road-stream intersection locations, where water flow and sediment transport may accumulate and cause serious road damage. The approach is based on novel integration of key watershed and road characteristics, including also measures of sediment connectivity. The approach is concretely applied to and quantified for two specific study case examples in southwest Sweden, with documented road flooding effects of recorded extreme rainfall. The novel contributions of this study in combining a sediment connectivity account with that of soil type, land use, spatial precipitation-runoff variability and road drainage in catchments, and in extending the connectivity measure use for different types of catchments, improve the accuracy of model results for road flood probability. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Data-driven methods towards learning the highly nonlinear inverse kinematics of tendon-driven surgical manipulators.

    Science.gov (United States)

    Xu, Wenjun; Chen, Jie; Lau, Henry Y K; Ren, Hongliang

    2017-09-01

    Accurate motion control of flexible surgical manipulators is crucial in tissue manipulation tasks. The tendon-driven serpentine manipulator (TSM) is one of the most widely adopted flexible mechanisms in minimally invasive surgery because of its enhanced maneuverability in torturous environments. TSM, however, exhibits high nonlinearities and conventional analytical kinematics model is insufficient to achieve high accuracy. To account for the system nonlinearities, we applied a data driven approach to encode the system inverse kinematics. Three regression methods: extreme learning machine (ELM), Gaussian mixture regression (GMR) and K-nearest neighbors regression (KNNR) were implemented to learn a nonlinear mapping from the robot 3D position states to the control inputs. The performance of the three algorithms was evaluated both in simulation and physical trajectory tracking experiments. KNNR performed the best in the tracking experiments, with the lowest RMSE of 2.1275 mm. The proposed inverse kinematics learning methods provide an alternative and efficient way to accurately model the tendon driven flexible manipulator. Copyright © 2016 John Wiley & Sons, Ltd.

  5. PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry.

    Science.gov (United States)

    Nakata, Maho; Shimazaki, Tomomi

    2017-06-26

    Large-scale molecular databases play an essential role in the investigation of various subjects such as the development of organic materials, in silico drug design, and data-driven studies with machine learning. We have developed a large-scale quantum chemistry database based on first-principles methods. Our database currently contains the ground-state electronic structures of 3 million molecules based on density functional theory (DFT) at the B3LYP/6-31G* level, and we successively calculated 10 low-lying excited states of over 2 million molecules via time-dependent DFT with the B3LYP functional and the 6-31+G* basis set. To select the molecules calculated in our project, we referred to the PubChem Project, which was used as the source of the molecular structures in short strings using the InChI and SMILES representations. Accordingly, we have named our quantum chemistry database project "PubChemQC" ( http://pubchemqc.riken.jp/ ) and placed it in the public domain. In this paper, we show the fundamental features of the PubChemQC database and discuss the techniques used to construct the data set for large-scale quantum chemistry calculations. We also present a machine learning approach to predict the electronic structure of molecules as an example to demonstrate the suitability of the large-scale quantum chemistry database.

  6. Data-Driven Machine-Learning Model in District Heating System for Heat Load Prediction: A Comparison Study

    Directory of Open Access Journals (Sweden)

    Fisnik Dalipi

    2016-01-01

    Full Text Available We present our data-driven supervised machine-learning (ML model to predict heat load for buildings in a district heating system (DHS. Even though ML has been used as an approach to heat load prediction in literature, it is hard to select an approach that will qualify as a solution for our case as existing solutions are quite problem specific. For that reason, we compared and evaluated three ML algorithms within a framework on operational data from a DH system in order to generate the required prediction model. The algorithms examined are Support Vector Regression (SVR, Partial Least Square (PLS, and random forest (RF. We use the data collected from buildings at several locations for a period of 29 weeks. Concerning the accuracy of predicting the heat load, we evaluate the performance of the proposed algorithms using mean absolute error (MAE, mean absolute percentage error (MAPE, and correlation coefficient. In order to determine which algorithm had the best accuracy, we conducted performance comparison among these ML algorithms. The comparison of the algorithms indicates that, for DH heat load prediction, SVR method presented in this paper is the most efficient one out of the three also compared to other methods found in the literature.

  7. A Data-Driven Modeling Strategy for Smart Grid Power Quality Coupling Assessment Based on Time Series Pattern Matching

    Directory of Open Access Journals (Sweden)

    Hao Yu

    2018-01-01

    Full Text Available This study introduces a data-driven modeling strategy for smart grid power quality (PQ coupling assessment based on time series pattern matching to quantify the influence of single and integrated disturbance among nodes in different pollution patterns. Periodic and random PQ patterns are constructed by using multidimensional frequency-domain decomposition for all disturbances. A multidimensional piecewise linear representation based on local extreme points is proposed to extract the patterns features of single and integrated disturbance in consideration of disturbance variation trend and severity. A feature distance of pattern (FDP is developed to implement pattern matching on univariate PQ time series (UPQTS and multivariate PQ time series (MPQTS to quantify the influence of single and integrated disturbance among nodes in the pollution patterns. Case studies on a 14-bus distribution system are performed and analyzed; the accuracy and applicability of the FDP in the smart grid PQ coupling assessment are verified by comparing with other time series pattern matching methods.

  8. Data-Driven Analysis of Virtual 3D Exploration of a Large Sculpture Collection in Real-World Museum Exhibitions

    KAUST Repository

    Agus, Marco

    2018-01-29

    We analyze use of an interactive system for the exploration of highly detailed three-dimensional (3D) models of a collection of protostoric Mediterranean sculptures. In this system, when the object of interest is selected, its detailed 3D model and associated information are presented at high resolution on a large display controlled by a touch-enabled horizontal surface at a suitable distance. The user interface combines an object-Aware interactive camera controller with an interactive point-ofinterest selector and is implemented within a scalable implementation based on multiresolution structures shared between the rendering and user interaction subsystems. The system was installed in several temporary and permanent exhibitions and was extensively used by tens of thousands of visitors. We provide a data-driven analysis of usage experience based on logs gathered during a 27-month period at four exhibitions in archeological museums for a total of more than 75K exploration sessions. We focus on discerning the main visitor behaviors during 3D exploration by employing tools for deriving interest measures on surfaces and tools for clustering and knowledge discovery from high-dimensional data. The results highlight the main trends in visitor behavior during the interactive sessions. These results provide useful insights for the design of 3D exploration user interfaces in future digital installations.© 2017 ACM 1556-4673/2017/12-ART2 $15.00.

  9. Coupling physically based and data-driven models for assessing freshwater inflow into the Small Aral Sea

    Science.gov (United States)

    Ayzel, Georgy; Izhitskiy, Alexander

    2018-06-01

    The Aral Sea desiccation and related changes in hydroclimatic conditions on a regional level is a hot topic for past decades. The key problem of scientific research projects devoted to an investigation of modern Aral Sea basin hydrological regime is its discontinuous nature - the only limited amount of papers takes into account the complex runoff formation system entirely. Addressing this challenge we have developed a continuous prediction system for assessing freshwater inflow into the Small Aral Sea based on coupling stack of hydrological and data-driven models. Results show a good prediction skill and approve the possibility to develop a valuable water assessment tool which utilizes the power of classical physically based and modern machine learning models both for territories with complex water management system and strong water-related data scarcity. The source code and data of the proposed system is available on a Github page (https://github.com/SMASHIproject/IWRM2018" target="_blank">https://github.com/SMASHIproject/IWRM2018).

  10. A Data-Driven Noise Reduction Method and Its Application for the Enhancement of Stress Wave Signals

    Directory of Open Access Journals (Sweden)

    Hai-Lin Feng

    2012-01-01

    Full Text Available Ensemble empirical mode decomposition (EEMD has been recently used to recover a signal from observed noisy data. Typically this is performed by partial reconstruction or thresholding operation. In this paper we describe an efficient noise reduction method. EEMD is used to decompose a signal into several intrinsic mode functions (IMFs. The time intervals between two adjacent zero-crossings within the IMF, called instantaneous half period (IHP, are used as a criterion to detect and classify the noise oscillations. The undesirable waveforms with a larger IHP are set to zero. Furthermore, the optimum threshold in this approach can be derived from the signal itself using the consecutive mean square error (CMSE. The method is fully data driven, and it requires no prior knowledge of the target signals. This method can be verified with the simulative program by using Matlab. The denoising results are proper. In comparison with other EEMD based methods, it is concluded that the means adopted in this paper is suitable to preprocess the stress wave signals in the wood nondestructive testing.

  11. A New Application of Dynamic Data Driven System in the Talbot-Ogden Model for Groundwater Infiltration

    KAUST Repository

    Yu, Han; Douglas, Craig C.; Ogden, Fred L.

    2012-01-01

    The TalbotOgden model is a mass conservative method to simulate flow of a wetting liquid in variably-saturated porous media. The principal feature of this model is the discretization of the moisture content domain into bins. This paper gives an analysis of the relationship between the number of bins and the computed flux. Under the circumstances of discrete bins and discontinuous wetting fronts, we show that fluxes increase with the number of bins. We then apply this analysis to the continuous case and get an upper bound of the difference of infiltration rates when the number of bins tends to infinity. We also extend this model by creating a two dimensional moisture content domain so that there exists a probability distribution of the moisture content for different soil systems. With these theoretical and experimental results and using a Dynamic Data Driven Application System (DDDAS), sensors can be put in soils to detect the infiltration fluxes, which are important to compute the proper number of bins for a specific soil system and predict fluxes. Using this feedback control loop, the extended TalbotOgden model can be made more efficient for estimating infiltration into soils.

  12. Data-Driven Analysis of Virtual 3D Exploration of a Large Sculpture Collection in Real-World Museum Exhibitions

    KAUST Repository

    Agus, Marco; Marton, Fabio; Bettio, Fabio; Hadwiger, Markus; Gobbetti, Enrico

    2018-01-01

    We analyze use of an interactive system for the exploration of highly detailed three-dimensional (3D) models of a collection of protostoric Mediterranean sculptures. In this system, when the object of interest is selected, its detailed 3D model and associated information are presented at high resolution on a large display controlled by a touch-enabled horizontal surface at a suitable distance. The user interface combines an object-Aware interactive camera controller with an interactive point-ofinterest selector and is implemented within a scalable implementation based on multiresolution structures shared between the rendering and user interaction subsystems. The system was installed in several temporary and permanent exhibitions and was extensively used by tens of thousands of visitors. We provide a data-driven analysis of usage experience based on logs gathered during a 27-month period at four exhibitions in archeological museums for a total of more than 75K exploration sessions. We focus on discerning the main visitor behaviors during 3D exploration by employing tools for deriving interest measures on surfaces and tools for clustering and knowledge discovery from high-dimensional data. The results highlight the main trends in visitor behavior during the interactive sessions. These results provide useful insights for the design of 3D exploration user interfaces in future digital installations.© 2017 ACM 1556-4673/2017/12-ART2 $15.00.

  13. A perspective on bridging scales and design of models using low-dimensional manifolds and data-driven model inference

    KAUST Repository

    Tegner, Jesper; Zenil, Hector; Kiani, Narsis A.; Ball, Gordon; Gomez-Cabrero, David

    2016-01-01

    Systems in nature capable of collective behaviour are nonlinear, operating across several scales. Yet our ability to account for their collective dynamics differs in physics, chemistry and biology. Here, we briefly review the similarities and differences between mathematical modelling of adaptive living systems versus physico-chemical systems. We find that physics-based chemistry modelling and computational neuroscience have a shared interest in developing techniques for model reductions aiming at the identification of a reduced subsystem or slow manifold, capturing the effective dynamics. By contrast, as relations and kinetics between biological molecules are less characterized, current quantitative analysis under the umbrella of bioinformatics focuses on signal extraction, correlation, regression and machine-learning analysis. We argue that model reduction analysis and the ensuing identification of manifolds bridges physics and biology. Furthermore, modelling living systems presents deep challenges as how to reconcile rich molecular data with inherent modelling uncertainties (formalism, variables selection and model parameters). We anticipate a new generative data-driven modelling paradigm constrained by identified governing principles extracted from low-dimensional manifold analysis. The rise of a new generation of models will ultimately connect biology to quantitative mechanistic descriptions, thereby setting the stage for investigating the character of the model language and principles driving living systems.

  14. Examining the Relationship Between Past Orientation and US Suicide Rates: An Analysis Using Big Data-Driven Google Search Queries.

    Science.gov (United States)

    Lee, Donghyun; Lee, Hojun; Choi, Munkee

    2016-02-11

    Internet search query data reflect the attitudes of the users, using which we can measure the past orientation to commit suicide. Examinations of past orientation often highlight certain predispositions of attitude, many of which can be suicide risk factors. To investigate the relationship between past orientation and suicide rate by examining Google search queries. We measured the past orientation using Google search query data by comparing the search volumes of the past year and those of the future year, across the 50 US states and the District of Columbia during the period from 2004 to 2012. We constructed a panel dataset with independent variables as control variables; we then undertook an analysis using multiple ordinary least squares regression and methods that leverage the Akaike information criterion and the Bayesian information criterion. It was found that past orientation had a positive relationship with the suicide rate (P ≤ .001) and that it improves the goodness-of-fit of the model regarding the suicide rate. Unemployment rate (P ≤ .001 in Models 3 and 4), Gini coefficient (P ≤ .001), and population growth rate (P ≤ .001) had a positive relationship with the suicide rate, whereas the gross state product (P ≤ .001) showed a negative relationship with the suicide rate. We empirically identified the positive relationship between the suicide rate and past orientation, which was measured by big data-driven Google search query.

  15. Combining density functional theory calculations, supercomputing, and data-driven methods to design new materials (Conference Presentation)

    Science.gov (United States)

    Jain, Anubhav

    2017-04-01

    Density functional theory (DFT) simulations solve for the electronic structure of materials starting from the Schrödinger equation. Many case studies have now demonstrated that researchers can often use DFT to design new compounds in the computer (e.g., for batteries, catalysts, and hydrogen storage) before synthesis and characterization in the lab. In this talk, I will focus on how DFT calculations can be executed on large supercomputing resources in order to generate very large data sets on new materials for functional applications. First, I will briefly describe the Materials Project, an effort at LBNL that has virtually characterized over 60,000 materials using DFT and has shared the results with over 17,000 registered users. Next, I will talk about how such data can help discover new materials, describing how preliminary computational screening led to the identification and confirmation of a new family of bulk AMX2 thermoelectric compounds with measured zT reaching 0.8. I will outline future plans for how such data-driven methods can be used to better understand the factors that control thermoelectric behavior, e.g., for the rational design of electronic band structures, in ways that are different from conventional approaches.

  16. A data-driven soft sensor for needle deflection in heterogeneous tissue using just-in-time modelling.

    Science.gov (United States)

    Rossa, Carlos; Lehmann, Thomas; Sloboda, Ronald; Usmani, Nawaid; Tavakoli, Mahdi

    2017-08-01

    Global modelling has traditionally been the approach taken to estimate needle deflection in soft tissue. In this paper, we propose a new method based on local data-driven modelling of needle deflection. External measurement of needle-tissue interactions is collected from several insertions in ex vivo tissue to form a cloud of data. Inputs to the system are the needle insertion depth, axial rotations, and the forces and torques measured at the needle base by a force sensor. When a new insertion is performed, the just-in-time learning method estimates the model outputs given the current inputs to the needle-tissue system and the historical database. The query is compared to every observation in the database and is given weights according to some similarity criteria. Only a subset of historical data that is most relevant to the query is selected and a local linear model is fit to the selected points to estimate the query output. The model outputs the 3D deflection of the needle tip and the needle insertion force. The proposed approach is validated in ex vivo multilayered biological tissue in different needle insertion scenarios. Experimental results in five different case studies indicate an accuracy in predicting needle deflection of 0.81 and 1.24 mm in the horizontal and vertical lanes, respectively, and an accuracy of 0.5 N in predicting the needle insertion force over 216 needle insertions.

  17. Idiopathic Pulmonary Fibrosis: Data-driven Textural Analysis of Extent of Fibrosis at Baseline and 15-Month Follow-up.

    Science.gov (United States)

    Humphries, Stephen M; Yagihashi, Kunihiro; Huckleberry, Jason; Rho, Byung-Hak; Schroeder, Joyce D; Strand, Matthew; Schwarz, Marvin I; Flaherty, Kevin R; Kazerooni, Ella A; van Beek, Edwin J R; Lynch, David A

    2017-10-01

    Purpose To evaluate associations between pulmonary function and both quantitative analysis and visual assessment of thin-section computed tomography (CT) images at baseline and at 15-month follow-up in subjects with idiopathic pulmonary fibrosis (IPF). Materials and Methods This retrospective analysis of preexisting anonymized data, collected prospectively between 2007 and 2013 in a HIPAA-compliant study, was exempt from additional institutional review board approval. The extent of lung fibrosis at baseline inspiratory chest CT in 280 subjects enrolled in the IPF Network was evaluated. Visual analysis was performed by using a semiquantitative scoring system. Computer-based quantitative analysis included CT histogram-based measurements and a data-driven textural analysis (DTA). Follow-up CT images in 72 of these subjects were also analyzed. Univariate comparisons were performed by using Spearman rank correlation. Multivariate and longitudinal analyses were performed by using a linear mixed model approach, in which models were compared by using asymptotic χ 2 tests. Results At baseline, all CT-derived measures showed moderate significant correlation (P pulmonary function. At follow-up CT, changes in DTA scores showed significant correlation with changes in both forced vital capacity percentage predicted (ρ = -0.41, P pulmonary function (P fibrosis at CT yields an index of severity that correlates with visual assessment and functional change in subjects with IPF. © RSNA, 2017.

  18. On the data-driven inference of modulatory networks in climate science: an application to West African rainfall

    Science.gov (United States)

    González, D. L., II; Angus, M. P.; Tetteh, I. K.; Bello, G. A.; Padmanabhan, K.; Pendse, S. V.; Srinivas, S.; Yu, J.; Semazzi, F.; Kumar, V.; Samatova, N. F.

    2015-01-01

    Decades of hypothesis-driven and/or first-principles research have been applied towards the discovery and explanation of the mechanisms that drive climate phenomena, such as western African Sahel summer rainfall~variability. Although connections between various climate factors have been theorized, not all of the key relationships are fully understood. We propose a data-driven approach to identify candidate players in this climate system, which can help explain underlying mechanisms and/or even suggest new relationships, to facilitate building a more comprehensive and predictive model of the modulatory relationships influencing a climate phenomenon of interest. We applied coupled heterogeneous association rule mining (CHARM), Lasso multivariate regression, and dynamic Bayesian networks to find relationships within a complex system, and explored means with which to obtain a consensus result from the application of such varied methodologies. Using this fusion of approaches, we identified relationships among climate factors that modulate Sahel rainfall. These relationships fall into two categories: well-known associations from prior climate knowledge, such as the relationship with the El Niño-Southern Oscillation (ENSO) and putative links, such as North Atlantic Oscillation, that invite further research.

  19. Performance of a data-driven technique to changes in wave height and its effect on beach response

    Directory of Open Access Journals (Sweden)

    Jose M. Horrillo-Caraballo

    2016-01-01

    Full Text Available In this study the medium-term response of beach profiles was investigated at two sites: a gently sloping sandy beach and a steeper mixed sand and gravel beach. The former is the Duck site in North Carolina, on the east coast of the USA, which is exposed to Atlantic Ocean swells and storm waves, and the latter is the Milford-on-Sea site at Christchurch Bay, on the south coast of England, which is partially sheltered from Atlantic swells but has a directionally bimodal wave exposure. The data sets comprise detailed bathymetric surveys of beach profiles covering a period of more than 25 years for the Duck site and over 18 years for the Milford-on-Sea site. The structure of the data sets and the data-driven methods are described. Canonical correlation analysis (CCA was used to find linkages between the wave characteristics and beach profiles. The sensitivity of the linkages was investigated by deploying a wave height threshold to filter out the smaller waves incrementally. The results of the analysis indicate that, for the gently sloping sandy beach, waves of all heights are important to the morphological response. For the mixed sand and gravel beach, filtering the smaller waves improves the statistical fit and it suggests that low-height waves do not play a primary role in the medium-term morphological response, which is primarily driven by the intermittent larger storm waves.

  20. Performance of a data-driven technique applied to changes in wave height and its effect on beach response

    Directory of Open Access Journals (Sweden)

    José M. Horrillo-Caraballo

    2016-01-01

    Full Text Available In this study the medium-term response of beach profiles was investigated at two sites: a gently sloping sandy beach and a steeper mixed sand and gravel beach. The former is the Duck site in North Carolina, on the east coast of the USA, which is exposed to Atlantic Ocean swells and storm waves, and the latter is the Milford-on-Sea site at Christchurch Bay, on the south coast of England, which is partially sheltered from Atlantic swells but has a directionally bimodal wave exposure. The data sets comprise detailed bathymetric surveys of beach profiles covering a period of more than 25 years for the Duck site and over 18 years for the Milford-on-Sea site. The structure of the data sets and the data-driven methods are described. Canonical correlation analysis (CCA was used to find linkages between the wave characteristics and beach profiles. The sensitivity of the linkages was investigated by deploying a wave height threshold to filter out the smaller waves incrementally. The results of the analysis indicate that, for the gently sloping sandy beach, waves of all heights are important to the morphological response. For the mixed sand and gravel beach, filtering the smaller waves improves the statistical fit and it suggests that low-height waves do not play a primary role in the medium-term morphological response, which is primarily driven by the intermittent larger storm waves.

  1. Geoscience Meets Social Science: A Flexible Data Driven Approach for Developing High Resolution Population Datasets at Global Scale

    Science.gov (United States)

    Rose, A.; McKee, J.; Weber, E.; Bhaduri, B. L.

    2017-12-01

    Leveraging decades of expertise in population modeling, and in response to growing demand for higher resolution population data, Oak Ridge National Laboratory is now generating LandScan HD at global scale. LandScan HD is conceived as a 90m resolution population distribution where modeling is tailored to the unique geography and data conditions of individual countries or regions by combining social, cultural, physiographic, and other information with novel geocomputation methods. Similarities among these areas are exploited in order to leverage existing training data and machine learning algorithms to rapidly scale development. Drawing on ORNL's unique set of capabilities, LandScan HD adapts highly mature population modeling methods developed for LandScan Global and LandScan USA, settlement mapping research and production in high-performance computing (HPC) environments, land use and neighborhood mapping through image segmentation, and facility-specific population density models. Adopting a flexible methodology to accommodate different geographic areas, LandScan HD accounts for the availability, completeness, and level of detail of relevant ancillary data. Beyond core population and mapped settlement inputs, these factors determine the model complexity for an area, requiring that for any given area, a data-driven model could support either a simple top-down approach, a more detailed bottom-up approach, or a hybrid approach.

  2. Data-driven adaptive fractional order PI control for PMSM servo system with measurement noise and data dropouts.

    Science.gov (United States)

    Xie, Yuanlong; Tang, Xiaoqi; Song, Bao; Zhou, Xiangdong; Guo, Yixuan

    2018-04-01

    In this paper, data-driven adaptive fractional order proportional integral (AFOPI) control is presented for permanent magnet synchronous motor (PMSM) servo system perturbed by measurement noise and data dropouts. The proposed method directly exploits the closed-loop process data for the AFOPI controller design under unknown noise distribution and data missing probability. Firstly, the proposed method constructs the AFOPI controller tuning problem as a parameter identification problem using the modified l p norm virtual reference feedback tuning (VRFT). Then, iteratively reweighted least squares is integrated into the l p norm VRFT to give a consistent compensation solution for the AFOPI controller. The measurement noise and data dropouts are estimated and eliminated by feedback compensation periodically, so that the AFOPI controller is updated online to accommodate the time-varying operating conditions. Moreover, the convergence and stability are guaranteed by mathematical analysis. Finally, the effectiveness of the proposed method is demonstrated both on simulations and experiments implemented on a practical PMSM servo system. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.

  3. An optimal baseline selection methodology for data-driven damage detection and temperature compensation in acousto-ultrasonics

    International Nuclear Information System (INIS)

    Torres-Arredondo, M-A; Sierra-Pérez, Julián; Cabanes, Guénaël

    2016-01-01

    The process of measuring and analysing the data from a distributed sensor network all over a structural system in order to quantify its condition is known as structural health monitoring (SHM). For the design of a trustworthy health monitoring system, a vast amount of information regarding the inherent physical characteristics of the sources and their propagation and interaction across the structure is crucial. Moreover, any SHM system which is expected to transition to field operation must take into account the influence of environmental and operational changes which cause modifications in the stiffness and damping of the structure and consequently modify its dynamic behaviour. On that account, special attention is paid in this paper to the development of an efficient SHM methodology where robust signal processing and pattern recognition techniques are integrated for the correct interpretation of complex ultrasonic waves within the context of damage detection and identification. The methodology is based on an acousto-ultrasonics technique where the discrete wavelet transform is evaluated for feature extraction and selection, linear principal component analysis for data-driven modelling and self-organising maps for a two-level clustering under the principle of local density. At the end, the methodology is experimentally demonstrated and results show that all the damages were detectable and identifiable. (paper)

  4. An optimal baseline selection methodology for data-driven damage detection and temperature compensation in acousto-ultrasonics

    Science.gov (United States)

    Torres-Arredondo, M.-A.; Sierra-Pérez, Julián; Cabanes, Guénaël

    2016-05-01

    The process of measuring and analysing the data from a distributed sensor network all over a structural system in order to quantify its condition is known as structural health monitoring (SHM). For the design of a trustworthy health monitoring system, a vast amount of information regarding the inherent physical characteristics of the sources and their propagation and interaction across the structure is crucial. Moreover, any SHM system which is expected to transition to field operation must take into account the influence of environmental and operational changes which cause modifications in the stiffness and damping of the structure and consequently modify its dynamic behaviour. On that account, special attention is paid in this paper to the development of an efficient SHM methodology where robust signal processing and pattern recognition techniques are integrated for the correct interpretation of complex ultrasonic waves within the context of damage detection and identification. The methodology is based on an acousto-ultrasonics technique where the discrete wavelet transform is evaluated for feature extraction and selection, linear principal component analysis for data-driven modelling and self-organising maps for a two-level clustering under the principle of local density. At the end, the methodology is experimentally demonstrated and results show that all the damages were detectable and identifiable.

  5. A perspective on bridging scales and design of models using low-dimensional manifolds and data-driven model inference

    KAUST Repository

    Tegner, Jesper

    2016-10-04

    Systems in nature capable of collective behaviour are nonlinear, operating across several scales. Yet our ability to account for their collective dynamics differs in physics, chemistry and biology. Here, we briefly review the similarities and differences between mathematical modelling of adaptive living systems versus physico-chemical systems. We find that physics-based chemistry modelling and computational neuroscience have a shared interest in developing techniques for model reductions aiming at the identification of a reduced subsystem or slow manifold, capturing the effective dynamics. By contrast, as relations and kinetics between biological molecules are less characterized, current quantitative analysis under the umbrella of bioinformatics focuses on signal extraction, correlation, regression and machine-learning analysis. We argue that model reduction analysis and the ensuing identification of manifolds bridges physics and biology. Furthermore, modelling living systems presents deep challenges as how to reconcile rich molecular data with inherent modelling uncertainties (formalism, variables selection and model parameters). We anticipate a new generative data-driven modelling paradigm constrained by identified governing principles extracted from low-dimensional manifold analysis. The rise of a new generation of models will ultimately connect biology to quantitative mechanistic descriptions, thereby setting the stage for investigating the character of the model language and principles driving living systems.

  6. Improved multi-stage neonatal seizure detection using a heuristic classifier and a data-driven post-processor.

    Science.gov (United States)

    Ansari, A H; Cherian, P J; Dereymaeker, A; Matic, V; Jansen, K; De Wispelaere, L; Dielman, C; Vervisch, J; Swarte, R M; Govaert, P; Naulaers, G; De Vos, M; Van Huffel, S

    2016-09-01

    After identifying the most seizure-relevant characteristics by a previously developed heuristic classifier, a data-driven post-processor using a novel set of features is applied to improve the performance. The main characteristics of the outputs of the heuristic algorithm are extracted by five sets of features including synchronization, evolution, retention, segment, and signal features. Then, a support vector machine and a decision making layer remove the falsely detected segments. Four datasets including 71 neonates (1023h, 3493 seizures) recorded in two different university hospitals, are used to train and test the algorithm without removing the dubious seizures. The heuristic method resulted in a false alarm rate of 3.81 per hour and good detection rate of 88% on the entire test databases. The post-processor, effectively reduces the false alarm rate by 34% while the good detection rate decreases by 2%. This post-processing technique improves the performance of the heuristic algorithm. The structure of this post-processor is generic, improves our understanding of the core visually determined EEG features of neonatal seizures and is applicable for other neonatal seizure detectors. The post-processor significantly decreases the false alarm rate at the expense of a small reduction of the good detection rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  7. A New Application of Dynamic Data Driven System in the Talbot-Ogden Model for Groundwater Infiltration

    KAUST Repository

    Yu, Han

    2012-06-02

    The TalbotOgden model is a mass conservative method to simulate flow of a wetting liquid in variably-saturated porous media. The principal feature of this model is the discretization of the moisture content domain into bins. This paper gives an analysis of the relationship between the number of bins and the computed flux. Under the circumstances of discrete bins and discontinuous wetting fronts, we show that fluxes increase with the number of bins. We then apply this analysis to the continuous case and get an upper bound of the difference of infiltration rates when the number of bins tends to infinity. We also extend this model by creating a two dimensional moisture content domain so that there exists a probability distribution of the moisture content for different soil systems. With these theoretical and experimental results and using a Dynamic Data Driven Application System (DDDAS), sensors can be put in soils to detect the infiltration fluxes, which are important to compute the proper number of bins for a specific soil system and predict fluxes. Using this feedback control loop, the extended TalbotOgden model can be made more efficient for estimating infiltration into soils.

  8. Production of nattokinase by batch and fed-batch culture of Bacillus subtilis.

    Science.gov (United States)

    Cho, Young-Han; Song, Jae Yong; Kim, Kyung Mi; Kim, Mi Kyoung; Lee, In Young; Kim, Sang Bum; Kim, Hyeon Shup; Han, Nam Soo; Lee, Bong Hee; Kim, Beom Soo

    2010-09-30

    Nattokinase was produced by batch and fed-batch culture of Bacillus subtilis in flask and fermentor. Effect of supplementing complex media (peptone, yeast extract, or tryptone) was investigated on the production of nattokinase. In flask culture, the highest cell growth and nattokinase activity were obtained with 50 g/L of peptone supplementation. In this condition, nattokinase activity was 630 unit/ml at 12 h. In batch culture of B. subtilis in fermentor, the highest nattokinase activity of 3400 unit/ml was obtained at 10h with 50 g/L of peptone supplementation. From the batch kinetics data, it was shown that nattokinase production was growth-associated and culture should be harvested before stationary phase for maximum nattokinase production. In fed-batch culture of B. subtilis using pH-stat feeding strategy, cell growth (optical density monitored at 600 nm) increased to ca. 100 at 22 h, which was 2.5 times higher than that in batch culture. The highest nattokinase activity was 7100 unit/ml at 19 h, which was also 2.1 times higher than that in batch culture. Copyright 2010 Elsevier B.V. All rights reserved.

  9. Systematic Methodology for Reproducible Optimizing Batch Operation

    DEFF Research Database (Denmark)

    Bonné, Dennis; Jørgensen, Sten Bay

    2006-01-01

    This contribution presents a systematic methodology for rapid acquirement of discrete-time state space model representations of batch processes based on their historical operation data. These state space models are parsimoniously parameterized as a set of local, interdependent models. The present...

  10. Batch extractive distillation for high purity methanol

    International Nuclear Information System (INIS)

    Zhang Weijiang; Ma Sisi

    2006-01-01

    In this paper, the application in chemical industry and microelectronic industry, market status and the present situation of production of high purity methanol at home and abroad were introduced firstly. Purification of industrial methanol for high purity methanol is feasible in china. Batch extractive distillation is the best separation technique for purification of industrial methanol. Dimethyl sulfoxide was better as an extractant. (authors)

  11. Monitoring of batch processes using spectroscopy

    NARCIS (Netherlands)

    Gurden, S. P.; Westerhuis, J. A.; Smilde, A. K.

    2002-01-01

    There is an increasing need for new techniques for the understanding, monitoring and the control of batch processes. Spectroscopy is now becoming established as a means of obtaining real-time, high-quality chemical information at frequent time intervals and across a wide range of industrial

  12. Data-Driven Learning and Awareness-Raising: An Effective Tandem to Improve Grammar in Written Composition?

    Directory of Open Access Journals (Sweden)

    María Luisa Pérez Cañado

    2006-09-01

    Full Text Available The present paper, framed within the ECTS scheme currently being piloted at the University of Jaén, reports on a study carried out in the second semester of the academic year 2004-5 with English Philology freshmen at this University. One of its aims, described in an initial section of the paper, was to determine whether the use of Computer Assisted Language Learning (CALL, and Data-Driven Learning (DDL, could help raise awareness of and thus remediate the grammar weaknesses of such pupils under four categories (articles, verb tenses, verbal complementation, and prepositions. The procedure, outlined subsequently, involved using DDL to raise awareness of the main grammar mistakes in these headings, which had been previously identified in first year students’ production through the use of an UCLEE-error-tagged written learner corpus. Two one-hour seminars were employed weekly, each one with a group of 40 students, to raise awareness of these mistakes with the help of web-based resources. Four were the steps undertaken: initial attempts on the part of the students to identify the mistakes in the seven headings; a session provided by the authors on CALL as a means to raise awareness of, identify, and solve written mistakes; use of these electronic resources to contrast their initial error identification; and explicit correction of the mistakes in each category. The results and implications, discussed in a final section, highlight that DDL and awareness-raising – albeit in some categories more than in others – indeed constitute an effective tandem when it comes to improving grammatical aspects in written composition at University level.

  13. Retrospective cost adaptive Reynolds-averaged Navier-Stokes k-ω model for data-driven unsteady turbulent simulations

    Science.gov (United States)

    Li, Zhiyong; Hoagg, Jesse B.; Martin, Alexandre; Bailey, Sean C. C.

    2018-03-01

    This paper presents a data-driven computational model for simulating unsteady turbulent flows, where sparse measurement data is available. The model uses the retrospective cost adaptation (RCA) algorithm to automatically adjust the closure coefficients of the Reynolds-averaged Navier-Stokes (RANS) k- ω turbulence equations to improve agreement between the simulated flow and the measurements. The RCA-RANS k- ω model is verified for steady flow using a pipe-flow test case and for unsteady flow using a surface-mounted-cube test case. Measurements used for adaptation of the verification cases are obtained from baseline simulations with known closure coefficients. These verification test cases demonstrate that the RCA-RANS k- ω model can successfully adapt the closure coefficients to improve agreement between the simulated flow field and a set of sparse flow-field measurements. Furthermore, the RCA-RANS k- ω model improves agreement between the simulated flow and the baseline flow at locations at which measurements do not exist. The RCA-RANS k- ω model is also validated with experimental data from 2 test cases: steady pipe flow, and unsteady flow past a square cylinder. In both test cases, the adaptation improves agreement with experimental data in comparison to the results from a non-adaptive RANS k- ω model that uses the standard values of the k- ω closure coefficients. For the steady pipe flow, adaptation is driven by mean stream-wise velocity measurements at 24 locations along the pipe radius. The RCA-RANS k- ω model reduces the average velocity error at these locations by over 35%. For the unsteady flow over a square cylinder, adaptation is driven by time-varying surface pressure measurements at 2 locations on the square cylinder. The RCA-RANS k- ω model reduces the average surface-pressure error at these locations by 88.8%.

  14. Proactive monitoring of a wind turbine array with lidar measurements, SCADA data and a data-driven RANS solver

    Science.gov (United States)

    Iungo, G.; Said, E. A.; Santhanagopalan, V.; Zhan, L.

    2016-12-01

    Power production of a wind farm and durability of wind turbines are strongly dependent on non-linear wake interactions occurring within a turbine array. Wake dynamics are highly affected by the specific site conditions, such as topography and local atmospheric conditions. Furthermore, contingencies through the life of a wind farm, such as turbine ageing and off-design operations, make prediction of wake interactions and power performance a great challenge in wind energy. In this work, operations of an onshore wind turbine array were monitored through lidar measurements, SCADA and met-tower data. The atmospheric wind field investing the wind farm was estimated by using synergistically the available data through five different methods, which are characterized by different confidence levels. By combining SCADA data and the lidar measurements, it was possible to estimate power losses connected with wake interactions. For this specific array, power losses were estimated to be 4% and 2% of the total power production for stable and convective atmospheric regimes, respectively. The entire dataset was then leveraged for the calibration of a data-driven RANS (DDRANS) solver for prediction of wind turbine wakes and power production. The DDRANS is based on a parabolic formulation of the Navier-Stokes equations with axisymmetry and boundary layer approximations, which allow achieving very low computational costs. Accuracy in prediction of wind turbine wakes and power production is achieved through an optimal tuning of the turbulence closure model. The latter is based on a mixing length model, which was developed based on previous wind turbine wake studies carried out through large eddy simulations and wind tunnel experiments. Several operative conditions of the wind farm under examination were reproduced through DDRANS for different stability regimes, wind directions and wind velocity. The results show that DDRANS is capable of achieving a good level of accuracy in prediction

  15. Data-Driven Approaches for Computation in Intelligent Biomedical Devices: A Case Study of EEG Monitoring for Chronic Seizure Detection

    Directory of Open Access Journals (Sweden)

    Naveen Verma

    2011-04-01

    Full Text Available Intelligent biomedical devices implies systems that are able to detect specific physiological processes in patients so that particular responses can be generated. This closed-loop capability can have enormous clinical value when we consider the unprecedented modalities that are beginning to emerge for sensing and stimulating patient physiology. Both delivering therapy (e.g., deep-brain stimulation, vagus nerve stimulation, etc. and treating impairments (e.g., neural prosthesis requires computational devices that can make clinically relevant inferences, especially using minimally-intrusive patient signals. The key to such devices is algorithms that are based on data-driven signal modeling as well as hardware structures that are specialized to these. This paper discusses the primary application-domain challenges that must be overcome and analyzes the most promising methods for this that are emerging. We then look at how these methods are being incorporated in ultra-low-energy computational platforms and systems. The case study for this is a seizure-detection SoC that includes instrumentation and computation blocks in support of a system that exploits patient-specific modeling to achieve accurate performance for chronic detection. The SoC samples each EEG channel at a rate of 600 Hz and performs processing to derive signal features on every two second epoch, consuming 9 μJ/epoch/channel. Signal feature extraction reduces the data rate by a factor of over 40×, permitting wireless communication from the patient’s head while reducing the total power on the head by 14×.

  16. A data-driven multi-model methodology with deep feature selection for short-term wind forecasting

    International Nuclear Information System (INIS)

    Feng, Cong; Cui, Mingjian; Hodge, Bri-Mathias; Zhang, Jie

    2017-01-01

    Highlights: • An ensemble model is developed to produce both deterministic and probabilistic wind forecasts. • A deep feature selection framework is developed to optimally determine the inputs to the forecasting methodology. • The developed ensemble methodology has improved the forecasting accuracy by up to 30%. - Abstract: With the growing wind penetration into the power system worldwide, improving wind power forecasting accuracy is becoming increasingly important to ensure continued economic and reliable power system operations. In this paper, a data-driven multi-model wind forecasting methodology is developed with a two-layer ensemble machine learning technique. The first layer is composed of multiple machine learning models that generate individual forecasts. A deep feature selection framework is developed to determine the most suitable inputs to the first layer machine learning models. Then, a blending algorithm is applied in the second layer to create an ensemble of the forecasts produced by first layer models and generate both deterministic and probabilistic forecasts. This two-layer model seeks to utilize the statistically different characteristics of each machine learning algorithm. A number of machine learning algorithms are selected and compared in both layers. This developed multi-model wind forecasting methodology is compared to several benchmarks. The effectiveness of the proposed methodology is evaluated to provide 1-hour-ahead wind speed forecasting at seven locations of the Surface Radiation network. Numerical results show that comparing to the single-algorithm models, the developed multi-model framework with deep feature selection procedure has improved the forecasting accuracy by up to 30%.

  17. Abnormal Resting-State Functional Connectivity in Patients with Chronic Fatigue Syndrome: Results of Seed and Data-Driven Analyses.

    Science.gov (United States)

    Gay, Charles W; Robinson, Michael E; Lai, Song; O'Shea, Andrew; Craggs, Jason G; Price, Donald D; Staud, Roland

    2016-02-01

    Although altered resting-state functional connectivity (FC) is a characteristic of many chronic pain conditions, it has not yet been evaluated in patients with chronic fatigue. Our objective was to investigate the association between fatigue and altered resting-state FC in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). Thirty-six female subjects, 19 ME/CFS and 17 healthy controls, completed a fatigue inventory before undergoing functional magnetic resonance imaging. Two methods, (1) data driven and (2) model based, were used to estimate and compare the intraregional FC between both groups during the resting state (RS). The first approach using independent component analysis was applied to investigate five RS networks: the default mode network, salience network (SN), left frontoparietal networks (LFPN) and right frontoparietal networks, and the sensory motor network (SMN). The second approach used a priori selected seed regions demonstrating abnormal regional cerebral blood flow (rCBF) in ME/CFS patients at rest. In ME/CFS patients, Method-1 identified decreased intrinsic connectivity among regions within the LFPN. Furthermore, the FC of the left anterior midcingulate with the SMN and the connectivity of the left posterior cingulate cortex with the SN were significantly decreased. For Method-2, five distinct clusters within the right parahippocampus and occipital lobes, demonstrating significant rCBF reductions in ME/CFS patients, were used as seeds. The parahippocampal seed and three occipital lobe seeds showed altered FC with other brain regions. The degree of abnormal connectivity correlated with the level of self-reported fatigue. Our results confirm altered RS FC in patients with ME/CFS, which was significantly correlated with the severity of their chronic fatigue.

  18. RWater - A Novel Cyber-enabled Data-driven Educational Tool for Interpreting and Modeling Hydrologic Processes

    Science.gov (United States)

    Rajib, M. A.; Merwade, V.; Zhao, L.; Song, C.

    2014-12-01

    Explaining the complex cause-and-effect relationships in hydrologic cycle can often be challenging in a classroom with the use of traditional teaching approaches. With the availability of observed rainfall, streamflow and other hydrology data on the internet, it is possible to provide the necessary tools to students to explore these relationships and enhance their learning experience. From this perspective, a new online educational tool, called RWater, is developed using Purdue University's HUBzero technology. RWater's unique features include: (i) its accessibility including the R software from any java supported web browser; (ii) no installation of any software on user's computer; (iii) all the work and resulting data are stored in user's working directory on RWater server; and (iv) no prior programming experience with R software is necessary. In its current version, RWater can dynamically extract streamflow data from any USGS gaging station without any need for post-processing for use in the educational modules. By following data-driven modules, students can write small scripts in R and thereby create visualizations to identify the effect of rainfall distribution and watershed characteristics on runoff generation, investigate the impacts of landuse and climate change on streamflow, and explore the changes in extreme hydrologic events in actual locations. Each module contains relevant definitions, instructions on data extraction and coding, as well as conceptual questions based on the possible analyses which the students would perform. In order to assess its suitability in classroom implementation, and to evaluate users' perception over its utility, the current version of RWater has been tested with three different groups: (i) high school students, (ii) middle and high school teachers; and (iii) upper undergraduate/graduate students. The survey results from these trials suggest that the RWater has potential to improve students' understanding on various

  19. Data-driven modeling of sleep EEG and EOG reveals characteristics indicative of pre-Parkinson's and Parkinson's disease.

    Science.gov (United States)

    Christensen, Julie A E; Zoetmulder, Marielle; Koch, Henriette; Frandsen, Rune; Arvastson, Lars; Christensen, Søren R; Jennum, Poul; Sorensen, Helge B D

    2014-09-30

    Manual scoring of sleep relies on identifying certain characteristics in polysomnograph (PSG) signals. However, these characteristics are disrupted in patients with neurodegenerative diseases. This study evaluates sleep using a topic modeling and unsupervised learning approach to identify sleep topics directly from electroencephalography (EEG) and electrooculography (EOG). PSG data from control subjects were used to develop an EOG and an EEG topic model. The models were applied to PSG data from 23 control subjects, 25 patients with periodic leg movements (PLMs), 31 patients with idiopathic REM sleep behavior disorder (iRBD) and 36 patients with Parkinson's disease (PD). The data were divided into training and validation datasets and features reflecting EEG and EOG characteristics based on topics were computed. The most discriminative feature subset for separating iRBD/PD and PLM/controls was estimated using a Lasso-regularized regression model. The features with highest discriminability were the number and stability of EEG topics linked to REM and N3, respectively. Validation of the model indicated a sensitivity of 91.4% and a specificity of 68.8% when classifying iRBD/PD patients. The topics showed visual accordance with the manually scored sleep stages, and the features revealed sleep characteristics containing information indicative of neurodegeneration. This study suggests that the amount of N3 and the ability to maintain NREM and REM sleep have potential as early PD biomarkers. Data-driven analysis of sleep may contribute to the evaluation of neurodegenerative patients. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. A data-driven modeling approach to identify disease-specific multi-organ networks driving physiological dysregulation.

    Directory of Open Access Journals (Sweden)

    Warren D Anderson

    2017-07-01

    Full Text Available Multiple physiological systems interact throughout the development of a complex disease. Knowledge of the dynamics and connectivity of interactions across physiological systems could facilitate the prevention or mitigation of organ damage underlying complex diseases, many of which are currently refractory to available therapeutics (e.g., hypertension. We studied the regulatory interactions operating within and across organs throughout disease development by integrating in vivo analysis of gene expression dynamics with a reverse engineering approach to infer data-driven dynamic network models of multi-organ gene regulatory influences. We obtained experimental data on the expression of 22 genes across five organs, over a time span that encompassed the development of autonomic nervous system dysfunction and hypertension. We pursued a unique approach for identification of continuous-time models that jointly described the dynamics and structure of multi-organ networks by estimating a sparse subset of ∼12,000 possible gene regulatory interactions. Our analyses revealed that an autonomic dysfunction-specific multi-organ sequence of gene expression activation patterns was associated with a distinct gene regulatory network. We analyzed the model structures for adaptation motifs, and identified disease-specific network motifs involving genes that exhibited aberrant temporal dynamics. Bioinformatic analyses identified disease-specific single nucleotide variants within or near transcription factor binding sites upstream of key genes implicated in maintaining physiological homeostasis. Our approach illustrates a novel framework for investigating the pathogenesis through model-based analysis of multi-organ system dynamics and network properties. Our results yielded novel candidate molecular targets driving the development of cardiovascular disease, metabolic syndrome, and immune dysfunction.

  1. DATA-DRIVEN RADIATIVE HYDRODYNAMIC MODELING OF THE 2014 MARCH 29 X1.0 SOLAR FLARE

    Energy Technology Data Exchange (ETDEWEB)

    Costa, Fatima Rubio da; Petrosian, Vahé [Department of Physics, Stanford University, Stanford, CA 94305 (United States); Kleint, Lucia [University of Applied Sciences and Arts Northwestern Switzerland, 5210 Windisch (Switzerland); Liu, Wei [Bay Area Environmental Research Institute, 625 2nd Street, Suite 209, Petaluma, CA 94952-5159 (United States); Allred, Joel C., E-mail: frubio@stanford.edu [NASA/Goddard Space Flight Center, Code 671, Greenbelt, MD 20771 (United States)

    2016-08-10

    Spectroscopic observations of solar flares provide critical diagnostics of the physical conditions in the flaring atmosphere. Some key features in observed spectra have not yet been accounted for in existing flare models. Here we report a data-driven simulation of the well-observed X1.0 flare on 2014 March 29 that can reconcile some well-known spectral discrepancies. We analyzed spectra of the flaring region from the Interface Region Imaging Spectrograph ( IRIS ) in Mg ii h and k, the Interferometric BIdimensional Spectropolarimeter at the Dunn Solar Telescope (DST/IBIS) in H α 6563 Å and Ca ii 8542 Å, and the Reuven Ramaty High Energy Solar Spectroscope Imager ( RHESSI ) in hard X-rays. We constructed a multithreaded flare loop model and used the electron flux inferred from RHESSI data as the input to the radiative hydrodynamic code RADYN to simulate the atmospheric response. We then synthesized various chromospheric emission lines and compared them with the IRIS and IBIS observations. In general, the synthetic intensities agree with the observed ones, especially near the northern footpoint of the flare. The simulated Mg ii line profile has narrower wings than the observed one. This discrepancy can be reduced by using a higher microturbulent velocity (27 km s{sup −1}) in a narrow chromospheric layer. In addition, we found that an increase of electron density in the upper chromosphere within a narrow height range of ≈800 km below the transition region can turn the simulated Mg ii line core into emission and thus reproduce the single peaked profile, which is a common feature in all IRIS flares.

  2. Data-Driven Derivation of an "Informer Compound Set" for Improved Selection of Active Compounds in High-Throughput Screening.

    Science.gov (United States)

    Paricharak, Shardul; IJzerman, Adriaan P; Jenkins, Jeremy L; Bender, Andreas; Nigsch, Florian

    2016-09-26

    Despite the usefulness of high-throughput screening (HTS) in drug discovery, for some systems, low assay throughput or high screening cost can prohibit the screening of large numbers of compounds. In such cases, iterative cycles of screening involving active learning (AL) are employed, creating the need for smaller "informer sets" that can be routinely screened to build predictive models for selecting compounds from the screening collection for follow-up screens. Here, we present a data-driven derivation of an informer compound set with improved predictivity of active compounds in HTS, and we validate its benefit over randomly selected training sets on 46 PubChem assays comprising at least 300,000 compounds and covering a wide range of assay biology. The informer compound set showed improvement in BEDROC(α = 100), PRAUC, and ROCAUC values averaged over all assays of 0.024, 0.014, and 0.016, respectively, compared to randomly selected training sets, all with paired t-test p-values agnostic fashion. This approach led to a consistent improvement in hit rates in follow-up screens without compromising scaffold retrieval. The informer set is adjustable in size depending on the number of compounds one intends to screen, as performance gains are realized for sets with more than 3,000 compounds, and this set is therefore applicable to a variety of situations. Finally, our results indicate that random sampling may not adequately cover descriptor space, drawing attention to the importance of the composition of the training set for predicting actives.

  3. Mapping of Agricultural Crops from Single High-Resolution Multispectral Images—Data-Driven Smoothing vs. Parcel-Based Smoothing

    Directory of Open Access Journals (Sweden)

    Asli Ozdarici-Ok

    2015-05-01

    Full Text Available Mapping agricultural crops is an important application of remote sensing. However, in many cases it is based either on hyperspectral imagery or on multitemporal coverage, both of which are difficult to scale up to large-scale deployment at high spatial resolution. In the present paper, we evaluate the possibility of crop classification based on single images from very high-resolution (VHR satellite sensors. The main objective of this work is to expose performance difference between state-of-the-art parcel-based smoothing and purely data-driven conditional random field (CRF smoothing, which is yet unknown. To fulfill this objective, we perform extensive tests with four different classification methods (Support Vector Machines, Random Forest, Gaussian Mixtures, and Maximum Likelihood to compute the pixel-wise data term; and we also test two different definitions of the pairwise smoothness term. We have performed a detailed evaluation on different multispectral VHR images (Ikonos, QuickBird, Kompsat-2. The main finding of this study is that pairwise CRF smoothing comes close to the state-of-the-art parcel-based method that requires parcel boundaries (average difference ≈ 2.5%. Our results indicate that a single multispectral (R, G, B, NIR image is enough to reach satisfactory classification accuracy for six crop classes (corn, pasture, rice, sugar beet, wheat, and tomato in Mediterranean climate. Overall, it appears that crop mapping using only one-shot VHR imagery taken at the right time may be a viable alternative, especially since high-resolution multitemporal or hyperspectral coverage as well as parcel boundaries are in practice often not available.

  4. Monte Carlo simulation on kinetics of batch and semi-batch free radical polymerization

    KAUST Repository

    Shao, Jing

    2015-10-27

    Based on Monte Carlo simulation technology, we proposed a hybrid routine which combines reaction mechanism together with coarse-grained molecular simulation to study the kinetics of free radical polymerization. By comparing with previous experimental and simulation studies, we showed the capability of our Monte Carlo scheme on representing polymerization kinetics in batch and semi-batch processes. Various kinetics information, such as instant monomer conversion, molecular weight, and polydispersity etc. are readily calculated from Monte Carlo simulation. The kinetic constants such as polymerization rate k p is determined in the simulation without of “steady-state” hypothesis. We explored the mechanism for the variation of polymerization kinetics those observed in previous studies, as well as polymerization-induced phase separation. Our Monte Carlo simulation scheme is versatile on studying polymerization kinetics in batch and semi-batch processes.

  5. Consuming America : A Data-Driven Analysis of the United States as a Reference Culture in Dutch Public Discourse on Consumer Goods, 1890-1990

    NARCIS (Netherlands)

    Wevers, M.J.H.F.

    2017-01-01

    Consuming America offers a data-driven, longitudinal analysis of the historical dynamics that have underpinned a long-term, layered cultural-historical process: the emergence of the United States as a dominant reference culture in Dutch public discourse on consumer goods between 1890 and 1990. The

  6. Multi-objective optimization of glycopeptide antibiotic production in batch and fed batch processes

    DEFF Research Database (Denmark)

    Maiti, Soumen K.; Eliasson Lantz, Anna; Bhushan, Mani

    2011-01-01

    batch operations using process model for Amycolatopsis balhimycina, a glycopeptide antibiotic producer. This resulted in a set of several pareto optimal solutions with the two objectives ranging from (0.75gl−1, 3.97g$-1) to (0.44gl−1, 5.19g$-1) for batch and from (1.5gl−1, 5.46g$-1) to (1.1gl−1, 6.34g...

  7. Medication waste reduction in pediatric pharmacy batch processes.

    Science.gov (United States)

    Toerper, Matthew F; Veltri, Michael A; Hamrock, Eric; Mollenkopf, Nicole L; Holt, Kristen; Levin, Scott

    2014-04-01

    To inform pediatric cart-fill batch scheduling for reductions in pharmaceutical waste using a case study and simulation analysis. A pre and post intervention and simulation analysis was conducted during 3 months at a 205-bed children's center. An algorithm was developed to detect wasted medication based on time-stamped computerized provider order entry information. The algorithm was used to quantify pharmaceutical waste and associated costs for both preintervention (1 batch per day) and postintervention (3 batches per day) schedules. Further, simulation was used to systematically test 108 batch schedules outlining general characteristics that have an impact on the likelihood for waste. Switching from a 1-batch-per-day to a 3-batch-per-day schedule resulted in a 31.3% decrease in pharmaceutical waste (28.7% to 19.7%) and annual cost savings of $183,380. Simulation results demonstrate how increasing batch frequency facilitates a more just-in-time process that reduces waste. The most substantial gains are realized by shifting from a schedule of 1 batch per day to at least 2 batches per day. The simulation exhibits how waste reduction is also achievable by avoiding batch preparation during daily time periods where medication administration or medication discontinuations are frequent. Last, the simulation was used to show how reducing batch preparation time per batch provides some, albeit minimal, opportunity to decrease waste. The case study and simulation analysis demonstrate characteristics of batch scheduling that may support pediatric pharmacy managers in redesign toward minimizing pharmaceutical waste.

  8. Optimal operation of batch membrane processes

    CERN Document Server

    Paulen, Radoslav

    2016-01-01

    This study concentrates on a general optimization of a particular class of membrane separation processes: those involving batch diafiltration. Existing practices are explained and operational improvements based on optimal control theory are suggested. The first part of the book introduces the theory of membrane processes, optimal control and dynamic optimization. Separation problems are defined and mathematical models of batch membrane processes derived. The control theory focuses on problems of dynamic optimization from a chemical-engineering point of view. Analytical and numerical methods that can be exploited to treat problems of optimal control for membrane processes are described. The second part of the text builds on this theoretical basis to establish solutions for membrane models of increasing complexity. Each chapter starts with a derivation of optimal operation and continues with case studies exemplifying various aspects of the control problems under consideration. The authors work their way from th...

  9. An architecture for a continuous, user-driven, and data-driven application of clinical guidelines and its evaluation.

    Science.gov (United States)

    Shalom, Erez; Shahar, Yuval; Lunenfeld, Eitan

    2016-02-01

    Design, implement, and evaluate a new architecture for realistic continuous guideline (GL)-based decision support, based on a series of requirements that we have identified, such as support for continuous care, for multiple task types, and for data-driven and user-driven modes. We designed and implemented a new continuous GL-based support architecture, PICARD, which accesses a temporal reasoning engine, and provides several different types of application interfaces. We present the new architecture in detail in the current paper. To evaluate the architecture, we first performed a technical evaluation of the PICARD architecture, using 19 simulated scenarios in the preeclampsia/toxemia domain. We then performed a functional evaluation with the help of two domain experts, by generating patient records that simulate 60 decision points from six clinical guideline-based scenarios, lasting from two days to four weeks. Finally, 36 clinicians made manual decisions in half of the scenarios, and had access to the automated GL-based support in the other half. The measures used in all three experiments were correctness and completeness of the decisions relative to the GL. Mean correctness and completeness in the technical evaluation were 1±0.0 and 0.96±0.03 respectively. The functional evaluation produced only several minor comments from the two experts, mostly regarding the output's style; otherwise the system's recommendations were validated. In the clinically oriented evaluation, the 36 clinicians applied manually approximately 41% of the GL's recommended actions. Completeness increased to approximately 93% when using PICARD. Manual correctness was approximately 94.5%, and remained similar when using PICARD; but while 68% of the manual decisions included correct but redundant actions, only 3% of the actions included in decisions made when using PICARD were redundant. The PICARD architecture is technically feasible and is functionally valid, and addresses the realistic

  10. Batch calculations in CalcHEP

    International Nuclear Information System (INIS)

    Pukhov, A.

    2003-01-01

    CalcHEP is a clone of the CompHEP project which is developed by the author outside of the CompHEP group. CompHEP/CalcHEP are packages for automatic calculations of elementary particle decay and collision properties in the lowest order of perturbation theory. The main idea prescribed into the packages is to make available passing on from the Lagrangian to the final distributions effectively with a high level of automation. According to this, the packages were created as a menu driven user friendly programs for calculations in the interactive mode. From the other side, long-time calculations should be done in the non-interactive regime. Thus, from the beginning CompHEP has a problem of batch calculations. In CompHEP 33.23 the batch session was realized by mean of interactive menu which allows to the user to formulate the task for batch. After that the not-interactive session was launched. This way is too restricted, not flexible, and leads to doubling in programming. In this article I discuss another approach how one can force an interactive program to work in non-interactive mode. This approach was realized in CalcHEP 2.1 disposed on http://theory.sinp.msu.ru/~pukhov/calchep.html

  11. Pollution prevention applications in batch manufacturing operations

    Science.gov (United States)

    Sykes, Derek W.; O'Shaughnessy, James

    2004-02-01

    Older, "low-tech" batch manufacturing operations are often fertile grounds for gains resulting from pollution prevention techniques. This paper presents a pollution prevention technique utilized for wastewater discharge permit compliance purposes at a batch manufacturer of detergents, deodorants, and floor-care products. This manufacturer generated industrial wastewater as a result of equipment rinses required after each product batch changeover. After investing a significant amount of capital on end of pip-line wastewater treatment technology designed to address existing discharge limits, this manufacturer chose to investigate alternate, low-cost approaches to address anticipated new permit limits. Mass balances using spreadsheets and readily available formulation and production data were conducted on over 300 products to determine how each individual product contributed to the total wastewater pollutant load. These mass balances indicated that 22 products accounted for over 55% of the wastewater pollutant. Laboratory tests were conducted to determine whether these same products could accept their individual changeover rinse water as make-up water in formulations without sacrificing product quality. This changeover reuse technique was then implement at the plant scale for selected products. Significant reductions in wastewater volume (25%) and wastewater pollutant loading (85+%) were realized as a direct result of this approach.

  12. Batch-batch stable microbial community in the traditional fermentation process of huyumei broad bean pastes.

    Science.gov (United States)

    Zhu, Linjiang; Fan, Zihao; Kuai, Hui; Li, Qi

    2017-09-01

    During natural fermentation processes, a characteristic microbial community structure (MCS) is naturally formed, and it is interesting to know about its batch-batch stability. This issue was explored in a traditional semi-solid-state fermentation process of huyumei, a Chinese broad bean paste product. The results showed that this MCS mainly contained four aerobic Bacillus species (8 log CFU per g), including B. subtilis, B. amyloliquefaciens, B. methylotrophicus, and B. tequilensis, and the facultative anaerobe B. cereus with a low concentration (4 log CFU per g), besides a very small amount of the yeast Zygosaccharomyces rouxii (2 log CFU per g). The dynamic change of the MCS in the brine fermentation process showed that the abundance of dominant species varied within a small range, and in the beginning of process the growth of lactic acid bacteria was inhibited and Staphylococcus spp. lost its viability. Also, the MCS and its dynamic change were proved to be highly reproducible among seven batches of fermentation. Therefore, the MCS naturally and stably forms between different batches of the traditional semi-solid-state fermentation of huyumei. Revealing microbial community structure and its batch-batch stability is helpful for understanding the mechanisms of community formation and flavour production in a traditional fermentation. This issue in a traditional semi-solid-state fermentation of huyumei broad bean paste was firstly explored. This fermentation process was revealed to be dominated by a high concentration of four aerobic species of Bacillus, a low concentration of B. cereus and a small amount of Zygosaccharomyces rouxii. Lactic acid bacteria and Staphylococcus spp. lost its viability at the beginning of fermentation. Such the community structure was proved to be highly reproducible among seven batches. © 2017 The Society for Applied Microbiology.

  13. On-line Scheduling Of Multi-Server Batch Operations

    NARCIS (Netherlands)

    van der Zee, D.J.; van Harten, A.; Schuur, P.C.

    1999-01-01

    Batching jobs in a manufacturing system is a very common policy in most industries. Main reasons for batching are avoidance of setups and/or facilitation of material handling. Good examples of batch-wise production systems are ovens found in aircraft industry and in semiconductor manufacturing.

  14. On-line scheduling of multi-server batch operations

    NARCIS (Netherlands)

    Zee, Durk Jouke van der; Harten, Aart van; Schuur, Peter

    The batching of jobs in a manufacturing system is a very common policy in many industries. The main reasons for batching are the avoidance of setups and/or facilitation of material handling. Good examples of batch-wise production systems are the ovens that are found in the aircraft industry and in

  15. 7 CFR 58.728 - Cooking the batch.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 3 2010-01-01 2010-01-01 false Cooking the batch. 58.728 Section 58.728 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (Standards... Procedures § 58.728 Cooking the batch. Each batch of cheese within the cooker, including the optional...

  16. 40 CFR 63.1408 - Aggregate batch vent stream provisions.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 11 2010-07-01 2010-07-01 true Aggregate batch vent stream provisions... § 63.1408 Aggregate batch vent stream provisions. (a) Emission standards. Owners or operators of aggregate batch vent streams at a new or existing affected source shall comply with either paragraph (a)(1...

  17. An evaluation of data-driven motion estimation in comparison to the usage of external-surrogates in cardiac SPECT imaging

    International Nuclear Information System (INIS)

    Mukherjee, Joyeeta Mitra; Johnson, Karen L; Pretorius, P Hendrik; King, Michael A; Hutton, Brian F

    2013-01-01

    Motion estimation methods in single photon emission computed tomography (SPECT) can be classified into methods which depend on just the emission data (data-driven), or those that use some other source of information such as an external surrogate. The surrogate-based methods estimate the motion exhibited externally which may not correlate exactly with the movement of organs inside the body. The accuracy of data-driven strategies on the other hand is affected by the type and timing of motion occurrence during acquisition, the source distribution, and various degrading factors such as attenuation, scatter, and system spatial resolution. The goal of this paper is to investigate the performance of two data-driven motion estimation schemes based on the rigid-body registration of projections of motion-transformed source distributions to the acquired projection data for cardiac SPECT studies. Comparison is also made of six intensity based registration metrics to an external surrogate-based method. In the data-driven schemes, a partially reconstructed heart is used as the initial source distribution. The partially-reconstructed heart has inaccuracies due to limited angle artifacts resulting from using only a part of the SPECT projections acquired while the patient maintained the same pose. The performance of different cost functions in quantifying consistency with the SPECT projection data in the data-driven schemes was compared for clinically realistic patient motion occurring as discrete pose changes, one or two times during acquisition. The six intensity-based metrics studied were mean-squared difference, mutual information, normalized mutual information (NMI), pattern intensity (PI), normalized cross-correlation and entropy of the difference. Quantitative and qualitative analysis of the performance is reported using Monte-Carlo simulations of a realistic heart phantom including degradation factors such as attenuation, scatter and system spatial resolution. Further the

  18. Response variation in a batch of TLDS

    International Nuclear Information System (INIS)

    Burrage, J.; Campbell, A.

    2004-01-01

    Full text: At Royal Perth Hospital, LiF thermoluminescent dosimeter rods (TLDs) are handled in batches of 50. Rods in each batch are always annealed together to ensure the same thermal history and an individual batch is used with the same type and energy of radiation. A subset of a batch is used for calibration purposes by exposing them to a range of known doses and their output is used to calculate the dose received by other rods used for a dose measurement. Variation in TLD response is addressed by calculating 95% certainty levels from the calibration rods and applying this to the dose measurement rods. This approach relies on the sensitivity of rods within each batch being similar. This work investigates the validity of this assumption and considers possible benefits of applying individual rod sensitivities. The variation in response of TLD rods was assessed using 25 TLD-100 rods (Harshaw/Bicron) which were uniformly exposed to 1 Gy using 6 MeV photons in a linear accelerator on 5 separate occasions. Rods were read with a Harshaw 5500 reader. During the read process the Harshaw reader periodically checks for noise and PMT gain drift and the data were corrected for these parameters. Replicate exposure data were analysed using 1-way Analysis of Variance (ANOVA) to determine whether the between rod variations were significantly different to the variations within a single rod. A batch of 50 rods was also exposed on three occasions using the above technique. Individual TLD rod sensitivity values were determined using the rod responses from 2 exposures and these values were applied to correct charges on a rod-by-rod basis for the third exposure. ANOVA results on the 5 exposures of 25 rods showed the variance between rods was significantly greater than the within rod variance (p < 0.001). The precision of an individual rod was estimated to have a standard deviation of 2.8%. This suggests that the 95% confidence limits for repeated measurements using the same dose and

  19. BATCH-GE: Batch analysis of Next-Generation Sequencing data for genome editing assessment

    Science.gov (United States)

    Boel, Annekatrien; Steyaert, Woutert; De Rocker, Nina; Menten, Björn; Callewaert, Bert; De Paepe, Anne; Coucke, Paul; Willaert, Andy

    2016-01-01

    Targeted mutagenesis by the CRISPR/Cas9 system is currently revolutionizing genetics. The ease of this technique has enabled genome engineering in-vitro and in a range of model organisms and has pushed experimental dimensions to unprecedented proportions. Due to its tremendous progress in terms of speed, read length, throughput and cost, Next-Generation Sequencing (NGS) has been increasingly used for the analysis of CRISPR/Cas9 genome editing experiments. However, the current tools for genome editing assessment lack flexibility and fall short in the analysis of large amounts of NGS data. Therefore, we designed BATCH-GE, an easy-to-use bioinformatics tool for batch analysis of NGS-generated genome editing data, available from https://github.com/WouterSteyaert/BATCH-GE.git. BATCH-GE detects and reports indel mutations and other precise genome editing events and calculates the corresponding mutagenesis efficiencies for a large number of samples in parallel. Furthermore, this new tool provides flexibility by allowing the user to adapt a number of input variables. The performance of BATCH-GE was evaluated in two genome editing experiments, aiming to generate knock-out and knock-in zebrafish mutants. This tool will not only contribute to the evaluation of CRISPR/Cas9-based experiments, but will be of use in any genome editing experiment and has the ability to analyze data from every organism with a sequenced genome. PMID:27461955

  20. Data-driven analysis of collections of big datasets by the Bi-CoPaM method yields field-specific novel insights

    DEFF Research Database (Denmark)

    Abu-Jamous, Basel; Liu, Chao; Roberts, David, J.

    2017-01-01

    not commonly considered. To bridge this gap between the fast pace of data generation and the slower pace of data analysis, and to exploit the massive amounts of existing data, we suggest employing data-driven explorations to analyse collections of related big datasets. This approach aims at extracting field......Massive amounts of data have recently been, and are increasingly being, generated from various fields, such as bioinformatics, neuroscience and social networks. Many of these big datasets were generated to answer specific research questions, and were analysed accordingly. However, the scope...... clusters of consistently correlated objects. We demonstrate the power of data-driven explorations by applying the Bi-CoPaM to two collections of big datasets from two distinct fields, namely bioinformatics and neuroscience. In the first application, the collective analysis of forty yeast gene expression...

  1. Production of ethanol in batch and fed-batch fermentation of soluble sugar

    International Nuclear Information System (INIS)

    Chaudhary, M.Y.; Shah, M.A.; Shah, F.H.

    1991-01-01

    Keeping in view of the demand and need for alternate energy source, especially liquid fuels and the availability of raw materials in Pakistan, we have carried out biochemical and technological studies for ethanol through fermentation of renewable substrates. Molasses and sugar cane have been used as substrate for yeast fermentation. Selected yeast were used in both batch and semi continuous fermentation of molasses. Clarified dilute molasses were fermented with different strains of Saccharomyces cerevisiae. Ethanol concentration after 64 hours batch fermentation reached 9.4% with 90% yield based on sugar content. During feed batch system similar results were obtained after a fermentation cycle of 48 hours resulting in higher productivity. Similarly carbohydrates in fruit juices and hydro lysates of biomass can be economically fermented to ethanol to be used as feed stock for other chemicals. (author)

  2. Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch.

    Science.gov (United States)

    Hoffmann, Thomas J

    2011-03-01

    It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.

  3. Reformulated Neural Network (ReNN): a New Alternative for Data-driven Modelling in Hydrology and Water Resources Engineering

    Science.gov (United States)

    Razavi, S.; Tolson, B.; Burn, D.; Seglenieks, F.

    2012-04-01

    Reformulated Neural Network (ReNN) has been recently developed as an efficient and more effective alternative to feedforward multi-layer perceptron (MLP) neural networks [Razavi, S., and Tolson, B. A. (2011). "A new formulation for feedforward neural networks." IEEE Transactions on Neural Networks, 22(10), 1588-1598, DOI: 1510.1109/TNN.2011.2163169]. This presentation initially aims to introduce the ReNN to the water resources community and then demonstrates ReNN applications to water resources related problems. ReNN is essentially equivalent to a single-hidden-layer MLP neural network but defined on a new set of network variables which is more effective than the traditional set of network weights and biases. The main features of the new network variables are that they are geometrically interpretable and each variable has a distinct role in forming the network response. ReNN is more efficiently trained as it has a less complex error response surface. In addition to the ReNN training efficiency, the interpretability of the ReNN variables enables the users to monitor and understand the internal behaviour of the network while training. Regularization in the ReNN response can be also directly measured and controlled. This feature improves the generalization ability of the network. The appeal of the ReNN is demonstrated with two ReNN applications to water resources engineering problems. In the first application, the ReNN is used to model the rainfall-runoff relationships in multiple watersheds in the Great Lakes basin located in northeastern North America. Modelling inflows to the Great Lakes are of great importance to the management of the Great Lakes system. Due to the lack of some detailed physical data about existing control structures in many subwatersheds of this huge basin, the data-driven approach to modelling such as the ReNN are required to replace predictions from a physically-based rainfall runoff model. Unlike traditional MLPs, the ReNN does not necessarily

  4. Developing a Metadata Infrastructure to facilitate data driven science gateway and to provide Inspire/GEMINI compliance for CLIPC

    Science.gov (United States)

    Mihajlovski, Andrej; Plieger, Maarten; Som de Cerff, Wim; Page, Christian

    2016-04-01

    indicators Key is the availability of standardized metadata, describing indicator data and services. This will enable standardization and interoperability between the different distributed services of CLIPC. To disseminate CLIPC indicator data, transformed data products to enable impacts assessments and climate change impact indicators a standardized meta-data infrastructure is provided. The challenge is that compliance of existing metadata to INSPIRE ISO standards and GEMINI standards needs to be extended to further allow the web portal to be generated from the available metadata blueprint. The information provided in the headers of netCDF files available through multiple catalogues, allow us to generate ISO compliant meta data which is in turn used to generate web based interface content, as well as OGC compliant web services such as WCS and WMS for front end and WPS interactions for the scientific users to combine and generate new datasets. The goal of the metadata infrastructure is to provide a blueprint for creating a data driven science portal, generated from the underlying: GIS data, web services and processing infrastructure. In the presentation we will present the results and lessons learned.

  5. Consuming America : A Data-Driven Analysis of the United States as a Reference Culture in Dutch Public Discourse on Consumer Goods, 1890-1990

    OpenAIRE

    Wevers, M.J.H.F.

    2017-01-01

    Consuming America offers a data-driven, longitudinal analysis of the historical dynamics that have underpinned a long-term, layered cultural-historical process: the emergence of the United States as a dominant reference culture in Dutch public discourse on consumer goods between 1890 and 1990. The ideas, values, and practices associated with the United States in public discourse remained relatively steady over time, which might explain the country’s longevity as a reference culture and its po...

  6. New data-driven estimation of terrestrial CO2 fluxes in Asia using a standardized database of eddy covariance measurements, remote sensing data, and support vector regression

    Science.gov (United States)

    Ichii, Kazuhito; Ueyama, Masahito; Kondo, Masayuki; Saigusa, Nobuko; Kim, Joon; Alberto, Ma. Carmelita; Ardö, Jonas; Euskirchen, Eugénie S.; Kang, Minseok; Hirano, Takashi; Joiner, Joanna; Kobayashi, Hideki; Marchesini, Luca Belelli; Merbold, Lutz; Miyata, Akira; Saitoh, Taku M.; Takagi, Kentaro; Varlagin, Andrej; Bret-Harte, M. Syndonia; Kitamura, Kenzo; Kosugi, Yoshiko; Kotani, Ayumi; Kumar, Kireet; Li, Sheng-Gong; Machimura, Takashi; Matsuura, Yojiro; Mizoguchi, Yasuko; Ohta, Takeshi; Mukherjee, Sandipan; Yanagi, Yuji; Yasuda, Yukio; Zhang, Yiping; Zhao, Fenghua

    2017-04-01

    The lack of a standardized database of eddy covariance observations has been an obstacle for data-driven estimation of terrestrial CO2 fluxes in Asia. In this study, we developed such a standardized database using 54 sites from various databases by applying consistent postprocessing for data-driven estimation of gross primary productivity (GPP) and net ecosystem CO2 exchange (NEE). Data-driven estimation was conducted by using a machine learning algorithm: support vector regression (SVR), with remote sensing data for 2000 to 2015 period. Site-level evaluation of the estimated CO2 fluxes shows that although performance varies in different vegetation and climate classifications, GPP and NEE at 8 days are reproduced (e.g., r2 = 0.73 and 0.42 for 8 day GPP and NEE). Evaluation of spatially estimated GPP with Global Ozone Monitoring Experiment 2 sensor-based Sun-induced chlorophyll fluorescence shows that monthly GPP variations at subcontinental scale were reproduced by SVR (r2 = 1.00, 0.94, 0.91, and 0.89 for Siberia, East Asia, South Asia, and Southeast Asia, respectively). Evaluation of spatially estimated NEE with net atmosphere-land CO2 fluxes of Greenhouse Gases Observing Satellite (GOSAT) Level 4A product shows that monthly variations of these data were consistent in Siberia and East Asia; meanwhile, inconsistency was found in South Asia and Southeast Asia. Furthermore, differences in the land CO2 fluxes from SVR-NEE and GOSAT Level 4A were partially explained by accounting for the differences in the definition of land CO2 fluxes. These data-driven estimates can provide a new opportunity to assess CO2 fluxes in Asia and evaluate and constrain terrestrial ecosystem models.

  7. CONVERSION OF PINEAPPLE JUICE WASTE INTO LACTIC ACID IN BATCH AND FED – BATCH FERMENTATION SYSTEMS

    Directory of Open Access Journals (Sweden)

    Abdullah Mochamad Busairi

    2012-01-01

    Full Text Available Pineapple juice waste contains valuable components, which are mainly sucrose, glucose, and fructose. Recently, lactic acid has been considered to be an important raw material for the production of biodegradable lactide polymer. The fermentation experiments were carried out in a 3 litres fermentor (Biostat B Model under anaerobic condition with stirring speed of 50 rpm, temperature at 40oC, and pH of 6.00. Effect of feed concentration on lactic acid production, bacterial growth, substrate utilisation and productivity was studied. The results obtained from fed- batch culture fermentation showed that the maximum lactic acid productivity was 0.44 g/L.h for feed concentration of 90 g/L at 48 hours. Whereas the lactic acid productivity obtained from fed-batch culture was twice and half fold higher than that of batch culture productivity.  Buangan jus nanas mengandung komponen yang berharga terutama sukrosa, glukosa, dan fruktosa. Asam laktat adalah bahan baku yang terbaru dan penting untuk dibuat sebagai polimer laktat yang dapat terdegradasi oleh lingkungan. Percobaan dilakukan pada fermentor 3 liter (Model Biostat B di bawah kondisi anaerob dengan kecepatan pengadukan 50 rpm, temperatur 40oC, dan pH 6,00. Pengaruh konsentrasi umpan terhadap produksi asam laktat, pertumbuhan mikroba, pengggunaan substrat dan produktivitas telah dipelajari. Hasil yang didapatkan pada fermentasi dengan menggunakan sistem fed-batch menunjukkan bahwa produktivitas asam laktat maksimum adalah 0.44 g/L,jam dengan konsentrasi umpan, 90 g/L pada waktu 48 jam. Bahkan produktivitas asam laktat yang didapat pada kultur fed-batch lebih tinggi 2,5 kali dari pada proses menggunakan sistem batch

  8. Sojourn time distributions in a Markovian G-queue with batch arrival and batch removal

    Directory of Open Access Journals (Sweden)

    Yang Woo Shin

    1999-01-01

    Full Text Available We consider a single server Markovian queue with two types of customers; positive and negative, where positive customers arrive in batches and arrivals of negative customers remove positive customers in batches. Only positive customers form a queue and negative customers just reduce the system congestion by removing positive ones upon their arrivals. We derive the LSTs of sojourn time distributions for a single server Markovian queue with positive customers and negative customers by using the first passage time arguments for Markov chains.

  9. Cadmium removal using Cladophora in batch, semi-batch and flow reactors.

    Science.gov (United States)

    Sternberg, Steven P K; Dorn, Ryan W

    2002-02-01

    This study presents the results of using viable algae to remove cadmium from a synthetic wastewater. In batch and semi-batch tests, a local strain of Cladophora algae removed 80-94% of the cadmium introduced. The flow experiments that followed were conducted using non-local Cladophora parriaudii. Results showed that the alga removed only 12.7(+/-6.4)% of the cadmium introduced into the reactor. Limited removal was the result of insufficient algal quantities and poor contact between the algae and cadmium solution.

  10. Evaluation of vitrification factors from DWPF's macro-batch 1

    International Nuclear Information System (INIS)

    Edwards, T.B.

    2000-01-01

    The Defense Waste Processing Facility (DWPF) is evaluating new sampling and analytical methods that may be used to support future Slurry Mix Evaporator (SME) batch acceptability decisions. This report uses data acquired during DWPF's processing of macro-batch 1 to determine a set of vitrification factors covering several SME and Melter Feed Tank (MFT) batches. Such values are needed for converting the cation measurements derived from the new methods to a ''glass'' basis. The available data from macro-batch 1 were used to examine the stability of these vitrification factors, to estimate their uncertainty over the course of a macro-batch, and to provide a recommendation on the use of a single factor for an entire macro-batch. The report is in response to Technical Task Request HLW/DWPF/TTR-980015

  11. Inorganic fouling mitigation by salinity cycling in batch reverse osmosis

    OpenAIRE

    Maswadeh, Laith A.; Warsinger, David Elan Martin; Tow, Emily W.; Connors, Grace B.; Swaminathan, Jaichander; Lienhard, John H

    2018-01-01

    Enhanced fouling resistance has been observed in recent variants of reverse osmosis (RO) desalination which use time-varying batch or semi-batch processes, such as closed-circuit RO (CCRO) and pulse flow RO (PFRO). However, the mechanisms of batch processes' fouling resistance are not well-understood, and models have not been developed for prediction of their fouling performance. Here, a framework for predicting reverse osmosis fouling is developed by comparing the fluid residence time in bat...

  12. Optimizing Resource Utilization in Grid Batch Systems

    International Nuclear Information System (INIS)

    Gellrich, Andreas

    2012-01-01

    On Grid sites, the requirements of the computing tasks (jobs) to computing, storage, and network resources differ widely. For instance Monte Carlo production jobs are almost purely CPU-bound, whereas physics analysis jobs demand high data rates. In order to optimize the utilization of the compute node resources, jobs must be distributed intelligently over the nodes. Although the job resource requirements cannot be deduced directly, jobs are mapped to POSIX UID/GID according to the VO, VOMS group and role information contained in the VOMS proxy. The UID/GID then allows to distinguish jobs, if users are using VOMS proxies as planned by the VO management, e.g. ‘role=production’ for Monte Carlo jobs. It is possible to setup and configure batch systems (queuing system and scheduler) at Grid sites based on these considerations although scaling limits were observed with the scheduler MAUI. In tests these limitations could be overcome with a home-made scheduler.

  13. Sewage sludge irradiators: Batch and continuous flow

    International Nuclear Information System (INIS)

    Lavale, D.S.; George, J.R.; Shah, M.R.; Rawat, K.P.

    1998-01-01

    The potential threat to the environment imposed by high pathogenic organism content in municipal wastewater, especially the sludge and the world-wide growing aspirations for a cleaner, salubrious environment have made it mandatory for the sewage and sludge to undergo treatment, prior to their ultimate disposal to mother nature. Incapabilities associated with the conventional wastewater treatments to mitigate the problem of microorganisms have made it necessary to look for other alternatives, radiation treatment being the most reliable, rapid and environmentally sustainable of them. To promote the use of radiation for the sludge hygienization, Department of Atomic Energy has endeavoured to set up an indigenous, Sludge Hygienization Research Irradiator (SHRI) in the city of Baroda. Designed for 18.5 PBq of 60 Co to disinfect the digested sludge, the irradiator has additional provision for treatment of effluent and raw sewage. From engineering standpoint, all the subsystems have been functioning satisfactorily since its commissioning in 1990. Prolonged studies, spanning over a period of six years, primarily focused on inactivation of microorganism revealed that 3 kGy dose of gamma radiation is adequate to make the sludge pathogen and odour-free. A dose of 1.6 kGy in raw sewage and 0.5 kGy in effluent reduced coliform counts down to the regulatory discharge limits. These observations reflect a possible cost-effective solution to the burgeoning problem of surface water pollution across the globe. In the past, sub 37 PBq 60 Co batch irradiators have been designed and commissioned successfully for the treatment of sludge. Characterized with low dose delivery rates they are well-suited for treating low volumes of sludge in batches. Some concepts of continuous flow 60 Co irradiators having larger activities, yet simple and economic in design, are presented in the paper

  14. A comparison of Data Driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception

    Science.gov (United States)

    Sboev, A.; Moloshnikov, I.; Gudovskikh, D.; Rybka, R.

    2017-12-01

    In this work we compare several data-driven approaches to the task of author’s gender identification for texts with or without gender imitation. The data corpus has been specially gathered with crowdsourcing for this task. The best models are convolutional neural network with input of morphological data (fl-measure: 88%±3) for texts without imitation, and gradient boosting model with vector of character n-grams frequencies as input data (f1-measure: 64% ± 3) for texts with gender imitation. The method to filter the crowdsourced corpus using limited reference sample of texts to increase the accuracy of result is discussed.

  15. Bioprocess iterative batch-to-batch optimization based on hybrid parametric/nonparametric models.

    Science.gov (United States)

    Teixeira, Ana P; Clemente, João J; Cunha, António E; Carrondo, Manuel J T; Oliveira, Rui

    2006-01-01

    This paper presents a novel method for iterative batch-to-batch dynamic optimization of bioprocesses. The relationship between process performance and control inputs is established by means of hybrid grey-box models combining parametric and nonparametric structures. The bioreactor dynamics are defined by material balance equations, whereas the cell population subsystem is represented by an adjustable mixture of nonparametric and parametric models. Thus optimizations are possible without detailed mechanistic knowledge concerning the biological system. A clustering technique is used to supervise the reliability of the nonparametric subsystem during the optimization. Whenever the nonparametric outputs are unreliable, the objective function is penalized. The technique was evaluated with three simulation case studies. The overall results suggest that the convergence to the optimal process performance may be achieved after a small number of batches. The model unreliability risk constraint along with sampling scheduling are crucial to minimize the experimental effort required to attain a given process performance. In general terms, it may be concluded that the proposed method broadens the application of the hybrid parametric/nonparametric modeling technique to "newer" processes with higher potential for optimization.

  16. Data-Driven Robust RVFLNs Modeling of a Blast Furnace Iron-Making Process Using Cauchy Distribution Weighted M-Estimation

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Ping; Lv, Youbin; Wang, Hong; Chai, Tianyou

    2017-09-01

    Optimal operation of a practical blast furnace (BF) ironmaking process depends largely on a good measurement of molten iron quality (MIQ) indices. However, measuring the MIQ online is not feasible using the available techniques. In this paper, a novel data-driven robust modeling is proposed for online estimation of MIQ using improved random vector functional-link networks (RVFLNs). Since the output weights of traditional RVFLNs are obtained by the least squares approach, a robustness problem may occur when the training dataset is contaminated with outliers. This affects the modeling accuracy of RVFLNs. To solve this problem, a Cauchy distribution weighted M-estimation based robust RFVLNs is proposed. Since the weights of different outlier data are properly determined by the Cauchy distribution, their corresponding contribution on modeling can be properly distinguished. Thus robust and better modeling results can be achieved. Moreover, given that the BF is a complex nonlinear system with numerous coupling variables, the data-driven canonical correlation analysis is employed to identify the most influential components from multitudinous factors that affect the MIQ indices to reduce the model dimension. Finally, experiments using industrial data and comparative studies have demonstrated that the obtained model produces a better modeling and estimating accuracy and stronger robustness than other modeling methods.

  17. Development of a Data-Driven Predictive Model of Supply Air Temperature in an Air-Handling Unit for Conserving Energy

    Directory of Open Access Journals (Sweden)

    Goopyo Hong

    2018-02-01

    Full Text Available The purpose of this study was to develop a data-driven predictive model that can predict the supply air temperature (SAT in an air-handling unit (AHU by using a neural network. A case study was selected, and AHU operational data from December 2015 to November 2016 was collected. A data-driven predictive model was generated through an evolving process that consisted of an initial model, an optimal model, and an adaptive model. In order to develop the optimal model, input variables, the number of neurons and hidden layers, and the period of the training data set were considered. Since AHU data changes over time, an adaptive model, which has the ability to actively cope with constantly changing data, was developed. This adaptive model determined the model with the lowest mean square error (MSE of the 91 models, which had two hidden layers and sets up a 12-hour test set at every prediction. The adaptive model used recently collected data as training data and utilized the sliding window technique rather than the accumulative data method. Furthermore, additional testing was performed to validate the adaptive model using AHU data from another building. The final adaptive model predicts SAT to a root mean square error (RMSE of less than 0.6 °C.

  18. Initial Results from an Energy-Aware Airborne Dynamic, Data-Driven Application System Performing Sampling in Coherent Boundary-Layer Structures

    Science.gov (United States)

    Frew, E.; Argrow, B. M.; Houston, A. L.; Weiss, C.

    2014-12-01

    The energy-aware airborne dynamic, data-driven application system (EA-DDDAS) performs persistent sampling in complex atmospheric conditions by exploiting wind energy using the dynamic data-driven application system paradigm. The main challenge for future airborne sampling missions is operation with tight integration of physical and computational resources over wireless communication networks, in complex atmospheric conditions. The physical resources considered here include sensor platforms, particularly mobile Doppler radar and unmanned aircraft, the complex conditions in which they operate, and the region of interest. Autonomous operation requires distributed computational effort connected by layered wireless communication. Onboard decision-making and coordination algorithms can be enhanced by atmospheric models that assimilate input from physics-based models and wind fields derived from multiple sources. These models are generally too complex to be run onboard the aircraft, so they need to be executed in ground vehicles in the field, and connected over broadband or other wireless links back to the field. Finally, the wind field environment drives strong interaction between the computational and physical systems, both as a challenge to autonomous path planning algorithms and as a novel energy source that can be exploited to improve system range and endurance. Implementation details of a complete EA-DDDAS will be provided, along with preliminary flight test results targeting coherent boundary-layer structures.

  19. On transcending the impasse of respiratory motion correction applications in routine clinical imaging - a consideration of a fully automated data driven motion control framework

    International Nuclear Information System (INIS)

    Kesner, Adam L; Schleyer, Paul J; Büther, Florian; Walter, Martin A; Schäfers, Klaus P; Koo, Phillip J

    2014-01-01

    Positron emission tomography (PET) is increasingly used for the detection, characterization, and follow-up of tumors located in the thorax. However, patient respiratory motion presents a unique limitation that hinders the application of high-resolution PET technology for this type of imaging. Efforts to transcend this limitation have been underway for more than a decade, yet PET remains for practical considerations a modality vulnerable to motion-induced image degradation. Respiratory motion control is not employed in routine clinical operations. In this article, we take an opportunity to highlight some of the recent advancements in data-driven motion control strategies and how they may form an underpinning for what we are presenting as a fully automated data-driven motion control framework. This framework represents an alternative direction for future endeavors in motion control and can conceptually connect individual focused studies with a strategy for addressing big picture challenges and goals. The online version of this article (doi:10.1186/2197-7364-1-8) contains supplementary material, which is available to authorized users.

  20. Collaborative Project: The problem of bias in defining uncertainty in computationally enabled strategies for data-driven climate model development. Final Technical Report.

    Energy Technology Data Exchange (ETDEWEB)

    Huerta, Gabriel [Univ. of New Mexico, Albuquerque, NM (United States)

    2016-05-10

    The objective of the project is to develop strategies for better representing scientific sensibilities within statistical measures of model skill that then can be used within a Bayesian statistical framework for data-driven climate model development and improved measures of model scientific uncertainty. One of the thorny issues in model evaluation is quantifying the effect of biases on climate projections. While any bias is not desirable, only those biases that affect feedbacks affect scatter in climate projections. The effort at the University of Texas is to analyze previously calculated ensembles of CAM3.1 with perturbed parameters to discover how biases affect projections of global warming. The hypothesis is that compensating errors in the control model can be identified by their effect on a combination of processes and that developing metrics that are sensitive to dependencies among state variables would provide a way to select version of climate models that may reduce scatter in climate projections. Gabriel Huerta at the University of New Mexico is responsible for developing statistical methods for evaluating these field dependencies. The UT effort will incorporate these developments into MECS, which is a set of python scripts being developed at the University of Texas for managing the workflow associated with data-driven climate model development over HPC resources. This report reflects the main activities at the University of New Mexico where the PI (Huerta) and the Postdocs (Nosedal, Hattab and Karki) worked on the project.

  1. Data-Driven Zero-Sum Neuro-Optimal Control for a Class of Continuous-Time Unknown Nonlinear Systems With Disturbance Using ADP.

    Science.gov (United States)

    Wei, Qinglai; Song, Ruizhuo; Yan, Pengfei

    2016-02-01

    This paper is concerned with a new data-driven zero-sum neuro-optimal control problem for continuous-time unknown nonlinear systems with disturbance. According to the input-output data of the nonlinear system, an effective recurrent neural network is introduced to reconstruct the dynamics of the nonlinear system. Considering the system disturbance as a control input, a two-player zero-sum optimal control problem is established. Adaptive dynamic programming (ADP) is developed to obtain the optimal control under the worst case of the disturbance. Three single-layer neural networks, including one critic and two action networks, are employed to approximate the performance index function, the optimal control law, and the disturbance, respectively, for facilitating the implementation of the ADP method. Convergence properties of the ADP method are developed to show that the system state will converge to a finite neighborhood of the equilibrium. The weight matrices of the critic and the two action networks are also convergent to finite neighborhoods of their optimal ones. Finally, the simulation results will show the effectiveness of the developed data-driven ADP methods.

  2. Modelling and Simulation of the Batch Hydrolysis of Acetic ...

    African Journals Online (AJOL)

    The kinetic modelling of the batch synthesis of acetic acid from acetic anhydride was investigated. The kinetic data of the reaction was obtained by conducting the hydrolysis reaction in a batch reactor. A dynamic model was formulated for this process and simulation was carried out using gPROMS® an advanced process ...

  3. [Batch release of immunoglobulin and monoclonal antibody products].

    Science.gov (United States)

    Gross, S

    2014-10-01

    The Paul-Ehrlich Institute (PEI) is an independent institution of the Federal Republic of Germany responsible for performing official experimental batch testing of sera. The institute decides about the release of each batch and performs experimental research in the field. The experimental quality control ensures the potency of the product and also the absence of harmful impurities. For release of an immunoglobulin batch the marketing authorization holder has to submit the documentation of the manufacture and the results of quality control measures together with samples of the batch to the PEI. Experimental testing is performed according to the approved specifications regarding the efficacy and safety. Since implementation of the 15th German drug law amendment, the source of antibody is not defined anymore. According to § 32 German drug law, all batches of sera need to be released by an official control laboratory. Sera are medicinal products, which contain antibodies, antibody fragments or fusion proteins with a functional antibody portion. Therefore, all batches of monoclonal antibodies and derivatives must also be released by the PEI and the marketing authorization holder has to submit a batch release application. Under certain circumstances a waiver for certain products can be issued with regard to batch release. The conditions for such a waiver apply to the majority of monoclonal antibodies.

  4. 21 CFR 80.37 - Treatment of batch pending certification.

    Science.gov (United States)

    2010-04-01

    ... 21 Food and Drugs 1 2010-04-01 2010-04-01 false Treatment of batch pending certification. 80.37 Section 80.37 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES GENERAL COLOR ADDITIVE CERTIFICATION Certification Procedures § 80.37 Treatment of batch pending certification...

  5. Solving a chemical batch scheduling problem by local search

    NARCIS (Netherlands)

    Brucker, P.; Hurink, Johann L.

    1999-01-01

    A chemical batch scheduling problem is modelled in two different ways as a discrete optimization problem. Both models are used to solve the batch scheduling problem in a two-phase tabu search procedure. The method is tested on real-world data.

  6. Dynamic Scheduling Of Batch Operations With Non-Identical Machines

    NARCIS (Netherlands)

    van der Zee, D.J.; van Harten, A.; Schuur, P.C.

    1997-01-01

    Batch-wise production is found in many industries. A good example of production systems which process products batch-wise are the ovens found in aircraft industry and in semiconductor manufacturing. These systems mostly consist of multiple machines of different types, given the range and volumes of

  7. Dynamic scheduling of batch operations with non-identical machines

    NARCIS (Netherlands)

    van der Zee, D.J.; van Harten, Aart; Schuur, Peter

    1997-01-01

    Batch-wise production is found in many industries. A good example of production systems which process products batch-wise are the ovens found in aircraft industry and in semiconductor manufacturing. These systems mostly consist of multiple machines of different types, given the range and volumes of

  8. A canned food scheduling problem with batch due date

    Science.gov (United States)

    Chung, Tsui-Ping; Liao, Ching-Jong; Smith, Milton

    2014-09-01

    This article considers a canned food scheduling problem where jobs are grouped into several batches. Jobs can be sent to the next operation only when all the jobs in the same batch have finished their processing, i.e. jobs in a batch, have a common due date. This batch due date problem is quite common in canned food factories, but there is no efficient heuristic to solve the problem. The problem can be formulated as an identical parallel machine problem with batch due date to minimize the total tardiness. Since the problem is NP hard, two heuristics are proposed to find the near-optimal solution. Computational results comparing the effectiveness and efficiency of the two proposed heuristics with an existing heuristic are reported and discussed.

  9. Spatial and interannual variability in Baltic sprat batch fecundity

    DEFF Research Database (Denmark)

    Haslob, H.; Tomkiewicz, Jonna; Hinrichsen, H.H.

    2011-01-01

    in the central Baltic Sea, namely the Bornholm Basin, Gdansk Deep and Southern Gotland Basin. Environmental parameters such as hydrography, fish condition and stock density were tested in order to investigate the observed variability in sprat fecundity. Absolute batch fecundity was found to be positively related...... to fish length and weight. Significant differences in absolute and relative batch fecundity of Baltic sprat among areas and years were detected, and could partly be explained by hydrographic features of the investigated areas. A non-linear multiple regression model taking into account fish length...... and ambient temperature explained 70% of variability in absolute batch fecundity. Oxygen content and fish condition were not related to sprat batch fecundity. Additionally, a negative effect of stock size on sprat batch fecundity in the Bornholm Basin was revealed. The obtained data and results are important...

  10. Biodenitrification in Sequencing Batch Reactors. Final report

    International Nuclear Information System (INIS)

    Silverstein, J.

    1996-01-01

    One plan for stabilization of the Solar Pond waters and sludges at Rocky Flats Plant (RFP), is evaporation and cement solidification of the salts to stabilize heavy metals and radionuclides for land disposal as low-level mixed waste. It has been reported that nitrate (NO 3- ) salts may interfere with cement stabilization of heavy metals and radionuclides. Therefore, biological nitrate removal (denitrification) may be an important pretreatment for the Solar Pond wastewaters at RFP, improving the stability of the cement final waste form, reducing the requirement for cement (or pozzolan) additives and reducing the volume of cemented low-level mixed waste requiring ultimate disposal. A laboratory investigation of the performance of the Sequencing Batch Reactor (SBR) activated sludge process developed for nitrate removal from a synthetic brine typical of the high-nitrate and high-salinity wastewaters in the Solar Ponds at Rocky Flats Plant was carried out at the Environmental Engineering labs at the University of Colorado, Boulder, between May 1, 1994 and October 1, 1995

  11. Glucoamylase production in batch, chemostat and fed-batch cultivations by an industrial strain of Aspergillus niger

    DEFF Research Database (Denmark)

    Pedersen, Henrik; Beyer, Michael; Nielsen, Jens

    2000-01-01

    The Aspergillus niger strain BO-1 was grown in batch, continuous (chemostat) and fed-batch cultivations in order to study the production of the extracellular enzyme glucoamylase under different growth conditions. In the pH range 2.5-6.0, the specific glucoamylase productivity and the specific...

  12. Batch-to-batch quality consistency evaluation of botanical drug products using multivariate statistical analysis of the chromatographic fingerprint.

    Science.gov (United States)

    Xiong, Haoshu; Yu, Lawrence X; Qu, Haibin

    2013-06-01

    Botanical drug products have batch-to-batch quality variability due to botanical raw materials and the current manufacturing process. The rational evaluation and control of product quality consistency are essential to ensure the efficacy and safety. Chromatographic fingerprinting is an important and widely used tool to characterize the chemical composition of botanical drug products. Multivariate statistical analysis has showed its efficacy and applicability in the quality evaluation of many kinds of industrial products. In this paper, the combined use of multivariate statistical analysis and chromatographic fingerprinting is presented here to evaluate batch-to-batch quality consistency of botanical drug products. A typical botanical drug product in China, Shenmai injection, was selected as the example to demonstrate the feasibility of this approach. The high-performance liquid chromatographic fingerprint data of historical batches were collected from a traditional Chinese medicine manufacturing factory. Characteristic peaks were weighted by their variability among production batches. A principal component analysis model was established after outliers were modified or removed. Multivariate (Hotelling T(2) and DModX) control charts were finally successfully applied to evaluate the quality consistency. The results suggest useful applications for a combination of multivariate statistical analysis with chromatographic fingerprinting in batch-to-batch quality consistency evaluation for the manufacture of botanical drug products.

  13. Batch-To-Batch Rational Feedforward Control : From Iterative Learning to Identification Approaches, with Application to a Wafer Stage

    NARCIS (Netherlands)

    Blanken, L.; Boeren, F.A.J.; Bruijnen, D.J.H.; Oomen, T.A.E.

    2017-01-01

    Feedforward control enables high performance for industrial motion systems that perform nonrepeating motion tasks. Recently, learning techniques have been proposed that improve both performance and flexibility to nonrepeating tasks in a batch-To-batch fashion by using a rational parameterization in

  14. Kinetic study of batch and fed-batch enzymatic saccharification of pretreated substrate and subsequent fermentation to ethanol

    Directory of Open Access Journals (Sweden)

    Gupta Rishi

    2012-03-01

    Full Text Available Abstract Background Enzymatic hydrolysis, the rate limiting step in the process development for biofuel, is always hampered by its low sugar concentration. High solid enzymatic saccharification could solve this problem but has several other drawbacks such as low rate of reaction. In the present study we have attempted to enhance the concentration of sugars in enzymatic hydrolysate of delignified Prosopis juliflora, using a fed-batch enzymatic hydrolysis approach. Results The enzymatic hydrolysis was carried out at elevated solid loading up to 20% (w/v and a comparison kinetics of batch and fed-batch enzymatic hydrolysis was carried out using kinetic regimes. Under batch mode, the actual sugar concentration values at 20% initial substrate consistency were found deviated from the predicted values and the maximum sugar concentration obtained was 80.78 g/L. Fed-batch strategy was implemented to enhance the final sugar concentration to 127 g/L. The batch and fed-batch enzymatic hydrolysates were fermented with Saccharomyces cerevisiae and ethanol production of 34.78 g/L and 52.83 g/L, respectively, were achieved. Furthermore, model simulations showed that higher insoluble solids in the feed resulted in both smaller reactor volume and shorter residence time. Conclusion Fed-batch enzymatic hydrolysis is an efficient procedure for enhancing the sugar concentration in the hydrolysate. Restricting the process to suitable kinetic regimes could result in higher conversion rates.

  15. Kinetic study of batch and fed-batch enzymatic saccharification of pretreated substrate and subsequent fermentation to ethanol

    Science.gov (United States)

    2012-01-01

    Background Enzymatic hydrolysis, the rate limiting step in the process development for biofuel, is always hampered by its low sugar concentration. High solid enzymatic saccharification could solve this problem but has several other drawbacks such as low rate of reaction. In the present study we have attempted to enhance the concentration of sugars in enzymatic hydrolysate of delignified Prosopis juliflora, using a fed-batch enzymatic hydrolysis approach. Results The enzymatic hydrolysis was carried out at elevated solid loading up to 20% (w/v) and a comparison kinetics of batch and fed-batch enzymatic hydrolysis was carried out using kinetic regimes. Under batch mode, the actual sugar concentration values at 20% initial substrate consistency were found deviated from the predicted values and the maximum sugar concentration obtained was 80.78 g/L. Fed-batch strategy was implemented to enhance the final sugar concentration to 127 g/L. The batch and fed-batch enzymatic hydrolysates were fermented with Saccharomyces cerevisiae and ethanol production of 34.78 g/L and 52.83 g/L, respectively, were achieved. Furthermore, model simulations showed that higher insoluble solids in the feed resulted in both smaller reactor volume and shorter residence time. Conclusion Fed-batch enzymatic hydrolysis is an efficient procedure for enhancing the sugar concentration in the hydrolysate. Restricting the process to suitable kinetic regimes could result in higher conversion rates. PMID:22433563

  16. A Novel Hybrid Data-Driven Model for Daily Land Surface Temperature Forecasting Using Long Short-Term Memory Neural Network Based on Ensemble Empirical Mode Decomposition

    Directory of Open Access Journals (Sweden)

    Xike Zhang

    2018-05-01

    Full Text Available Daily land surface temperature (LST forecasting is of great significance for application in climate-related, agricultural, eco-environmental, or industrial studies. Hybrid data-driven prediction models using Ensemble Empirical Mode Composition (EEMD coupled with Machine Learning (ML algorithms are useful for achieving these purposes because they can reduce the difficulty of modeling, require less history data, are easy to develop, and are less complex than physical models. In this article, a computationally simple, less data-intensive, fast and efficient novel hybrid data-driven model called the EEMD Long Short-Term Memory (LSTM neural network, namely EEMD-LSTM, is proposed to reduce the difficulty of modeling and to improve prediction accuracy. The daily LST data series from the Mapoling and Zhijaing stations in the Dongting Lake basin, central south China, from 1 January 2014 to 31 December 2016 is used as a case study. The EEMD is firstly employed to decompose the original daily LST data series into many Intrinsic Mode Functions (IMFs and a single residue item. Then, the Partial Autocorrelation Function (PACF is used to obtain the number of input data sample points for LSTM models. Next, the LSTM models are constructed to predict the decompositions. All the predicted results of the decompositions are aggregated as the final daily LST. Finally, the prediction performance of the hybrid EEMD-LSTM model is assessed in terms of the Mean Square Error (MSE, Mean Absolute Error (MAE, Mean Absolute Percentage Error (MAPE, Root Mean Square Error (RMSE, Pearson Correlation Coefficient (CC and Nash-Sutcliffe Coefficient of Efficiency (NSCE. To validate the hybrid data-driven model, the hybrid EEMD-LSTM model is compared with the Recurrent Neural Network (RNN, LSTM and Empirical Mode Decomposition (EMD coupled with RNN, EMD-LSTM and EEMD-RNN models, and their comparison results demonstrate that the hybrid EEMD-LSTM model performs better than the other

  17. A Novel Hybrid Data-Driven Model for Daily Land Surface Temperature Forecasting Using Long Short-Term Memory Neural Network Based on Ensemble Empirical Mode Decomposition.

    Science.gov (United States)

    Zhang, Xike; Zhang, Qiuwen; Zhang, Gui; Nie, Zhiping; Gui, Zifan; Que, Huafei

    2018-05-21

    Daily land surface temperature (LST) forecasting is of great significance for application in climate-related, agricultural, eco-environmental, or industrial studies. Hybrid data-driven prediction models using Ensemble Empirical Mode Composition (EEMD) coupled with Machine Learning (ML) algorithms are useful for achieving these purposes because they can reduce the difficulty of modeling, require less history data, are easy to develop, and are less complex than physical models. In this article, a computationally simple, less data-intensive, fast and efficient novel hybrid data-driven model called the EEMD Long Short-Term Memory (LSTM) neural network, namely EEMD-LSTM, is proposed to reduce the difficulty of modeling and to improve prediction accuracy. The daily LST data series from the Mapoling and Zhijaing stations in the Dongting Lake basin, central south China, from 1 January 2014 to 31 December 2016 is used as a case study. The EEMD is firstly employed to decompose the original daily LST data series into many Intrinsic Mode Functions (IMFs) and a single residue item. Then, the Partial Autocorrelation Function (PACF) is used to obtain the number of input data sample points for LSTM models. Next, the LSTM models are constructed to predict the decompositions. All the predicted results of the decompositions are aggregated as the final daily LST. Finally, the prediction performance of the hybrid EEMD-LSTM model is assessed in terms of the Mean Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Pearson Correlation Coefficient (CC) and Nash-Sutcliffe Coefficient of Efficiency (NSCE). To validate the hybrid data-driven model, the hybrid EEMD-LSTM model is compared with the Recurrent Neural Network (RNN), LSTM and Empirical Mode Decomposition (EMD) coupled with RNN, EMD-LSTM and EEMD-RNN models, and their comparison results demonstrate that the hybrid EEMD-LSTM model performs better than the other five

  18. A Data-Driven Voter Guide for U.S. Elections: Adapting Quantitative Measures of the Preferences and Priorities of Political Elites to Help Voters Learn About Candidates

    Directory of Open Access Journals (Sweden)

    Adam Bonica

    2016-11-01

    Full Text Available Internet-based voter advice applications have experienced tremendous growth across Europe in recent years but have yet to be widely adopted in the United States. By comparison, the candidate-centered U.S. electoral system, which routinely requires voters to consider dozens of candidates across a dizzying array of local, state, and federal offices each time they cast a ballot, introduces challenges of scale to the systematic provision of information. Only recently have methodological advances combined with the rapid growth in publicly available data on candidates and their supporters to bring a comprehensive data-driven voter guide within reach. This paper introduces a set of newly developed software tools for collecting, disambiguating, and merging large amounts of data on candidates and other political elites. It then demonstrates how statistical methods developed by political scientists to measure the preferences and expressed priorities of politicians can be adapted to help voters learn about candidates.

  19. Development of a data-driven algorithm to determine the W+jets background in t anti t events in ATLAS

    Energy Technology Data Exchange (ETDEWEB)

    Mehlhase, Sascha

    2010-07-12

    The physics of the top quark is one of the key components in the physics programme of the ATLAS experiment at the Large Hadron Collider at CERN. In this thesis, general studies of the jet trigger performance for top quark events using fully simulated Monte Carlo samples are presented and two data-driven techniques to estimate the multi-jet trigger efficiency and the W+Jets background in top pair events are introduced to the ATLAS experiment. In a tag-and-probe based method, using a simple and common event selection and a high transverse momentum lepton as tag object, the possibility to estimate the multijet trigger efficiency from data in ATLAS is investigated and it is shown that the method is capable of estimating the efficiency without introducing any significant bias by the given tag selection. In the second data-driven analysis a new method to estimate the W+Jets background in a top-pair event selection is introduced to ATLAS. By defining signal and background dominated regions by means of the jet multiplicity and the pseudo-rapidity distribution of the lepton in the event, the W+Jets contribution is extrapolated from the background dominated into the signal dominated region. The method is found to estimate the given background contribution as a function of the jet multiplicity with an accuracy of about 25% for most of the top dominated region with an integrated luminosity of above 100 pb{sup -1} at {radical}(s) = 10 TeV. This thesis also covers a study summarising the thermal behaviour and expected performance of the Pixel Detector of ATLAS. All measurements performed during the commissioning phase of 2008/09 yield results within the specification of the system and the performance is expected to stay within those even after several years of running under LHC conditions. (orig.)

  20. Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery

    International Nuclear Information System (INIS)

    Hu, Chao; Jain, Gaurav; Zhang, Puqiang; Schmidt, Craig; Gomadam, Parthasarathy; Gorka, Tom

    2014-01-01

    Highlights: • We develop a data-driven method for the battery capacity estimation. • Five charge-related features that are indicative of the capacity are defined. • The kNN regression model captures the dependency of the capacity on the features. • Results with 10 years’ continuous cycling data verify the effectiveness of the method. - Abstract: Reliability of lithium-ion (Li-ion) rechargeable batteries used in implantable medical devices has been recognized as of high importance from a broad range of stakeholders, including medical device manufacturers, regulatory agencies, physicians, and patients. To ensure Li-ion batteries in these devices operate reliably, it is important to be able to assess the battery health condition by estimating the battery capacity over the life-time. This paper presents a data-driven method for estimating the capacity of Li-ion battery based on the charge voltage and current curves. The contributions of this paper are three-fold: (i) the definition of five characteristic features of the charge curves that are indicative of the capacity, (ii) the development of a non-linear kernel regression model, based on the k-nearest neighbor (kNN) regression, that captures the complex dependency of the capacity on the five features, and (iii) the adaptation of particle swarm optimization (PSO) to finding the optimal combination of feature weights for creating a kNN regression model that minimizes the cross validation (CV) error in the capacity estimation. Verification with 10 years’ continuous cycling data suggests that the proposed method is able to accurately estimate the capacity of Li-ion battery throughout the whole life-time